New uses of underspecified variables in the ERG 2018?

I notice that the MRS outputs of the ERG contain i variables more often than in the 1214 version. I’m used to them being used for dropped x arguments, but now I’m asking about them being used as intrinsic variables (ARG0s, abbreviated “IV” below). I show the parses from ACE using the ERG grammar images Woodley distributes. The examples are mostly taken from the mrs test suite.

Number constructions

1214 already used i for some IVs of number constructions, as below for “Twenty three dogs go” (i-id 631):

[ TOP: h0
  INDEX: e2
  RELS: < [ udef_q<0:12> LBL: h4 ARG0: x3 RSTR: h5 BODY: h6 ]
          [ card<0:6> LBL: h7 ARG0: e9 ARG1: x3 CARG: "20" ]
          [ plus<0:6> LBL: h10 ARG0: i11 ARG1: x3 ARG2: h7 ARG3: h12 ]
          [ card<7:12> LBL: h12 ARG0: i14 ARG1: x3 CARG: "3" ]
          [ _dog_n_1<13:17> LBL: h10 ARG0: x3 ]
          [ _go_v_1<18:21> LBL: h1 ARG0: e2 ARG1: x3 ] >
  HCONS: < h0 qeq h1 h5 qeq h10 > ]

But 2018 changes the distribution so card EPs use i variables and the plus is now an e.

[ TOP: h0
  INDEX: e2
  RELS: < [ udef_q<0:12> LBL: h4 ARG0: x3 RSTR: h5 BODY: h6 ]
          [ card<0:6> LBL: h7 ARG0: i9 ARG1: i10 CARG: "20" ]
          [ plus<0:6> LBL: h11 ARG0: e12 ARG1: x3 ARG2: h7 ARG3: h13 ]
          [ card<7:12> LBL: h13 ARG0: i15 ARG1: i16 CARG: "3" ]
          [ _dog_n_1<13:17> LBL: h11 ARG0: x3 ]
          [ _go_v_1<18:21> LBL: h1 ARG0: e2 ARG1: x3 ] >
  HCONS: < h0 qeq h1 h5 qeq h11 > ]

I think this is an improvement, but I’m still not certain why the cards have i IVs at all. Normally they are e in this context, as in the MRS for “Two dogs go”, although they can be x, as in “Two is a number”. I suppose if they were x-like you would not want to quantify them all, but is there more to the story?

Scopal modifiers

In 1214, scopal modifiers (by which I mean things like “probably”, or is the term “scopal operator”?) used e variables, as below for “The dog probably barked” (i-id 451):

[ TOP: h0
  INDEX: e2
  RELS: < [ _the_q<0:3> LBL: h4 ARG0: x3 RSTR: h5 BODY: h6 ]
          [ _dog_n_1<4:7> LBL: h7 ARG0: x3 ]
          [ _probable_a_1<8:16> LBL: h1 ARG0: e8 ARG1: h9 ]
          [ _bark_v_1<17:24> LBL: h10 ARG0: e2 ARG1: x3 ] >
  HCONS: < h0 qeq h1 h5 qeq h7 h9 qeq h10 > ]

But in 2018 they use i variables:

[ TOP: h0
  INDEX: e2
  RELS: < [ _the_q<0:3> LBL: h4 ARG0: x3 RSTR: h5 BODY: h6 ]
          [ _dog_n_1<4:7> LBL: h7 ARG0: x3 ]
          [ _probable_a_1<8:16> LBL: h1 ARG0: i8 ARG1: h9 ]
          [ _bark_v_1<17:24> LBL: h10 ARG0: e2 ARG1: x3 ] >
  HCONS: < h0 qeq h1 h5 qeq h7 h9 qeq h10 > ]

If this is an intentional change and it’s linguistically sound, then maybe we should describe it as a new criterion for well-formed MRSs. Doing so would allow me to improve a couple semantic operations I do in PyDelphin (which I won’t get into here).

I note that the ERG already did this for neg in 1214, as for “Don’t bark!” (i-id 1061)

[ TOP: h0
  INDEX: e2
  RELS: < [ pronoun_q<0:5> LBL: h4 ARG0: x5 RSTR: h6 BODY: h7 ]
          [ pron<0:5> LBL: h8 ARG0: x5 ]
          [ neg<0:5> LBL: h1 ARG0: i9 ARG1: h10 ]
          [ _bark_v_1<6:11> LBL: h11 ARG0: e2 ARG1: x5 ] >
  HCONS: < h0 qeq h1 h6 qeq h8 h10 qeq h11 > ]

and things like addressee, as in “It rained, Abrams” from the esd test suite (i-id 1151):

[ TOP: h0
  INDEX: e2
  RELS: < [ _rain_v_1<3:10> LBL: h1 ARG0: e2 ]
          [ addressee<11:18> LBL: h1 ARG0: i4 ARG1: x5 ARG2: e2 ]
          [ proper_q<11:18> LBL: h6 ARG0: x5 RSTR: h7 BODY: h8 ]
          [ named<11:18> LBL: h9 ARG0: x5 CARG: "Abrams" ] >
  HCONS: < h0 qeq h1 h7 qeq h9 > ]


There are plenty of other examples in Redwoods but I’ll stop here. I just want to know if there was some new principle applied in the ERG’s MRS outputs to explain these changes, since I didn’t notice anything about it in the release announcement. Or maybe we can categorize when we should expect to see i variables: (1) dropped x arguments; (2) underspecified x/e; (3) scopal modifiers; (4) ???. I also note that there are no p-variable IVs in Redwoods, although they appear in other argument positions for dropped scopal arguments. And if dropped e arguments are possible (all my examples insert ellipses or unknowns), would they be i or p?

And finally, to clarify, I want to distinguish what we can consider core principles of DELPH-IN MRSs (e.g., as we’ve adopted the intrinsic-variable-property), and what is ERG MRS conventions.



It’s been about a month but I’m still hoping to get some feedback about this so we can have a more directed conversation about it in Cambridge.

Also a small correction:

And if dropped e arguments are possible (all my examples insert ellipses or unknowns), would they be i or p ?

That should have been i or u (p is not a supertype of e). But maybe we’d actually want a variable underspecified over e and h?

Some of the responses I got at Cambrige are:

  • The examples for ERG 2018 above are consistent; in the cardinal number and scopal modifier cases, the i ARG0s are meant to be underspecified because their status as eventualities or instances is either ambiguous/undecidable or Dan is not willing/ready to make such a decision.
  • A p argument is meant to be a dropped x that, if not dropped, cannot be an e, not one that can be an x or an h
  • Supposedly h arguments cannot be dropped. This surprised me. So for “Kim believed”, the dropped argument must have been an instance like “Sandy” or “that fact”, not a proposition like “it would rain today”
  • It is OK to alter our condition about every x being quantified to “Every x that is the ARG0 of some EP must be quantified”. So x variables may be used for dropped arguments (not ARG0s). This seems to be the way forward to avoid the conflation with underspecified i arguments.

See for the discussion minutes.

I think this last take-away is not 100% accurate. Losing the quantifier of ‘x’ is (arguably) safe in cases where it can be losslessly inferred to be a bland existential. The conditions are then something like:

  • The scope the quantifier takes is unconstrained, i.e. no constraints on its LTOP and BODY beyond those implied by x having to be bound already before someone can use it. We currently assume this holds of all quantifiers.
  • The quantifier’s predicate name is predictable, like “exist_q” or “udef_q” (pick one and stick with it)
  • The RSTR of the quantifier is empty, i.e. the vacuous proposition “true.”

Typically this last condition would correspond to x not being the ARG0 of anything, as you said. In cases where people like David with Nuuchahnulth and Emily with Wambaya are pushing the envelope, we might conceivably encounter a situation where a RSTR consists only of something like [red_a ARG0: e ARG1: x] or [eat_v ARG0: e ARG1: y ARG2: x]. These would violate the bound variable assumption, making them impossible to translate to DMRS, so they may or may not be something that you care about, but I still think such possibilities should be considered in this discussion: the quantifier could NOT be dropped in these situations.

Thanks, Woodley. That’s good information but I think we may be talking about separate points.

First, when I mentioned not quantifying every ‘x’, I meant specifically for the case of using a variable of type ‘x’ (instead of ‘i’ or ‘p’) to represent dropped arguments. Previously, every ‘x’ variable needed a quantifier, but if we start using ‘x’ for dropped arguments, we don’t want to quantify those. So the condition was altered so that every ‘x’ that is an ARG0 needs a quantifier. For example, “Cats were chased” (with ERG 2018):

< h0, e2,
  { h4:udef_q<0:3>(ARG0 x3, RSTR h5, BODY h6),
    h7:_cat_n_1<4:7>(ARG0 x3),
    h1:_chase_v_1<12:18>(ARG0 e2, ARG1 i8, ARG2 x3) },
  { h0 qeq h1,
    h5 qeq h7 },
  { e2 topic x3 } >

The proposal is that the i8 on _chase_v_1 above would be x8 but it wouldn’t need a quantifier simply by virtue of being an ‘x’ variable. Also, as is the case currently, there would be no EP introduced where it is an ARG0. It does not include dropping quantifiers for ‘x’ variables that are the ARG0 of some overt EP, like udef_q for the _cat_n_1 EP above, as that is a separate proposal.

I think what you’re talking about is that separate proposal, i.e., not outputting what I’ve called “default” quantifiers (udef_q, proper_q, pronoun_q, etc. in the ERG). Your conditions look good, although I need some clarification about the last one (below). Paraphrasing to make sure I understood correctly, you can only drop them if they can be unambiguously reinserted, which requires that the scope tree is not partially or fully resolved and the quantifier’s predicate is predictable from its target (the predicate of the EP whose ARG0 is its bound variable). Yes?

Regarding your last point, you don’t mean BODY, do you? And I don’t fully follow your second paragraph. Are you saying that the restriction can select a scope containing referents and their modifiers (as in a scope-underspecified form), but Emily and David putting partially scope resolved structures in the MRS?

You’re right that I care about DMRS (and EDS) conversion, although I say that PyDelphin supports conversion for MRSs that follow some conditions, so it’s ok to not convert as long as I can accurately document those conditions. Also PyDelphin v1.0.0 makes steps towards scope resolution, and for this I’d like to properly account for all valid variations.

We’re in agreement that quantifiers of dropped arguments in vanilla cases should be possible to leave out (assuming no person_rel or pron_rel or thing_rel or the like is put in).

No, that’s not what I was talking about. My point instead was about situations that violate the intrinsic variable property. In some languages, the phrase that embodies an “x” argument of a verb can be headed by something other than a noun – in Ancient Greek you can say “the red” meaning “the red one” or “the eating” meaning “the one who is eating,” and my understanding is Nuuchahnulth and Wambaya both have comparable phenomena. One way of writing the semantics for situations like those is: exist_q(x2, RSTR: red(e1, x2)). I realize you are not actively proposing dropping quantifiers from cases like this, where the “NP” is overt, but the criterion you made (quantifiers must be explicit if BV is the ARG0 of some EP) would seem to license dropping the quantifier in these cases, which would be wrong. This problem/difference goes away if you only concern yourself with MRSes that obey the intrinsic variable property, but I think entangling the solution to dropping quantifiers of dropped arguments with the (quite) mildly controversial intrinsic variable property is unnecessary.

One more potential wrinkle is that argument dropping does not always mean we know nothing about the argument. The data that is most frequently available about the dropped argument generally shows up as variable properties on the ‘x’. Variable properties don’t get much attention, but formally I think they might be understood as one-place predications, e.g. “singular(x)” and “past-tense(e)”. In that point of view, their place in the scope tree would always be equal to the variable’s quantifier’s RSTR, whether said quantifier is implicit or explicit, so no problem arises. The potential wrinkle would be if some language happens to mark something about a dropped argument that doesn’t fit neatly into a variable property, and instead is best expressed as an EP. In that case, the EP’s label would have to be explicitly equated with the quantifier’s RSTR, thereby rendering an implicit quantifier problematic (how would you know whether the EP scopes under RSTR or BODY?). Whether a compelling situation of that sort arises or not in language is not something I want to make a claim about though.

Ok, sorry for misunderstanding and thanks for the clarification.

So this is like “The old are experienced.”, except that Dan handles (some) of these with special predications like with _old_n_1 (although it must be plural… “The old is familiar” doesn’t parse). So then, as I understand, in Nuuchahnulth, Wambaya, and Ancient Greek these are more productive and the grammar developers decided to not insert some generic_entity for the unspecified thing that has the stated property, leading to MRSs somewhat like this:

[ "The old are experienced."
  TOP: h0
  INDEX: e2 [ e SF: prop TENSE: pres MOOD: indicative PROG: - PERF: - ]
  RELS: < [ _the_q<0:3> LBL: h4 ARG0: x3 [ x PERS: 3 NUM: pl IND: + ] RSTR: h5 BODY: h6 ]
          [ _old_a_1<4:7> LBL: h7 ARG0: e8 [ e SF: prop TENSE: untensed MOOD: indicative PROG: bool PERF: - ] ARG1: x3 ]
          [ _experienced_a_in<8:20> LBL: h1 ARG0: e2 ARG1: x3 ARG2: i10 ] >
  HCONS: < h0 qeq h1 h5 qeq h7 > ]

…Where the EP that would have ARG0: x3 (i3 if using the prevailing convention) is not expressed and the RSTR of _the_q is the _old_a_1 EP rather than some instance EP. Assuming I’ve got it right this time, these are indeed interesting examples, and they are both appealing and worrying for the reasons you highlighted earlier.

Let me turn for a minute to the challenges these bring to the intrinsic variable property (“IVP” below) and how my working definition differs from Ann’s. From the Copestake 2009:

[…] for every variable, there is a unique EP which has that variable as its ARG0. I will assume for this paper that all EPs, apart from quantifier EPs, have such an ARG0.

Since “variable” was unclear here (apparently it doesn’t include those for dropped arguments), I adjusted the definition in Goodman 2018:

[…] every EP has an intrinsic variable and each intrinsic variable is unique to a single EP (excepting quantifiers, which do not have intrinsic variables).

By these definitions, the MRSs under discussion violate Ann’s IVP but not my version. Let’s break down the IVP into individual claims:

  1. Every (presumably ‘e’ or ‘x’) variable is the ARG0 of some non-quantifier EP
  2. Every non-quantifier EP has a (presumable ‘e’ or ‘x’) variable as its ARG0
  3. The ARG0 of each EP is not the ARG0 of any other EP

Ann uses claims (1) and (3) (although (2) is “assumed”), whereas I only use (2) and (3).

I’ll ignore the similar claims we could make about the bound variables of quantifiers, but briefly note the issue posed by the use of ‘i’ variables as both underspecified ‘x’/‘e’ variables and a dropped ‘x’. If ‘i’ is used only for the underspecified case we get rid of one problem; we just adjust (1) and (2) to drop the hedging and specify ‘i’, ‘e’, or ‘x’ variables. But we still don’t have an answer for dropped arguments without posing a new variable type (‘d’, as I proposed at Cambridge) or changing the IVP. The agreement at Cambridge to use ‘x’ for dropped arguments forces us (Ann, rather) to abandon claim (1) above, leading the following revised IVP:

  1. Every overt non-quantifier EP has an individual variable (‘i’, ‘e’, or ‘x’) as its ARG0
  2. Every individual variable that is not the ARG0 of an overt EP is assumed to be the ARG0 of some covert (or unexpressed) EP
  3. The ARG0 of each EP is not the ARG0 of any other EP

I imagine that (2) might be somewhat controversial as it hypothesizes the presence of unknown entities. Also maybe it should be more specific and pertain only to ‘x’ variables as I’m not sure that ‘e’ arguments can be dropped. Dan said, at least, that propositions cannot be dropped (mentioned earlier in this thread).

Still, as you say, the possibility of “variable properties” on dropped arguments (also reentrancies that target them) is a complication. In good ol’ MRS the variable suffices as the locus of these relationships, but in variableless DMRS the unexpressed nodes must be made overt to encode the same information. Where there are no properties or reentrancies necessitating the presence of these new nodes, they may be dropped with the only consequence being that the arity of the predications taking them as arguments becomes opaque, thus requiring a SEM-I for reconstitution, but this is a consequence we have already willfully accepted in DMRS.

Ok, so with this in mind, you’re saying that in the MRS above (“old are experienced”), _the_q should not be dropped because (a) it is not a bland (or default) quantifier and (b) its RSTR selects a proposition?

Yes, that is what I am saying. Thanks for putting together a concrete MRS to talk about :-). I’m not fluent enough in DMRS to be able to easily evaluate some of what you said, although it certainly looks reasonable to me. Regarding the various aspects of IVP, I follow your story and see what you mean about the revised version of it not conflicting here so badly. I should maybe go on record that I also have mild hesitations about (3) “The ARG0 of each EP is not the ARG0 of any other EP,” but not for nearly as principled reasons, and it’s orthogonal to the present discussion.

I think positing the existence of “covert EPs” requires some thought. To the extent that quantifiers are EPs (questionable), the whole thread is already assuming such beasts exist. Covert EPs sound to me a bit like a band of monsters hiding in Pandora’s box ready to attack at a moment’s notice. It’s saying the meaning of a sentence is more than what it says. If you allow pragmatics into the discussion, then it’s a true claim, of course, but we generally try to capture sentence meaning rather than speaker meaning. If we were to open that box I think we would need some pretty good fences in place to contain the chaos.

Yes I forgot that this one can be controversial for Joshua, Sanghoun, etc. for their valence-changing affixes, causatives, serial verbs, etc. We even discussed it briefly at Cambridge.

What is perhaps relevant is the cause of the tension here. I think this is due to us overloading ARG0s to mean both the quantified variables (for instances) or event variables (for eventualities) and EP/node identifiers. RMRS has “anchors” which serve the latter purpose, but MRS has no such mechanism. If we had some other way to identify EPs in MRS (or define some principle of when in composition they may no longer be identifiable, one that is compatible with the kinds of analyses we want to make), then I would have less concern about reusing ARG0s. As it stands, though, this condition is pretty much necessary if you want to be able to convert to a dependency representation.

Yes, this is what I anticipated. I definitely don’t think we should allow grammars to construct imaginary castles and forests on top of simple graphs, but only for the constrained case where we know there is something lurking in the shadows, and we may even know something specific about it, but not its name. If you’ll allow me to change metaphors from fantasy to fancy feasts, we don’t set the table for every empty seat, but we might set an extra appropriately when a guest says their one-armed vegetarian friend might turn up for dinner.

I’ve been tempted to stop calling them EPs (maybe EQs, or just Qs) because they are treated so differently. And yet, they have a label and may be modified, so they share some properties with EPs. And scopal arguments on non-Q EPs also affect the scope tree and insert HCONs, so there is overlap both ways.

Agreed. But also it’s high time we start modeling discourse properly. Maybe what we learn there can help us contain the chaos we now face.