Difference between 2018 and 2020 ERG: ARG1 of _be_v_id

After switching to the 2020 ERG grammar, I’ve noticed that _be_v_id changed how it is using ARG1. In 2018 ARG1 was the subject of the sentence in the following two phrases. In 2020, it depends on the sentence.

  • “Which color is the door?” 2018 puts puts “door” in ARG1 of _be_v_id. 2020 puts “color”.
  • “Are you a fireman?” 2018 puts “you” in ARG1 of _be_v_id. So does 2020.

The 2018 behavior was preferred for me since ARG1 was consistently the subject which made reporting errors nice. Which year is “right”? should ARG1 be something consistent or was that just a coincidence in 2018?

2018 grammar readings

"Which color is the door?"

[ TOP: h0
INDEX: e2
RELS: < 
[ _which_q LBL: h4 ARG0: x5 [ x PERS: 3 NUM: sg ] RSTR: h6 BODY: h7 ]
[ _color_n_1 LBL: h8 ARG0: x5 [ x PERS: 3 NUM: sg ] ]
[ _the_q LBL: h9 ARG0: x3 [ x PERS: 3 NUM: sg IND: + ] RSTR: h10 BODY: h11 ]
[ _door_n_1 LBL: h12 ARG0: x3 [ x PERS: 3 NUM: sg IND: + ] ]
[ _be_v_id LBL: h1 ARG0: e2 [ e SF: ques TENSE: pres MOOD: indicative PROG: - PERF: - ] ARG1: x3 ARG2: x5 ]
>
HCONS: < h0 qeq h1 h6 qeq h8 h10 qeq h12 > ]
"Are you a fireman?"

[ TOP: h0
INDEX: e2
RELS: < 
[ pronoun_q LBL: h6 ARG0: x3 [ x PERS: 2 IND: + PT: std ] RSTR: h7 BODY: h8 ]
[ pron LBL: h5 ARG0: x3 [ x PERS: 2 IND: + PT: std ] ]
[ _a_q LBL: h9 ARG0: x4 [ x PERS: 3 NUM: sg IND: + ] RSTR: h10 BODY: h11 ]
[ _fireman_n_1 LBL: h12 ARG0: x4 [ x PERS: 3 NUM: sg IND: + ] ]
[ _be_v_id LBL: h1 ARG0: e2 [ e SF: ques TENSE: pres MOOD: indicative PROG: - PERF: - ] ARG1: x3 ARG2: x4 ]
>
HCONS: < h0 qeq h1 h7 qeq h5 h10 qeq h12 > ]

2020 grammar readings

"Which color is the door?"

[ TOP: h0
INDEX: e2
RELS: < 
[ _which_q LBL: h4 ARG0: x3 [ x PERS: 3 NUM: sg ] RSTR: h5 BODY: h6 ]
[ _color_n_1 LBL: h7 ARG0: x3 [ x PERS: 3 NUM: sg ] ]
[ _the_q LBL: h9 ARG0: x8 [ x PERS: 3 NUM: sg IND: + ] RSTR: h10 BODY: h11 ]
[ _door_n_1 LBL: h12 ARG0: x8 [ x PERS: 3 NUM: sg IND: + ] ]
[ _be_v_id LBL: h1 ARG0: e2 [ e SF: ques TENSE: pres MOOD: indicative PROG: - PERF: - ] ARG1: x3 ARG2: x8 ]
>
HCONS: < h0 qeq h1 h5 qeq h7 h10 qeq h12 > ]
"Are you a fireman?"

[ TOP: h0
INDEX: e2
RELS: < 
[ pronoun_q LBL: h6 ARG0: x3 [ x PERS: 2 IND: + PT: std ] RSTR: h7 BODY: h8 ]
[ pron LBL: h5 ARG0: x3 [ x PERS: 2 IND: + PT: std ] ]
[ _a_q LBL: h9 ARG0: x4 [ x PERS: 3 NUM: sg IND: + ] RSTR: h10 BODY: h11 ]
[ _fireman_n_1 LBL: h12 ARG0: x4 [ x PERS: 3 NUM: sg IND: + ] ]
[ _be_v_id LBL: h1 ARG0: e2 [ e SF: ques TENSE: pres MOOD: indicative PROG: - PERF: - ] ARG1: x3 ARG2: x4 ]
>
HCONS: < h0 qeq h1 h7 qeq h5 h10 qeq h12 > ]

Are these the top-ranked parses only? I would expect both readings to be possible, but the parse-ranking model may have changed which one is ranked at the top. English grammar makes copula questions ambiguous:

the door is ??? — which X is the door? / what is the door?
??? is the door — which X is the door? / what is the door?

But they have different word order as embedded questions:

we know the door is ??? — we know which X the door is / we know what the door is.
we know ??? is the door — we know which X is the door / we know what is the door.

Yep, that appears to be it. Those were the top ranked parses only, and, when I look down through the lower ranked ones, I see the alternative. Looks like it is just the ranking that has changed. It didn’t occur to me that the question itself could be ambiguous in this way. Guess I should have guessed that by now.

Hmmm. This means that reporting errors just got way harder for me. For example, if the door doesn’t have a color, the answer to “Which color is the door?” is one of:

  1. “A color is not the door”
  2. “The door is not a color”

Your examples make me realize that both are actually valid responses (before I considered the first to be a bug), it is just that the first is much less likely to be the “right error” in this example since colors are very rarely doors, but doors can often be colors.

Makes me realize yet again how impressively the ranking has been working, since, if nothing “succeeds”, I report the first error, which only makes sense if ranking is doing a good job.

While I could certainly try to build in some logic to report the right error, I think this might better fall into the category of “need the ranking engine to change to fix this problem”. I haven’t hit many of those. The other that comes to mind from the 2018 grammar was the propensity to interpret “what do you see on the table?” as “what do you see [while you are] on the table?” as the primary parse, which was almost never right for my scenarios at least.

1 Like