Constraining MRSs to reduce ERG outputs

I have a number of questions about whether I can constrain certain MRSs that I have (with variable properties or otherwise) to reduce the number of outputs each the ERG generates for each one.

Note that for each MRS in order to actually generate results with the ERG I do a “preparation” step where I quantify the top level x variable and then add the unknown predicate to make the whole MRS have an e-type index. The MRSs I show don’t have this preparation step done for simplicity, but in order to actually get the strings that I show this step is applied.

Prevent modifiers between nouns in a compound noun

I have an MRS that I’m intending to use to generate strings like “the black car key.” I do get those results out, but I also get things like “the car black key.”

Here is the MRS:

[ TOP: h13                                                                       
   INDEX: x1                                                                      
   RELS: < [ _black_a_1 LBL: h13 ARG0: i14 ARG1: x1 ]                             
           [ compound LBL: h13 ARG0: e10 ARG1: x1 ARG2: x3 ]                      
           [ udef_q LBL: h8 ARG0: x3 RSTR: h6 BODY: h7 ]                          
           [ _car_n_1 LBL: h4 ARG0: x3 ]                                          
           [ _key_n_1 LBL: h13 ARG0: x1 ] >                                       
   HCONS: < h6 qeq h4 > ]  

When I parse the string “the black car key” with the ERG I get the one I expect:

but I also get others that don’t feel to me like they’re grammatical:

to be fair I don’t know what the syntax rules being shown actually mean, but I can’t personally accept “the [[black car] key]” as a noun phrase. Maybe another example would clarify what this construction is meant to be for, but my instinct is that a modifier in front of a compound noun would always be modifying the whole compound noun and not the just first member of the compound.

So my questions here are:

  1. Why is the parse where the modifier modifies only the non-head noun acceptable?
  2. Is there a way to constrain the MRS in such a way that it won’t generate strings like “the car black key”?

Constraining passive participles

The next MRS I have is one that I want to generate strings like “the locked car, west of the green bush.”

Here is the MRS:

[ TOP: h19                                                                       
   INDEX: x1                                                                      
   RELS: < [ loc_nonsp LBL: h19 ARG0: i16 ARG1: x1 ARG2: x13 ]                    
           [ def_implicit_q LBL: h30 ARG0: x13 RSTR: h28 BODY: h29 ]              
           [ _west_a_1 LBL: h15 ARG0: i12 ARG1: x13 ARG2: x22 ]                   
           [ def_udef_a_q LBL: h25 ARG0: x22 RSTR: h23 BODY: h24 ]                
           [ _green_a_2 LBL: h11 ARG0: i9 ARG1: x22 ]                             
           [ _bush_n_1 LBL: h11 ARG0: x22 ]                                       
           [ place_n LBL: h15 ARG0: x13 ]                                         
           [ _lock_v_cause LBL: h19 ARG0: e3 ARG1: i4 ARG2: x1 ]                  
          [ _car_n_1 LBL: h19 ARG0: x1 ] >                                       
   HCONS: < h23 qeq h11 h28 qeq h15 > ]    

In addition to the strings I want, I am also getting things like the following:

The car locked west of the green bush  
The car to lock west of the green bush  
The car west of the green bush locked  
The car west of the green bush to lock   
  1. Is there a way to eliminate the “to lock” ones? I’m hoping there’s some tense constraint I can use.
  2. Is there a way to prevent the modifier of car from moving after it? This seems to be similar to the issue I was having with the MRS in the previous example

Constraining subject relative clauses

The last MRS I have (for this post…) contains a subject relative clause. I want it to generate strings like “the person who ate the cookie.”

[ TOP: h11
  INDEX: x12 [ x NUM: sg ]
  RELS: < [ _eat_v_1 LBL: h11 ARG0: e1 [ e TENSE: tensed ] ARG1: x12 ARG2: x5 [ x NUM: sg ] ]
          [ def_udef_a_q LBL: h10 ARG0: x5 RSTR: h8 BODY: h9 ]
          [ _cookie_n_1 LBL: h6 ARG0: x5 ]
          [ _person_n_1 LBL: h11 ARG0: x12 ] >
  HCONS: < h8 qeq h6 > ]

Here is a sample of the strings I get out:

The person that the cookie was eaten by
The person that the cookie was eaten by.
The person which the cookie was eaten by
The person by who the cookie was eaten.
The person by which the cookie was eaten.
The person by who the cookie was eaten
The person by which the cookie was eaten
The person who the cookie was eaten by
The person which the cookie was eaten by.
The person who the cookie was eaten by.
The person who ate the cookie
The person the cookie was eaten by
The person that ate the cookie
The person the cookie was eaten by.
The person which ate the cookie
The person who ate the cookie.
The person that ate the cookie.
The person which ate the cookie.

Is there a way to get rid of the ones with “The person which”? Or is there no animacy in the ERG?

Thanks in advance :slight_smile:

Hi @ecconrad ,

Lots here, but some thoughts:

  1. I, too, am surprised at The car black key as licensed output. @Dan what are the examples where we want to allow the head noun in a noun-noun compound to be modified (other than by another noun-noun compound pre-head noun)?

  2. For the locked examples, there seem to be two separate things going on:

    1. locked v. to lock – the question here is whether these can be distinguished at the MRS level
    2. The locked modifier showing up to the right of the head noun. Is this because it is in fact being modified by the west of thing, rather than both being modifiers of the noun? Or is there something else behind that ordering?

Emily

First, on modifiers of non-head nouns within a noun-noun compound:

Compounds with the structure [[Adj N] N] are pretty common, as in these examples

open source software
big cat sanctuary
new car smell
third place finish
private interest conflict

But since the semantics of [[Adj N] N] is different from [Adj [N N], I don’t think the existence of this alternative analysis of Adj N N strings should get in your way when generating.

For the more objectionable [[N Adj] N] generator outputs like your “car black key”, they will only appear if you gave the generator the less likely MRS for “black car key” where “black” is modifying “car” instead of “key”. But if you want to protect yourself from these odd-sounding outputs in general, you could choose to block the two syntactic rules that license the [[N Adj] N] structures so the generator won’t use them. You do this by adding the following two lines to the end of the file erg/lkb/nogen-rules.set:

n-j_j-cpd_c
n-j_j-t-cpd_c

Then once you recompile the grammar (for ACE) or reload it (for the LKB), you won’t see such outputs from the generator even if the MRS was not quite what you intended. for “black car key”. Note that these two rules are needed for parsing, to account for examples like the following from corpora, often but not always with a hyphen connecting the first noun and the following adjective:

pain-free existence
toll free number
traffic-free road
state subsidized company
user friendly software
avalanche-safe site
water-repellent coat
school-internal shuttle
cost-effective solution
tax deductible contribution
family friendly beach
drought tolerant plants
world-famous actor
color blind policy
power hungry politician
rock steady grip
coal-black night
care-free existence
fur-clad ancestors

While I think these N-Adj nouns should have a somewhat different semantics from the corresponding Adj+N ones, I so far have not landed on a satisfying MRS for them, so the MRSs are sadly still the same, which is why the N-Adj ones get generated from the Adj+N ones. Hence it seems okay for you to block the two rules above for the generator.

On your second topic, constraining passive participles, I think we can do a little better, though the solution is only partly theoretically sound. Since the passive lexical rule has an information-structure effect, there is an ICONS “topic” constraint linking the index of the passivized object with the event of the verb, so when you add this constraint to the input MRS for the generator, you’ll only see passive relative clauses, not the infinitival ones. So in your MRS for “the locked car …” you can add this line after the HCONS constraints;

ICONS: < e3 topic x1 >

where “e3” is the ARG0 of the “lock” EP and “x1” is the ARG2 of that EP. With this added constraint, you won’t generate “the car to lock”.

In order to also block generation of “the car locked”, we have to resort to a less satisfying solution which takes advantage of a slight difference in the MRSs of “locked car” and “car locked”, namely that the grammar constrains the aspect of the ARG0 of “locked” when it’s pre-nominal to be [PERF -] (non-perfect), but for no deep reason fails to assert this of “locked” when it’s a post-nominal modifier. So if you add [PERF -] to that ARG0 in your MRS input to the generator, you won’t get the output “car locked” because the generator has to produce an output which accounts for every element of the input MRS, and the generated structure for “car locked” won’t say anything about the value of PERF for the ARG0 of “lock”. That is, in your example MRS, the EP for lock_v_cause should read

[ _lock_v_cause LBL: h19 ARG0: e3 [e PERF: -] ARG1: i4 ARG2: x1 ]

With these two additions to the MRS you started with, you should see the behavior you want. for passive participles. Note that while “the car locked” doesn’t sound great, this post-modifier structure is not infrequent, as you see in examples such as “the papers rejected were still of good quality” and “the topics discussed cover a wide range of concerns”. I don’t yet have an account of how the unhappy “car locked” is different from the fine “topics discussed”, so the grammar doesn’t distinguish them.

Regarding your third question, about distinguishing “who” from “which” relative clauses, the difference is not one of animacy, but about the human/nonhuman contrast. Note that animals are animate, but (apart from dearly beloved pets) occur with the relative pronoun “which”, not “who”. Unfortunately, the ERG does not yet have a useful treatment of this lexical-semantics property, though I hope to do better once we have finished adding WordNet senses to the ERG lexicon, since we can then do a better job of distinguishing human from non-human senses of nouns at scale. For example, the noun “kid” currently has only one lexical entry, but when enriched from WordNet, there are two, one in the synonym set with “child” and one in the set with “baby goat”, where the class of “child” nouns (and similar human-denoting classes) would have their semantic index be marked as [SORT human] while most other noun classes would be marked [SORT nonhuman].

But at present the only way you might have to restrict generator outputs would be to restrict yourself to a small lexicon of nouns, and manually add the [SORT human] or [SORT nonhuman] constraint to each noun lexical entry. You would also need to add [SORT nonhuman] to the lexical entry for the relative pronoun “which”, though the entry for “who” is already correctly constrained. If you’re working with a bounded vocabulary for your task, this approach might be manageable, but I think it’s the only avenue available for now.