I asked this question previously in a thread with multiple questions, so to make each thread address only one topic I’m reposting it as multiple threads.
I have an MRS that I’m intending to use to generate strings like “the black car key.” I do get those results out, but I also get things like “the car black key.”
Here is the MRS:
[ TOP: h13
INDEX: x1
RELS: < [ _black_a_1 LBL: h13 ARG0: i14 ARG1: x1 ]
[ compound LBL: h13 ARG0: e10 ARG1: x1 ARG2: x3 ]
[ udef_q LBL: h8 ARG0: x3 RSTR: h6 BODY: h7 ]
[ _car_n_1 LBL: h4 ARG0: x3 ]
[ _key_n_1 LBL: h13 ARG0: x1 ] >
HCONS: < h6 qeq h4 > ]
Is there a way to constrain the MRS in such a way that it won’t generate strings like “the car black key”?
In my original question I also asked about why we need parses like “[[black car] key]” because that didn’t sound grammatical to me. But Dan clarified in his response that that’s fairly common (e.g. “open-source software”). So “[[black car] key]” would refer to a key specifically for a black car.
An acceptable construction but unusual in the context of a car key.
Dan also answered the question about “car black key” in his response, which I will paste here:
For the more objectionable [[N Adj] N] generator outputs like your “car black key”, they will only appear if you gave the generator the less likely MRS for “black car key” where “black” is modifying “car” instead of “key”. But if you want to protect yourself from these odd-sounding outputs in general, you could choose to block the two syntactic rules that license the [[N Adj] N] structures so the generator won’t use them. You do this by adding the following two lines to the end of the file erg/lkb/nogen-rules.set:
n-j_j-cpd_c
n-j_j-t-cpd_c
Then once you recompile the grammar (for ACE) or reload it (for the LKB), you won’t see such outputs from the generator even if the MRS was not quite what you intended. for “black car key”. Note that these two rules are needed for parsing, to account for examples like the following from corpora, often but not always with a hyphen connecting the first noun and the following adjective:
pain-free existence
toll free number
traffic-free road
state subsidized company
user friendly software
avalanche-safe site
water-repellent coat
school-internal shuttle
cost-effective solution
tax deductible contribution
family friendly beach
drought tolerant plants
world-famous actor
color blind policy
power hungry politician
rock steady grip
coal-black night
care-free existence
fur-clad ancestors
While I think these N-Adj nouns should have a somewhat different semantics from the corresponding Adj+N ones, I so far have not landed on a satisfying MRS for them, so the MRSs are sadly still the same, which is why the N-Adj ones get generated from the Adj+N ones. Hence it seems okay for you to block the two rules above for the generator.
Thanks @Dan for responding on the other thread. I’m splitting the threads up now to make it easier.
You mention in your response that I should only generate “car black key” if the MRS I use for generation has “black” modifying “car” instead of “key,” but I’m fairly certain the MRS I gave the generator does have “black” modifying “key” (unless I’m misunderstanding):
[ TOP: h13
INDEX: x1
RELS: < [ _black_a_1 LBL: h13 ARG0: i14 ARG1: x1 ]
[ compound LBL: h13 ARG0: e10 ARG1: x1 ARG2: x3 ]
[ udef_q LBL: h8 ARG0: x3 RSTR: h6 BODY: h7 ]
[ _car_n_1 LBL: h4 ARG0: x3 ]
[ _key_n_1 LBL: h13 ARG0: x1 ] >
HCONS: < h6 qeq h4 > ]