I asked this question previously in a thread with multiple questions, so to make each thread address only one topic I’m reposting it as multiple threads.
I have an MRS that contains a subject relative clause. I want it to generate strings like “the person who ate the cookie.”
[ TOP: h11
INDEX: x12 [ x NUM: sg ]
RELS: < [ _eat_v_1 LBL: h11 ARG0: e1 [ e TENSE: tensed ] ARG1: x12 ARG2: x5 [ x NUM: sg ] ]
[ def_udef_a_q LBL: h10 ARG0: x5 RSTR: h8 BODY: h9 ]
[ _cookie_n_1 LBL: h6 ARG0: x5 ]
[ _person_n_1 LBL: h11 ARG0: x12 ] >
HCONS: < h8 qeq h6 > ]
Here is a sample of the strings I get out:
The person that the cookie was eaten by
The person that the cookie was eaten by.
The person which the cookie was eaten by
The person by who the cookie was eaten.
The person by which the cookie was eaten.
The person by who the cookie was eaten
The person by which the cookie was eaten
The person who the cookie was eaten by
The person which the cookie was eaten by.
The person who the cookie was eaten by.
The person who ate the cookie
The person the cookie was eaten by
The person that ate the cookie
The person the cookie was eaten by.
The person which ate the cookie
The person who ate the cookie.
The person that ate the cookie.
The person which ate the cookie.
Is there a way to get rid of the ones with “The person which”? Or is there no animacy in the ERG?
Dan answered this question in his response on the other thread, which I will paste here:
Regarding your third question, about distinguishing “who” from “which” relative clauses, the difference is not one of animacy, but about the human/nonhuman contrast. Note that animals are animate, but (apart from dearly beloved pets) occur with the relative pronoun “which”, not “who”. Unfortunately, the ERG does not yet have a useful treatment of this lexical-semantics property, though I hope to do better once we have finished adding WordNet senses to the ERG lexicon, since we can then do a better job of distinguishing human from non-human senses of nouns at scale. For example, the noun “kid” currently has only one lexical entry, but when enriched from WordNet, there are two, one in the synonym set with “child” and one in the set with “baby goat”, where the class of “child” nouns (and similar human-denoting classes) would have their semantic index be marked as [SORT human] while most other noun classes would be marked [SORT nonhuman].
But at present the only way you might have to restrict generator outputs would be to restrict yourself to a small lexicon of nouns, and manually add the [SORT human] or [SORT nonhuman] constraint to each noun lexical entry. You would also need to add [SORT nonhuman] to the lexical entry for the relative pronoun “which”, though the entry for “who” is already correctly constrained. If you’re working with a bounded vocabulary for your task, this approach might be manageable, but I think it’s the only avenue available for now.