The erg config-dict.tdl vs config.tdl

I am using the ace/config-dict.tdl for parsing wordnet definitions. I am not making any strong claim, but I didn’t see much difference compared to config.tdl for the cases I see so far. For instance, for the definition of nag:

% ace -g erg-dict.dat -Tf1
an old or over worked horse
SENT: an old or over worked horse
[ LTOP: h0
INDEX: e2 [ e SF: prop-or-ques ]
RELS: < [ unknown<0:27> LBL: h1 ARG0: e2 ARG: x4 [ x PERS: 3 NUM: sg IND: + ] ]
 [ _a_q<0:2> LBL: h5 ARG0: x4 RSTR: h6 BODY: h7 ]
 [ _old_a_1<3:6> LBL: h8 ARG0: e9 [ e SF: prop TENSE: untensed MOOD: indicative PROG: bool PERF: - ] ARG1: x4 ]
 [ _or_c<7:9> LBL: h8 ARG0: e10 [ e SF: prop TENSE: untensed MOOD: indicative PROG: bool PERF: - ] ARG1: e9 ARG2: e11 [ e SF: prop TENSE: untensed MOOD: indicative PROG: bool PERF: - ] ]
 [ _over_p_dir<10:14> LBL: h8 ARG0: e12 [ e SF: prop ] ARG1: i13 ]
 [ _work_v_1<15:21> LBL: h8 ARG0: e11 ARG1: i14 ARG2: x4 ]
 [ _horse_n_1<22:27> LBL: h8 ARG0: x4 ] >
HCONS: < h0 qeq h1 h6 qeq h8 >
ICONS: < e11 topic x4 > ]
NOTE: 1 readings, added 2078 / 625 edges to chart (173 fully instantiated, 189 actives used, 146 passives used)	RAM: 7665k

and

% ace -g erg.dat -Tf1
an old or over worked horse
SENT: an old or over worked horse
[ LTOP: h0
INDEX: e2 [ e SF: prop-or-ques ]
RELS: < [ unknown<0:27> LBL: h1 ARG0: e2 ARG: x4 [ x PERS: 3 NUM: sg IND: + ] ]
 [ _a_q<0:2> LBL: h5 ARG0: x4 RSTR: h6 BODY: h7 ]
 [ _old_a_1<3:6> LBL: h8 ARG0: e9 [ e SF: prop TENSE: untensed MOOD: indicative PROG: bool PERF: - ] ARG1: x4 ]
 [ _or_c<7:9> LBL: h8 ARG0: e10 [ e SF: prop TENSE: untensed MOOD: indicative PROG: bool PERF: - ] ARG1: e9 ARG2: e11 [ e SF: prop TENSE: untensed MOOD: indicative PROG: bool PERF: - ] ]
 [ _over_p_dir<10:14> LBL: h8 ARG0: e12 [ e SF: prop ] ARG1: i13 ]
 [ _work_v_1<15:21> LBL: h8 ARG0: e11 ARG1: i14 ARG2: x4 ]
 [ _horse_n_1<22:27> LBL: h8 ARG0: x4 ] >
HCONS: < h0 qeq h1 h6 qeq h8 >
ICONS: < e11 topic x4 > ]
NOTE: 1 readings, added 2050 / 621 edges to chart (173 fully instantiated, 184 actives used, 144 passives used)	RAM: 7599k

Undoubtedly, the annoying thing here is the unknown predication that captures the fact that this is a sentence fragment. We can avoid this predication if we can somehow automatically complete the sentence.

% ace -g erg-dict.dat -Tf1
A nag is an old or over-worked horse
SENT: A nag is an old or over-worked horse
[ LTOP: h0
INDEX: e2 [ e SF: prop TENSE: pres MOOD: indicative PROG: - PERF: - ]
RELS: < [ _a_q<0:1> LBL: h4 ARG0: x3 [ x PERS: 3 NUM: sg IND: + ] RSTR: h5 BODY: h6 ]
 [ _nag_n_1<2:5> LBL: h7 ARG0: x3 ]
 [ _be_v_id<6:8> LBL: h1 ARG0: e2 ARG1: x3 ARG2: x8 [ x PERS: 3 NUM: sg IND: + ] ]
 [ _a_q<9:11> LBL: h9 ARG0: x8 RSTR: h10 BODY: h11 ]
 [ _old_a_1<12:15> LBL: h12 ARG0: e13 [ e SF: prop TENSE: untensed MOOD: indicative PROG: bool PERF: - ] ARG1: x8 ]
 [ _or_c<16:18> LBL: h12 ARG0: e14 [ e SF: prop TENSE: untensed MOOD: indicative PROG: bool PERF: - ] ARG1: e13 ARG2: e15 [ e SF: prop TENSE: untensed MOOD: indicative PROG: bool PERF: - ] ]
 [ _work_v_1<19:30> LBL: h12 ARG0: e15 ARG1: i16 ARG2: x8 ]
 [ _over-_a_1<19:30> LBL: h12 ARG0: e15 ARG1: e15 ]
 [ _horse_n_1<31:36> LBL: h12 ARG0: x8 ] >
HCONS: < h0 qeq h1 h5 qeq h7 h10 qeq h12 >
ICONS: < e15 topic x8 > ]
NOTE: 1 readings, added 2793 / 790 edges to chart (265 fully instantiated, 214 actives used, 194 passives used)	RAM: 12925k

So what is the expected behavior when using the config-dict.tdl? Isn’t it to precisely handle those fragments differently?

Hi,

I think it is easy to just look down the unknown_rel’s ARG, it is pretty consistent for nouns. In 2004, when we didn’t have a nice parser like pydelphin, we added a root that allowed NPs to head utterances directly <https://aclanthology.org/C04-1193.pdf>, but now I think it s easy to just look down the unknown_rel’s ARG, it is pretty consistent for noun fragments, and it gives somewhere for modifiers like “In golf, a kind of club” to anchor to.

1 Like

The semantics of fragments is the same for the standard grammar and for the one compiled with config-dict.tdl. The unknown_rel is introduced for fragments in order to provide a consistent entry point for the semantic graph regardless of whether the fragment is a nominal phrase, a modifier phrase, or a verbal phrase. The main difference with config-dict.tdl is the addition of rules for fragment types that are not ordinarily well-formed, such as the missing object in the definitions “to devour” or “to refer to”, and the missing determiner in a definition such as “flavor used in ice cream”.

2 Likes