Token mapping rules for the SRG

I have a grammar which has: (1) token mapping types; (2) a lexical filtering rule; and (3) a couple of sample generic and native lexical entries as follows:

In generics.tdl:

v_np_ge := v_np*_le &
[ STEM < "v_np*_le" >,
  TOKENS.+LIST generic_token_list & 
                < [ +POS.+TAGS  < "vmip3p0" > ] >,
  SYNSEM.LKEYS.KEYREL [ PRED "_generic_v_rel" ] ].

In lexicon.tdl:

dormir_v := v_-_le & 
  [ STEM < "dormir" >, 
    SYNSEM.LKEYS.KEYREL.PRED "_dormir_v_rel",
    TRAITS native_token_list
     ].

This grammar behaves fine in the sense that it filters out the generic entry for something like

(1, 0, 1, <0:5>, 1, "ellos" "ellos", 0, "pp3mp00", "pp3mp00" 1) (2, 1, 2, <6:13>, 1, "dormir" "duermen", 0, "vmip3p0", "vmip3p0" 1)

Ellos duermen (they sleep) and successfully parses something like Ellos sobreandan (they float [over something]; the corresponding input in YY mode, as above), even though sobreandar is not in the lexicon.

However, I understand that eventually, I will need also the token mapping rules, such as the ones found here: srg/tmr at olzama-dev · delph-in/srg · GitHub

I copied those over from zhong, only three files, prelude, pos, and finis. If I actually include them in the grammar, it stops parsing sentences.

I think I can spot that it loses the POS information for the tokens, the list of TAGS is empty:

image

Instead, it should look like this in order to further unify with the POS-tag-specific “inflectional rule” and then further go into the head-subject rule (this is from the grammar which does not include token mapping rules):

Does anyone more familiar with how the token mapping rules work see what would cause the POS TAGS list to become empty after including the token mapping rules in the grammar?

Here’s some ACE standard output for Ellos duermen, which it ultimately cannot parse. All of them seem to have to do with the generics, so, maybe not relevant for the native entry, though the unknown word doesn’t get parsed either, in the corresponding example:

LATTICE-MAPPING: applying default_carg_tmr
LATTICE-MAPPING: applying default_carg_tmr
LATTICE-MAPPING: applying pos_terminate_tmr
LATTICE-MAPPING: applying pos_terminate_tmr
LATTICE-MAPPING: applying generic_pred_tmr
LATTICE-MAPPING: applying generic_pred_tmr
LATTICE-MAPPING: applying default_class_tmr
LATTICE-MAPPING: applying default_class_tmr
LATTICE-MAPPING: applying generic_non_ne+native_lfr
LATTICE-MAPPING: applying generic_non_ne+native_lfr
LATTICE-MAPPING: applying generic_non_ne+native_lfr
LATTICE-MAPPING: applying generic_non_ne+native_lfr
LATTICE-MAPPING: applying generic_non_ne+native_lfr
LATTICE-MAPPING: applying generic_non_ne+native_lfr
LATTICE-MAPPING: applying generic_non_ne+native_lfr
LATTICE-MAPPING: applying generic_non_ne+native_lfr
LATTICE-MAPPING: applying generic_non_ne+native_lfr
LATTICE-MAPPING: applying generic_non_ne+native_lfr
LATTICE-MAPPING: applying generic_non_ne+native_lfr
LATTICE-MAPPING: applying generic_non_ne+native_lfr
SKIP: (yy mode)

I just noticed that in fact when I am compiling the grammar with ACE, I get this warning:

WARNING: lattice mapping rule 'default_carg_tmr' had +CONTEXT *list* (not*null* or *cons*)
WARNING: suppressing more lattice mapping warnings

Probably relevant to the fact that the grammar doesn’t work with default_carg_tmr (and other tmr) included?

In interactive unification, I am getting no failures anywhere, including when dragging token edges onto the token portion of the lexemes. And yet I get a SKIP from ACE when parsing.

In ERG (and probably Jacy, where you got these token mapping rules), input tokens are presumed to come with a list of POS tags – i.e. possibly more than one guess from the tagger, with different probabilities. The token mapping rules take an input token that looks like, for example, [+TRAIT trait, +POS.+TAGS [ "JJ", "VBG" ] ], indicating a word that the tagger thought might be an ordinary adjective and might be a gerund (forgive me if my memory of PTB tags is imperfect… this is just by way of explanation). The rules have the effect of converting this into three new tokens, looking something like this: [+TRAIT generic, +POS.+TAGS ["JJ"] ], [+TRAIT generic, +POS.+TAGS ["VBG"] ], [+TRAIT native, +POS.+TAGS null ] ]. Next, lexical lookup is applied to all the tokens. Native lexical entries stipulate [+TRAIT native] and therefore only fire for (at most) one of the three tokens. Generic entries stipulate [+TRAIT generic] as well as a specific POS tag, and therefore also fire for (at most) one of the tokens. If both generic and native lexical entries are licensed, then the lexical filtering rule phase eliminates the generics. As far as I can see that likely explains why you are seeing tokens with empty POS lists.

You can see what the token chart looks like after each rule application with :break all in LUI mode. That may help with understanding what each rule does.

I don’t see an error from ACE about a lexical gap in the output you provided. That suggests to me that there are lexical entries being licensed for every input token. Are you able to pull up a parse chart using :l after the parse fails?

1 Like

Hi Woodley @sweaglesw ! Thanks a lot for taking a look. Yesterday we had a zoom meeting, and @Dan noticed that the first rule in tmr/pos.tdl was essentially deleting the POS tags (turning the list into the null,
which somehow was what was needed in the jacy/zhong universe; I am not very clear on this part). Simply getting rid of that rule seems to have fixed things, as the next rule in that file carefully copied the tags (which is what I wanted).

1 Like

The previous post isn’t quite right: what is needed (thanks, @Dan ) is to completely exclude the files called prelude.tdl and pos.tdl. finis.tdl can be included in the modified form.

1 Like