Generic entries with inflections unexpectedly pass constant lrules

This is a continuation of a prior topic about token-mapping.

A problem that I cannot solve at present is that, for a generic lexical entry that allows both lrule and irule inflections, when I try to parse an inflected form of the LE (i.e. that requires some explicit suffix from an irule), LKB allows the form from the lrule version to pass and enter the phrase rules.

To be more explicit, I have the following type in gle.tdl:

generic_card_ne := numeral-adj-lex &
  [ STEM < "_generic_card_ne_" >,
    TOKENS.+LIST < [ +CLASS card_ne, +CARG #carg ] >, 
    SYNSEM.LKEYS.KEYREL.CARG #carg ].

Related tmr definition:

card_ne_2_tmr := basic_ne_tmt &
[ +INPUT < [ +FORM ^([0-9]+)(\-[a-w]+)?$ ] >,
  +OUTPUT < [ +CLASS card_ne, +CARG "${I1:+FORM:1}" ] > ].

And tmt definitions:

basic_ne_tmt := one_one_tmt &
[ +INPUT < [ +FORM #form, +CLASS no_class, 
	           +TRAIT [ +LB #lb, +RB #rb, +LD #ld, +RD #rd ],
             +PRED #pred, +CARG #carg] >,
  +OUTPUT < [ +FORM #form, +CLASS named_entity,
              +TRAIT [ +LB #lb, +RB #rb, +LD #ld, +RD #rd ],
              +PRED #pred, +CARG #carg ] >,
  +CONTEXT <> ].

basic_one_one_tmt := token_mapping_rule &
[ +INPUT.FIRST [ +ID #id, +FROM #from, +TO #to ],
  +OUTPUT.FIRST [ +ID #id, +FROM #from, +TO #to ] ].

one_one_tmt := basic_one_one_tmt &
[ +INPUT < [] >,
  +OUTPUT < [] >,
  +POSITION "O1@I1" ].

numeral-adj-lex can go through the following irules

pl_abs_attr_adj-suffix :=
%suffix (* -t)
pl_abs_attr_adj-lex-rule.

sg_erg_attr_adj-suffix :=
%suffix (* -p)
sg_erg_attr_adj-lex-rule.

pl_erg_attr_adj-suffix :=
%suffix (* -t)
pl_erg_attr_adj-lex-rule.

sg_ins_attr_adj-suffix :=
%suffix (* -mik)
sg_ins_attr_adj-lex-rule.

pl_ins_attr_adj-suffix :=
%suffix (* -nik)
pl_ins_attr_adj-lex-rule.

sg_incorp_real_v_attr_adj-suffix :=
%suffix (* -mik)
sg_incorp_real_v_attr_adj-lex-rule.

pl_incorp_real_v_attr_adj-suffix :=
%suffix (* -nik)
pl_incorp_real_v_attr_adj-lex-rule.

pl_incorp_npred_attr_adj-suffix :=
%suffix (* -t)
pl_incorp_npred_attr_adj-lex-rule.

;;;mod [TODO: choices]
sg_loc_attr_adj-suffix :=
%suffix (* -mi)
sg_loc_attr_adj-lex-rule.

pl_loc_attr_adj-suffix :=
%suffix (* -ni)
pl_loc_attr_adj-lex-rule.
;;;endmod

And the following lrules:

sg_abs_attr_adj-lex := sg_abs_attr_adj-lex-rule.

sg_incorp_npred_attr_adj-lex := sg_incorp_npred_attr_adj-lex-rule.

Now I try to parse “15-nik”. I can see from the LKB parse chart that the generic lexical entry with CARG “15” can pass pl_ins_attr_adj-suffix (-nik) and cannot pass e.g., pl_erg_attr_adj-suffix (-p), which is expected. However, surprisingly, it can pass the lrule sg_abs_attr_adj-lex and then directly enter a unary phrase rule without being killed. sg_abs_attr_adj-lex is not the daugher of any futher lexical rules (including both irules and lrules).

What should I do to rule out lrule applications that do not respect the original form with inflections? I cannot find a good example from ERG to start with, as numerals with a trailing s (like “1990s”) are directly captured in whole as plur_ne, and token-mapping classifies invented nouns in, e.g. “there are two 404s”, as a singular generic_proper_ne that goes through n_sg_ilr, not a unk with plural inflection.

The normal mechanism to force lexemes to undergo an inflectional rule is via the attribute INFLECTED, which will get the value infl-satisified after undergoing an inflectional rule. So each of your derivational rules needs to be sure to constrain the value of INFLECTED appropriately. If your lrules do not add affixes and should apply after inflectional rules, then they should inherit from the type non-affix-bearing, which constrains INFLECTED to be infl-satisfied, but these irules also need to constrain the DTR.INFLECTED value to be infl-satisfied (which will force the lexeme to first undergo an inflectional rule). If, on the other hand, these lrules should apply before the inflectional rules, then they should not inherit from non-affix-bearing, but still need to constrain the INFLECTED feature appropriately on both DTR and mother, so the result of such an lrule still fails to unify its INFLECTED value with infl-satisfied, and hence will be forced to undergo an inflectional rule before being able to participate in syntactic phrases.

Hi Dan, thanks for your reply, but I think the problem might lie somewhere else. The reason is that, the problem I’ve described never happens to inflections of my native lexical entries of the same lexical type – the lrules in question do not apply to them as expected, but they do unexpectedly get applied to the generic LEs, which is the source of my confusion.

For example, this is an LE in lexicon.tdl with the same lexical type:

atausiq := numeral-adj-lex &
  [ STEM < "atausiq" >,
    SYNSEM.LKEYS.KEYREL.CARG "1" ].

when I try “atausiq-nik”, sg_abs_attr_adj-lex never shows up in the chart. But for “15-nik” which should get the type generic_card_ne, a subtype of numeral-adj-lex without any modification regarding INFLECTED, it does.

I remember having similar issues with the interaction of inflectional rules and the ERG generic lexical entries when I first introduced them into the grammar. In the end I settled on making the generic entries always be fully inflected. This has the disadvantage of not allowing any lexeme-to-lexeme rules to apply to these generic entries, but it ensures that I don’t see the kind of unwanted interactions you’re reporting. I still think you could look more closely at the value of INFLECTED at each step of the derivation of the lexical edge going into the parse chart. You probably already know, but there is a graphical tool in LUI for viewing the token chart to inspect the sequence of application of chart-mapping rules. You can invoke it by starting LUI and then giving it the command
:break all
before entering a word or phrase to parse. To advance step-wise through the rule applications, type “q” to close the current display window and bring up the next. You can also type
:break
and then enter your word or phrase, which will cause the token mapping to proceed from the beginning and stop just after applying the rule .

Hope this helps.

Hi Dan,

Thanks for the reply. In fact I am not sure how to start LUI from a command line – it might be related to ACE, but probably my only previous experience with LUI was using (lui-initialize) in LKB, and I couldn’t find the equivalent of :break there (or maybe it is documented somewhere the LUI documentation page that I missed). Could you provide more details so I can figure it out?

Those colon commands to LUI are specific to ACE. In LKB, after (lui-initialize) you can get similar results to a subset of ACE’s colon commands. Some of these are listed at https://delph-in.github.io/docs/tools/AceLui/. For example, :c in ACE corresponds to Parse → Show parse chart in the LKB.

On that AceLui page there’s no mention of :break (I didn’t know about it), so thanks Dan for explaining it. In the LKB, you can get a trace of chart mapping rule applications by setting the variable *cm-debug* to a non-nil value (see my LKB Update 2022 talk slides). However, this trace shows different things to the ACE/LUI token chart debugging facility, and won’t give you values of features in chart edges.

Hi @johnca, thanks for the info, LKB started to show something after typing (setf *cm-debug* 3) for the sentence “15-nik kiak-vuq”:

I’m not sure how to interpret the information, but to me:

  • pl_ins-suffix is an inflecting LR that happens to share the same form “-nik”. It should not be attached to a numeral according to the position class constraints, but it also does not show up at the parse chart.

  • pl_ins_attr_adj-suffix and pl_incorp_real_v_attr_adj-suffix are two inflecting LRs that should appear in the parse chart (although one of them will be rejected by the syntex eventually), so I expect them to appear here.

  • The line Adding edge 9 for lexical entry generic_card_ne 15-nik looks like the reason why I unexpectedly get sg_abs_attr_adj-lex in the parse chart – it seems to consider “15-nik” as a whole to be the STEM of an LE, but that was not my intention, as only “15” should be long to the LE. Was my way of writing card_ne_2_tmr incorrect? Another possibility was that “15-nik” also passes the following default class tmr:

    default_class_tmr := one_one_tmt &
    
    [ +INPUT < [ +FORM #form, +TRAIT #trait, +CLASS no_class,
    
                 +PRED #pred, +CARG #carg ] >,
    
      +OUTPUT < [ +FORM #form, +TRAIT #trait, +CLASS non_ne,
    
                  +PRED #pred, +CARG #carg ] >,
    
      +CONTEXT < > ].
    

But I remember this was required for all the other non-generic words to finally appear in the parse chart, and I remember there are also similar tmr’s in ERG.

As for ACE, I am still exploring the proper way to enable token-mapping – it seems that something optional in the LKB script, such as token-postags-path, is required in ACE config, but for the moment I am not expecting to have POS information available from the morphological analyzer.