Debugging SRG generic lexical entry updates

(See also Handling UNKnown words with ACE+SRG for more general discussion).

I am trying to work from example (using jacy) to intruduce token mapping and properly working generic lexical entries to SRG.

It’s a bit complex because there are many files and features involved, and they are handled slightly differently in e.g. ERG and jacy, so I may have mixed up some instructions and advice from various sources.

After I made modifications to the grammar (described below), I started getting the following error compiling the grammar with ACE. The error looks completely unrelated to generic entries, so it must be something I messed up in the feature geometry maybe:

ace -g ace/config.tdl -G srg-generics.dat
reading configuration       from `ace/config.tdl'
grammar version             SRG (1007)
reading grammar             from `ace/../srg.tdl'
reading lexical-filtering-rulefrom `ace/../lfr.tdl'
reading types               from `ace/../fundamentals.tdl'
reading types               from `ace/../hdtypes.tdl'
reading types               from `ace/../letypes.tdl'
reading types               from `ace/../irtypes.tdl'
reading types               from `ace/../lrtypes.tdl'
reading types               from `ace/../srtypes.tdl'
reading types               from `ace/../tmt.tdl'
reading token-mapping-rule  from `ace/../tmr/prelude.tdl'
reading token-mapping-rule  from `ace/../tmr/pos.tdl'
reading token-mapping-rule  from `ace/../tmr/finis.tdl'
reading lexical entries     from `ace/../lexicon.tdl'
reading generic-lex-entry   from `ace/../generics.tdl'
reading rules               from `ace/../srules.tdl'
reading lexical rules       from `ace/../lrules.tdl'
reading lexical rules       from `ace/../inflr.tdl'
reading instance            from `ace/../labels.tdl'
reading instance            from `ace/../roots.tdl'
checking for glbs...        0.17 sec
processing constraints...   well-formedness: no type is compatible with features and type of `clitic-synsem'
	while processing `n_-_pr-impers-se_lex'
ace: type.c:562: constrain_glbs_and_wellformedness: Assertion `wellform_result == 0' failed.
Aborted (core dumped)

Looking at n_-_pr-impers-se_lex, it’s this:

n_-_pr-impers-se_lex := clitic-lex & 
  [ SYNSEM clitic-synsem & 
           [ LOCAL [ AGR.PRONTYPE impers,
                     CAT.HEAD.CASE none ] ] ].

Clitic types:

; --- Clitics
; - depend phonologically on the verb and can't appear alone; can't appear in a coordination nor be 
; selected by identity (e.g. *Juan lo y la trajo; *Juan la lavó y regaló); may produce phonological 
; processes in the verb (e.g. sentad+os => sentaos); when appear in a sequence, this can't be interrupted 
; (e.g. *lo puede darme)

clitic-lex := norm-zero-arg &
    SYNSEM [ PUNCT [ LPUNCT no_punct,
                     RPUNCT no_punct ],
             LOCAL [ COORD -,
                     COORD-STRAT zero,
                     CAT.HEAD noun ] ] ].

; clitic-synsem := lex-synsem &
clitic-synsem := non-canonical &
  [ LOCAL [ AGR #index,
            CAT [ MC na,
                  HEAD noun &
                      [ PRD non-prd,
                        MOD < >,
                        KEYS [ KEY pron_rel,
                               ALT2KEY #alt2key ] ],
	          VAL [ SUBJ < >,
                        COMPS < >,
                        SPEC < >,
                        SPR < >,
                        CLTS < > ] ] ,
            CONT [ HOOK [ LTOP #nhand,
                          INDEX #index,
                          XARG #nhand ],
                   RELS <! [ PRED pron_rel,
                             ARG0 #index,
                             LBL #nhand ], 
                           [ PRED #alt2key & pronoun_q_rel,
                             ARG0 #index,
                             RSTR #phand ] !>,
                   HCONS <! qeq & [ HARG #phand,
                                    LARG #nhand ] !> ] ],
    NON-LOCAL [ SLASH 0-dlist,
                REL 0-dlist,
                QUE 0-dlist ] ]. 

What should I be looking for? Again, I doubt the clitic type needs to be changed; most likely something much higher up in the hierarchy? At the level of non-canonical, maybe?

The modifications I made to the grammar:

  1. Added tmt.tdl from jacy
  2. From the folder called tmr in jacy, I added the files: prelude.tdl, pos.tdl, and finis.tdl
  3. In srg.tdl, I added:
:begin :instance :status lexical-filtering-rule.
:include "lfr.tdl".
:end :instance.

:include "tmt.tdl".

:begin :instance :status generic-lex-entry.
:include "generics.tdl".
:end :instance.
:begin :instance :status token-mapping-rule.
   :include "tmr/prelude".
   :include "tmr/pos".
   ;:include "tmr/pos-ipa".
   :include "tmr/finis".
:end :instance.
  1. In ace/config.tdl:
token-mapping := enabled.

lexicon-tokens-path := TOKENS +LIST.
lexicon-last-token-path := TOKENS +LAST.
token-type      := token.
token-form-path     := +FORM.       ; [required] string for lexical lookup
token-id-path       := +ID.         ; [optional] list of external ids
token-from-path     := +FROM.       ; [optional] surface start position
token-to-path       := +TO.         ; [optional] surface end position
token-postags-path  := +POS +TAGS.  ; [optional] list of POS tags
token-posprobs-path := +POS +PRBS.  ; [optional] list of POS probabilities
lattice-mapping-input-path := +INPUT.
lattice-mapping-output-path := +OUTPUT.
lattice-mapping-context-path := +CONTEXT.
lattice-mapping-position-path := +POSITION.
  1. Added a file called lfr.tdl, added just one entry there for now:
generic_non_ne+native_lfr := lexical_filtering_rule &
 [ +CONTEXT < [ TRAITS native_token_list ] >,
   +INPUT   < [ TRAITS generic_token_list ] >,
   +OUTPUT < >,
   +POSITION "I1@C1" ].
  1. In generics.tdl, commented everything out except, for now:
 v_np_ge := v_np*_le &
[ STEM < "_generic_v_rel" >,
  TOKENS.+LIST generic_token_list & 
                < [ +POS.+TAGS "vmip3p0" ], 
                    +PRED #pred >,
  1. In fundamentals.tdl:
 sign-min := avm &
  [ STEM orthog ].

orthog := cons &
  [ FROM string,
    TO string ].  

  1. In lexicon.tdl, for now, just modified one entry:
dormir_v := v_-_le & 
  [ STEM < "dormir" >, 
    SYNSEM.LKEYS.KEYREL.PRED "_dormir_v_rel",
    TRAITS native_token_list ].

Which of the above things could be causing the issue which manifests itself in compilation with the n_-_pr-impers-se_lex entry?..

Or regardless of the issue that manifests itself in this way, can someone spot some of the things I am not doing right, in the modifications above?

Woodley says:

t’s hard to tell exactly what’s going on from that message, but I wonder based on the rest of your message where the TRAIT feature is introduced?

One debugging strategy might be to add the changes one step at a time. I think tmt.tdl has most of the exciting feature geometry in it. Can you add just that file to your original SRG (without actually enabling token handling or writing any rules or referencing the new geometry in your core definitions) and still compile successfully?

We met today with Emily, Dan, Francis, and Alexandre and in the end narrowed it down excactly to tmt.tdl. We don’t yet know which part of it but we’ll find out :).


The problem is in this portion of tmt.tdl:

lex-item :+
This links the CFROM and CTO to the position in the orthography.
[ TRAITS #traits,
  STEM [ FROM #from,
	 TO #to ],
		        CTO #to	],
		           CTO #to	],
  TOKENS tokens &
	 [ +LIST #traits &
		 [ FIRST.+FROM #from ], 
	   +LAST.+TO #to ] ].

If I remove this part, the grammar compiles.

Specifically, the problem is here:

		        CTO #to	],
		           CTO #to	],

Both these sets of constraints (the KEYREL and the ALTKEYREL) need to be removed in order for the grammar to compile.

Are the features CFROM and CTO declared anywhere?


The features CFROM and CTO record in a predication the starting and ending character positions for the corresponding token in the sentence being parsed. You might want to add this often useful information into the SRG at some point, though it takes a little effort to make sure that every leaf lexical type assigns these values correctly. So for now it may be more convenient to delete these constraints in tmt.tdl for the SRG.


They are:

relation :+
  [ CFROM *top*,
    CTO *top*  ].

Okay, thanks, Dan!