Creating generic entries for the SRG

Many thanks for my colleagues for all the help with setting up the SRG generic machinery so far. (See also Handling UNKnown words with ACE+SRG for general discussion.)

I now have a grammar which includes token mapping additions (copied from jacy) as well as a file with some definitions for generics, and some definitions for lexical filtering rules. For the most part, these things are just copied wholesale from jacy, although I did have to remove a couple of things from tmt.tdl.

If I do not include the generics, the grammar parses sentences which do not contain unknown words, as expected. Yay!

If I include a generic entry as follows (note the commented out TOKENS constraints):

v_np_ge := v_np*_le &
[ STEM < "v_np*_le" >,
  ;TOKENS.+LIST generic_token_list & 
  ;              < [ +POS.+TAGS  < "vmip3p0" > ] >,
  SYNSEM.LKEYS.KEYREL [ PRED "_generic_v_rel" ] ].

…then I get parses for sentences with and without unknown words (progress!)

But ultimately I need to include the TOKENS constraints, in order to eventually do lexical filtering. If I uncomment the TOKENS constraints above, I again start getting the “Lexical Gap” error from ACE, so, it cannot find the word. (Sentences without unknown words still parse.)

TOKENS are define as follows:

lex-item :+
This links the CFROM and CTO to the position in the orthography.
OZ 15-Nov-2022: I am taking out CTO and CFROM constraints for now
because they break compilation.
[ TRAITS #traits,
  STEM [ FROM #from,
	 TO #to ],
	;	        CTO #to	],
	;	           CTO #to	],
  TOKENS tokens &
	 [ +LIST #traits &
		 [ FIRST.+FROM #from ], 
	   +LAST.+TO #to ] ].

The problem in this case (soved by @sweaglesw ) was the mismatch in capitalization between the grammar and the YY-input. LUI was hiding this difference, and in the end, the unification failure pointed to the POS feature but with identical values (from which it was possible to guess that there is a difference in capitalization in the underlying data).