Missing freeling tags (or Updating SRG lexicon for latest version of freeling?)

Thanks to Luis @lmc, Francis @bond, and @Dan, we can now run ACE on SRG, as documented here.

The dependency for morphological analysis there is Freeling, and we are currently experimenting with using the latest version. There are certainly some mismatches between the grammar lexicon and this latest version of the morphological analyzer, so I expect I will need to update the lexicon in some ways.

Luis wrote a script that maps the output of Freeling to yy-mode, which can then be passed to ACE. The script can be found here.

I started using the script on some Spanish sentences beyond the sample sentence, and I am running into an error (which I think comes from ACE):

ERROR: yy-mode input requested non-existant rule PR0CN00

For now, I cannot figure out what requires updating with respect to this tag that ACE thinks is missing.

The input sentence here is Un lugar que no me gusta es mi antiguo instituto which I would expect to be covered by the SRG; the “missing” entry is for que which is the complementizer:

Un lugar que no me gusta es mi antiguo instituto.
Un uno DI0MS0 0.99698
lugar lugar NCMS000 1
que que PR0CN00 0.550139.     <<<<< THIS
no no RN 0.999297
me me PP1CS00 0.755196
gusta gustar VMIP3S0 0.99569
es ser VSIP3S0 1
mi mi DP1CSS 0.999325
antiguo antiguo AQ0MS00 0.994169
instituto instituto NCMS000 1

(I am noting the low probability; would that cause an issue somehow?.. But the output is there, so, I doubt it?..)

The mapping script builds a yy-mode string using the same tag that comes from freeling. For example, if the input to the script is El perro duerme, then the output will be:

'(1, 0, 1, <0:2>, 1, "el" "El", 0, "DA0MS0", "DA0MS0" 1) (2, 1, 2, <3:8>, 1, "perro" "perro", 0, "NCMS000", "NCMS000" 1) (3, 2, 3, <9:15>, 1, "dormir" "duerme", 0, "VMIP3S0", "VMIP3S0" 0.989241) '

That then is successfully parsed by ACE, which makes me think the SRG has all those tags somewhere. But I can’t figure out where? Searching for them doesn’t yield anything.

I am missing something fundamental about how this works. Could someone help?

One of the fields in YY mode is a (list of?) lexical rules that the grammar must apply to the token before it is licensed to appear as input to a syntactic rule. It sounds like that field is being employed here, and the rule requested — named PR0CN00 — does not exist. Try looking in the files that define lexical / orthographemic rules.

I don’t know Spanish, but I wouldn’t expect an Indo-European complementizer to be subject to any orthographemic rules. Possibly the script needs a case to handle tags that don’t correspond to any rule requirements? Just a guess.

1 Like

The lexical rule name should be PR0CN000; it seems to have got truncated. PP1CS00 and AQ0MS0 have also been similarly truncated.

I can successfully parse that sentence using the freelingSPPP binary in the logon distribution and LKB-FOS (having fixed a call to an external XML parsing library that seems to have changed its interface). Those lexical rules are a required part of the eventual analysis (see the screenshot below, in which I’ve just clicked the preterminal N node above “que”).

Screenshot 2022-08-19 at 10.26.50

1 Like

So the way to go is probably to update the SRG inflr.tdl to the newest version of freeling.

For example, in order for the lexical rule to be found, I need to change aq0ms0 to aq0ms00 below:

; -- qualitatives
; bonito
aq0ms0 := 
%suffix (aq0ms0 aq0ms0) 

My question is: shoud I change in in all three occurrences? It is not needed for the sentence to parse but seems more consistent?

I would think that changing it everywhere to match the newest version of Freeling would be best. Otherwise it may be difficult, for the naked eye, to notice subtle differences like two or three zeros in a tag – and make it difficult to troubleshoot in the future.