Numbers in SRG

UPDATE: Numbers have nothing to do with it but I am not going to edit the entire post. Please scroll to the last posts for current state of affairs.

I am trying to understand how numbers are supposed to work in the SRG.

In the old grammar, here’s what I get:

More specifically, for tres:

With the new grammar, here’s what Freeling 4.0 is giving me:

(1, 0, 1, <0:4>, 1, "3" "tres", 0, "z", "z" 1) (2, 1, 2, <5:11>, 1, "perro" "perros", 0, "ncmp000", "ncmp000" 1) (3, 2, 3, <12:18>, 1, "ladrar" "ladran", 0, "vmip3p0", "vmip3p0" 1) (4, 3, 4, <19:20>, 1, "." ".", 0, "fp", "fp" 1)

I have of course the “inflectional rule”, similar to other ones, in the appropriate files all of which are loaded:

; -- numbers
z := 
%suffix (z z)

ne_ilr :=  infl-ltow-rule & 

When trying to parse the above sentence with ACE, I get:

NOTE: lexemes do not span position 0 `3'!

If I try to replace 3 with tres in the YY input, it’s the same result:

NOTE: lexemes do not span position 0 `tres'!

Now, in the old grammar, there is a file which apparently is used to overwrite some of the Freeling’s output, and some of the content of this file is related to numbers:

##Rearrangements to SPPP output fields
## Rule form is:
##     form lemma tag  =>  stem rule_id form
##  On the left hand side:
##    "form", "lemma", and "tag" are regular expressions.
##    "*" may be used to mean "anything".
##    For "form" and "lemma" complete match will be checked.
##    For "tag" prefix match will be used.
##    Symbol "!" preceding the regexp negates it.
##  On the right hand side:
##    "stem" may be "F" (form), "L" (lemma), "T" (tag), or any lowercase literal.
##    "rule_id" may be "F" (form), "L" (lemma), or "T" (tag).
##    "form" may be any combination of "F", "L", and "T". form/lemma/tag will be 
##           concatenated in the given order, separated by "#".
##  Rules are applied in order, until a match is found, thus, a last default
##  rule "* * *" is needed.

*             *  !(Z|W|NP|AO)  =>  L  T  F   ## stem=lema per tots excepte numeros, dates, NPs i AOs.
(un|una|uno)  *  Z             =>  F  T  FL  ## lema="un/o/a" per "un/o/a" amb tag Z (tenien lema="1")
*             *  *             =>  T  T  FL  ## stem=tag per la resta (numeros!="un/o/a", dates, NPs, AOs)

So, looks like maybe the tag Z should trigger some spelling change? But I don’t understand what exactly. I even tried replacing 3 with tres#3, based on what I see in the reference structure, but that did not help. Also, Z is present in the reference parse (the rule ne_ilr is actually labeled Z in the reference tree).

So I am not sure what is going on. Does anyone have an idea about how to investigate? I tried using break-after but nothing shows up, no token structures of any kind, in this case.

Many thanks in advance.

Well, I think this might just be an case of a missing generic lexical type. And I think the Freeling’s “3” needs to be replaced with the tag “Z”. I’ll investigate.

Update: Yes, this particular issue is solved by replacing Freeling’s output (“3”) with “z” in the input.