UPDATE: Numbers have nothing to do with it but I am not going to edit the entire post. Please scroll to the last posts for current state of affairs.
I am trying to understand how numbers are supposed to work in the SRG.
In the old grammar, here’s what I get:
More specifically, for tres:
With the new grammar, here’s what Freeling 4.0 is giving me:
(1, 0, 1, <0:4>, 1, "3" "tres", 0, "z", "z" 1) (2, 1, 2, <5:11>, 1, "perro" "perros", 0, "ncmp000", "ncmp000" 1) (3, 2, 3, <12:18>, 1, "ladrar" "ladran", 0, "vmip3p0", "vmip3p0" 1) (4, 3, 4, <19:20>, 1, "." ".", 0, "fp", "fp" 1)
I have of course the “inflectional rule”, similar to other ones, in the appropriate files all of which are loaded:
; -- numbers
z :=
%suffix (z z)
ne_ilr.
ne_ilr := infl-ltow-rule &
[ SYNSEM.LOCAL.CAT.HEAD head ].
When trying to parse the above sentence with ACE, I get:
NOTE: lexemes do not span position 0 `3'!
If I try to replace 3 with tres in the YY input, it’s the same result:
NOTE: lexemes do not span position 0 `tres'!
Now, in the old grammar, there is a file which apparently is used to overwrite some of the Freeling’s output, and some of the content of this file is related to numbers:
##Rearrangements to SPPP output fields
## Rule form is:
## form lemma tag => stem rule_id form
##
## On the left hand side:
## "form", "lemma", and "tag" are regular expressions.
## "*" may be used to mean "anything".
## For "form" and "lemma" complete match will be checked.
## For "tag" prefix match will be used.
## Symbol "!" preceding the regexp negates it.
##
## On the right hand side:
## "stem" may be "F" (form), "L" (lemma), "T" (tag), or any lowercase literal.
## "rule_id" may be "F" (form), "L" (lemma), or "T" (tag).
## "form" may be any combination of "F", "L", and "T". form/lemma/tag will be
## concatenated in the given order, separated by "#".
##
## Rules are applied in order, until a match is found, thus, a last default
## rule "* * *" is needed.
<Output>
* * !(Z|W|NP|AO) => L T F ## stem=lema per tots excepte numeros, dates, NPs i AOs.
(un|una|uno) * Z => F T FL ## lema="un/o/a" per "un/o/a" amb tag Z (tenien lema="1")
* * * => T T FL ## stem=tag per la resta (numeros!="un/o/a", dates, NPs, AOs)
</Output>
So, looks like maybe the tag Z should trigger some spelling change? But I don’t understand what exactly. I even tried replacing 3 with tres#3, based on what I see in the reference structure, but that did not help. Also, Z is present in the reference parse (the rule ne_ilr is actually labeled Z in the reference tree).
So I am not sure what is going on. Does anyone have an idea about how to investigate? I tried using break-after
but nothing shows up, no token structures of any kind, in this case.
Many thanks in advance.