We have a bit of a mysterious situation with some (many!) items in the updated SRG treebanks which we are trying to reconcile with the old version of them after reparcing them using an updated version of the grammar. The new grammar uses the new version of Freeling which has different tag names for many things, so most trees have to be looked at again.
Some situations seem easy for fftb and a couple clicks lead you to what it considers a gold tree. But here’s a type of situation which we don’t quite understand (by “we” I mean myself and @Dan who looked at it with me in the last GE meeting). This situation occurs often, and it often includes the pronoun clitic tags which are lexical rules as far as the grammar is concerned. These lexical rules often come as a chain, e.g.:
(3, 2, 3, <8:16>, 1, "dormir" "dormirse", 0, "vmn0000" "+pp3cn00", "vmn0000" "+pp3cn00" 1.00000000)
I suspect there is something about the treatment of these lexical rules that might be a problem, though I can’t be sure. In any case, here’s an example of the problem and the mystery:
For the sentence below, it is in principle possible to find the tree which fftb considers gold (based on the old treebank), in the new parse forest. I have succeeded at that one time, and here’s the proof pic:
However, whenever I try to do this again with a copy of that treebank, I can never arrive at that tree again (so, the mystery is being able to find it once).
Whatever we tried in a session with Dan, results in seeing the following error:
In the above screenshot, what I did is turn off all the gold discriminants and then simply picking one of the options for dormirse as the first discriminant. It is the one that seems to happily appear in the first screenshot, but regardless, all choices for dormirse will lead to the error.
@Dan said the error was a bit surprising to him because he doesn’t often see fftb complain about semantics in this way. To me what is also surprising is the mention of the hd_advnp-pp_c
rule which I do not see anywhere in the gold tree.
@sweaglesw do you have any comments on this? What is it that we are observing and what could be causing it? I suspect that the new grammar is broken in some way due to lexical rule name updates, or perhaps it is something about the YY input and the way I specified the rule chain there? But why was I able to find the tree once?
Here’s the full input just in case:
(1, 0, 1, <0:4>, 1, "para" "para", 0, "sp", "sp" 0.99983359) (2, 1, 2, <5:7>, 1, "no" "no", 0, "rn", "rn" 0.99929653) (3, 2, 3, <8:16>, 1, "dormir" "dormirse", 0, "vmn0000" "+pp3cn00", "vmn0000" "+pp3cn00" 1.00000000) (4, 3, 4, <17:24>, 1, "empezar" "empieza", 0, "vmip3s0", "vmip3s0" 0.99450549) (5, 4, 5, <25:26>, 1, "a" "a", 0, "sp", "sp" 0.99877540) (6, 5, 6, <27:32>, 1, "tocar" "tocar", 0, "vmn0000", "vmn0000" 1.00000000) (7, 6, 7, <33:35>, 1, "de" "de", 0, "sp", "sp" 0.99996148) (8, 7, 8, <36:39>, 1, "pie" "pie", 0, "ncms000", "ncms000" 1.00000000) (9, 8, 9, <39:40>, 1, "." ".", 0, "fp", "fp" 1.00000000)