Unification failure in the RELS list during treebanking (and a bit of mystery)

We have a bit of a mysterious situation with some (many!) items in the updated SRG treebanks which we are trying to reconcile with the old version of them after reparcing them using an updated version of the grammar. The new grammar uses the new version of Freeling which has different tag names for many things, so most trees have to be looked at again.

Some situations seem easy for fftb and a couple clicks lead you to what it considers a gold tree. But here’s a type of situation which we don’t quite understand (by “we” I mean myself and @Dan who looked at it with me in the last GE meeting). This situation occurs often, and it often includes the pronoun clitic tags which are lexical rules as far as the grammar is concerned. These lexical rules often come as a chain, e.g.:

(3, 2, 3, <8:16>, 1, "dormir" "dormirse", 0, "vmn0000" "+pp3cn00", "vmn0000" "+pp3cn00" 1.00000000)

I suspect there is something about the treatment of these lexical rules that might be a problem, though I can’t be sure. In any case, here’s an example of the problem and the mystery:

For the sentence below, it is in principle possible to find the tree which fftb considers gold (based on the old treebank), in the new parse forest. I have succeeded at that one time, and here’s the proof pic:

However, whenever I try to do this again with a copy of that treebank, I can never arrive at that tree again (so, the mystery is being able to find it once).

Whatever we tried in a session with Dan, results in seeing the following error:

In the above screenshot, what I did is turn off all the gold discriminants and then simply picking one of the options for dormirse as the first discriminant. It is the one that seems to happily appear in the first screenshot, but regardless, all choices for dormirse will lead to the error.

@Dan said the error was a bit surprising to him because he doesn’t often see fftb complain about semantics in this way. To me what is also surprising is the mention of the hd_advnp-pp_c rule which I do not see anywhere in the gold tree.

@sweaglesw do you have any comments on this? What is it that we are observing and what could be causing it? I suspect that the new grammar is broken in some way due to lexical rule name updates, or perhaps it is something about the YY input and the way I specified the rule chain there? But why was I able to find the tree once?

Here’s the full input just in case:

(1, 0, 1, <0:4>, 1, "para" "para", 0, "sp", "sp" 0.99983359) (2, 1, 2, <5:7>, 1, "no" "no", 0, "rn", "rn" 0.99929653) (3, 2, 3, <8:16>, 1, "dormir" "dormirse", 0, "vmn0000" "+pp3cn00", "vmn0000" "+pp3cn00" 1.00000000) (4, 3, 4, <17:24>, 1, "empezar" "empieza", 0, "vmip3s0", "vmip3s0" 0.99450549) (5, 4, 5, <25:26>, 1, "a" "a", 0, "sp", "sp" 0.99877540) (6, 5, 6, <27:32>, 1, "tocar" "tocar", 0, "vmn0000", "vmn0000" 1.00000000) (7, 6, 7, <33:35>, 1, "de" "de", 0, "sp", "sp" 0.99996148) (8, 7, 8, <36:39>, 1, "pie" "pie", 0, "ncms000", "ncms000" 1.00000000) (9, 8, 9, <39:40>, 1, "." ".", 0, "fp", "fp" 1.00000000)

From what I can infer, the choice you made for “dormirse” implied a choice for the neighboring word “empieza”, which is where the action actually happened. It seems the implied choice was to use hd_advnp-pp_c, which is a bit of an unusual looking rule in (my outdated copy of) the SRG, in that it reaches directly into its daughter’s RELS list and makes an assertion about its first element.

My guess is that that rule is not one that you actually wanted to use here. The fact that it was implied by your other choice suggests that perhaps the desired tree is no longer available, due to some change in the grammar. Have you tried reconstructing that derivation manually? It can be done somewhat painfully using interactive unification, or much more easily using the recons tool which is part of my libtsdb package (and included in the binary acetools release – which apparently is a few versions out of date, oops). You would do something like this:

recons -g srg.dat -v -i 10809 gold-profile

where 10809 is the item id for the sentence you are working on. That should give you some diagnostic information about whether the saved derivation is still valid. If it fails for mundane reasons like changed rule names, you can manually edit the rule names in the derivation and try again – either by editing the result relation in a copy of the gold profile or by pasting in the derivation on stdin and giving recons the -t flag instead of a profile name and item id.

1 Like