I have a unification surprise. Always fun!
I have a reference grammar where the sentence Pitágoras quería ladrar. (Pitágoras wanted to bark.) is parsing (the old SRG using Freeling 3.0 as the morphophonological analyzer).
I have the new grammar, which is the same but relies on Freeling 4.0 (and its updated tag set). This new grammar does not parse the same sentence when it’s passed to it in YY format.
Below you see the reference parse on the left and the actual parse chart on the right. Note that I have to use LKB with the reference grammar and ACE with the new grammar; at the moment I have no other choice and cannot load the new grammar into the LKB (due to lack of interface with the new Freeling) and vice versa (due to challenges with installing old Freeling on newer systems).
- As far as I can tell, the missing rule in the chart is the second rule above quería which is called V_VP_INF-SC_DLR. It is a lexical rule which makes quería require an infinite FORM on its clausal complement.
The said rule is present in the new grammar, as expected (I did not change anything in the grammar apart from inflectional rules associated with Freeling tags:
v_vp_inf-sc_dlr := v_trans_cp_prop-or-ques_dlr.
The rule is loaded in the grammar, it is possible to invoke it in interactive debugging and the node that’s below it in the reference parse (which is found in the actual chart just fine) can be fed into the rule with no failure.
- The rule above the missing rule, VMII4S0, is a bit bizzare (note the suffix orthography), but again, it is the same in both grammars and, most importantly, is found in the actual chart:
%suffix (vmii13s vmii13s)
[ SYNSEM.LOCAL [ CAT.HEAD.AUX -,
AGR.PNG.PN 1or3sg ] ].
This is not even one of the affected Freeling tags, the tag did not change between 3.0 and 4.0.
There is one difference between the Freeling interfaces which I use for the two grammars involving this rule. Freeling actually returns a different tag for quería, and the Freeling interface for the old grammar js just hardcoded to always replace the Freeling tag for quería to this tag, VMII4S0. However, I manually replaced the tag in the input string, so I don’t see how this can matter. Again, the node is in the chart. What is not in the chart is the second node, V_VP_INF-SC_DLR.
Another difference between the two grammars is handling of the unknown words. Note how in the second screenshot, the node I use as the “daughter” in interactive debugging has a type ending in
_native_le. That’s new.
Does anyone see how I should further debug this? I’ve always found the part about lexical rules on the wiki a bit confusing… But I think in this case, this has to do with lexical rules, right?.. Maybe it is the issue with what is an inflecting rule and what is not? None of these are adding anything to the orphography because Freeling has already analyzed the orthography. Yet, all of these types (like VMII4S0) live in
Coming in with some thoughts in case this is still an open question:
It seems to me that it might indeed have to do with the way morphological processing is handled. That is: the affixes are stripped and then that provides a set of possible recipes (rules that corresponded) for creating the words — so maybe limits which rules are attempted in creating the chart?
The %suffix rule is very odd. I am curious whether there are other rules with orthographemic “effects” like that which are working as intended? YY mode has a facility where the name of a rule that needs to apply can be specified. ACE ignores it if you only use
-y, but if you additionally use
--yy-rules it tries to honor it, and also ignores the actual content of the spelling changes. Are you using
--yy-rules? If not, maybe try it. It’s hard to see what would be blocking the application of the (non-spelling-changing) derivational rule though.
Thanks, @sweaglesw and @ebender !
Certainly still an open question
Certainly using --yy-rules! Always have been.
Do you mean, such that the content of the
%suffix differs from the rule name?..
Generally, every token in any sentence will go through an “inflectional” rule of this sort (though in most cases, the rule’s name is the same as the “%suffix”), and some sentences are parsing just fine.
There is something weird about this one (and probably lots of others which I simply haven’t looked at), and it is tempting to think that the issue is with the weird %suffix rule however that rule is in the chart and the one that isn’t in the chart is the syntactic lexical rule…
Fwiw, changing the definition of the %suffix rule such that the “stem” matches the rule’s name doesn’t help… Same chart, as far as I can tell.
It is curious that the sentences that do not parse (from the MRS test suite) seem to be the ones involving this clausal complements in the infinite form. So, something about this lexical rule which is denoted above as V_VP_INF-SC_DLR is what the failures have in common, somehow.
Passing -vv to ACE and sifting through the log, you can see that the edge you are looking for actually does get built. What’s happening is that it is then subsequently being removed by the lexical filtering rules. Unfortunately, that fact is not reflected in the logs – but it is why you don’t see it in the chart and the higher up syntax edges you want aren’t created.
The lexical filtering rule for native+generic is deleting it because the V_VP_INF-SC_DLR rule is leaving its TRAITS value underspecified instead of copying it up from the daughter. It looks like this is normally done on
lex-item, but for whatever reason this rule skips
lex-rule and inherits from
basic-lex-rule, hence the trouble.
I guess the thing to take home is that this is a new type of unification surprise – an edge that actually got built but was subsequently removed by lexical filtering.
Thank you so much, @sweaglesw , for finding the problem and for explaining how to find such problems generally!
Mystery solved! Olga, do you have a minute to add a note about this one to the unification surprise page?