InfoStr regression tests: running them with rtest

Some of the regression tests currently fail if the new rtest.py system is used.

It seems like in all the previous cases, rtest was discovering something that the old system was missing, so, in most cases gold profiles had to simply be updated.

I just wanted to check that this is also the case here.

Gold:

15%20AM

Current, for the same item:

47%20AM

So the old system isn’t looking at ICONS, though it is a bit puzzling to me that there is two MRS in current?

Two MRSes in current suggests that the current grammar (= grammar generated by the current customization system) is finding ambiguity. Can you test that example by hand to confirm?

It’s also suspicious that one of those two has an ICONS list and the other doesn’t.

@olzama yes, I noticed this and mentioned it in the email thread to matrix-dev (the message sent around Aug 6). Here’s the tests that were failing:

  • infostr-foc-affix-after-noun
  • infostr-foc-affix-obj
  • infostr-foc-affix-obj-verb
  • infostr-foc-affix-subj
  • infostr-foc-affix-subj-obj
  • infostr-foc-affix-subj-obj-hier
  • infostr-foc-affix-subj-obj-verb
  • infostr-foc-affix-subj-verb
  • infostr-foc-sov-prev

I think the old comparison function do not examine ICONS.

@ebender The missing ICONS is actually a PyDelphin thing (consider the indented, colored MRSs you’re seeing are not the MRSs directly out of the grammar, but those read and written by PyDelphin, which is also why they show TOP and not LTOP). It was a feature that PyDelphin does not print ICONS if the list were empty, because I didn’t want to start printing ICONS lists for grammars that didn’t implement ICONS at all. Perhaps this feature is now a bug, or at least it needs to be refined to distinguish empty-list from no-list.

Well the issue is that the test passes with the old system and fails with rtest…

Right; but why is there two MRS in the current profile, is that actual ambiguity, you think? I think I did not fully understand your explanation about ICONS.

So I checked the gold profile to confirm there is indeed only one result there for this item (confirmed).

So it looks like:

(1) The old regression test system failed to notice the ambiguity that was introduced at some point.
(2) Either the tests (which Michael lists above) should be updated or the bug should be found and fixed in the customization system.

@sanghoun Can you still tell whether there should or should not be ambiguity of this sort:

The NP->NP rule is a “narrowly focused phrase”, and the top S is a special NF-phrase as well.

Note that the MRS looks the same however there is a difference in ICONS (see the first post) as well as in the shape of the tree.

@olzama yes, the ambiguity is what is is. Even without such differences, differences in ICONS were ignored but no longer with rtest.py.

@sanghoun, could you perhaps comment on this issue? I am trying to figure out whether or not this ambiguity (the two trees, one focus-neutral and one focused) is expected for the PN CN TV sentence in the infostr-foc-sov-prev test. If it is expected, I will just update the gold profiles in such cases.

My apologies for this late reply. These days are the biggest holiday season in Korea.

As I remember, “PN CN TV” in the test set is ambiguous with respect to information structure and therefore the set generates two readings.

In the very previous version at that time, only one item could be compared (maybe the comparison module in incr_tsdb). So, the previous version selected only one and ignored the others when it comes to the MRS comparison. Now, using the rtest.py, we can update this way of incomplete comparison.

Let me think whether I remember right or not. If I have any more idea, let me leave some more comments.

Sanghoun

1 Like

Thank you @sanghoun ! That’s what I thought, that the gold should probably be updated in such cases.

So in this one test, infostr-foc-sov-prev, the gold needed to be updated.

In the affix ones however, seems like at some point we lost the ICONS, so, that’s a true regression. I opened a ticket (https://lemur.ling.washington.edu/trac/matrix/ticket/935) but I am not sure I will myself investigate this right now, since this has been broken for a while and it is unclear which change led to this. To clarify, these tests are failing in the trunk.

(Couldn’t help investigating a little bit):

So suppose we are testing that an affix is adding focus information.

Does the following look right then:

v1-lex-rule-super := add-only-no-ccont-rule & infl-lex-rule &
  [ DTR verb-lex ].

r1-lex-rule := v1-lex-rule-super &
  [ SYNSEM.LOCAL [ CAT.VAL.SUBJ.FIRST.LOCAL.CONT.HOOK.ICONS-KEY contrast-focus,
                   CONT.HOOK.ICONS-KEY semantic-focus ] ].

If it’s an add-only-no-ccont rule, perhaps it won’t add anything to ICONS (no CCONT, and it is a same-non-local rule which I think means if it started with an ICONS-empty daughter, it will have an ICONS-empty mother?)

Or am I misunderstanding the lexical rules hierarchy and the above looks correct?