Ill-formed MRS in some regression tests

Some of the MRSs which had been added to the repository as part of regression tests are invalid. For example, the following MRS has a dangling qeq relation h0-h1:

If I understand right, the fact that the h1 is only present in the HCONS and is not present anywhere else, makes this MRS ill-formed?

If so, then the question is: should such tests be removed/fixed or left alone?

We often add tests which aren’t fully correct, e.g. with some coverage or overgeneration issues, or something like that, for example. But I suppose this is different?

(Context: leaving such tests alone requires changing the mrs_compare() function in pydelphin; fixing such tests may or may not be straightforward).

Hi Olga (sorry some of my responses are redundant from the thread, but not everyone here is on matrix-dev),

Yes, it doesn’t scope. The “lo” handle of every qeq should be a label shared by one or more EPs. Also note that the INDEX e2 is not the intrinsic variable of any EP.

I vote for fixing the Matrix so it doesn’t produce bad MRSs, then updating the gold profiles. However I agree with others (from the matrix-dev thread) that the regression tests should just PASS if the grammar produces the same MRSs as in the gold, even if ill-formed.

Yes, different. Coverage/overgeneration can be expected, but I don’t think we should ever produce ill-formed MRSs. These are due to bugs in the grammar instead of just missing analyses.

I think you mean the delphin.commands.compare() function, which is a profile-aware wrapper around delphin.mrs.compare_bags. This is defined to return a triple (test-only, shared, gold-only) where “shared” are MRSs that are isomorphic to each other and “test-only” and “gold-only” are the remainders on either side, but currently the code assumes that the graphs are connected. So the change to PyDelphin that you are referring to needs to happen anyway, but for the Matrix we can choose what to do with these cases. I agree that fixing those in the Matrix may be challenging.

1 Like

I think that rtest could just check if the MRS is well-formed before the function is called, and output ERROR if it is not well-formed. Alternatively, rtest could refuse to add an ill-formed MRS as part of a regression test.

As for fixing the tests, I could do that as part of my summer RA-ship but we’ll need to discuss what the priority of this is. Alternatively, I could remove all such tests and archive them somewhere and file bugs. One day someone interested in the libraries in question could fix them.

You know better of course! But perhaps the change is rather to not call the function on invalid input or throw an error? I mean, it’s worth changing the function only if we actually expect to be passing it such input, isn’t it? For now it sounds like it should not be happening? It is fine for the code to assume a certain type of input; it’s responsibility of the caller to call correctly, which means rtest, I think, in this case.

Resolution after a meeting with @ebender: MRS like this are possible in regression tests.

This means that we should probably have rtest issue a warning when a test like this is added but make sure, one way or another, such tests pass so long as there is no change.

2 Likes

I think I’ve resolved the problem with PyDelphin. The neg-mod-mod test (and perhaps others) now pass. However, I find that overall more tests failed than before (33 instead of 26). Either I have more work to do with PyDelphin, or the previous bug was hiding some actual problems in the tests. It’s late here but I’ll investigate more tomorrow.

1 Like

Update for those following here: a bug (now fixed) in PyDelphin prevented backtracking in some cases where it started to find a bad isomorphism mapping. Now there are just 14 regression test failures.

2 Likes