Hi Olga (sorry some of my responses are redundant from the thread, but not everyone here is on matrix-dev),
Yes, it doesn’t scope. The “lo” handle of every qeq should be a label shared by one or more EPs. Also note that the INDEX e2 is not the intrinsic variable of any EP.
I vote for fixing the Matrix so it doesn’t produce bad MRSs, then updating the gold profiles. However I agree with others (from the matrix-dev thread) that the regression tests should just PASS if the grammar produces the same MRSs as in the gold, even if ill-formed.
Yes, different. Coverage/overgeneration can be expected, but I don’t think we should ever produce ill-formed MRSs. These are due to bugs in the grammar instead of just missing analyses.
I think you mean the delphin.commands.compare() function, which is a profile-aware wrapper around delphin.mrs.compare_bags. This is defined to return a triple (test-only, shared, gold-only) where “shared” are MRSs that are isomorphic to each other and “test-only” and “gold-only” are the remainders on either side, but currently the code assumes that the graphs are connected. So the change to PyDelphin that you are referring to needs to happen anyway, but for the Matrix we can choose what to do with these cases. I agree that fixing those in the Matrix may be challenging.
I think that rtest could just check if the MRS is well-formed before the function is called, and output ERROR if it is not well-formed. Alternatively, rtest could refuse to add an ill-formed MRS as part of a regression test.
As for fixing the tests, I could do that as part of my summer RA-ship but we’ll need to discuss what the priority of this is. Alternatively, I could remove all such tests and archive them somewhere and file bugs. One day someone interested in the libraries in question could fix them.
You know better of course! But perhaps the change is rather to not call the function on invalid input or throw an error? I mean, it’s worth changing the function only if we actually expect to be passing it such input, isn’t it? For now it sounds like it should not be happening? It is fine for the code to assume a certain type of input; it’s responsibility of the caller to call correctly, which means rtest, I think, in this case.
I think I’ve resolved the problem with PyDelphin. The neg-mod-mod test (and perhaps others) now pass. However, I find that overall more tests failed than before (33 instead of 26). Either I have more work to do with PyDelphin, or the previous bug was hiding some actual problems in the tests. It’s late here but I’ll investigate more tomorrow.