Matrix regression testing - running all tests failing

Hey folks,

When I tried running all of the Grammar Matrix regression tests on trunk, it failed. All of the grammars compiled but then all of the tests failed with a seemingly generic TSDB error that I couldn’t track down.

I seem to recall this happening before and it having to do with the file limit, but I increased the file limit to 100k and it didn’t seem to fix the problem. I also tried increasing the RAM and disk space available on my Virtualbox VM to 12GB and 60GB respectively. Is anyone else able to replicate this problem?

But then when I ran n-1 “random tests” they mostly passed (except the info-str ones), so it seems to have to do specifically with running all tests.


Hi TJ,

Can you please give a bit more info, specifically what command you used ( or regression-test)?

With I am getting some failures now. Here’s the summary:

******** SUMMARY *************
Passed 450/478 tests;
Failed  20/478 tests;
Errors   8/478 tests.

I checked out the errors and see this:

2 tests have no gold profile checked in:

  • adj-yes-no-cop-aux-inv
  • adj-yes-no-cop-inv

These should be easy enough to fix, assuming the creator of this test has the gold profile on their computer and it passes the test. Hint, --list -v lists all the tests and inspects if they have all the files. Run that and look for red "no"s.

6 tests have carriage returns (\r) in the i-input field of the item file:

  • clausalmods-basque
  • clausalmods-german
  • clausalmods-lavukaleve
  • clausalmods-madi
  • clausalmods-moseten
  • clausalmods-uranina

Since Python reads files with “universal newlines” by default, these \r characters count as newlines and it effectively breaks each item record into two, so it looks like there are not enough fields in a record. TSDB’s specification only provides an escape for \n and not \r, so part of this can be fixed by PyDelphin not using Python’s universal newlines when reading these files (I filed an issue to fix it), but also they should have been stripped out when the skeleton was created. I suspect the user was on Windows, which uses the \r\n sequence for newlines, and the script that created the skeleton only stripped off the \n, leaving the \r in place. If --add or --mkskel were used, then PyDelphin handled the creation of the skeleton and the universal newlines thing should have helped here. Perhaps they were created with some other method?

The 20 failures mean that the MRSs differ, so they require someone to inspect the gold and current output.

However none of these problems sound like the one you were having, @trimblet. I tried running the regression-test command and I’m getting errors for all, as you say. The tsdb.$date.log files say:

read_tuple(): non-integer `' as `i-wf' in `item' (1).

Any column with an :integer datatype must have -1 if it is not set, but the i-wf field is obligatory and must have a value of 0, 1, or 2. This is funny, though, because I do see proper i-wf values in both the gold and current profiles (for at least one test I checked). The old script does some direct editing of files, and for some reason it tries to pipe in input sentences to art. Basically, I don’t recommend using the old regression testing code at all because it is hacky, outdated, and brittle.

1 Like

Hi @goodmami, thanks for the info!

I was using the traditional system ( r). With rtest, everything except for the information structure tests talked about already pass.

I guess this leads me back to my question here of basically “when can we officially deprecate the old system?” One benefit from my point of view is with rtest, I believe the whole development cycle can now be done on macOS (including the stricter verification with LKB-FOS).

And thanks for noticing the problems with the adj-yes-no tests… I just added them a couple weeks ago, and I think I used rtest. I will give it a go to add the profiles and verify with that command.

What was the error message in the logs of the info-str tests?

Hmm I seem to have missed that thread. In any case, both systems have issues, but I think rtest is easier to maintain. Your points about portability and the logon burden are valid, too.

If we could get the Matrix hosted on UW’s GitLab or on GitHub, we could also setup continuous integration (CI) where the tests are run whenever commits are pushed. I suspect rtest would be easier to use in this situation simply because the time and space required to configure the logon distribution on a virtual machine. On a related note, my experimental Jacy repository has an action (see here) that compiles the grammar with ACE when changes are pushed. Something like this could be adapted to run


I have encountered this issue (with itsdb-errors) before, and in my case re-installing everything helped…

I have been using exclusively rtest in my branch, where I updated gold profiles for a bunch of tests. I now have 6 failures (all info structure) and 8 errors (all clausal mods), and I just ignore those for the time being.

1 Like