Grammar development, test suites and profiles

For testing our Portuguese Grammar, we are using the sentences from https://github.com/delph-in/docs/wiki/MatrixMrsTestSuitePt (with some revision). We manually added some examples in the Matrix customization system in the Test Sentences form. Latter, @leoalenc created some more sentences directly in a local file, called my-test_sentences.txt, to avoid the trouble to add them in the web form to have them saved back in the txt file during the grammar generation from the Matrix.

I am now looking for a more efficient way to deal with the test sets. We want to use not only the MRS but also the CSLI. Page https://github.com/delph-in/docs/wiki/MatrixDoc_TestSentences is incomplete. I suppose that profiles are the right way to go. Still, I am looking for advice on how to use profiles during grammar development and, in particular, if there is any special support for profiles in the Matrix or any particular way to work with profiles in a matrix-based grammar.

@Dan, any suggestion about how to work with profiles instead of text files during grammar development?

If I understand correctly, you want to know (a) how to create test suites from text files, and (b) how to work with profiles during grammar development?

For the first, you can use Woodley’s mkprof tool (part of art), or PyDelphin’s delphin mkprof command, with almost the same syntax:

$ mkprof -i sentences.txt -r Relations destination

PyDelphin’s version just has a few more options, but otherwise works the same.

For the second question, there’s nothing special about the Matrix with regards to the use of [incr tsdb()] profiles, at least as compared to other DELPH-IN HPSG implemented grammars. If you want to know about specific methodologies, like regression testing, grammar profiling, treebanking, etc., then we can offer some suggestions, but let us know what your needs are, first.

I am quite familiar with (a) using both art and pydelphin. My question is more related to (b)

If you want to know about specific methodologies, like regression testing, grammar profiling, treebanking, etc., then we can offer some suggestions, but let us know what your needs are, first.

Yes, I want to know about methodologies. During the development cycle of the grammar, we:

  1. fill the form
  2. generate the grammar
  3. test it
  4. manually update some tdl files
  5. go to 3 and eventually to 1 again (forever! ;-))

for the testing, we are using LKB to run a regression test from a text file with one sentence per line with some negative examples marked with *. But now, we don’t want to have only a single file and instead of files, I suspect we can use profiles, right?

In which stage a regression testing turns to be grammar profiling and/or treebanking?

Yes, I assumed you were familiar with art and PyDelphin, but then if that’s the case I’m confused by your statement:

Once you’ve created the test suite from mkprof, I don’t see why you need the text files anymore unless you want to recreate the test suite with new sentences. You can run regression tests from test suites instead of text files, although I’m not sure how to do that in the LKB.

You are right, sorry for the confusion. My sentence was misleading. I guess I need to refine my questions.

For now, in the lkb/script produce by the MATRIX, I see the *last-parses* with all the sentences introduced in the Test Sentences form. But is this the only use of the test sentences by the MATRIX system?

If we move the sentences for profiles, what is the general workflow? I guess the use of TSDB connected with LKB would be very convenient, right? Is there any alternative? Does it make sense to have regression tests separated from the profiles used for grammar profiling and/or treebanking? I assume treebanking go later when we will eventually need to train a parsing rank model.

As I recall, *last-parses* is just used for auto-filling the drop-down menu of sentences to parse within the LKB interface.

When you configure a grammar with the Matrix, it also uses those sentences to create a skeleton under tsdb/skeletons/matrix/.

$ python3 matrix.py c web/sample-choices/mini-english .
$ cat eng/tsdb/skeletons/matrix/item 
1@unknown@unknown@none@1@S@the cat chases the dog@@@@1@5@@Grammar Matrix Customization System@16-sep-2021 14:24:45
2@unknown@unknown@none@1@S@the dogs sleep@@@@1@3@@Grammar Matrix Customization System@16-sep-2021 14:24:45

You can use those as a start for grammar profiling:

$ mkdir eng/tsdb/current/  eng/tsdb/gold  # where to put new, gold profiles
$ delphin mkprof -s eng/tsdb/skeletons/matrix/ eng/tsdb/current/matrix
    9746 bytes	relations
     222 bytes	item
[...]
$ ace -g eng/ace/config.tdl -G eng.dat
[...]
$ delphin process -g eng.dat eng/tsdb/current/matrix/
Processing |################################| 2/2

Once you have, through [incr tsdb()] or whatever you prefer, confirmed the profile is good and copy it to the gold directory, you can compare later profiles for basic regression testing (here I just copied the same profile to the gold directory):

$ delphin compare eng/tsdb/current/matrix/ eng/tsdb/gold/matrix/
1	<0,1,0>
2	<0,1,0>

This says that each item had 1 parse that was shared by both the current and gold profiles, i.e., they have the same bags of MRSs per item. If any differ, you might see something like this:

$ delphin compare eng/tsdb/current/matrix/ eng/tsdb/gold/matrix/
1	<0,1,0>
2	<1,0,1>

The last line here shows there’s 1 parse on the current profile not matched by a parse in the gold profile, and vice versa. That is, they each got a parse, but the MRSs are different.

This PyDelphin-based solution is good for scripting and basic tests. If you want to do serious grammar profiling I’d use [incr tsdb()] and either [incr tsdb()] or FFTB for treebanking.

2 Likes

In my work flow (and the one I recommend to students in 567), I create the profiles separately from the customization system, and then use [incr tsdb()] to test different versions of the grammar. That is, the development of the testsuites is not directly linked to the development of the grammar (either through customization or eventually through manual extension). The testsuites are, of course, both aspirational (what we hope the grammar will cover) and documentary (distinctions we got working and want to make sure not to lose).

The “test sentences” functionality in the customization system is (as you spotted) just there to populate the parse history in the LKB parse dialogue. Generally speaking, however, once you get going, it is far faster/more convenient to have [incr tsdb()] running and double click sentences from profiles (Browse | Test Items).

2 Likes

Thank you @ebender, that is the kind of feedback I was looking for! For the [incr tsdb()], are you using it connected to the LKB, right? Are you using the LkbFos, right? I was about to change my docker to remove all support for the LOGON stuff, but I guess I need first to understand how to install the [incr tsdb()] without the LOGON directory.

Thank you very much @goodmami! That was useful! Now I have a much better understanding on how Matrix uses the test sentences. So tsdb is the standard place to put the profiles, together with the grammar (similar to what @Dan does with ERG). The current vs gold subdirectories make sense too, thank you.

Next, I need to read the [incr tsdb()] manual to make it work with LkbFos without the need for the LOGON directory.

For 567, we have a VirtualBox image with LkbFos and [incr tsdb()] (and ace). You can see that here, in case it’s helpful:

https://wiki.ling.washington.edu/bin/view.cgi/Main/KnoppixLKB