Combining ACE's -R option with pydelphin's select

See: (1) Running ACE with pydelphin wrapper with -R option - #2 by goodmami and (2) Reparsing and updating a treebank keeping previous decisions - #16 by sweaglesw

I need to (1) suppress “results” -R when running ACE, for efficiency and (2) use the i-tokens field instead of i-input while preserving i-input for treebanking.

From the linked responses above it would appear what I want is not possible off the shelf? According to @Dan I am losing a lot in performance by generating the results files in the treebanks. I’d like to not do that but there is no obvious way for me to get out of the YY-input in the i-tokens field situation, and for that it looks like I have to use pydelphin?

Any ideas? Or what seem to be the avenue to the solution: modifying ACE or modifying pydelphin?.. OR getting rid of YY input after all… (Not an option at the moment but maybe in the future).

To paraphrase, you want both (1) and (2), and ACE et al. will handle only (1) while PyDelphin handles only (2), right?

I have also missed some context. Why do you want to use -R? Do you just want to know whether or not an item will parse? Or is there some non-result data you want to store in a profile?

To paraphrase, you want both (1) and (2), and ACE et al. will handle only (1) while PyDelphin handles only (2), right?

That’s right.

Why do you want to use -R ?

Because what I need is update treebanked profiles. Dan is telling me generating the result file is expensive and unnecessary in such a case.

Ok, that much makes sense. I’m not clear on what you hope to get out of running ACE with -R, though. This is what I see:

$ ace -R -g ~/delphin/erg-2018.dat <<< "The dog barks."
NOTE: 2 readings, added 691 / 108 edges to chart (37 fully instantiated, 56 actives used, 24 passives used)	RAM: 1535k
NOTE: parsed 1 / 1 sentences, avg 1535k, time 0.00692s
$ echo "$?"
0
$ ace -R -g ~/delphin/erg-2018.dat <<< "barks dog the."
NOTE: 0 readings, added 617 / 36 edges to chart (22 fully instantiated, 22 actives used, 9 passives used)	RAM: 1238k
NOTE: parsed 0 / 1 sentences, avg 1238k, time 0.00559s
$ echo "$?"
255

With a successful parse, you get some NOTE messages on stderr, and the exit code is 0. With a parse failure you get similar messages on stderr and the exit code is 255. Since PyDelphin does not parse those stderr messages (it used to, but this led to hard-to-debug issues with buffers filling up), the only signal you’re getting out of ACE is the exit code to signal if the sentence parsed or not. If that is sufficient for you, I could consider allowing the -R option in PyDelphin’s ace package when --tsdb-stdout is not used (see below).

And as mentioned in Running ACE with pydelphin wrapper with -R option - #2 by goodmami, when --tsdb-stdout is used, ACE seems to ignore the -R option. If you want ACE to print out everything but the :results(...) content in this case, you might put in a condition for if (!inhibit_results) ... around here: https://github.com/delph-in/ace/blob/19576aff0f7c74e6ff904405e2ca21f2c9afe8ff/itsdb.c#L699-L701. Then if Woodley agreed to such a change, I could consider changing PyDelphin to allow -R along with --tsdb-stdout.

Your github link doesn’t work for me Mike, but guessing based on the URL your suggested change would be fine by me.

To Olga’s purpose, it’s true that unpacking all results would be unnecessarily expensive (both in terms of computation and in terms of disk space), but unpacking just one result probably isn’t that painful a cost? I would expect the -1 option to work fine in combination with the other flags you are using, and might be a simpler solution. I don’t think FFTB will have any trouble working with a profile that has some results recorded along side the edges.

@dan is of correct that not unpacking any results at all is the most efficient way, but given that treebanking is by nature a human effort-limited process is it particularly critical to save 10% of the upfront computation time?

Thank you for your responses, @goodmami and @sweaglesw !

Maybe you are right that it is not worth it in the end. Let’s see: what I do is the following:

delphin process --options="-y --yy-rules -1" -g ~/delphin/SRG/grammar/srg/ace/srg.dat --full-forest --select i-tokens "$profile"

I already am using the -1 flag, as you see. But also of course the --full-forest option, for treebanking. Maybe -1 doesn’t really work in combination with full-forest? The parsing is very slow and the results files are large. But it is good to know that getting rid of the result content woould only save me about 10% I guess? I suppose that would not be critical at all.

Sorry, that project is the GitHub import of the ACE SVN repository and is still set to private mode. I’ve given the ‘sweaglesw’ user admin access so the link (among other things) should now work for you, although it doesn’t help others much. I forget what we were waiting on to make it public… maybe a better plan for keeping it in sync with the SVN source. In any case, I’m glad you were able to work out the filename and line numbers from just the URL.

When you use --full-forest, I would think that the result file doesn’t get populated at all, and instead the packed forest goes into the edge file. For example:

$ delphin process -g ../erg-2018.dat tmp3  # normal mode
Processing |################################| 3/3
NOTE: parsed 3 / 3 sentences, avg 1907k, time 0.01763s
$ ls -lh tmp3/edge tmp3/result
-rw-r--r-- 1 goodmami goodmami   0 Jul 11 09:16 tmp3/edge
-rw-r--r-- 1 goodmami goodmami 13K Jul 11 09:16 tmp3/result
$ delphin process -g ../erg-2018.dat --full-forest tmp3
Processing |################################| 3/3
NOTE: parsed 0 / 3 sentences, avg 1849k, time 0.01504s
$ ls -lh tmp3/edge tmp3/result
-rw-r--r-- 1 goodmami goodmami 20K Jul 11 09:16 tmp3/edge
-rw-r--r-- 1 goodmami goodmami   0 Jul 11 09:16 tmp3/result

However, it seems that if you use -1 or -nX, it will populate both:

$ delphin process -g ../erg-2018.dat --full-forest -o"-1" tmp3
Processing |################################| 3/3
NOTE: parsed 3 / 3 sentences, avg 1898k, time 0.01739s
$ ls -lh tmp3/edge tmp3/result
-rw-r--r-- 1 goodmami goodmami 20K Jul 11 09:28 tmp3/edge
-rw-r--r-- 1 goodmami goodmami 10K Jul 11 09:28 tmp3/result
$ delphin process -g ../erg-2018.dat --full-forest -o"-n5" tmp3 
Processing |################################| 3/3
NOTE: parsed 3 / 3 sentences, avg 1907k, time 0.01849s
$ ls -lh tmp3/edge tmp3/result
-rw-r--r-- 1 goodmami goodmami 20K Jul 11 09:29 tmp3/edge
-rw-r--r-- 1 goodmami goodmami 13K Jul 11 09:29 tmp3/result

So maybe just leave off the -1 or -n options?

Aha! Leaving off the -1 does help! Thanks, @goodmami !

I added it because without it, ACE reports a wrong number after it finishes (e.g. 0/65 sentences where in fact it parsed all 65). But that’s a small price to pay :). I don’t have to rely on that number for coverage I think.

Which reminds me of this question I asked a while ago. I wonder if people have been answering it via email and their replies were silently dropped? ;(