I need to (1) suppress “results” -R when running ACE, for efficiency and (2) use the i-tokens field instead of i-input while preserving i-input for treebanking.
From the linked responses above it would appear what I want is not possible off the shelf? According to @Dan I am losing a lot in performance by generating the results files in the treebanks. I’d like to not do that but there is no obvious way for me to get out of the YY-input in the i-tokens field situation, and for that it looks like I have to use pydelphin?
Any ideas? Or what seem to be the avenue to the solution: modifying ACE or modifying pydelphin?.. OR getting rid of YY input after all… (Not an option at the moment but maybe in the future).
To paraphrase, you want both (1) and (2), and ACE et al. will handle only (1) while PyDelphin handles only (2), right?
I have also missed some context. Why do you want to use -R? Do you just want to know whether or not an item will parse? Or is there some non-result data you want to store in a profile?
With a successful parse, you get some NOTE messages on stderr, and the exit code is 0. With a parse failure you get similar messages on stderr and the exit code is 255. Since PyDelphin does not parse those stderr messages (it used to, but this led to hard-to-debug issues with buffers filling up), the only signal you’re getting out of ACE is the exit code to signal if the sentence parsed or not. If that is sufficient for you, I could consider allowing the -R option in PyDelphin’s ace package when --tsdb-stdout is not used (see below).
Your github link doesn’t work for me Mike, but guessing based on the URL your suggested change would be fine by me.
To Olga’s purpose, it’s true that unpacking all results would be unnecessarily expensive (both in terms of computation and in terms of disk space), but unpacking just one result probably isn’t that painful a cost? I would expect the -1 option to work fine in combination with the other flags you are using, and might be a simpler solution. I don’t think FFTB will have any trouble working with a profile that has some results recorded along side the edges.
@dan is of correct that not unpacking any results at all is the most efficient way, but given that treebanking is by nature a human effort-limited process is it particularly critical to save 10% of the upfront computation time?
I already am using the -1 flag, as you see. But also of course the --full-forest option, for treebanking. Maybe -1 doesn’t really work in combination with full-forest? The parsing is very slow and the results files are large. But it is good to know that getting rid of the result content woould only save me about 10% I guess? I suppose that would not be critical at all.
Sorry, that project is the GitHub import of the ACE SVN repository and is still set to private mode. I’ve given the ‘sweaglesw’ user admin access so the link (among other things) should now work for you, although it doesn’t help others much. I forget what we were waiting on to make it public… maybe a better plan for keeping it in sync with the SVN source. In any case, I’m glad you were able to work out the filename and line numbers from just the URL.
When you use --full-forest, I would think that the result file doesn’t get populated at all, and instead the packed forest goes into the edge file. For example:
Aha! Leaving off the -1 does help! Thanks, @goodmami !
I added it because without it, ACE reports a wrong number after it finishes (e.g. 0/65 sentences where in fact it parsed all 65). But that’s a small price to pay :). I don’t have to rely on that number for coverage I think.
Which reminds me of this question I asked a while ago. I wonder if people have been answering it via email and their replies were silently dropped? ;(