Segmentation fault when parsing with ACE 0.9.32

I’m using the latest version of ACE to compile a grammar and parse test items, but I get a segmentation fault:

-bash-4.2$ ./ace-0.9.32/ace -g grammars/abz/abz1/ace/abz1_0932.dat -Tf
Segmentation fault

When I compile the same output from the matrix with ACE 0.9.31 I’m able to run the above command and get readings for input test sentences, but for some reason 0.9.32 won’t let me parse individual items.

In addition to this, when I try to get parses for a whole test suite using art with ACE 0.9.31 I get something like this:
Screen Shot 2021-01-04 at 3.28.00 PM

But using the same data and ACE 0.9.32 I get this:
Screen Shot 2021-01-04 at 3.29.44 PM

The fact that I wasn’t getting readings is why I was trying to test single sentences with ACE. @sweaglesw Do you have any idea why this might be happening?

Unfortunately I am not in a position to debug this currently and due to extenuating circumstances I don’t know exactly when I will be. It certainly looks like I introduced a new bug in 0.9.32. I will look into it when I can but it may be some time; sorry!

The 0.9.32 release included some “improvements” with the side effect that loading a grammar with no maxent model (these don’t exist in my regression tests…) caused a segfault at startup. I just posted 0.9.33, which contains (only) the fix for this bug. Please let me know whether this resolves the problem you are seeing!

Thank you, Woodley!

Hi @sweaglesw, in http://sweaglesw.org/linguistics/ace/ I didn’t see the macOS binary, any special reason? I will try to compile from the source later, should checking if something blocked you for compiling on MacOS.

Maybe @trimblet would like to update the brew formula too? I created an issue at https://github.com/dantiston/homebrew-delphin/issues/4

Hi! Thank you so much. It’s getting further than the previous segmentation fault, but now there’s a new error, but this could very well be an issue with the way I’m running the command. First I tried running ACE on its own on the command line:

./ace-0.9.33/ace -g grammars/yaq/ace/yaq.dat -Tf
Juan nooka
ERROR: DEADLY SIGNAL! sent = `Juan nooka'

Then I tried using ART (which is what I use most of the time anyway to run my full profiles) and something similar happened:

art-0.1.9/art -a 'ace-0.9.33/ace -g grammars/yaq/ace/yaq.dat' output_dir/profiles/yaq
art...
reading results for                0	0 results
reading results for               10	0 results
reading results for               20	0 results
reading results for               30	0 results
reading results for               40	0 results
reading results for               50	0 results
reading results for               60	
out of sync; arbiter sent 'ERROR: DEADLY SIGNAL! sent = `huevena-kai !'' when expecting a SENT or SKIP header.
failed to read result for 60

Hmm, well clearly there are more issues. Can you provide the grammar in question so I can reproduce the crash?

I sent an email with a tar.gz file of the grammar that I used for the above tests.

@sweaglesw, Slight update on this issue…

I’m using ACE 0.9.34 and it’s working for most of my grammars in that it is much faster than ACE 0.9.31 and there are no errors or anything. However, for some grammars it still takes an exceptionally long time to get results for a sentence. I started a run on Thursday Jan 21st, and it started “reading results” for the first sentence in my test suite for a particular grammar, and it has not progressed past that point since Thursday.

I can’t tell if it’s just the nature of this grammar or if there is something going on with ACE. How can I go about diagnosing this?

For now though, I’m using the --timeout option to stop it from hanging too long. I just find it odd since grammars for this language were providing results in a reasonable time before some of the newer matrix changes. But I don’t see why ACE 0.9.34 would be totally fine with half of the grammars and not the other half.

Impossible to say without looking into it, really. Have you tried running that input directly into ace without using art, and if so is there continual output if you add -vvv? Does the memory footprint grow over time? I’m happy to look at it too if you want to share the grammar and input that give unsatisfactory results.

Sorry for the late response, the school servers are partially down, but I got the grammar and input off of them. I’ll send an email with those included if you want to look into it, but no rush or anything as the server trouble is somewhat halting my work in this area anyway.

The erk grammar you provided seems to have some rules in it that freely recurse on their own output. With just the input “wak”, I found n25-bottom-coord, n24-left-coord and n24-bottom-coord all spinning. They did not spin as quickly as I expected, and it seem to slow down dramatically the more edges are created, so there is probably something inefficient going on in some part of the parser that doesn’t get much exercise in more ordinary usage, but I suspect the best solution is to adjust the grammar to block that behavior.

Interesting — all of those rules are actually lexical rules, but they inherit from inflecting-lex-rule rather than infl-lex-rule (like the rest of the irules). Their instances are defined in irules.tdl, as one would expect, and they have the specification [ NEEDS-AFFIX + ] inherited from inflecting-lex-rule. Which of these properties does ace use to determine which rules are inflecting rules?

It’s been a while since thinking about that, but as far as I can tell ACE only draws a distinction between lexical rules with spelling changes and lexical rules without spelling changes, and does not pay any attention to the value of NEEDS-AFFIX or similar (at least not beyond whatever unification constraints the grammar puts on those features).

But looking at the grammar in question more closely, I see that the rule types are instantiated twice – once as orthographemic rules in irules.tdl (with names n24-bottom, n24-left, n25-bottom) and once as syntax rules in rules.tdl (with names 24-bottom-coord, n24-left-coord, 25-bottom-coord). It’s these latter rules that are spinning, not the orthographemic variants. Perhaps these versions need to be tightened (or removed)?

Thanks for catching that, Woodley! Indeed, they should only be instantiated as irules. I wonder if this bug had managed to stay hidden until now because of something about the way the grammar was being loaded (in both the LKB and ace), with the irules instances overwriting the rules instances previously. Anyway, know we know what to fix!

I am not able to follow all the discussions about the lexical rules vs. inflecting rules. But from the computer science perspective, this kind of situation is very interesting. Sorry for jumping into the thread. From the @sweaglesw comment, I got that the freely recurse rules is a problem in the grammar and ACE is not expect to detect this kind of mistake from the grammar developer, right? But we do have the possible problem with some non-consistent ways grammars are loaded by the systems, right? Is it related to the discussion promoted by @goodmami in the last Summit (http://moin.delph-in.net/wiki/VirtualSharedConfigs)?

From the @sweaglesw comment, I got that the freely recurse rules is a problem in the grammar and ACE is not expect to detect this kind of mistake from the grammar developer, right?

ACE attempts to detect situations like this after the rule has spun several times, and I’m not sure why that message did not show up for this case. Generally ACE can only flag it as a clear problem if the rule recurses freely and produces an identical result. It’s possible that in this situation the result is different in some subtle way that stops ACE from detecting the free recursion automatically but doesn’t actually matter combinatorially. For example, the computation history in some append list might be different.

But we do have the possible problem with some non-consistent ways grammars are loaded by the systems, right? Is it related to the discussion promoted by @goodmami in the last Summit (http://moin.delph-in.net/wiki/VirtualSharedConfigs)?

That’s unclear to me. Much of this thread was about temporary buglets in short-lived ACE releases which are now resolved. The most recent issue I would expect to be present in LKB also, so if it is not then you may be right that there is a platform difference lurking here. Has it been confirmed that those duplicates of the rules didn’t show up in LKB? Maybe the fact that they were [NEEDS-AFFIX +] alerted LKB to the fact that something funny was going on?

Just checked, and the LKB does load this grammar with the rules as both syntax & irules, so maybe the [ NEEDS-AFFIX + ] specification is enough to keep them from applying as syntax rules in the LKB.

I’m a bit surprised that we didn’t see this problem previously with using these grammars with ace. @ecconrad can you see if the grammars in Kristen’s repro repo had this property? (Or check what the version of the Matrix she was working from was doing?)

And yes, this is definitely a grammar bug (and in fact a grammar customization bug, which Liz has now fixed :slight_smile:

In the LKB, if a rule satisfies the function spelling-change-rule-p then it’s not treated as a syntax rule. Many grammars define this function (in user-fns.lsp) to test [ NEEDS-AFFIX + ] or similar.