Debugging SKIPPED sentences using ERG and ACE


#1

I tried to parse the sentence:

We present a new hypothesis for the Jurassic plate-tectonic evolution of the Gulf of Mexico basin and discuss how this evolution influenced Jurassic salt tectonics.

But I got a SKIP: ...

My question is how to debug the -v output from ACE (or better use another tool) to understand what blocks the grammar to analyze this sentence? I suspect that a perfect answer would be to suggest me to attend the course http://courses.washington.edu/ling566/, right? :wink: But maybe someone can point me directly to the best reference to learn how to deal with this particular situation.

I imagine that LKB should provide better support for this situation (so probably I need to study the Copestake, Ann, Implementing Typed Feature Structure Grammars , 2001). Am I right?

Finally, I noted that the fail in the analysis may not be related necessarily to unknow lexical units. The sentence below, with 2 unknown words was correctly parsed:

In our hypothesis, Callovian salt was deposited in pre-existing crustal depressions on hyperextended continental and transitional crust.


#2

If you call ACE with -l, there is an interactive tool and you can look
at the chart. However, the chart is generally massive.

One effective way to find errors is to simplify the sentence, e.g. try
to parse just “We present a new hypothesis for the Jurassic
plate-tectonic evolution of the Gulf of Mexico basin.” until it
parses, then add on more until it fails. I believe this is what was
done for the road testing paper.


#3

I just tried parsing this in the LKB+lui (using ERG 1214), and the chart is indeed massive, so you can’t really make much sense of it right away, but one thing it does is show you a substring when you hover over a constituent (I don’t know, maybe that’s lui and not LKB, in which case you would get that with ACE as well).

I looked at a couple of trees in the chart, some look like this:

02%20AM

You can look at the chart parse as follows: Parse -> Show parse chart -> then CTRL+right-click on licensing rule names in the chart and select “parse tree” (if you are using a Mac you might encounter a UI issue).

Now, at least one problem with this particular tree is that it cannot correctly parse “a new hypothesis for the Jurassic plate-tectonic”. Even if the top S did project to the root, you probably wouldn’t want a tree like this (at least it would be incorrect from the linguistic point of view). I haven’t looked at all its tree attempts though, but it wouldn’t surprise me if the problem would be in “Jurassic plate-tectonic” for many of them… Looks like the ERG just cannot quite place those things as a single modifier.

However, when I replace “Jurassic plate-tectonic” with “green”, then I still do not get a parse, while I still get an S spanning all the words in the sentence, and this time the tree looks good to me? But, for some reason, the top S node (licenced by some rule called CL-CL_CRD-IM_C) still does not project to the root?


#4

To generalize a little bit over what I did above.

There are two types of issues: something is not parsed and something is parsed incorrectly. For now, let’s suppose we are dealing only with the first issue.

Suppose we want to parse: “We present a new hypothesis for the green evolution of the Gulf of Mexico basin and discuss how this evolution influenced Jurassic salt tectonics.”

We hit “parse sentence” but we don’t get a parse. Now we examine the parse chart. In addition to what Francis suggests above (try smaller sentences), what you can do is examine the top of the chart and see what is the largest substring (span) that the grammar actually can license. In our case, it seems like it can span the entire sentence by some combination of rules, however none of the top nodes project to the root (there is no rule to say that’s allowed).

In another situation, you might find that the grammar actually cannot find a way to license something closer to the leaf nodes of the chart. At any rate, you examine the chart and try to find which constituent is not licensed (in the imaginary example that I suggested: the root).


#5

You know what, I tried to be careful but I think I missed something. Looks like I do not have the word “tectonics” spanned in any of the trees! That would be one simple reason for it not to parse. But I will leave my posts here because they do demonstrate how to debug a little.


#6

Now, going back to modifying the original example, I still don’t have a parse here, although this time it does look to me like the possible S node is good and spans the entire input:

31%20AM


#7

The node at the top there is S/PP — it’s got a PP gap inside of it (i.e. a non-empty SLASH value) and so won’t unify with any initial symbol. The PP gap seems to initiate at VP/PP over VP (in both conjuncts) and the question is why don’t the unslashed VPs form a conjoined VP that can then head the S?