Debugging SKIPPED sentences using ERG and ACE


#1

I tried to parse the sentence:

We present a new hypothesis for the Jurassic plate-tectonic evolution of the Gulf of Mexico basin and discuss how this evolution influenced Jurassic salt tectonics.

But I got a SKIP: ...

My question is how to debug the -v output from ACE (or better use another tool) to understand what blocks the grammar to analyze this sentence? I suspect that a perfect answer would be to suggest me to attend the course http://courses.washington.edu/ling566/, right? :wink: But maybe someone can point me directly to the best reference to learn how to deal with this particular situation.

I imagine that LKB should provide better support for this situation (so probably I need to study the Copestake, Ann, Implementing Typed Feature Structure Grammars , 2001). Am I right?

Finally, I noted that the fail in the analysis may not be related necessarily to unknow lexical units. The sentence below, with 2 unknown words was correctly parsed:

In our hypothesis, Callovian salt was deposited in pre-existing crustal depressions on hyperextended continental and transitional crust.


#2

If you call ACE with -l, there is an interactive tool and you can look
at the chart. However, the chart is generally massive.

One effective way to find errors is to simplify the sentence, e.g. try
to parse just “We present a new hypothesis for the Jurassic
plate-tectonic evolution of the Gulf of Mexico basin.” until it
parses, then add on more until it fails. I believe this is what was
done for the road testing paper.


#3

I just tried parsing this in the LKB+lui (using ERG 1214), and the chart is indeed massive, so you can’t really make much sense of it right away, but one thing it does is show you a substring when you hover over a constituent (I don’t know, maybe that’s lui and not LKB, in which case you would get that with ACE as well).

I looked at a couple of trees in the chart, some look like this:

02%20AM

You can look at the chart parse as follows: Parse -> Show parse chart -> then CTRL+right-click on licensing rule names in the chart and select “parse tree” (if you are using a Mac you might encounter a UI issue).

Now, at least one problem with this particular tree is that it cannot correctly parse “a new hypothesis for the Jurassic plate-tectonic”. Even if the top S did project to the root, you probably wouldn’t want a tree like this (at least it would be incorrect from the linguistic point of view). I haven’t looked at all its tree attempts though, but it wouldn’t surprise me if the problem would be in “Jurassic plate-tectonic” for many of them… Looks like the ERG just cannot quite place those things as a single modifier.

However, when I replace “Jurassic plate-tectonic” with “green”, then I still do not get a parse, while I still get an S spanning all the words in the sentence, and this time the tree looks good to me? But, for some reason, the top S node (licenced by some rule called CL-CL_CRD-IM_C) still does not project to the root?


#4

To generalize a little bit over what I did above.

There are two types of issues: something is not parsed and something is parsed incorrectly. For now, let’s suppose we are dealing only with the first issue.

Suppose we want to parse: “We present a new hypothesis for the green evolution of the Gulf of Mexico basin and discuss how this evolution influenced Jurassic salt tectonics.”

We hit “parse sentence” but we don’t get a parse. Now we examine the parse chart. In addition to what Francis suggests above (try smaller sentences), what you can do is examine the top of the chart and see what is the largest substring (span) that the grammar actually can license. In our case, it seems like it can span the entire sentence by some combination of rules, however none of the top nodes project to the root (there is no rule to say that’s allowed).

In another situation, you might find that the grammar actually cannot find a way to license something closer to the leaf nodes of the chart. At any rate, you examine the chart and try to find which constituent is not licensed (in the imaginary example that I suggested: the root).


#5

You know what, I tried to be careful but I think I missed something. Looks like I do not have the word “tectonics” spanned in any of the trees! That would be one simple reason for it not to parse. But I will leave my posts here because they do demonstrate how to debug a little.


#6

Now, going back to modifying the original example, I still don’t have a parse here, although this time it does look to me like the possible S node is good and spans the entire input:

31%20AM


#7

The node at the top there is S/PP — it’s got a PP gap inside of it (i.e. a non-empty SLASH value) and so won’t unify with any initial symbol. The PP gap seems to initiate at VP/PP over VP (in both conjuncts) and the question is why don’t the unslashed VPs form a conjoined VP that can then head the S?


#8

Regarding the problem of no parses coming up for this sentence: from what I can see, the 1214 ERG returns analyses and the 2018 (trunk) ERG does not. As Francis suggested, the easiest way to isolate a problem like this is to simplify the sentence in question until you can tell what portion of it is causing trouble. In the case of the 2018 grammar, the phrase “the Gulf of Mexico basin” is the problem; for instance, the following fails to parse:

The Gulf of Mexico basin arose.

The 1214 ERG gives an analysis of this NP, but it is a crummy one: it involves a dubious gerund form of the verb “to base” (i.e. basin) compounded with Mexico (to base something in Mexico?), in relation to a gulf. The 2018 ERG dropped the lexical entry base_v2 that was used in this construction, apparently on the intuition that “to base” always takes a locative complement in addition to a direct object. That makes it incompatible with the gerund rule in this instance.


#9

To first answer Woodley’s most recent question, the phrase “the Gulf of Mexico basin” fails to parse because of a still incomplete account in the ERG of complex proper names. In general, English does not like post-modified nominals to be pre-noun modifiers ("*the friend of Bill mother", “*your chair near me leg”). But some nouns such as “university” can combine with a following PP and have the resulting nominal serve as a pre-noun modifier, as in “the University of Nevada campus”. The ERG 2018 only has about ten nouns that belong to this lexical class, including “bank”, “board”, and “state”, but clearly there are quite a few more. I don’t yet know how to predict which nouns belong to this class, nor do I know of a resource that would give them to me, so help on either front would be welcome. Clearly, “gulf” should be in this class.

The failure to parse the original sentence that started this thread is also for a second reason. The sentence uses the noun “tectonics”, but unfortunately ERG 2018 only has the adjective “tectonic”, which means that the lemma is not unknown, so we don’t use the POS-tag-based machinery to propose a lexical entry for the plural noun “tectonics”. You can perhaps see why we’re reluctant to always propose POS-tag-based entries for every word in every sentence, but this means we have to count on the manually produced lexicon to be exhaustive about the full range of lexical entries that have the same lemma. Thus it is simply a bug in ERG 2018 that it is lacking the noun entry, presumably always a plural, for “tectonics”.

Now to the more general question about how to debug when there is no parse for a sentence, or no correct parse. In addition to the good advice others have offered, to gradually simplify the sentence until you can isolate which phrase the grammar rejects, you can also use the interactive unifier, both in the LKB and ACE. I’ll illustrate for ACE, since the LKB already has some usable documentation in Ann’s book on how to do this. For ACE, view the parse chart for the failed parse, and if there is an edge in the topmost cell that looks promising to you, find out why the grammar rejects it as a good parse by seeing why it does not unify with one of the root symbols, usually `root_informal’. Do this by bringing up the feature structure for the root as follows, in the same terminal window where you asked to view the parse chart:
:i root_informal
Then right-click on the edge in the chart that you think should be a good spanning analysis for the sentence, and drag it onto the outermost node in the root feature structure. ACE will then show you which feature value(s) fail to unify (for example the SLASH value for an edge whose parse tree shows the top node as S/PP).

One additional useful tool in ACE for isolating sources of grammar errors is the menu item “Filter others” that you’ll see when you right-click on an edge in the parse chart. If you select that menu choice, ACE will hide all of the edges that don’t use this edge. So if you’re sure that the right parse should make use of a lexical edge that you see, use the filter mechanism to hide everything that is irrelevant, and if you do this a few times, you can often quickly see in the reduced chart where the parser has failed to propose an edge for some span of the sentence.

If you’re ambitious, when the problem seems to be a lexical gap (as it was twice for the original sentence), you might try adding the missing entries to the `lexicon.tdl’ file, recompile (ACE) or reload (LKB) the grammar, and try parsing the sentence again. For our sentence, the missing two entries look like this (I know they look a bit obscure, but they are just variants of existing entries of the same classes, so you can mostly copy and edit):

tectonics_n1 := n_-_c-pl_le &
[ ORTH < “tectonics” >,
SYNSEM [ LKEYS.KEYREL.PRED “_tectonics_n_1_rel”,
LOCAL.AGR.PNG png-irreg,
PHON.ONSET con ] ].

gulf_n1 := n_pp_c-of-lhc_le &
[ ORTH < “gulf” >,
SYNSEM [ LKEYS.KEYREL.PRED “_gulf_n_of_rel”,
PHON.ONSET con ] ].

With these two entries added, the top-ranked parse for the original sentence now looks okay, confirming our hypotheses about what was wrong with the grammar this time.

Dan