Definitions of specific relations in a profile

I’m calculating accuracy over a treebanked profile. One thing I’d like to know is for how many items I accepted a parse, given the number of test items that a parse was available.

I generated the profiles using the -f option and the only results in the profile are the trees I selected while treebanking. I want to count not only the test items I accepted (and therefore have a result for) as well as the ones I rejected, but had a parse available. I’m trying to figure out if I can get this from the parse file. The values below seem promising, but I don’t understand what the terms tedeges, eedges, ledges, sedges, redges mean. Can someone define them (these come from the relations file, under parse)?

tedges :integer # type-0 entries in (visible) forest
eedges :integer # type-1 entries in (visible) forest
ledges :integer # type-2 entries in (visible) forest
sedges :integer # type-3 entries in (visible) forest
redges :integer # type-4 entries in (visible) forest

From what I can tell, a sentence that didn’t have any parses available when treebanking has a value of 0 for type-1 through type-4, and a non-zero value if parses were available whether or not I accepted/rejected them. Type-0 is zero if the grammar didn’t have lexical coverage over the sentence and non-zero if it did. So I’m guessing tedges have something to do with lexical parsing and the other edges would be expected if a valid tree was found. Based on this, I think I can check for a non-zero value in type-1 (or one of the others) to know if the grammar could have parsed the sentence. Can someone confirm or reject that intuition?

Off hand I’m not sure if tedges is a reliable indicator of lexical coverage, but maybe? To your main question: yes, the existence of syntactic edges in the recorded chart means parsing before unpacking reached the root condition at least once. Unfortunately, packing means there can be cases where it looks a tree would exist but when you replay the derivation without the loss of information incurred by packing it turns out to be inconsistent. This is especially true when packing under generalization is used, which is the default in ace. I think if you disable generalization packing by passing —disable-generalization to ace then the inexactness will only affect the number of readings, not how many whether there are any at all, but I’m not 100% sure about that. The safest way to get an exact number of how many sentences are parseable is just to parse in 1-best mode with a large RAM limit, rather than to record a full forest. In that scenario, you would just count the items where readings > 0.

Best, Woodley

Thanks Woodley. I had thought about just parsing the profile without the -f option and seeing which items had results (I think that’s comparable to what you described), but then I thought that maybe it was better/more reliable to make sure all of the counts I’m using come from the same profile (in case eg. a RAM limit had some effect). But this makes sense. Thanks!