Here’s a comparison of two grammars with respect to a test suite, in [incr tsdb()]:
As you can see, the “new” grammar has strictly more ambiguity.
From the compare → coverage view, it appears like the “gold” grammar has on average 43.92 analyses per sentence while the “new” one has 43.95 analyses per sentence:
However, when I look at the “analye → coverage” view separately, I see, for the “gold” one (yes, I checked multiple times, this is the “gold” one:
The “new” one:
Not only the numbers are different, but the number of “distinct analyses” for the “new” grammar is lower than the one for the “gold grammar”?
Am I forgetting something fundamental about how to look at these?