Ambiguity mystery in tsdb++

Here’s a comparison of two grammars with respect to a test suite, in [incr tsdb()]:

Screen Shot 2022-04-29 at 1.17.43 PM

As you can see, the “new” grammar has strictly more ambiguity.

From the compare → coverage view, it appears like the “gold” grammar has on average 43.92 analyses per sentence while the “new” one has 43.95 analyses per sentence:

Right?

However, when I look at the “analye → coverage” view separately, I see, for the “gold” one (yes, I checked multiple times, this is the “gold” one:

The “new” one:

Not only the numbers are different, but the number of “distinct analyses” for the “new” grammar is lower than the one for the “gold grammar”?

Am I forgetting something fundamental about how to look at these?

At a guess, is one of those views looking at all items, while the other is looking only at positive (grammatical) items?