How do I reliably obtain the accuracy for old, let’s call them “frozen” treebanked profiles which cannot be reparsed using ACE? So, I don’t want to change anything in them, I cannot reparse and update them, I want to know simply how many trees in them were maked as accepted.
I am currently trying different methods and getting different results. All methods yield the same number of items in the profile but differ when it comes to counting accepted items.
1: count the lines in the preference
file and divide that number over the number of lines in the item
file:
This would make you think that there are 155 accepted items in the profile.
- Now I open a profile in fftb and manually count the non-accepted items (the yellow and the red ones).
I am not giving you the full list but believe me, I counted 10 times, and there are 27 non-accepted items in that list, which would mean there are 181-27=154 accepted items. Not 155.
Spoiler: there is an item that appears as “accepted” in fftb which does not appear in the preference
file; the sentence does not look Spanish (has a typo or is in a different language).
On the other hand, in the preference
file, some item IDs appear twice:
11072@5@1
11072@5@0
So, this would account for the difference between what I see in the preference
file and what I see if I open the profile in fftb, but what does it mean?
- Finally, I tried relying on [incr tsdb()] in the first place but I don’t fully understand how to do it there either. I thought what I should do is: select the TSQL condition t-active=1, like this:
And then I would see the number of accepted items which I can then myself divide by the number of total items to get the treebanked coverage (accuracy), e.g. 153/181:
Perhaps [incr tsdb()] knows how to not count the doubly-listed item twice and knows how to exclude the foreign item (because it is marked as such elsewhere in the database?), so perhaps I should trust this 153 number.
However:
If I load another profile from the same corpus, with the same TSQL query, I see a number of “results” which is larger than the number of total items (519 “out of” 388):
So, I don’t trust what I am doing here. Maybe the profiles are broken somehow, but then I can’t trust what I see in [incr tsdb()] anyway.
Any comments on the correct procedure? What is the correct way to query a profile for it’s accuracy? (Again, assume we can’t reparse it with ACE to create an updated version.)