How do I reliably obtain the accuracy for old, let’s call them “frozen” treebanked profiles which cannot be reparsed using ACE? So, I don’t want to change anything in them, I cannot reparse and update them, I want to know simply how many trees in them were maked as accepted.
I am currently trying different methods and getting different results. All methods yield the same number of items in the profile but differ when it comes to counting accepted items.
1: count the lines in the
preference file and divide that number over the number of lines in the
This would make you think that there are 155 accepted items in the profile.
- Now I open a profile in fftb and manually count the non-accepted items (the yellow and the red ones).
I am not giving you the full list but believe me, I counted 10 times, and there are 27 non-accepted items in that list, which would mean there are 181-27=154 accepted items. Not 155.
Spoiler: there is an item that appears as “accepted” in fftb which does not appear in the
preference file; the sentence does not look Spanish (has a typo or is in a different language).
On the other hand, in the
preference file, some item IDs appear twice:
So, this would account for the difference between what I see in the
preference file and what I see if I open the profile in fftb, but what does it mean?
- Finally, I tried relying on [incr tsdb()] in the first place but I don’t fully understand how to do it there either. I thought what I should do is: select the TSQL condition t-active=1, like this:
And then I would see the number of accepted items which I can then myself divide by the number of total items to get the treebanked coverage (accuracy), e.g. 153/181:
Perhaps [incr tsdb()] knows how to not count the doubly-listed item twice and knows how to exclude the foreign item (because it is marked as such elsewhere in the database?), so perhaps I should trust this 153 number.
If I load another profile from the same corpus, with the same TSQL query, I see a number of “results” which is larger than the number of total items (519 “out of” 388):
So, I don’t trust what I am doing here. Maybe the profiles are broken somehow, but then I can’t trust what I see in [incr tsdb()] anyway.
Any comments on the correct procedure? What is the correct way to query a profile for it’s accuracy? (Again, assume we can’t reparse it with ACE to create an updated version.)