A question from a UW treebanker: Given a treebanked profile, what is the quickest way to get summary statistics (most importantly # of items with a tree accepted).
edit: Sorry, I just saw the ‘fftb’ tag on this question. I don’t think the following will work for packed profiles, but maybe you’ll find a use for them in other contexts.
— original reply below —
With PyDelphin, this gets you the i-ids of items that had at least one parse:
$ delphin select 'i-id where readings > 0' profile
You can use wc
to count them:
$ # successfully parsed items
$ delphin select 'i-id where readings > 0' tsdb/gold/mrs | wc -l
130
These may also be useful
$ # no parse but no error (e.g., ungrammatical)
$ delphin select 'i-id where readings = 0' tsdb/gold/mrs | wc -l
4
$ # no parse with some error
$ delphin select 'i-id where readings = -1' tsdb/gold/mrs | wc -l
1
$ # get the i-id and error field
$ delphin select 'i-id error where readings = -1' tsdb/gold/mrs
222@no lexicon entries for: "幾つか"
$ # get the number of parses per item
$ delphin select 'i-id from result' tsdb/gold/mrs | sort -n | uniq -c | awk '{print $2 "\t" $1}'
11 1
21 1
31 1
41 1
[...]