Context: @kphowell and I are working on treebanking profiles based on inferred grammars (from the AGGREGATION project), using fftb. These grammars are very noisy and have low coverage (~10%, give or take). fftb seems set up to expect nearly complete coverage over profiles–if you click on an item that has no results, you get a 404 and then have to use the back button on the browser and track which item you just looked at as it doesn’t change color to “visited”. So, we’re trying to use PyDelphin to downselect the profiles so that they only contain the examples that the grammar can parse, before treebanking.
@goodmami has provided the following steps to do so:
delphin process -g grm.dat original-profile/ delphin mkprof --full --where 'readings > 0' --source original-profile/ new-profile/ delphin process -g grm.dat --full-forest new-profile/
Questions: (for @sweaglesw especially)
- When I ran the first step, I noticed that for some sentences, ace was giving an error saying it ran out of RAM while unpacking. I understand this to mean that there is some loss of coverage in the process of storing the old-fashioned/non-full-forest profile. Is this correct?
- Is there a way to get summary statistics (e.g. coverage) out of fftb profiles?