Why would I sometimes hit this error but not always:
File "/home/olga/delphin/parsing_with_supertagging/venv/lib/python3.8/site-packages/delphin/itsdb.py", line 891, in process
_add_row(self, tablename, data, buffer_size)
File "/home/olga/delphin/parsing_with_supertagging/venv/lib/python3.8/site-packages/delphin/itsdb.py", line 918, in _add_row
ts.commit()
File "/home/olga/delphin/parsing_with_supertagging/venv/lib/python3.8/site-packages/delphin/itsdb.py", line 793, in commit
tsdb.write(
File "/home/olga/delphin/parsing_with_supertagging/venv/lib/python3.8/site-packages/delphin/tsdb.py", line 845, in write
raise NotImplementedError('cannot append to a gzipped file')
NotImplementedError: cannot append to a gzipped file
when running:
with ace.ACEParser(grammar, cmdargs=cmdargs, executable=ace_exec, stderr=errf) as parser:
ts.process(parser)
cmdargs can be “-1” or “-1 --ubertagging=000.1”.
With some profiles, the process finishes but with others there is the error above.
on the ERG tsdb profiles (all of which have the same format, namely, item.gz etc)?
When you parse a profile, the item(.gz) file will be read, but not written to. One possibility for the error is that some of the profiles you are working with have been compressed to save space, but may need to be uncompressed before processing.
What @Dan said is partially true for PyDelphin. The .gz files are compressed, but PyDelphin is happy to read and write them transparently (i.e., usually you don’t need to know or care whether the file was gz-compressed). One exception is when PyDelphin is appending to a file on disk rather than writing the whole file anew, because the gzip compression would work better on the whole file than compressing it piecemeal.
The problem
There are two questions:
Why is it gz-compressing the files?
Why is it appending to the files?
RE (1), if you aren’t passing the -z or --gzip options, it won’t compress the results, but the exception can still be raised if the profile has already-compressed files. In the snippet below, gzip is the flag to determine if the results will be compressed, and use_gz is a flag to indicate if an existing file is gzipped:
RE (2), if you run process on a profile with > 1000 items, PyDelphin will try to append the results in batches as it goes. This was done to help it deal with very large profiles.
Feel free to file a bug report on GitHub, since I think PyDelphin is not behaving correctly in this case.
Workarounds
Try to process a version of the profile that does not have compressed files. You can do this with the mkprof command:
delphin mkprof --refresh my-profile # add --gzip to compress again
If you’re using the Python API instead of the command line, you can also try increasing the buffer size to sidestep the issue:
...
ts.process(parser, buffer_size=N)
where N is greater than the number of items in your profile.