Result files over 16GB?

Speaking of large result files (--full-forest flag for ACE and comparing treebanks in tsdb++ - #6 by olzama): I noticed that ACE (run via pydelphin wrapper) gets “killed” once the result file reaches 16GB. Is that expected perhaps?

As much as possible, PyDelphin does not keep the entire profile in-memory during processing. It uses Python generators to iterate over rows and flushes results to disk after processing 1000 (by default), but it probably depends on how you’re using it. Are you using testsuite.process(...) in a Python script? And what is the memory capacity of the system you’re running on?

@arademaker, you’ve also processed large profiles using PyDelphin. Did you encounter this issue?

No @goodmami, I early discovered that working with relative small profiles are the better way to go. I split the glosses and examples of Wordnet into profiles with at most 2000 sentences.

Sadly, my problem is due to enormous ambiguity, not number of sentences…

@arademaker ok, thanks for explaining

@olzama Hmm, you might try reducing the buffer size so it writes to disk more frequently and stores fewer results in memory (see delphin.itsdb.TestSuite.process()):

ts.process(..., buffer_size=100)  # =10, etc.

But I’m not sure this will help much.

1 Like

But are you working on full-forest right? So no clear how the ambiguity is being a problem… Do you mean in the profile preparation?

Yes, I mean without the --full-forest. I want to assess the ambiguity and my attempts to reduce it. Maybe I just shouldn’t do it with longer sentences and only measure that on shorter ones…