One important thing to know is that itsdb.TestSuite
objects are like open SQL database connections. The data is persisted on disk, but changes are stored in-memory until you commit them. So if you want to use itsdb.Table.update()
to change these, don’t forget to run TestSuite.commit()
when you’re done. Here is an example:
$ cat update.py
import sys
from delphin import itsdb
from delphin import repp
tokenizer = repp.REPP() # default tokenizer, as an example
ts = itsdb.TestSuite(sys.argv[1])
item = ts['item']
for i, row in enumerate(item):
tokens = tokenizer.tokenize(row['i-input'])
# the update() function only changes data in-memory
item.update(i, {'i-tokens': str(tokens)}) # cast to str for YY format
ts.commit() # commit to write to disk
$ cat tmp/item # before updating
1@@@@1@@The cat meows.@@@@1@3@@@
2@@@@1@@The dog barks.@@@@1@3@@@
$ python update.py tmp
$ cat tmp/item # after updating
1@@@@1@@The cat meows.@(0, 0, 1, <0:3>, 1, "The", 0, "null") (1, 1, 2, <4:7>, 1, "cat", 0, "null") (2, 2, 3, <8:14>, 1, "meows.", 0, "null")@@@1@3@@@
2@@@@1@@The dog barks.@(0, 0, 1, <0:3>, 1, "The", 0, "null") (1, 1, 2, <4:7>, 1, "dog", 0, "null") (2, 2, 3, <8:14>, 1, "barks.", 0, "null")@@@1@3@@@
You can also use the low-level delphin.tsdb.write() function with a list of records you’ve created, but this requires a different workflow. The above is probably more user-friendly.
The update()
function is on the itsdb.Table
instance, not the individual rows, so it requires a row index. This is basically just a design limitation – the Row
objects are meant to be immutable to save memory and reduce complexity. The index used in update()
shouldn’t be a constant like 0
but some variable, such as the i
above returned from enumerate(item)
.
I offered this as an alternative workflow that dynamically generates the tokens when processing. For this, you would not update the profiles with i-tokens
, you would just define a delphin.interface.Processor subclass that performs your preprocessing and use the original profiles as inputs. Since you have to write custom Python code for this, you cannot use the delphin process
command, but instead write your own script to use the Python API to do the processing. For example:
# define your custom Processor subclass...
ts = itsdb.TestSuite(ts_path)
with ace.ACEParser(grm, cmdargs=['-y']) as _cpu:
cpu = PreprocessorWrapper(_cpu, ...)
ts.process(cpu)
Sorry there is not one obvious way to do it. You have choices. Either preprocess your profile as a first step and use delphin process --select ...
or write your own script to process the profile with dynamic preprocessing.