PyDelphin "make_skeleton"

This is an offshoot of this topic. At this point I’m converting Xigt data, and I came across this error:

AttributeError: module 'delphin.itsdb' has no attribute 'make_skeleton'.

This comes from these lines in the itsdb.py exporter:

itsdb.make_skeleton(
        outpath,
        config['relations'],
        export_corpus(xc, config)
    )

I assume that the newer version of PyDelphin has changed and this needs to be called in a different way, but I’m not sure what it’s doing in the first place so I don’t know how to update it. Any help appreciated!

Yes, you’re correct that this function has been removed in recent versions of PyDelphin. There are several ways to accomplish this now.

The most straightforward way is to use the delphin mkprof command at the command line (docs), assuming you have some sentences prepared:

$ cat foo.txt
This is a sentence.
*Not is sentence this.
$ delphin mkprof --skeleton \
                 -i foo.txt \
                 -r matrix-core/tsdb/skeletons/Relations \
                 foo-profile
    9746 bytes	relations
      78 bytes	item

You can also use that command from the Python API (docs):

>>> from delphin.commands import mkprof
>>> mkprof('foo-profile',
...        source='foo.txt',
...        schema='matrix-core/tsdb/skeletons/Relations',
...        skeleton=True)
    9746 bytes	relations
      78 bytes	item

For the above, you’ll need to write your sentences to a file first, as with foo.txt above.

Finally, you can also manually create the skeleton using the delphin.tsdb module (docs), but I don’t recommend this as it requires you to fill all necessary columns of the item file in addition to i-input (i-wf, i-length, maybe others):

>>> from delphin import tsdb
>>> schema = tsdb.read_schema('matrix-core/tsdb/skeletons/Relations')
>>> sents = ['This is a sentence.', '*Not is sentence this.']
>>> items = [tsdb.make_record({'i-id': i,
...                            'i-input': sent[1:] if sent.startswith('*') else sent,
...                            'i-wf': 0 if sent.startswith('*') else 1,
...                            'i-length': len(sent.split())},
...                           schema['item'])
...          for i, sent in enumerate(sents, 1)]
>>> tsdb.initialize_database('foo-profile', schema, files=False)
>>> tsdb.write('foo-profile', 'item', items)