Finally getting back to this again…
I found the script, and definitely thanks to @olzama that it exists! But I have a suggestion for how to possibly change it and want input on whether it’s a good idea for a change or not.
As it is now, the script produces 3 files:
- pos_tags.txt – set of POS tags retrieved from the Xigt POS tier
- glosses.txt – set of glosses retrieved from the Xigt gloss tier
- unknown_features.txt – set of glosses not found in the existing Feature Dictionary
Since the unknown_features.txt file is a superset of glosses.txt, and includes mostly all of the root translations, with a handful of potential new feature glosses, I was thinking that maybe glosses.txt should include only the known features, and then the user can look in unknown_features.txt and add any glosses that actually are features both to FeatureDictionary and glosses.txt.
This way the user has to do (slightly) less work and the glosses.txt file won’t include root translations by default anymore.
If I do this… what is the protocol for glosses that appear with periods? It seems that as of now, they show up in glosses.txt as is (e.g. something like “to.be” or “m.sg,” but then in unknown_features.txt they are split up. I assume they are split for unknown_features.txt because when inserting them into the Feature Dictionary they should appear separately (?), but maybe glosses.txt needs them in their original form?