POS and LE tags in ACE parses

Do the POS tags in the ACE-parsed ERG treebanks come from a model trained a while back with the TnT tagger?

I found this in the docs, but no details:

Batch Parsing
With sensible limits and using tnt for POS tagging (which enables unknown word processing).

I just need to know the provenance of the tags, wether they are gold or automatically predicted (I am assuming the latter).

I am talking about things like this (PRP, VBT, etc.), stored in particular in pydelphin’s Response objects:

(1, 0, 1, <0:1>, 1, "I", 0, "null", "PRP" 1.0) (2, 1, 2, <2:7>, 1, "agree", 0, "null", "VBP" 1.0) (3, 2, 3, <8:12>, 1, "with", 0, "null", "IN" 1.0) (4, 3, 4, <13:17>, 1, "most", 0, "null", "RBS" 1.0) (5, 4, 5, <18:20>, 1, "of", 0, "null", "IN" 1.0) (6, 5, 6, <21:24>, 1, "the", 0, "null", "DT" 1.0) (7, 6, 7, <25:31>, 1, "things", 0, "null", "NNS" 1.0) (8, 7, 8, <32:36>, 1, "that", 0, "null", "IN" 1.0) (9, 8, 9, <37:41>, 1, "your", 0, "null", "PRP$" 1.0) (10, 9, 10, <42:48>, 1, "father", 0, "null", "NN" 1.0) (11, 10, 11, <49:52>, 1, "was", 0, "null", "VBD" 1.0) (12, 11, 12, <53:59>, 1, "saying", 0, "null", "VBG" 1.0) (13, 12, 13, <59:60>, 1, ".", 0, "null", "." 1.0)

The tags are automatically predicted, not individually verified, though it is hard to think of a situation where the gold parse is correct while one of the tags would be incorrect. The tagger used is not TnT, but the one internal to ACE. (We would use TnT when parsing with PET, but not with ACE.)

1 Like

Thank you, @Dan!

@Dan , and is the same true for the lexical types or not? Was there any automatic prediction involved for those in the treebanks?

Where can we find instructions to train the pos tagged of Ace to a new language and setup the necessary rules in the grammar for trigger the generic lexical entries?

Should I be able to use these features in the Portuguese grammar under development with @leoalenc ?


We used the generic entries with an external pos tagger using yy-mode for Zhong. I think the internal tagger hsa the POS tags hard-coded somewhere in ace, so there would have to be some work to make it more flexible.

1 Like

If you use ace’s call out to TNT, then you should be able to train a TNT model and have ace use tnt with that. ACE’s built in tagger is pretty anglocentric. If somebody wanted to improve it and make it more customizable I would be happy to include those patches.

Once you have a tagger, ace will make the tags available as part of the token feature structures, at a path configurable in config.tdl. You can manipulate those in token mapping rules, and more importantly match against them in your generic lexical entries. I’m not sure whether there is any specific documentation walking you how to do all that, though?