Here’s a naive question. I’m a computational social scientist interested in using automated methods to analyze misinformation online. I have in mind using semantic parsing to split sentences into more basic elements of meaning and to understand the relationships between those elements. I’ve looked into PropBank, AMR, and now MRS / ERS. There’s a bit of a learning curve with MRS / ERS so I’m wondering if someone has any thoughts on whether MRS / ERS might or might not be useful for my purposes and how it stacks up relative to AMR?
AMR seems more complete in identifying semantic arguments than PropBank, but I have a number of reservations: It drops case and other syntax; it has only an English language version (OK for now, but perhaps not ideal in the longer-run); I’m not sure the roughly 60K sentences in AMR’s dataset will be enough. I have in mind training a deep learning model to parse sentences using AMR’s data and then using the model on arbitrary web text. Given the number of possible verbs / verb senses and their often fairly unique argument structure, I am concerned that training deep learning on AMR’s annotated dataset will not generalize well to arbitrary text.
I know that MRS is available for the 2008 Wikipedia, though I gather only some fraction of that is ‘gold’ annotated (still, the deep learning model could learn from less than gold and then be trained on gold).
Any thoughts on whether I’m barking up the right tree in considering ERS / MRS?
By ‘PropBank’ I actually had in mind Ontonotes 5.0
Hi Peter and welcome!
It sounds like you’re looking for semantic representations that make predicate argument structure explicit and possibly also something to do with word-sense disambiguation. Is that right? ERS is very good for the former, but only does coarse-grained word-senses (those that correspond to morphosyntactic differences). All of these representations should drop case and other syntax: those are only the scaffolding for getting to the predicate-argument structures and other semantic features.
I believe you are right that there is more data available annotated with ERSes than with AMRs. Also, ERS representations tend to be more detailed with respect to predicate-argument structure (though less so when it comes to word senses). For ERS, there is also the option of using the ERG directly (rather than training a parser to produce them).
You might find the following resources helpful:
A paper arguing for the ERS approach:
Bender, Emily M., Dan Flickinger, Stephan Oepen, Woodley Packard and Ann Copestake. 2015. Layers of Interpretation: On Grammar and Compositionality. In Proceedings of the 11th International Conference on Computational Semantics (IWCS 2015), London. pp.239-249.
A tutorial on semantic parsing with the ERG: http://moin.delph-in.net/wiki/ErsTutorial
Documentation of the ERS representations: http://moin.delph-in.net/wiki/ErgSemantics/
Finally, I think it might be worthwhile to take a sample of sentences in the domain you are interested in, trying parsing with both existing AMR & ERS parsers, and seeing which seems to capture the information you’re after better.
There is some prior work on producing MRS via neural nets, and probably some more opportunity in expanding it. See e.g. Buys and Blunsom, 2017.
(Also, following )
Thanks ebender, that’s very helpful!
I think my main interest is in obtaining the predicate argument structure. I can always run a separate step for word sense disambiguation if necessary. Ontonotes only gets senses for verbs and doesn’t use the full set of senses from WordNet, just what coders were able to distinguish with .9 accuracy–which might not be a lot different than what ERS distinguishes.
What I have been doing to compare these various approaches is downloading some ‘gold’ annotations, from Ontonotes and AMR, and then comparing how they handle non-core prepositional phrases, hypotheticals, passive voice, and so forth. There’s some overlap in the corpora, so I can directly compare them.
For ERS, I haven’t yet found annotations I can download and look at. For some of the smaller corpora, it seems that all I get are lists of sentences. So I’m guessing there are no down-loadable annoations and the usual procedure is to run an ERS rule-based parser to see what ERS gives? I don’t suppose there’s some web-based demo I can feed a few sentences. But, no worries–I’m now motivated to find and get an ERS parser to work.
Thanks for all the resources!
Thanks trimblet! That paper should provide some insights on what’s possible. I’m not wed to the idea of running neural nets–just whatever does a good job pulling out semantic structure from the text I’m working on. But from what I’ve seen even ERS has a distinction between automatically annotated text and ‘gold’ text, which I imagine is labeled by people. If so, there may be room for neural networks to improve on the automatic annotation–if that annotation is done by rule-based parsers.
There used to be a site where one could search the ERS annotated treebanks, but it is currently down. @arademaker do you have any plans for resurrecting the WeSearch functionality?
But in general, yes, you can get ERS annotated treebanks, where the parse selection has been done by hand (or WikiWoods, where it’s been done automatically), both in the original MRS format and also in some more distilled variants. Some details here:
Maybe the easiest thing for exploration is to use one of the web demos that connects to a parser using the ERG:
This one usually works too, but seems to be down at the moment. @goodmami … is this pointing to a UW machine?
Oh, perfect, I can use these to find out what I need to know. Thanks again!
The demo is working again. Thanks folks!
Yes, I have and I plan to conclude the new WSI interface in the following 2-3 months.
I forgot to mention that there is a running instance of the old implementation: