"Most likely" parses and resolved MRS Trees

If I ask ACE to parse a phrase and there are multiple possible parses, it will return them sorted by “probability” which I have found to match (at least my) expectations of what was meant very well.

First question is how does it come up with this probability?

My real challenge is that I really want the same probability for resolved trees. For example the “most probable” (and coincidentally only) parse of the phrase:

Where are the safe and the book?

has 17 different valid resolved MRS trees. One of them is this:

                   ┌_safe_n_1:x13
 _the_q:x13,h14,h15┤
                   │                  ┌_book_n_of:x18,i23
                   └_the_q:x18,h20,h21┤
                                      │                ┌place_n:x4
                                      └which_q:x4,h7,h8┤
                                                       │                 ┌_and_c:x3,x13,x18
                                                       └udef_q:x3,h10,h11┤
                                                                         └loc_nonsp:e2,x3,x4

Which as far as I can tell means “tell me the places where the safe and book are that they have in common” So if the book is in France and the safe is in Germany, the answer might be “The book and the safe are in Europe”.

When I’m pretty sure a much more probable interpretation is this:

                   ┌_book_n_of:x18,i23
 _the_q:x18,h20,h21┤
                   │                  ┌_safe_n_1:x13
                   └_the_q:x13,h14,h15┤
                                      │                 ┌_and_c:x3,x13,x18
                                      └udef_q:x3,h10,h11┤
                                                        │                ┌place_n:x4
                                                        └which_q:x4,h7,h8┤
                                                                         └loc_nonsp:e2,x3,x4

Which I believe means: “tell me the independent location of each of: the safe and the book”. If the book is in France and the safe is in Germany, the answer might be “The book is in France and the safe is in Germany”.

Anyone have a tips, research pointers, or anything else to help sort the probability of resolved trees?

I can answer the first of your questions, about how the parse ranking is done. Annotators over time have constructed a treebank for about 50,000 sentences, where for each one, the best analysis is identified from among the candidates produced by the grammar/parser. These annotations are then used to train a maximum entropy model which can be applied at run time to novel sentences to rank them from most to least likely. If you have a source copy of the ERG, you’ll find those annotations in the subdirectory tsdb/gold (if you care), and the model is in the file redwoods.mem in the top-level directory.

I can’t advise you on how to automatically rank the fully-resolved analyses.

Yeah, I was afraid of that. The only approaches I came up with involved hand-labeling lots and lots of data and training some kind of model based on the labels too. Hopefully someone has done that for resolved trees somewhere…