New Dev Docs for: Understanding the MRS Format

I’ve put together developer-focused writeup of “Understanding the MRS Format”. I’ve included lots of information from this forum such as:

And many others. I’ve tried to capture “everything I wish I had known”, again, from a developer perpective.

I’d love any comments on it. My plan is put it in the how-to section of the main documentation, by mid next month if there are no objections.

Enjoy!

The only thing that call my attention is the use of some nonstandard terminology of the domain. I am afraid that can introduce more confusion to people.

Surely I understand you have the goal to educate developers. In contrast, many documents from DELPH-IN are written by academic researchers to academic researchers and students.

Anyway, I would vote to make clear in the documents the distinct nature of each document.

1 Like

Thanks @arademaker! Can you give me an example of using non-standard terminology? In the two docs I just posted, in particular, I’ve tried very hard not to introduce any new terminology. I’ve certainly used analogies like “lamdba function” as analogous to “scopal argument” but only as analogies.

Help me ferret it out!

I realized the doc has moved and I can’t update the original post. It is here now.

@arademaker any specifics you can give here? I’d love to fix terminology if I’m misusing it.

I’ve updated the disclaimer at the top to try to address your concern:

This section is designed to give application developers an overview of the Minimal Recursion Semantics format which is the primary artifact used by DELPH-IN to represent the meaning of a phrase. For a deeper dive into MRS, or one that has a more academic or linguistic approach, explore Minimal Recursion Semantics: An Introduction.

I can try to read this in more depth later, but also it would be good to distinguish more between the ERG and ACE. While it’s true that the particular processor has idiosyncrasies that lead to particular MRS and/or trees and other parts of the representation, in an ideal world (I think) all of the processors would work exactly the same and the grammars would be the only variable. You should get the same MRS for the same utterances and version of the ERG (or any grammar) between ACE and the LKB or Agree or PET.

Each MRS document also has multiple interpretations. Using constraints that are included as part of the MRS, a set of trees (called well-formed trees ) can be built from the flat list of predications in a given MRS. These well-formed trees define all the alternative meanings of that particular MRS.

I know the focus on scope-resolved representations has come up before and I suppose we haven’t come to a conclusion on it. I suspect the use of the term “tree” to refer to these is one of the concerns @arademaker has. In linguistics, while the tree data structure comes up in several fields and areas, it is very associated with syntax trees, which the ERG produces. In delph-in land, usually we refer to trees as derivations (though I’m not sure why). So it is a bit confusing to say an utterance gets a reading which is a pairing of 1. a tree/derivation and 2. an MRS and then additionally say that an MRS can be expanded into trees.

A DELPH-IN parser like ACE will usually generate more than one MRS document representing the various high-level interpretations of a phrase. Each one contains a list of predicate-logic-like predications and not a tree like you’ll see in many natural language systems. That’s because it is underspecified . Even though the parser has already done one level of interpretation on the phrase, there are still (usually) multiple ways to interpret that .

Technically, the predications are stored as a bag, not a list. Also, I don’t think the reason the predications are stored as a bag is because of underspecification, but rather that there isn’t information that is stored in the ordering of predications and/or arcs between predications.

While I’m not particularly opposed to this interpretation of “underspecified,” (though others might be), I think that usually an entire MRS isn’t referred to as underspecified. Instead, various components are underspecified. But, probably not a big deal if clarified in a footnote or introductory note or something.

  • Whether it was actually seen in the text (starts with _) or added abstractly by the system (no initial _)

Probably better to say “by the grammar,” but that also doesn’t really capture what grammatical predicates are for, I think.

If you pick variable values such that the MRS is true for a given world, then you have understood the meaning of the MRS in that world.

I think this is a misstatement of how truth-conditional semantics works. I think traditionally, the sentence “means” the set of truth conditions that are true, as opposed to a single truth condition that is true. For instance, if Bob says “cats walk” and then I see a cat walking, it’s not the case that I understand what Bob meant.

LBL:

It’s unclear why just for LBL you’re including the colon, is this a typo?

  • PT: ?

Looks like PT means “prontype” and is for distinguishing different kinds of pronouns like reflexives, etc. See its definition.

This indicates that the verb go is the “main point of the phrase”. This is called the “syntactic head” in linguistics.

I don’t think “syntactic head” is accurate here. Usually the ARG0 of the verb or other main predicate is the INDEX, but 1. this isn’t the syntactic head (which, especially in delph-in and HPSG, typically refers to the head of a phrase, not an utterance) and 2. I believe there are cases when the INDEX is not the ARG0 of the main verb/etc. (in a different way than the copula example you provide).