Expanding the pydelphin isomorphism check

I wrote a subclass of PyDelphin’s MRS called SEMENT that adds ltop so I can work with them compositionally.

I want to use is_isomorphic to check whether two SEMENTs I create are isomorphic. As it stands now, the function CAN take the SEMENTs, but since LTOP is just something I added the isomorphism check just ignores it (I think).

What would be involved in having the isomorphism check also consider the LTOP? I feel like it should be doable (maybe as a separate function that checks for isomorphism between SEMENTs specifically) because in theory the LTOP would just be another part of the graph but looking at the code I can’t immediately see how to make this change.

The MRS class already has a top attribute. In an MRS for a full sentence, this is the GTOP that is qeq to the graph’s top handle, but in principle it could also serve as an LTOP handle during MRS construction. Does this not work for you?

Well, yes, but it also ignores the existing top as well. I didn’t include it previously because it wasn’t necessary as the h0 qeq hX constraint on HCONS was enough to identify the top, but if top is only an EP’s label I can see why it might need to be included explicitly.

Before checking isomorphism, PyDelphin transforms the MRS into what it calls an “isograph”, which is a simplified form of the MRS graph of the form g[source][target] = edge_label. You can modify the _make_mrs_isograph() function after these lines:

by inserting something like:

    g["*top*"][x.top] = "TOP"

or if you want to use the code as-is, you could insert an ad hoc qeq like this:

from copy import deepcopy

def is_sement_isomorphic(s1: MRS, s2: MRS) -> bool:
    s1_ = deepcopy(s1)  # don't modify the original
    s2_ = deepcopy(s2)
    s1_.hcons.append(HCons.qeq("*top*", s1_.top))
    s2_.hcons.append(HCons.qeq("*top*", s2_.top))
    return is_isomorphic(s1_, s2_)

I haven’t tested the above, but the idea is that the explicit qeq will be picked up by the _make_mrs_isograph() function.

Thanks!
I suppose just using the existing TOP feature could work. Is it the case that after composition whatever the LTOP of the final stucture is will always end up being GTOP anyway? If so then I can adjust mine to just use TOP as LTOP.

Another slight caveat that I didn’t mention above because I wanted to deal with LTOP first is that my SEMENT object has two more attributes, eqs and holes. eqs is a list of tuples of variables that are identified with one another. Every time a hole is plugged by an argument another eq is added. Once I’m finished with composition I turn these into sets of equalities then just pick one to represent the set and overwrite where necessary.

So for example, if after I’m done with composition I have [(x1, x2), (x2, x3)] for my eqs list, I turn that into a set (x1, x2, x3) then search through the structure and change any instance of x2 or x3 to x1.

To be honest I’m not sure if this is the best way to do it, but I figured it was easier to just keep adding the eqs until the end and then do one overwrite operation instead of having to search through the eqs list every time to see if a variable can be added to an existing set.

But as far as the isomorphism check, I think dealing with it in that form would be a massive headache, so if I want to check if two SEMENTs are isomorphic would you suggest I do the eq overwrite before checking? That’s what I think seems best but I’d like some input.

Then lastly for holes I don’t even know where to begin there or if it matters for isomorphism checking. As it stands now, the holes list looks something like this for a SEMENT: {'ARG1': 'x1', 'ARG2': 'x2'}. Maybe this would be better to check in an ad hoc way, like “do they each have the same keys in the holes dict?” … then somehow get back the isographs and see if the variable mapping is correct for each hole.

I believe so. From Copestake et al. 2005 (Intro to MRS paper), section 4.3 (emphasis added):

In order to state the rules of combination involving handles, we will assume that MRSs have an extra attribute, the local top, or ltop, which will be the topmost label in an MRS which is not the label of a floating EP(i.e., not the label of a quantifier). […]. In an MRS corresponding to a complete utterance, the top handle will be qeq to the local top. We will assume that when an MRS is composed, the top handle for each phrase in the composition is the same: that is, the top handle we introduced in §4.1 is a ‘global’ top handle gtop, in contrast to the local top for the phrase, which is the label of the topmost EP in that phrase which is not a floating EP.

I’m not sure where in the grammar the QEQ is introduced, though.

Not sure about holes, but the eqs list sounds like HCONS. You could create a PyDelphin HCons object with the lheq relation instead of maintaining a separate list, if that interests you.

While we’re on the subject, you might find delphin.scope.conjoin() useful?

Yes, if you consider a structure with the eqs and a resolved one to be equivalent. The isomorphism check will consider lheq HCons for the “isograph” mentioned above and it will appear connected, but it will still have a different shape to one with resolved handles.

I’m afraid I can’t say what is appropriate here. If the roles and holes/variables appear in EP.args then I think they will be part of the isograph and thus the isomorphism comparisons.

@goodmami Hi again, reviving this thread, I implemented what you put above in this reply.

But I’m getting the following error every time it tries to do the deep copy:
TypeError: HCons.__new__() missing 2 required positional arguments: 'relation' and 'lo'

The code for deepcopy is kind of inscrutable to me so I don’t really know what could be causing this… the HCons object that is on the hcons list for the SEMENT seems to have everything it needs:

Do you have any ideas about what could be causing this error?

Update: I tried the same thing using an MRS object, as opposed to my SEMENT object and I got the same error, so it seems something about the MRS class doesn’t get along well with deepcopy.

Yes, I think I see the problem. The HCons class is, for some reason, a subclass of tuple, which allows you to subscript and unpack hcons objects:

>>> from delphin.codecs import simplemrs
>>> m = simplemrs.decode('[TOP: h0 INDEX: e2 RELS: < [ _rain_v_1 LBL: h1 ARG0: e2 [ e TENSE: pres ] ] > HCONS: < h0 qeq h1 > ]')
>>> hi, rel, lo = m.hcons[0]
>>> hi == m.hcons[0][0] == m.hcons[0].hi
True

The problem is that the tuple parent class probably implements __deepcopy__() to determine the behavior during copy.deepcopy(), and tuples have different instantiation signatures than HCons. HCons take 3 positional arguments, but tuples take a single iterable argument:

>>> from delphin import mrs
>>> mrs.HCons('h0', 'qeq', 'h1')  # 3 positional arguments
hi='h0' relation='qeq'  lo='h1'
<HCons object (h0 qeq h1) at 125837875857984>
>>> tuple(['h0', 'qeq', 'h1'])  # single iterable argument
('h0', 'qeq', 'h1')

Giving HCons a single iterable argument (or tuples multiple arguments) results in an error:

>>> mrs.HCons(['h0', 'qeq', 'h1'])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: HCons.__new__() missing 2 required positional arguments: 'relation' and 'lo'

To fix the issue, we could do one of the following:

  1. Implement __deepcopy__() on the HCons (or _Constraint in PyDelphin) class
  2. Make HCons no longer be a subclass of tuple
  3. Find another way to copy the MRS, such as a serialization round-trip

For (3) above, you can try replacing the deepcopy lines with these:

    s1_ = simplemrs.decode(simplemrs.encode(s1))
    s2_ = simplemrs.decode(simplemrs.encode(s2))

It’s a bit inefficient, but it should give the intended result.

For now I’ve elected to just make a new SEMENT object and give all the information from the old one, plus the new qeq.

But since you’ve mentioned it, how complicated would it be for me to extend the decode functionality for my SEMENT objects?

I went ahead and implemented versions of encode, but decode seems more complicated. For the most part, everything should be the same as decoding an MRS string with the exception that a SEMENT has SLOTS and EQs:

[ TOP: h0
            INDEX: e1
            RELS: < [ _give_v_1 LBL: h0 ARG0: e1 ARG1: i2 ARG2: u3 ARG3: i4 ]
                [ _a_q LBL: h10 ARG0: x11 RSTR: h12 BODY: h13 ]
                [ _cookie_n_1 LBL: h6 ARG0: x7 ] >
            HCONS: < h12 qeq h6 >
            EQS: < (x11,x7), (x7,i2) >
            SLOTS: < ARG2: u3, ARG3: i4 > ]

So really I’m just looking to write whatever I need to decode the EQs and the SLOTS then reuse the rest, but it’s a little more complicated than encode it seems.

Maybe not too hard, but I’d suggest some changes to what you’re currently encoding. SimpleMRS does not use parentheses or commas so if you keep those you’d need to add new token types to the parser. Getting the regular expressions correct and matched in the right order can be tricky sometimes.

Here’s how you might follow SimpleMRS conventions and reuse existing token types:

            EQS: < x11 x7 x7 i2 >
            SLOTS: < ARG2: u3 ARG3: i4 > ]

The parser would expect EQS and SLOTS to appear in pairs so it wouldn’t get confused. A human might, though. So, if that bothers you, you could add a relation like with HCONS:

            EQS: < x11 lheq x7 x7 lheq i2 >

(Note: since lheq (label–handle equality) is already a valid HCONS relation, you could just put those on the HCONS list and you wouldn’t need a separate EQS list at all. You’d need to change how your code models them and separates them from qeqs, though.)

Then you add some code to parse the tokens following this pattern:

E.g.,

elif feature == "EQS":
    lexer.expect_type(LANGLE)
    while lexer.peek()[0] == SYMBOL:
        leq = _decode_variable(lexer, variables)
        # parse the relation here if you use one
        req = _decode_variable(lexer, variables)
        eqs.append((leq, req))
    lexer.expect_type(RANGLE)

I missed that the variables on EQS were individuals and not handles and thus wouldn’t be appropriate on HCONS. ICONS, however, should be fine.

What is an lheq relationship?

Also, all EQs are introduced as a pair, but there is a step in the process where I merge them into sets to make the overwriting easier, do you have any ideas for representing that?

As in, initially, it might be:

EQs: < x11 lheq x7 x7 lheq i2

But after the rewrite step it becomes a set of three, maybe something like this?

EQs: < x11 lheq x7 lheq i2 >

But I guess that might not be legitimate because I assume an lheq relationship only has two parts.

From the RmrsFaq wiki:

Q What are the elements in hcons: qeq, lheq, and outscopes.

A They are:

  • qeq - equality modulo quantifiers. A qeq constraint always relates a hole to a label. The intuition is that if a handle argument h is qeq to some label l, either that argument must be directly filled by l (i.e., h=l), or else one or more quantifiers ‘float in’ between h and l.
  • lheq - label-handle equality. The label and handle are constrained to be equated. Explicit equalities are used in various papers - the option of working with them rather than making the variables the same is useful theoretically and sometimes practically.
  • outscopes - directly or indirectly takes scope over. This is the relationship used by most work on underspecification. mentioned as an option in the MRS paper. Two possible versions: a) hole outscopes label - like qeq with no restriction to quantifiers b) label1 outscopes label2 - label1 must dominate label2 in all scope-resolved solutions. Not currently supported by the LKB scoping code in either case.

You don’t really see anything but qeq in the outputs of DELPH-IN grammars, but it’s not the only constraint type allowed on HCONS.

Not really. If you’re defining how the EQS list is parsed, you can choose a notation that makes sense for you. But if the variables on EQs are not labels/handles, then I wouldn’t call the constraint lheq (since the lh stands for label-handle), but rather just eq or something.

And to provide another alternative… if writing encoders and decoders for a SimpleMRS variant doesn’t sound like fun, you could use something like MrsJSON. The serialization and deserialization would be trivial since it’s just JSON. You’d just need to add some lines to the codec’s to_dict() and from_dict() functions.