Unexpected structure for pydelphin mrs-compare

We are running into another issue trying to use pydelphin mrs-compare on treebanks parsed with the SRG. Previously, the issue seemed to be underspecified CARG values (). This one looks a bit different and I cannot figure it out so far. Anyone see what the issue is?

The MRS:

[ LTOP: h0 INDEX: e2 [ e SF: prop TENSE: untensed MOOD: indicative ] RELS: < [ _después_de_p_rel<-1:-1> LBL: h1 ARG0: i4 ARG1: e2 ARG2: x5 [ x GEND: m ] ]  [ generic_entity_rel<-1:-1> LBL: h6 ARG0: x5 ]  [ _este_q_rel<-1:-1> LBL: h7 ARG0: x5 RSTR: h8 BODY: h9 ]  [ pron_rel<-1:-1> LBL: h10 ARG0: x3 [ x GEND: f ] ]  [ pronoun_q_rel<-1:-1> LBL: h11 ARG0: x3 RSTR: h12 BODY: h13 ]  [ _be_v_id_rel<-1:-1> LBL: h1 ARG0: e2 ARG1: x3 ARG2: x14 ]  [ _el_q_rel<-1:-1> LBL: h15 ARG0: x14 RSTR: h16 BODY: h17 ]  [ "_estrella_n_rel"<-1:-1> LBL: h18 ARG0: x14 ]  [ _de_p_rel<-1:-1> LBL: h18 ARG0: i19 ARG1: x14 ARG2: x20 ]  [ art_indef_q_rel<-1:-1> LBL: h21 ARG0: x20 RSTR: h22 BODY: h23 ]  [ "_programa_n_rel"<-1:-1> LBL: h24 ARG0: x20 ]  [ _en_p_rel<-1:-1> LBL: h24 ARG0: i25 ARG1: x20 ARG2: x26 [ x GEND: f ] ]  [ appos_rel<-1:-1> LBL: h27 ARG0: i28 ARG1: x26 ARG2: x29 ]  [ _el_q_rel<-1:-1> LBL: h30 ARG0: x26 RSTR: h31 BODY: h32 ]  [ "_canal_n_rel"<-1:-1> LBL: h33 ARG0: x26 ]  [ named_rel<-1:-1> LBL: h34 CARG: "disney" ARG0: x29 ARG1: u36 ] > HCONS: < h0 qeq h1 h8 qeq h6 h12 qeq h10 h16 qeq h18 h22 qeq h24 h31 qeq h33 > ]

In the debugger, it looks like maybe it is missing some LBL somewhere but I don’t understand where (I think all the LBLs are in place in the MRS above?..)

What I also notice though is that maybe the PRED value is somehow unexpected and gets broken down incorrectly?

después_de_p-cp-vm := p_cp_vm_native_le &
  [ STEM < "después_de" >,
    SYNSEM.LKEYS.KEYREL.PRED _después_de_p_rel ].

…because I seem to be seeing it in two pieces (the second one appears as e_p_rel below in the second screenshot):


It would appear that for some reason (maybe due to some invalid formatting on the SRG part?) pydelphin broke down the PRED value into two pieces and things that the second part of the PRED is the LBL?

Indeed it appears that if I change the underscore in the relation name to a dash, the problem goes away. Should I in principle change all of them? There are lots and lots of such relation names in the SRG… If they are against the formalism, they should be changed I guess. The change for each such item needs to be made in fundamentals.tdl and in lexicon.tdl

Possibly relevant:

1 Like

Aha; looks like a + is recommended.

1 Like

The + is recommended for predicates with spaces! In your case it looks like the problem is the accent in the name of the predicate, right?

[ LTOP: h0 INDEX: e2 [ e SF: prop TENSE: untensed MOOD: indicative ]
  RELS: <
  [ _después_de_p_rel<-1:-1>	LBL: h1 ARG0: i4 ARG1: e2 ARG2: x5 [ x GEND: m ] ]
  [ generic_entity_rel<-1:-1>	LBL: h6 ARG0: x5 ]
  [ _este_q_rel<-1:-1>		LBL: h7 ARG0: x5 RSTR: h8 BODY: h9 ]
  [ pron_rel<-1:-1>		LBL: h10 ARG0: x3 [ x GEND: f ] ]
  [ pronoun_q_rel<-1:-1>	LBL: h11 ARG0: x3 RSTR: h12 BODY: h13 ]
  [ _be_v_id_rel<-1:-1>		LBL: h1 ARG0: e2 ARG1: x3 ARG2: x14 ]
  [ _el_q_rel<-1:-1>		LBL: h15 ARG0: x14 RSTR: h16 BODY: h17 ]
  [ "_estrella_n_rel"<-1:-1>	LBL: h18 ARG0: x14 ]
  [ _de_p_rel<-1:-1>		LBL: h18 ARG0: i19 ARG1: x14 ARG2: x20 ]
  [ art_indef_q_rel<-1:-1>	LBL: h21 ARG0: x20 RSTR: h22 BODY: h23 ]
  [ "_programa_n_rel"<-1:-1>	LBL: h24 ARG0: x20 ]
  [ _en_p_rel<-1:-1>		LBL: h24 ARG0: i25 ARG1: x20 ARG2: x26 [ x GEND: f ] ]
  [ appos_rel<-1:-1>		LBL: h27 ARG0: i28 ARG1: x26 ARG2: x29 ]
  [ _el_q_rel<-1:-1>		LBL: h30 ARG0: x26 RSTR: h31 BODY: h32 ]
  [ "_canal_n_rel"<-1:-1>	LBL: h33 ARG0: x26 ]
  [ named_rel<-1:-1>		LBL: h34 CARG: "disney" ARG0: x29 ARG1: u36 ] >
  HCONS: < h0 qeq h1 h8 qeq h6 h12 qeq h10 h16 qeq h18 h22 qeq h24 h31 qeq h33 > ]

The _después_de_p_rel. By the way, it is interesting to have SRG now to test the toolsets… over years, ERG is the only grammar being used, so these errors shows the cases we need to review the specifications and make sure the tools are garmmar independent.

I think it’s the underscore between después and de, not the accent. It should be replaced with a +, sounds like. There is a gazillion of such rels in the SRG…

I think it’s probably fine to expect the grammars to conform to the standard though; if + is the standard, then the SRG should respect that, otherwise how are the tools going to parse it.

1 Like

I see, this is related to the discussion I had with @goodmami sometime ago

Indeed, PyDelphin is expecting the “convention” for predicate names, at first, this sounds strange to me. I would expect PyDelphin should respect the specification, not the convension. But @goodmami has good arguments to follow the convention.

See parsing MRS · Issue #371 · delph-in/pydelphin · GitHub and pydelphin/delphin/codecs/simplemrs.py at main · delph-in/pydelphin · GitHub. But the specification (MrsRFC · delph-in/docs Wiki · GitHub) says

Pred         := StringPred | TypePred
StringPred   := QuotedString
TypePred     := /_?([^_\s]+_)*(_rel)?/

I am a little bit more flexible in my implementation:

Correct, it is recommended to use + in place of spaces in predicates. This is a convention and not a format specification. The use of underscores is prescriptive, however. Surface predicates have a fixed upper limit of 3 underscores (ignoring any _rel suffix): _(lemma)_(pos)_(sense). From the wiki that @ebender linked in the section called Surface Predicates (emphasis added):

Surface predicates always have three fields: lemma, pos, and sense. The sense field is occasionally unspecified (e.g., _and_c).

The lemma field of a surface pred may be just about anything that does not contain underscores or spaces.

The above is a requirement for PyDelphin to parse the internal structure of the predicates. ACE seems to be ok with non-conforming predicates. I did not check the LKB.

I recommend reviewing your predicates containing more than 3 underscores (or more than 2 before the POS field, in case the sense is optional) and changing those as appropriate.

2 Likes