Treebanking: checking across decisions, preferences, results and relations

In an old SRG treebank, suppose I have an item la ventana se abrió (`The window opened.'), with item ID 31.

I am not sure I exported the gold trees correctly (using a tsdb Export script), so I am trying to check.

Am I doing it correctly? Here’s what I am doing:

  1. I look into the preference file, for item ID 31. There is only one line for each ID in that file I think:
31@1@0
  1. Now looking at therelations file:
preference:
   parse-id :integer :key
   t-version :integer
   result-id :integer

Do I understand right that it is saying the gold tree is tree number 0 in the result file?

There is a rather complex analysis under that result number but given that the clitic se is involved, perhaps that analysis is what is intended. In any case, for now I just want to know whether I am interpreting the relations correctly?

31@0@-1@-1@-1@-1@-1@-1@-1@-1@(5954 flr-hd_nwh_c 1.364140 0 5 (5945 sp-hd_c -0.986210 0 2 (5943 da0fs0 0.000000 0 1 (6 el_d 0.000000 0 1 ("el" 1 "token [ +FORM \\"el\\" +FROM \\"0\\" +TO \\"2\\" +ID diff-list [ LIST cons [ FIRST \\"1\\" REST list ] LAST list ] +POS pos [ +TAGS cons [ FIRST \\"da0fs0\\" REST null ] +PRBS cons [ FIRST \\"0.989260\\" REST null ] ] +CLASS token_class +TRAIT token_trait +PRED predsort +CARG string ]"))) (5944 ncfs000 0.000000 1 2 (7 ventana_n 0.000000 1 2 ("ventana" 2 "token [ +FORM \\"ventana\\" +FROM \\"3\\" +TO \\"10\\" +ID diff-list [ LIST cons [ FIRST \\"2\\" REST list ] LAST list ] +POS pos [ +TAGS cons [ FIRST \\"ncfs000\\" REST null ] +PRBS cons [ FIRST \\"1.000000\\" REST null ] ] +CLASS token_class +TRAIT token_trait +PRED predsort +CARG string ]")))) (5953 hd_optsb_c 2.484488 2 5 (5952 ct-hd_c 2.143565 2 5 (5946 p00cn00 0.000000 2 3 (10 se_pr-i 0.000000 2 3 ("se" 3 "token [ +FORM \\"se\\" +FROM \\"11\\" +TO \\"13\\" +ID diff-list [ LIST cons [ FIRST \\"3\\" REST list ] LAST list ] +POS pos [ +TAGS cons [ FIRST \\"p00cn00\\" REST null ] +PRBS cons [ FIRST \\"0.494509\\" REST null ] ] +CLASS token_class +TRAIT token_trait +PRED predsort +CARG string ]"))) (5951 hd_xcmp-v_c 2.143565 3 5 (5950 hd-pt_c 2.152085 3 5 (5948 vmis3s0 0.833495 3 4 (5947 v_impers-cl_dlr 0.000000 3 4 (13 abrir_v-np_rfx 0.000000 3 4 ("abrir" 4 "token [ +FORM \\"abrir\\" +FROM \\"14\\" +TO \\"19\\" +ID diff-list [ LIST cons [ FIRST \\"4\\" REST list ] LAST list ] +POS pos [ +TAGS cons [ FIRST \\"vmis3s0\\" REST null ] +PRBS cons [ FIRST \\"1.000000\\" REST null ] ] +CLASS token_class +TRAIT token_trait +PRED predsort +CARG string ]")))) (5949 fp 0.000000 4 5 (18 fstop_pt 0.000000 4 5 ("." 5 "token [ +FORM \\".\\" +FROM \\"20\\" +TO \\"21\\" +ID diff-list [ LIST cons [ FIRST \\"5\\" REST list ] LAST list ] +POS pos [ +TAGS cons [ FIRST \\"fp\\" REST null ] +PRBS cons [ FIRST \\"1.000000\\" REST null ] ] +CLASS token_class +TRAIT token_trait +PRED predsort +CARG string ]"))))))))@@("?" ("?" ("?" ("?" ("el"))) ("?" ("?" ("ventana")))) ("?" ("?" ("?" ("?" ("se"))) ("?" ("?" ("?" ("?" ("?" ("abrir")))) ("?" ("?" ("."))))))))@[ LTOP: h0 INDEX: event2 [ event SORT: semsort E.TENSE: ppast E.ASPECT: aspect E.MOOD: ind SF: iforce ] RELS: < [ focus_d_rel<-1:-1> LBL: handle1 [ handle SORT: semsort ] WLINK: list CFROM: *top* CTO: *top* ARG0: event7 [ event SORT: semsort E.TENSE: basic_tense E.ASPECT: aspect E.MOOD: mood SF: iforce ] ARG1: event2 ARG2: ref-ind8 [ ref-ind SORT: cnc PNG.PN: 3per PNG.GEN: gender PRONTYPE: not_pron DEF: bool DIVISIBLE: bool ] ]  [ _el_q_rel<-1:-1> LBL: handle9 [ handle SORT: semsort ] WLINK: list CFROM: *top* CTO: *top* ARG0: ref-ind8 RSTR: handle13 [ handle SORT: semsort ] BODY: handle14 [ handle SORT: semsort ] ]  [ "_ventana_n_rel"<-1:-1> LBL: handle15 [ handle SORT: semsort ] WLINK: list CFROM: *top* CTO: *top* ARG0: ref-ind8 ]  [ pron_rel<-1:-1> LBL: handle19 [ handle SORT: semsort ] WLINK: list CFROM: *top* CTO: *top* ARG0: ref-ind23 [ ref-ind SORT: semsort PNG.PN: 3per PNG.GEN: gender PRONTYPE: impers DEF: bool DIVISIBLE: bool ] ]  [ pronoun_q_rel<-1:-1> LBL: handle24 [ handle SORT: semsort ] WLINK: list CFROM: *top* CTO: *top* ARG0: ref-ind23 RSTR: handle28 [ handle SORT: semsort ] BODY: handle29 [ handle SORT: semsort ] ]  [ "_abrir_v_rel"<-1:-1> LBL: handle30 [ handle SORT: semsort ] WLINK: list CFROM: *top* CTO: *top* ARG0: event2 ARG1: index34 [ index SORT: semsort PNG.PN: 3sg PNG.GEN: gender PRONTYPE: prontype DEF: + ] ARG2: ref-ind8 ] > HCONS: < h0 qeq handle1 handle13 qeq handle15 handle28 qeq handle19 > ]@((:ascore . 1.36414) (:probability . 0.46655))

The above seems to be in agreement with how fftb reads the treebanked profile:

Screen Shot 2023-02-10 at 5.02.51 PM

…but in a disagreement with what I managed to export using tsdb Export script, so I am trying to make sure I know what is correct.

Anyone? :slight_smile:

The main question is how to read the preference file, and whether that is the file I should first be looking at when trying to identify the gold tree. What’s t-version? Do I want that or the result-id (in the preference file), or a different file altogether?

It appears that the FFTB tool does not record the ‘rank’ of the parse in the third field of (each line of) the preference file, presumably because in general it would be hard to assign that number from the packed-forest ‘edge’ relation. This is different behavior from that of the old treebanking tool, where we recorded for each item the top N parses which were ranked according to the maxent model, so the selected tree had a rank which was stored in that third field in the preference relation. Even if you had the gold rank for an item, it won’t be a reliable identifier of the tree you want in a newly parsed run, since the grammar may well have changed, producing a different number of trees. So you’ll need to identify the gold tree by comparing the stored derivation, but FFTB will tell you whether or not you’ve matched that gold tree for each item.

That’s right. This use case is for situations when I cannot arrive to a gold tree given the current grammar at all, and so what I need is to understand what exactly the gold tree used to look like. Sorry, the specific example above was confusing in that sense.

The question then is, how can I use the original tsdb files (not the reparsed ones) to find that one gold tree, previously selected, from the previous results. Then I can look at that tree and try to understand why the new grammar doesn’t give it to me. So, at that point, I am not using FFTB, nor the updated forest. I thought I could rely on the [incr tsdb()] Export tool, but, given what I see above, I am no longer sure that I can (or that I used it correctly).

So, to rephrase, in the old profile, which has not been reparsed, does the preference tell me that I should look at result #0 in the result file, for the previously selected gold tree?

31@1@0
preference:
   parse-id :integer :key
   t-version :integer
   result-id :integer

If your preference file contains any lines where the last field is something other than 0, then yes, that third number should identify the parse. I’m pretty sure Montse treebanked using the classic tool, not FFTB, so the preference file should be informative for you.

As for viewing those gold trees, you should be able to view them by loading the old grammar into the LKB plus [incr (tsdb)] and then use the [incr (tsdb)] Browse–Results on the relevant profile to get a list of gold items. Double-click on the red integer (presumably 1) in ‘derivation’ for an item, and then double click on the derivation line that pops up, to get a recreated parse tree for that item. To see why that item does not succesfully parse with your new grammar, you can load that grammar, and then repeat the steps above. Instead of bringing up a parse tree window, the LKB should report why the attempted recreation of the tree failed.

1 Like

Thanks, @Dan ! Part of the problem is I cannot work with the new grammar in the LKB nor can I load the old grammar with ACE, so I have to go back and forth between ACE and the LKB. But this helps! I think many of the items are not actually treebanked (but some are), and probably that is what confuses me and what led me to think I can’t easily get access to the gold tree there…