Different exact MRS match results with pydelphin Testsuite process and Parser interact

I have two functions that use pydelphin to parse items in a tsdb profile. The goal is then to compare the resulting MRSs with the gold (from the ERG tsdb/gold).

I am observing different exact match number between the two functions. Could someone help me spot a bug or offer a possible explanation?

Method 1, with Testsuite process:

def run_ace_on_ts(tsuite, grammar, ace_exec, cmdargs, output_path):
    ts = itsdb.TestSuite(tsuite)
    with open(output_path + '/ace_err.txt', 'w') as errf:
        with ace.ACEParser(grammar, cmdargs=cmdargs, executable=ace_exec, stderr=errf) as parser:
            ts.process(parser)
    id2mrs = {}
    for i,res in enumerate(ts['result']):
        id = ts['item'][i]['i-id']
        id2mrs[id] = simplemrs.decode(res['mrs'])
    return id2mrs, len(ts['item'])

This gives me, for a particular experiment:

Parsed 11/25 sentences
2 same, 23 different, 0.08% exact match, 7.843531713485718 sec/sen

Method 2, with Parser interact, for each item separately:

def run_ace(tsuite, grammar, ace_exec, cmdargs, output_path, id2gold_mrs):
    ts = itsdb.TestSuite(tsuite)
    id2mrs = {}
    items = list(ts['item'])
    responses = []
    no_result = []
    with open(output_path + '/ace_err.txt', 'w') as errf:
        with ace.ACEParser(grammar, cmdargs=cmdargs, executable=ace_exec, stderr=errf) as parser:
            for item in items:
                response = parser.interact(item['i-input'])
                if len(response['results']) == 0:
                    no_result.append(item['i-input'])
                    print('*** No parse. ***')
                else:
                    responses.append(response)
                    id = item['i-id']
                    id2mrs[id] = simplemrs.decode(response['results'][0]['mrs'])
                    if id in id2gold_mrs:
                        if not mrs.is_isomorphic(id2gold_mrs[id], id2mrs[id]):
                            print('*** Different MRS ***')
                        else:
                            print('*** Same MRS ***')
    print("Parsed {}/{} sentences".format(len(responses), len(items)))
    return id2mrs, len(items)

This way, I get not 2 but 5 exact matches, somehow (note the warning).

Parsed 11/25 sentences
/home/olga/delphin/parsing_with_supertagging/venv/lib/python3.8/site-packages/delphin/dmrs/_operations.py:81: DMRSWarning: unusable TOP: h0
  warnings.warn(f'unusable TOP: {top_var}', dmrs.DMRSWarning)
5 same, 20 different, 0.2% exact match, 7.885234594345093 sec/sen

The gold MRS are the same for sure in both cases (they come from the same code and the same location).

Here’s what I do to count matches:

def compare_results(gold, experimental):
    same = []
    not_same = []
    new = []
    in_both = set(gold.keys()).intersection(set(experimental.keys()))
    only_in_gold = set(gold.keys()).difference(set(experimental.keys()))
    only_in_experimental = set(experimental.keys()).difference(set(gold.keys()))
    for id in in_both:
        if mrs.is_isomorphic(gold[id], experimental[id]):
            same.append(id)
        else:
            not_same.append(id)
    for id in only_in_gold:
        not_same.append(id)
    for id in only_in_experimental:
        new.append(id)
    return same, not_same, new

In method 1, this looks incorrect:

The result file only has rows for successful parses. One an item fails to parse, any successfully parsed items after that will have the wrong i-id. Furthermore, if you’re storing more than one result per item, the i-id can become misaligned in the other direction. I suggest you iterate over items using TestSuite.processed_items():

    for response in ts.processed_items():
        result = response.result(0)  # only the first result
        id2mrs[response["i-id"]] = result.mrs()

Method 2 seems ok, even if it misses out on some of PyDelphin’s conveniences. Unlike what I said in the other thread, I don’t think you necessarily need to use TestSuite.process_item() here. You are only recording the i-id, which could be the same for many results, but then you are only looking at the first result, so it works out.

I would be very surprised if that is the warning message you get for the code above. This is a warning you get when converting from MRS to DMRS, but I don’t see you doing any conversion.

Thanks, @goodmami , indeed, using ts.processed_items() after calling ts.process() to iterate fixes the problem.

(The conversion warning is an editing artifact; I was doing conversion but removed it from the code I pasted here…)

1 Like