'Peg' object has no attribute 'format': parsing with ACE via pyDelphin


#1

I am trying to process a testsuite with ACE, via pydelphin.

I created a toy testsuite with mkprof:

delphin.sh mkprof --input toy.txt --relation Relations output/tsdb/

I took the Relations file from logon/lingo/lkb/src/tsdb/skeletons/english/. (I don’t know if that is right.)

Then:

ts = itsdb.TestSuite('output/tsdb/')

Then I do:

with ace.AceParser('grammar.dat',executable='ace-0.9.30/ace') as cpu:
    ts.process(cpu)

Something is wrong however (probably with the profile?) as I get the following:

Upon inspecting:

Is it obvious to anyone where I am making the wrong step?


#2

I believe I was not importing the pydelphin module correctly and python managed to find something else called SExpr :). This particular issue seems to be resolved :).


#3

Yes it sounds like you have more than one version of PyDelphin. The format() method was added to the S-Expression parser delphin.util.SExpr in version v0.7.1, and that version also stopped using the Peg parsing functionality and introduced the TestSuite.process() method. I would pip uninstall pydelphin (maybe also with pip3) and make sure you’re using a virtualenv to manage dependencies for your project. Hopefully that will keep things clear and reproducible.


#4

Hi Olga and Michael,

I am also in the process of collecting a corpus with ~10k sentences to run ERG and see identify the coverage and the reason for missing analysis. I am curious to understand what you are calling a testsuite and what would be the benefits to use that instead of manually execute the grammar over a text file with one sentence per line and inspect the results in the output file.


#5

Hi Alexandre,

In this context, a testsuite means a list of sentences, grammatical or ungrammatical, associated with a number of files of a certain type and a directory structure, which all make it possible to store and explore the results of parsing in [incr tsdb()].

I really like Michael’s docs on the matter.

In short, some of the benefits of using [incr tsdb()] include comparison of two testsuites and generally a number of convenient ways of examining the results. Much better than doing that manually. And everything is stored, so you can look at it again next day and start from where you finished, etc. And there is some treebanking tool.

The cons are that the software is a bit challenging to use sometimes. But possible :). Michael has some utilities in pydelphin which allow to compare testsuites as well.


Testsuites and grammar profiling tools
#6

Thanks, Olga. I want to add that an input item can yield more than one output, or zero outputs, and in a flat text file it can become easy to lose track of which outputs correspond to which inputs. The test suites help manage that, as well as store performance-related information, such as the time and memory required for a parse, estimates of the size of the search space for a parse, reasons for parse failures, reranker scores for multiple analyses, etc. These additional details can be quite useful, depending on the task, especially (but not solely) if the task is grammar development and evaluation (hence the terminology of “test” and “profile”).


Testsuites and grammar profiling tools
#7

Thank you both. I am reading the docs. Many things to learn! My task is mainly the grammar evaluation. Given a set of sentences I want to determine what the reason when a sentence was not parsed and which sentences generate more readings and why. I suspect some ambiguities are caused by compound terms not considered as such.

BTW, sorry for changing the focus of the thread!


#8

@arademaker I created a new topic for this discussion. Let’s continue there.