Searching ERG treebanks


#1

Dear all,

I’m interested (in the context of a project with @kphowell) to be able to dig through the treebanks to find examples of particular verbs in particular lexical types. For example, the ERG has an entry of type v_np*_le for acknowledge, which seems surprising to me, but I’m sure it’s there for a reason. Do we have any interfaces that facilitate this currently? Is our best bet just to grep the thinned profiles?

Thanks,
Emily


#2

I believe the derivations in the treebanks use the UDF format, which does not include lexical types. What you want is the UDX format. I’m not sure if there’s an easy way for ACE or the LKB to read in a UDF derivation, reconstruct the full analysis (assuming this step is required), and output the more rich UDX format.

Perhaps easier is to compile a list lexical entries with the type in question, e.g., by grepping lexicon.tdl with the type you are interested in (or scripting it with PyDelphin), then to grep the treebanks for these lexical entries rather than the lexical types.

edit: also note that zgrep will work better than grep because the result files in the treebanks are gzipped


#3

Ltd will find things like this.


#4

Sorry, what is Ltd?


#5

Ltd is the Linguistic Type Database. You can access the instantiation for the ERG here:

http://compling.hss.ntu.edu.sg/ltdb/cgi/ERG_1214/ltypes.cgi

I was hoping that might have my answer, but I don’t see how to get examples for a particular verb in a particular lex type. If I search acknowledge in the lemma field, I get a list of types from which I can click on v_np*_le, but then the examples there are random other verbs. Is there a way to drill down further and find acknowledge as v_np*_le specifically?

Thanks!


#6

I just remembered that TSQL will also work here, as long as you know the lexical entry ID (e.g. acknowledge_v1). PyDelphin supports it but the LOGON one is written in C and is faster. Try this:

$ "$LOGONROOT"/bin/tsdb -query 'select i-id i-input where derivation ~ "acknowledge_v1"' -home path/to/profile

But you can only specify one profile at a time. This searches all profiles under ~/grammars/erg-1214/tsdb/gold (adjust as necessary):

$ for d in $( find ~/grammars/erg-1214/tsdb/gold -name relations )
> do
>   p=$( dirname "$d" )
>   "$LOGONROOT"/bin/tsdb -home "$p" -query "
>       select i-id i-input where derivation ~ \"acknowledge_v1\"
>       report \"$( basename $p ) %s %s\""
> done
wsj10a 21000012 Kemper officials declined to identify the firms but acknowledged a long-simmering dispute with four securities firms and said the list of brokers it won't do business with may be lengthened in the months ahead.
ws08 10480330 Many linguists would agree that these divisions overlap considerably, and the independent significance of each of these areas is not universally acknowledged.
wsj13d 21375016 In an interview, Mr. Joseph says his dinner discussion with the Prudential executives acknowledged problems for junk.
wsj07a 20719043 A Frankfurt exchange official, acknowledging the brokers' anxieties, says the market still feels it "functioned OK during this crash."
wsj19b 21929039 In a statement that was as close as East Germany gets to practicing "glasnost", Otto Reinhold, an East German party theorist, actually acknowledged the reunification dilemma.
wsj09b 20956009 It's also the prime showcase for a country whose world dominance in the industry is increasingly acknowledged, and therein lies the draw.
wsj16c 21634062 Mr. Peters acknowledges that and says it's not unlike the situation he and Mr. Guber are in with Warner.
wsj01a 20102023 The company acknowledges some problems.
ws13 10781630 Due to this influence and for other sociohistorical reasons, a standardized form of the language ([[Standard Spanish]]) is widely acknowledged for use in literature, academic contexts and the media.
wsj12d 21280017 Apparently acknowledging weaker U.S. sales systemwide, McDonald's vowed "to use our size and muscle to do all that is necessary to build the brand."
ws06 10360270 Its issues and errors were last acknowledged by errata published in 2001.
ecpr 2417010 please acknowledge.
wsj13e 21394062 Federal Express officials acknowledge mistakes in their drive overseas but say it will pay off eventually.
rtc001 30002026 The boy acknowledged having lied.
wsj06c 20669008 That will involve a substantial increase in overseas manufacturing capacity, he acknowledged, but didn't provide specific details.
ecoc 2065208 I will not acknowledge receipt of my order 10250.
wsj14c 21449024 Bond Corp. said the acknowledged losses mean net asset backing is in the red to the tune of 53 Australian cents a share, vs. positive asset backing of A$1.92 a share a year ago.

#7

Thanks, @goodmami!


#8

Sorry for my short reply earlier, I was in transit.

Currently, as you noted, ltdb does not allow you to specify lexical item. A new version is very almost ready, I am now at Lingo North working on the final release, and it will allow you to search for a specific lexical item.


#9

Excellent news!