How should we analyse the word 'etc'?

Long discussion in the UD documentation: inconsistent analysis of etc · Issue #820 · UniversalDependencies/docs · GitHub. I am curious to understand the HPSG point of view.

I found many entries in the ERG lexicon.tdl: and_etc_conj, and_etc_conj_2 … and_etc_conj_6, etc_conj, etc_conj_2 … etc_conj_6, etc_conj_nbar, etc_conj_nbar_2 … etc_conj_nbar_5.

Maybe the big difference about a corpus-based analysis is precisely the fact that they want to converge to a single analysis instead of embracing the ambiguity and vagueness?

1 Like

I’m definitely no expert here, but after reading the UD thread, I feel like these are their open questions:

  • one word or two words
  • what POS tag
  • what dependency label

I think I generally agree with Manning’s points in the thread. I think their goals are a bit different than DELPH-IN’s, so e.g. the possible conclusion of calling it a noun seems particularly odd in an MRS.

With regards to the three points, I think the DELPH-IN pipelines are much more eager to split words, but @Dan has chosen to not do that for the ERG. I agree with Manning that it doesn’t make sense to do it synchronically. The conj POS tag in the ERG seems reasonable for the same reasons it is being argued in the thread. And it seems the preferred (by the ranker) rule is a conjunction rule, again in line with the UD folks.

To your point about one analysis vs ambiguity: I suspect it is more about fineness/precision than embracing ambiguity. As mentioned in the thread, they want to find a consistent tag basically regardless of context, which is definitely not the philosophy in DELPH-IN grammars AFAIU.