Parsing punctuation marks with SRG (using ACE)

One issue that arises with trying to use ACE with the SRG is punctuation marks.

Currently, if I try to parse:

(1, 0, 1, <0:2>, 1, "el" "el", 0, "DA0MS0", "DA0MS0" 1) (2, 1, 2, <3:8>, 1, "perro" "perro", 0, "NCMS000", "NCMS000" 1) (3, 2, 3, <9:15>, 1, "dormir" "duerme", 0, "VMIP3S0", "VMIP3S0" 0.989241) (4, 3, 4, <16:17>, 1, "." ".", 0, "Fp", "Fp" 1)

I get:

NOTE: lexemes do not span position 3 `.'!

This looks different from errors associated with lexical rules which aren’t found, so I am not sure whether this is an issue with the freeling morphological analyzer or not, although I think Luis @lmc thought that it was related. Does anyone have a good idea about this?

The full stop “lexical rule” looks like this in the SRG:

; -- full-stop .
fp := 
%suffix (fp fp)

but I don’t think capitalization is meaningful (in fact, I am sure it is not: (1) I tried changing Fp to fp in the input and it did not have any effect, and (2) all tags come out all-caps from freeling and the ones that match the lowercase versions in inflr.tdl usually work, for example VMIP3S0 will map successfully to vmip3s0).

When running this sentence with LKB-FOS, the following lexical entry is picked up for the final “.” and the parse is successful:

fstop_pt := pt_-_fstop_le & 
  [ STEM < "\." > ].

Note the superfluous backslash inside the double quotes. Perhaps the backslash is confusing ACE?

1 Like

This was indeed the case, thanks a lot, John.

I am wondering now if removing that backslash might break anything?.. I mean, it doesn’t break anything I am doing at the moment (parsing sentences with ACE).

Removing the backslash in "\." shouldn’t break any of the DELPH-IN processing systems. I suggest you also remove the backslash in the SRG entry containing "\'". Backslash in a string is only necessary if you want the string to contain a double quote character or a backslash itself (so for example, it’s needed in "\"" and "\\/").

1 Like