ACE parser Error for Japanese

Hi,

I wanted to parse the following Japanese sentence in the ACE parser.

ええ、いいですよ。いっしょにいきましょう。

while parsing using “JACY.dat” , I am getting the following error.

え、いいですよ。いっしょにいきましょう。
NOTE: lexemes do not span position 0 `え、いいですよ'!
NOTE: post reduction gap
SKIP: え、いいですよ。いっしょにいきましょう。
NOTE: ignoring `え、いいですよ。いっしょにいきましょう。'

I am able to parse the other japanese sentences correctly without any error.
For e.g. the following sentence gives correct parse without any error.

 彼 は 大声 で 助け を 求め た 。
NOTE: 1 readings, added 591 / 193 edges to chart (73 fully instantiated, 69 actives used, 42 passives used)    RAM: 3020k
SENT: 彼 は 大声 で 助け を 求め た 。
[ LTOP: h0
INDEX: e2 [ e TENSE: past MOOD: indicative PROG: - PERF: - ASPECT: default_aspect PASS: - SF: prop ]
RELS: < [ pron_rel<0:1> LBL: h4 ARG0: x3 [ x PERS: 3 NUM: sg GEND: m PRONTYPE: std_pron ] ]
 [ def_q_rel<-1:-1> LBL: h5 ARG0: x3 RSTR: h6 BODY: h7 ]
 [ _wa_d_rel<2:3> LBL: h4 ARG0: e8 [ e TENSE: untensed MOOD: indicative PROG: - PERF: - ASPECT: default_aspect PASS: - SF: prop ] ARG1: e9 [ e TENSE: tense MOOD: indicative PROG: - PERF: - ASPECT: default_aspect PASS: - SF: prop ] ARG2: x3 ]
 [ udef_q_rel<4:6> LBL: h10 ARG0: x11 [ x PERS: 3 ] RSTR: h12 BODY: h13 ]
 [ "_oogoe_n_1_rel"<4:6> LBL: h14 ARG0: x11 ]
 [ "_de_p_rel"<7:8> LBL: h1 ARG0: e15 [ e TENSE: untensed MOOD: indicative PROG: - PERF: - ASPECT: default_aspect PASS: - SF: prop ] ARG1: e2 ARG2: x11 ]
 [ udef_q_rel<9:11> LBL: h16 ARG0: x17 [ x PERS: 3 ] RSTR: h18 BODY: h19 ]
 [ "_tasuke_n_1_rel"<9:11> LBL: h20 ARG0: x17 ]
 [ "_motomeru_v_1_rel"<14:16> LBL: h1 ARG0: e2 ARG1: x3 ARG2: x17 ] >
HCONS: < h0 qeq h1 h6 qeq h4 h12 qeq h14 h18 qeq h20 > ]
-----------------------------

Can you please help in debugging the issue?

Thanks,
Sriram

Very generally, the error means that the parser doesn’t know what to do with the token え、いいですよ. It may be because it is not in the lexicon, or may be due to morphological processing (I don’t know anything about to what degree that applies to Jacy). It can also be a tokenization issue (something is considered a token when it is not). It can also be the case of a missing generic lexical entry for a corresponding POS tag. I don’t know much about Jacy and I don’t know Japanese so I cannot say more.

Jacy is expecting strings with whitespace (single byte whitespace) tokenization, corresponding roughly to the output of ChaSen. That’s true in this example:

彼 は 大声 で 助け を 求め た 。

So you need to figure out where to put the white space in your other example. At a guess, I’d try:

ええ 、いい です よ 。
いっしょに いき まし ょう 。

Note that those are two separate sentences, so should be parsed separately! Also, I think there needs to be white space separating the punctuation marks from the words.

1 Like