English to Japanese Translation: config.tdl

Hi,

I am following the git repository (GitHub - delph-in/JaEn: Japanese↔English transfer grammar for machine translation) for the purpose of English to Japanese Machine Translation using the MRS , Jacy and ERG.

In the EN-Ja section, it is given that

There has not been any development on EnJa for a while, so it currently does not have an ACE config file. It may work with the LKB, but we have not tested this. We include the EnJa files so that we may update them in the future.

In this regard, I need some guidance/help in creating the ACE config file.

Thanks,
Sriram

Hi Sriram,

I would start by copying the jaen/ace/ subdirectory to enja/ace then start modifying the paths as appropriate. It probably requires more than that to get it working (e.g., updating predicate names that have changed in the ERG and Jacy since EnJa was last working), so I’d run one or both of the following commands to attempt compiling a grammar with ACE and react to the error messages it produces:

ace -g enja/ace/config.tdl -G enja.dat
ace -g enja/ace/config-core.tdl -G enja-core.dat

Also you might look at the commit history to see what @bond and I changed ~5 years ago when we updated JaEn. I’d think that EnJa would require similar changes.

Good luck!

hello goodmami sir ,
we don’t have the enja.dat file in the jaenmaster
so how can we perform that operation without enja.dat ???

@soumyapokale that command illustrates compiling a grammar with ACE. The -G option is the output file. The enja.dat file is created if that operation is successful.

1 Like

hello, sir we don’t have the config.tdl file for enja do we have to use the same config.tdl for Jacy and enja?

Here’s what I said above:

Sorry if that wasn’t clear. The config.tdl file is contained within that subdirectory. The subdirectory (and the file) do not exist for EnJa, so you need to create it. I’m suggesting that you copy the JaEn one instead of creating one from scratch. After copying it you’ll need to modify paths and things to get it to work.

hello sir ,
i have a doubt do we have to create a new file for file such as mwe.selected.mtr , lex-auto-jaen.mwe.mrs-tab.mtr and lex-auto-jaen.mwe.phr-tab.mtr

Those are automatically generated transfer rule files created from, e.g., bilingual dictionaries or SMT-like transfer rule extraction. The config-core.tdl of JaEn is an alternative to config.tdl that only includes the hand-built parts. Unless you have code to automatically produce EnJa transfer rules, you might start with the “core” grammar.

thanks sir , i m getting another error
while running this

from delphin import ace
import cutlet
grm = '/home/soumya/majorprojectjapen/jacy/ace/jacy.dat'
conf = '/home/soumya/majorprojectjapen/jacy/ace/config.tdl'

jgrm = '/home/soumya/grammars/jaen.dat'
egrm = '/home/soumya/grammars/erg.dat'
enja = '/home/soumya/grammars/enjacore.dat'

j_response = ace.parse('{}'.format(egrm), 'Abrams barked. ')
je_response = ace.transfer('{}'.format(enja), j_response.result(0)['mrs'])
e_response = ace.generate('{}'.format(grm), je_response.result(0)['mrs'])

print(e_response.result(0)['surface'])

I m getting following error

soumya@soumya-VirtualBox:~/FlaskApp$ /bin/python3 /home/soumya/grammars/mtsoumya.py
NOTE: parsed 1 / 1 sentences, avg 997k, time 0.02101s
NOTE: 1 transfer results        RAM: 20k
NOTE: transfered 1 / 1 sentences, avg 20k, time 0.00543s
NOTE: EP '"en:proper_q"' is unknown in the semantic index
NOTE: EP '"en:named"' is unknown in the semantic index
NOTE: EP '"en:_bark_v_1"' is unknown in the semantic index
WARNING: EP '"en:proper_q"' is not covered
WARNING: EP '"en:named"' is not covered
WARNING: EP '"en:_bark_v_1"' is not covered
NOTE: 45 passive, 106 active edges in final generation chart; built 46 passives total. [0 results]
NOTE: generated 0 / 1 sentences, avg 1070k, time 0.01433s
NOTE: transfer did 20 successful unifies and 0 failed ones
Traceback (most recent call last):
  File "/home/soumya/grammars/mtsoumya.py", line 14, in <module>
    print(e_response.result(0)['surface'])
  File "/home/soumya/.local/lib/python3.10/site-packages/delphin/interface.py", line 222, in result
    return self._result_cls(self.get('results', [])[i])
IndexError: list index out of range

can you guide us goodmami sir .

Some things I noticed:

Shouldn’t you be using an enja.dat that you’ve compiled?

I would rename those to e_response, ej_response, and j_response if you’re doing English-to-Japanese translation. Otherwise they are confusingly named. Also it seems like you’re getting the first parse and first transfer results (...result(0)), which assumes that there was a valid parse and a valid transfer. This is fine for testing, but you may want to put some guards around these calls to check for the lack of any results.

Finally, the error messages like NOTE: EP '"en:_bark_v_1"' is unknown in the semantic index are saying that predicates like en:_bark_v_1 are not in the Japanese grammar. It seems like transfer failed to map this to, e.g., ja:_hoeru_v_1, but also there should be a step that removes the ja: prefix, which is only used in transfer. @bond might be able to assist here.

Thanks @goodmami for the help.
Can you please clarify if the ‘config.tdl’ file you have mentioned is from the copied ‘jaen/ace’ directory or from the old ‘enja/ace’ which was working earlier ?
In the later case , from where we can get the earlier ‘config.tdl’ file ?

Thanks in advance,
Sriram

Copied from jaen/ace/, as mentioned above. I don’t think anyone has ever used ACE for EnJa. When EnJa was developed it was used with the LKB. I do not know how to set up the LKB for transfer.