Installing extended jaen

while i m working on to use extended grammar

python2  /home/soumya/jaenproject/JaEn-master/utils/select-rule.py  . /home/soumya/jaenproject/JaEn-master/jacy/tsdb/skeletons/tanaka/tc-000/item

i m getting import error
ImportError: No module named MeCab
ModuleNotFoundError: No module named ‘delphin.mrs.components’

even i downloaded mecab latest version and pydelphin
and also when I try to install mecab from readme file it gives me error

(jaenproject) soumya@soumya-VirtualBox:~$ sudo apt-get install mecab-ipadic-utf8 python-mecab
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
E: Unable to locate package python-mecab

can you please guide me how I can run

That script is out of date. PyDelphin no longer works with Python 2. Also, it looks like you have activated a virtual environment jaenproject but when you do sudo apt-get install python-mecab it installs to the system environment and not to the virtual environment. Instead, try a regular pip install of the relevant packages.

As for the script being out of date, you’ll need to either update the script to work with the modern tooling (preferred), or install older versions of the packages.

what module can i use instead of from mrs.components import pred

It looks like the Pred class was only used to extract the lemma from a predicate string. You can use delphin.predicate.split() and get the first item in the tuple it returns to get the same thing.

hello sir

I am getting another error while running python file

soumya@soumya-VirtualBox:~/grammars$ python3 /home/soumya/grammars/utils/select-rule3.py . /home/soumya/grammars/jacy/tsdb/skeletons/tanaka/tc-000/item
./jacy/lexicon.tdl
<_io.TextIOWrapper name=‘’ mode=‘w’ encoding=‘utf-8’> Reading Jacy lexicon
Traceback (most recent call last):
File “/home/soumya/grammars/utils/select-rule3.py”, line 119, in
for entry in tdl._parse(open(pjoin(jacydir, ‘lexicon.tdl’))):
NameError: name ‘tdl’ is not defined

here I m running select_rule.py

#-*- coding: utf-8 -*-

# Script for selecting transfer rules relevant to batch files
# from the automatically derived transfer rules. The script reads the
# file(s) given as argument(s) and selects the transfer rules that could
# apply to the text in the batch file(s). To run the script, give the
# following command:
#
# $ python select-rule.py [OPTION...] WORKDIR ITEMFILE [ITEMFILE...]
#
# $ python select-rule.py ../ ~/logon/dfki/jacy/tsdb/skeletons/tanaka/tc-000/item ~/logon/dfki/jacy/tsdb/skeletons/tanaka/tc-001/item ~/logon/dfki/jacy/tsdb/skeletons/tanaka/tc-002/item
#
# WORKDIR is the working directory where the output files will be
# written to, and also where the default directories for data and Jacy
# are read from. You can create custom data sets by making a new working
# directory and copying or symlinking the relevant data to it.
#
# To install MeCab, try
#
# sudo apt-get install python-yaml
# sudo apt-get install mecab-ipadic-utf8 python-mecab
#
# pyDelphin must be installed or importable (see https://github.com/delph-in/pydelphin); try
#
# pip install pydelphin

import sys
import os; pjoin = os.path.join
from glob import glob
import re
import argparse

#import MeCab
#from delphin import tdl
#from delphin import predicate
from collections import deque  # remove when pyDelphin issue #81 is resolved


# various constants (unlikely to change often)
JACY_ORTH_FEAT = 'STEM'
JACY_PRED_FEAT = 'SYNSEM.LKEYS.KEYREL.PRED'
ERG_ORTH_FEAT = 'ORTH'
ERG_PRED_FEAT = 'SYNSEM.LKEYS.KEYREL.PRED'

DEFAULT_PROB = 0.09

parser = argparse.ArgumentParser()

# parser.add_argument('logonroot', help='root path of the LOGON distribution')
parser.add_argument(
    'WORKDIR', metavar='DIR',
    help='working directory (e.g. for output files)'
)

parser.add_argument('items', nargs='+', help='[incr tsdb()] item files')
parser.add_argument('--threshold', type=float, default=0.1)
# parser.add_argument('--division', type=int, default=1)
parser.add_argument(
    '--data', metavar='DIR',
    help='data (mtr) file directory [default: $WORKDIR/data]'
)
parser.add_argument(
    '--jacy', metavar='DIR',
    help='Jacy grammar directory [default: $WORKDIR/jacy]')
# parser.add_argument(
#     '--erg', metavar='DIR',
#     help='ERG grammar directory [default: $WORKDIR/erg]'
# )

# parser.add_argument()

args = parser.parse_args()

workdir = args.WORKDIR
if not os.path.isdir(workdir):
    if os.path.isfile(workdir):
        sys.exit(
            'Working directory path is not a directory: {}'.format(workdir)
        )
    os.makedirs(workdir)

datadir = args.data if args.data else pjoin(workdir, 'data')
if not os.path.isdir(datadir):
    sys.exit('Data directory not found: {}'.format(datadir))

jacydir = args.jacy if args.jacy else pjoin(workdir, 'jacy')
if not os.path.isdir(datadir):
    sys.exit('Jacy directory not found: {}'.format(datadir))

# ergdir = args.erg if args.erg else pjoin(workdir, 'erg')
# if not os.path.isdir(datadir):
#     sys.exit('ERG directory not found: {}'.format(datadir))

threshold = args.threshold
# division = args.division

#mecab = MeCab.Tagger('-Owakati')

# Reading Edict

# MWG 2017-03-15 : removed as it seems unused
# unfulltrans = set([])
# for line in open(pjoin(datadir, 'edict.ja-en.txt')):
#     items = line.rstrip().split('\t')
#     ens = items[1].split()
#     if len(ens) == 2:
#         unfulltrans.add(items[0] + '\t' + ens[0])
#         unfulltrans.add(items[0] + '\t' + ens[1])


# Reading Jacy

jrel = {}
id2rel = {}
# rel2lem = {}
nv = pjoin(jacydir, 'lexicon.tdl')
print(nv)
print(sys.stderr, "Reading Jacy lexicon")
for entry in tdl._parse(open(pjoin(jacydir, 'lexicon.tdl'))):
    identifier = entry.identifier
    orths = [o.strip('"') for o in entry[JACY_ORTH_FEAT].values()]

here I m getting error while executing for entry in tdl._parse(open(pjoin(jacydir, ‘lexicon.tdl’))):

hello sir
I m getting errors

while i run :

print(sys.stderr, "Reading Jacy lexicon")
for entry in tdl.parse(open(pjoin(jacydir, 'lexicon.tdl'))):

on this line I m getting a error :

AttributeError: module 'delphin.tdl' has no attribute 'parse'. Did you mean: '_parse'

can you guide sir ?

i m updating select3.py file from jaen-master/utils/select-rule.py

Did you try to uncomment the line:

#from delphin import tdl

??

Did you read the page below?

https://pydelphin.readthedocs.io/en/latest/api/delphin.tdl.html

I never use that module, but it seems the function to parse TDL files is iterparse. This page has a nice example about how to use it.

1 Like

yes i did uncomment it

i will try it

hello sir,

  1. Here itertools function is giving me three values (event, object, lineno) which are (‘LineComment’, ‘;;’, 1) .

  2. But I need to use it for this code:

for entry in tdl.iterparse(pjoin(jacydir, 'lexicon.tdl')):
    
    identifier = entry.identifier
    orths = [o.strip('"') for o in entry[JACY_ORTH_FEAT].values()]
    rel = entry.get(JACY_PRED_FEAT, default='')
    # rel2lem[rel] = ' '.join(orths)
    id2rel[identifier] = rel
    if rel:
        if isinstance(rel, tdl.TdlDefinition):
            rel = rel.supertypes[0]
        lemma = Pred.string_or_grammar_pred(rel).lemma
        for orth in orths:
            jrel[orth] = jrel.get(orth, []) + [lemma]


  1. My problem is here entry. identifier is giving me an error as `identifier = entry.identifier
    AttributeError: ‘tuple’ object has no attribute ‘identifier’

  2. the old select rule file has used the parse function and its variables are different too

  3. Also it’s hard to understand the meaning of variables used in select-rule.py file in Jaen master
    `

Yes, that is by design. The documentation for delphin.tdl.iterparse() says this:

Parse the TDL file at path and iteratively yield parse events.

Parse events are (event, object, lineno) tuples, where event is a string (“TypeDefinition”, “TypeAddendum”, “LexicalRuleDefinition”, “LetterSet”, “WildCard”, “BeginEnvironment”, “EndEnvironment”, “FileInclude”, “LineComment”, or “BlockComment”), object is the interpreted TDL object, and lineno is the line number where the entity began in path.

delphin.tdl.iterparse() yields tuples, not TDL objects, unlike the old tdl.parse() function. So you need to look for the appropriate event and then get the object, as shown in the example code in the documentation. Something like this:

for event, entry, lineno in tdl.iterparse(pjoin(jacydir, 'lexicon.tdl')):
    if event != "TypeDefinition":
        continue
    identifier = entry.identifier
    ...

There are more parts of the code using old features of PyDelphin that are no longer around, so you’ll need to consult the documentation as you work.

hello sir,
As you directed I used the iterparse function but here we are getting the object as:
for ex :

<TypeDefinition object 'zutto-adv' at 140290645815408>
<TypeDefinition object 'zutto-nmod' at 140290645816032>
<TypeDefinition object 'zuutai_1' at 140290645816656>
<TypeDefinition object 'zuuzuushii_1' at 140290645817280>
<TypeDefinition object 'zuwaigani_1' at 140290645817904>
<TypeDefinition object 'zuzou_n1' at 140290645818528>

but in select-rule script, we have the following code

rel = obj.get(JACY_PRED_FEAT, default='')
here JACY_ORTH_FEAT = 'STEM'

I m unable to find STEM in the tdl object as tdl object is like <TypeDefinition object ‘zuzou_n1’ at 140290645818528>

That <TypeDefinition object '...' at ...> is just the representation of the in-memory object. If you want to see what the TDL looks like, use tdl.format() (or just look in the original TDL file):

>>> from delphin import tdl
>>> entries = {obj.identifier: obj for event, obj, _ in tdl.iterparse('~/delphin/jacy/lexicon.tdl') if event == 'TypeDefinition'}
>>> obj = entries['zuzou_n1']
>>> print(tdl.format(obj))
zuzou_n1 := ordinary-nohon-n-lex &
  [ STEM < "図像" >,
    SYNSEM.LKEYS.KEYREL.PRED "_zuzou_n_1_rel",
    TRAITS native_token_list ].

If you want to get the value of a feature, the TypeDefinition object no longer has a get() method. Instead, it is on the Conjunction object which you can get from obj.conjunction:

>>> obj.get('STEM')  # does not work
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'TypeDefinition' object has no attribute 'get'
>>> obj.conjunction.get('STEM')  # use this instead
<ConsList object at 140447975437248>
>>> obj.conjunction.get('SYNSEM.LKEYS.KEYREL.PRED')  # path to predicate
<String object (_zuzou_n_1_rel) at 140447740392080>

A ConsList object is just a special type of AVM, and you can use the feature names to get the values of the list:

>>> stems = obj.conjunction.get('STEM')
>>> stems.features()
[('FIRST', <String object (図像) at 140447740391968>), ('REST', None)]
>>> stems.get('FIRST')
<String object (図像) at 140447740391968>

But it may be easier to use the ConsList.values() function, which traverses the features to assemble the values as a Python list:

>>> stems.values()
[<String object (図像) at 140447740391968>]

Now these String objects are low-level TDL types and are actually a subclass of Python’s native str type, so you can use it like a string, but if you want or need a basic string you can cast it:

>>> str(stems.values()[0])
'図像'

hello sir ,
Thanks a lot sir It worked also I am having one more issue now :

node = mecab.parseToNode(text)

I have imported mecab still it throwing an error :

soumya@soumya-VirtualBox:~/grammars$ python3  /home/soumya/grammars/utils/sample.py  . /home/soumya/grammars/jacy/tsdb/skeletons/tanaka/tc-000/item
<_io.TextIOWrapper name='<stderr>' mode='w' encoding='utf-8'> Reading Jacy lexicon
<_io.TextIOWrapper name='<stderr>' mode='w' encoding='utf-8'> Reading Jacy fullform
<_io.TextIOWrapper name='<stderr>' mode='w' encoding='utf-8'> Reading MTR files
<_io.TextIOWrapper name='<stderr>' mode='w' encoding='utf-8'> Select rules
<_io.TextIOWrapper name='<stderr>' mode='w' encoding='utf-8'> .. reading /home/soumya/grammars/jacy/tsdb/skeletons/tanaka/tc-000/item
Traceback (most recent call last):
  File "/home/soumya/grammars/utils/sample.py", line 298, in <module>
    node = mecab.parseToNode(text)
NameError: name 'mecab' is not defined. Did you mean: 'MeCab'?

thank you !!

It looks like you did not instantiate the MeCab tagger, or at least did not assign it to the name mecab. In the current version of select-rule.py, this happens at line 96:

mecab = MeCab.Tagger('-Ochasen')

This must happen before the mecab.parseToNode(text) expression executes.

hello sir , it worked but now I m getting this error -

soumya@soumya-VirtualBox:~/grammars$ python3  /home/soumya/grammars/utils/sample.py  . /home/soumya/grammars/jacy/tsdb/skeletons/tanaka/tc-000/item
<_io.TextIOWrapper name='<stderr>' mode='w' encoding='utf-8'> Reading Jacy lexicon
<_io.TextIOWrapper name='<stderr>' mode='w' encoding='utf-8'> Reading Jacy fullform
<_io.TextIOWrapper name='<stderr>' mode='w' encoding='utf-8'> Reading MTR files
<_io.TextIOWrapper name='<stderr>' mode='w' encoding='utf-8'> Select rules
<_io.TextIOWrapper name='<stderr>' mode='w' encoding='utf-8'> .. reading /home/soumya/grammars/jacy/tsdb/skeletons/tanaka/tc-000/item
Traceback (most recent call last):
  File "/home/soumya/grammars/utils/sample.py", line 304, in <module>
    lemma = node.feature.split(",")[6]
IndexError: list index out of range

while running following code -

node =  mecab.parseToNode(text)
        #print(format(node))
        while node:
            word = node.surface

            relations.add(word)
            lemma = node.feature.split(",")[6]

thanks you!!!

IndexError means that you tried to get a list item with an index that is out of bounds. In this case, you tried to get the item at index 6 but there are not 7+ items on the feature list. MeCab is not software I wrote, so I’m not sure why that might be the case. Sometimes printing the list (i.e., print(node.feature.split(",")) before trying to subscript it can be informative. Good luck!