Make a custom lemmatizer dictionary from txt file

1 year ago

#293040

WDarmtt

So I have the following txt file (lem_es.txt):

1 primer
1 primera
1 primeras
1 primero
1 primeros
(...)
comer coma
comer comais

The txt file starts with some numbers but then goes on with all the verb conjugations.

I am trying to make a lemmatizer dictionary. I would like the words from the second column as the dictionary key, and the words/numbers from the first column as the dictionary values.

I am having some problems because the formula I am trying to use only takes unique values, so the number 1 only appears once and it disorder all.

The output I need:

{'primer': '1', 'primera': '1', 'primeras': '1', 'primero':' 1', 'primeros': '1', 'coma': 'comer', 'comais': 'comer...}

And this is what I am trying to do with the Output I received:

python

dict = {}
with codecs.open("lem_es.txt", "r", "utf-8-sig") as filehandle:
    for line in filehandle.readlines():
        dict[line.split()[-1]] = line.split()[0]

#Output:
{'primer': '1',
 'primera': 'primero',
 'primeras': 'primero',
 'primero': '1',
 'primeros': 'primero'...

Thank you all!

python-3.x

for-loop

utf-8

nlp

lemmatization

0 Answers

Your Answer

Posts

Questions

Blogs

Jobs