1 year ago
#293040
WDarmtt
Make a custom lemmatizer dictionary from txt file
So I have the following txt file (lem_es.txt):
1 primer
1 primera
1 primeras
1 primero
1 primeros
(...)
comer coma
comer comais
The txt file starts with some numbers but then goes on with all the verb conjugations.
I am trying to make a lemmatizer dictionary. I would like the words from the second column as the dictionary key, and the words/numbers from the first column as the dictionary values.
I am having some problems because the formula I am trying to use only takes unique values, so the number 1 only appears once and it disorder all.
The output I need:
{'primer': '1', 'primera': '1', 'primeras': '1', 'primero':' 1', 'primeros': '1', 'coma': 'comer', 'comais': 'comer...}
And this is what I am trying to do with the Output I received:
python
dict = {}
with codecs.open("lem_es.txt", "r", "utf-8-sig") as filehandle:
for line in filehandle.readlines():
dict[line.split()[-1]] = line.split()[0]
#Output:
{'primer': '1',
'primera': 'primero',
'primeras': 'primero',
'primero': '1',
'primeros': 'primero'...
Thank you all!
python-3.x
for-loop
utf-8
nlp
lemmatization
0 Answers
Your Answer