1 year ago
#320779
Laz22434
Export fasttext vectors (korean) from fastText to spacy (UnicodeDecodeError)
Hi everyone i downloaded the korean fasttext model from FastText Korean Model and tried to export it to spacy using this code:
#!/usr/bin/env python
# coding: utf8
from __future__ import unicode_literals
import plac
import numpy
import spacy
from spacy.language import Language
@plac.annotations()
def main():
nlp = spacy.blank('ko')
with open("ko.vec", 'rb') as file_:
header = file_.readline()
nr_row, nr_dim = header.split()
nlp.vocab.reset_vectors(width=int(nr_dim))
count = 0
for line in file_:
count += 1
line = line.rstrip().decode("utf-8")
pieces = line.rsplit(' ', int(nr_dim))
word = pieces[0]
print("{} - {}".format(count, word))
vector = numpy.asarray([float(v) for v in pieces[1:]], dtype='f')
nlp.vocab.set_vector(word, vector) # add the vectors to the vocab
nlp.to_disk("/models/new_nlp/")
if __name__ == '__main__':
plac.call(main)
this code i got from this answered question on stackoverflow: Export FastText from fasttext to spacy
But after executing the code i got this error at the end:
Traceback (most recent call last):
File "C:\Users\User\fasttexttospacy\fasttexttospacy.py", line 31, in <module>
plac.call(main)
File "C:\Users\User\anaconda3\envs\fasttexttospacy\lib\site-packages\plac_core.py", line 436, in call
cmd, result = parser.consume(arglist)
File "C:\Users\User\anaconda3\envs\fasttexttospacy\lib\site-packages\plac_core.py", line 287, in consume
return cmd, self.func(*(args + varargs + extraopts), **kwargs)
File "C:\Users\User\fasttexttospacy\fasttexttospacy.py", line 21, in main
line = line.rstrip().decode("utf-8")
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xed in position 0: invalid continuation byte
I don't really understand where the problem is, can somebody explain me why i get this error please? It's not clear to me.
python
python-3.x
spacy
fasttext
spacy-3
0 Answers
Your Answer