1 year ago

#367633

test-img

HulaLula

error: 'ascii' codec can't decode byte 0xd8 (...): ordinal not in range(...) What am I doing wrong

What is wrong with this python asciicode:

with open('poetry.txt', 'w',encoding="utf-8") as outfile:
    for fname in poem_txt_list:
        with open(fname) as infile:
            outfile.write(infile.read())

it gives me the error:

---------------------------------------------------------------------------
UnicodeDecodeError                        Traceback (most recent call last)
<ipython-input-51-dc1d05741086> in <module>
      2     for fname in poem_txt_list:
      3         with open(fname) as infile:
----> 4             outfile.write(infile.read())

/opt/anaconda3/envs/MLinPractice/lib/python3.6/encodings/ascii.py in decode(self, input, final)
     24 class IncrementalDecoder(codecs.IncrementalDecoder):
     25     def decode(self, input, final=False):
---> 26         return codecs.ascii_decode(input, self.errors)[0]
     27 
     28 class StreamWriter(Codec,codecs.StreamWriter):

UnicodeDecodeError: 'ascii' codec can't decode byte 0xd8 in position 0: ordinal not in range(128)

I am doing something very fundamentally wrong I think. The original code from the github which I'm trying to adjust is the following:

#get list of text files in data
poem_txt_list = glob.glob('data/*.txt')

with open('raw_corpus.txt', 'w') as outfile:
    for fname in poem_txt_list:
        with open(fname) as infile:
            outfile.write(infile.read())

data_dir = 'raw_corpus.txt'
text = helper.load_data(data_dir)

I have one file ("poetry.txt", UTF-8 encoded) which I want to place in the code instead of "raw_corpus.txt". I think its the right spot for my file to go but I am not entirely sure. I also don't have a list of data, I only have one txt file with all the poetry as one long text. Do I even need the first line?

python

unicode

encoding

ascii

decoding

0 Answers

Your Answer

Accepted video resources