UnicodeDecodeError quick question

patrick.waldo · Dec 4, 2008

Hi Everyone,

I am using Python 2.4 and I am converting an excel spreadsheet to a
pipe delimited text file and some of the cells contain utf-8
characters. I solved this problem in a very unintuitive way and I
wanted to ask why. If I do,

csvfile.write(cell.encode("utf-8"))

I get a UnicodeDecodeError. However if I do,

c = unicode(cell.encode("utf-8"),"utf-8")
csvfile.write(c)

Why should I have to encode the cell to utf-8 and then make it unicode
in order to write to a text file? Is there a more intuitive way to
get around these bothersome unicode errors?

Thanks for any advice,
Patrick

Code:

# -*- coding: utf-8 -*-
import xlrd,codecs,os

xls_file = "/home/pwaldo2/work/docpool_plone/2008-12-4/
EU-2008-12-4.xls"
book = xlrd.open_workbook(xls_file)
bibliography_sheet = book.sheet_by_index(0)

csv = os.path.split(xls_file)[0] + '/' + os.path.split(xls_file)[1]
[:-4] + '.csv'
csvfile = codecs.open(csv,'w',encoding='utf-8')

rowcount = 0
data = []
while rowcount<bibliography_sheet.nrows:
data.append(bibliography_sheet.row_values(rowcount,
start_colx=0,end_colx=None))
rowcount+=1
for row in data:
for cell in row:
#csvfile.write(cell.encode("utf-8")) This causes the
UnicodeDecodeError
c = unicode(cell.encode("utf-8"),"utf-8")
csvfile.write(c)
csvfile.write('|')
csvfile.write('\r\n')
csvfile.close()

Tim Golden · Dec 4, 2008

Hi Everyone,

I am using Python 2.4 and I am converting an excel spreadsheet to a
pipe delimited text file and some of the cells contain utf-8
characters. I solved this problem in a very unintuitive way and I
wanted to ask why. If I do,

csvfile.write(cell.encode("utf-8"))

I get a UnicodeDecodeError. However if I do,

c = unicode(cell.encode("utf-8"),"utf-8")
csvfile.write(c)

Why should I have to encode the cell to utf-8 and then make it unicode
in order to write to a text file? Is there a more intuitive way to
get around these bothersome unicode errors?

The short answer is that you're writing to a file
you've opened with the codecs module. Any write to
this file expects unicode data and will automatically
encode it to the encoding you specified. You're trying
to send it utf8-encoded data -- ie a string of bytes,
*not* unicode -- and it presumably tries to decode it
to a unicode object before encoding it as utf8 like
you asked it to. Without looking at the implementation,
it probably just does unicode (x) on what you've passed
in, will will use the default ascii codec and fail in
the way you saw.

(Honestly, that was the short answer).

To solve it, assuming cell is already unicode, just pass
it unadulterated to csvfile.write.

The reason the other thing works is because you're in
control of the -- unncessary -- unicode conversion, and
you're telling Python what encoding to use for decoding
and encoding.

TJG

UnicodeDecodeError having fetch web page	9	May 25, 2010
UnicodeDecodeError? Argh! Nothing works! I'm tired and hurting and...	18	Nov 23, 2009
encoding error	1	Feb 20, 2013
csv and mixed lists of unicode and numbers	6	Nov 24, 2009
Translater + module + tkinter	1	Feb 16, 2023
UnicodeDecodeError: problem when path contain folder start withcharacter 'u	1	Jun 22, 2009
encoding error in python 27	4	Feb 22, 2013
Yet Another Tabular Data Question	2	Nov 29, 2007

UnicodeDecodeError quick question

patrick.waldo

Tim Golden

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads