Unhelpful traceback

J

John Nagle

Here's a traceback that's not helping:

Traceback (most recent call last):
File "InfoCompaniesHouse.py", line 255, in <module>
main()
File "InfoCompaniesHouse.py", line 251, in main
loader.dofile(infile) # load this file
File "InfoCompaniesHouse.py", line 213, in dofile
self.dofilezip(infilename) # do ZIP file
File "InfoCompaniesHouse.py", line 198, in dofilezip
self.dofilecsv(infile, infd) # as a CSV file
File "InfoCompaniesHouse.py", line 182, in dofilecsv
for fields in reader : # read entire
CSV file
UnicodeEncodeError: 'ascii' codec can't encode character u'\xa3' in
position 14: ordinal not in range(128)

This is wierd, becuase "for fields in reader" isn't directly
doing a decode. That's further down somewhere, and the backtrace
didn't tell me where.

The program is converting some .CSV files that come packaged in .ZIP
files. The files are big, so rather than expanding them, they're
read directly from the ZIP files and processed through the ZIP
and CSV modules.

Here's the code that's causing the error above:

decoder = codecs.getreader('utf-8')
with decoder(infdraw,errors="replace") as infd :
with codecs.open(outfilename, encoding='utf-8', mode='w') as
outfd :
headerline = infd.readline()
self.doheaderline(headerline)
reader = csv.reader(infd, delimiter=',', quotechar='"')
for fields in reader :
pass

Normally, the "pass" is a call to something that
uses the data, but for test purposes, I put a "pass" in there. It still
fails. With that "pass", nothing is ever written to the
output file, and no "encoding" should be taking place.

"infdraw" is a stream from the zip module, create like this:

with inzip.open(zipelt.filename,"r") as infd :
self.dofilecsv(infile, infd)

This works for data records that are pure ASCII, but as soon as some
non-ASCII character comes through, it fails.

Where is the error being generated? I'm not seeing any place
where there's a conversion to ASCII. Not even a print.

John Nagle
 
A

Andrew Berg

This is wierd, becuase "for fields in reader" isn't directly
doing a decode. That's further down somewhere, and the backtrace
didn't tell me where.
Looking at the csv module docs,the reader object iterates over the
csvfile argument (which can be any iterator). I think that, in the case
of a file object, it's not decoded until iteration.
I've never used the csv module before though, so I could be wrong.
 
C

Chris Rebert

Here's a traceback that's not helping:
UnicodeEncodeError: 'ascii' codec can't encode character u'\xa3' in
position 14: ordinal not in range(128)
The program is converting some .CSV files that come packaged in .ZIP
files. The files are big, so rather than expanding them, they're
read directly from the ZIP files and processed through the ZIP
and CSV modules.
This works for data records that are pure ASCII, but as soon as some
non-ASCII character comes through, it fails.

I'd recommend using the `unicodecsv` package, which, unlike the std
lib `csv` module, is properly Unicode-compatible:
https://pypi.python.org/pypi/unicodecsv

Cheers,
Chris
 
D

Dave Angel

Here's a traceback that's not helping:

A bit more context would be helpful. Starting with Python version.
Traceback (most recent call last):
File "InfoCompaniesHouse.py", line 255, in <module>
main()
File "InfoCompaniesHouse.py", line 251, in main
loader.dofile(infile) # load this file
File "InfoCompaniesHouse.py", line 213, in dofile
self.dofilezip(infilename) # do ZIP file
File "InfoCompaniesHouse.py", line 198, in dofilezip
self.dofilecsv(infile, infd) # as a CSV file
File "InfoCompaniesHouse.py", line 182, in dofilecsv
for fields in reader : # read entire
CSV file
UnicodeEncodeError: 'ascii' codec can't encode character u'\xa3' in
position 14: ordinal not in range(128)

This is wierd, becuase "for fields in reader" isn't directly
doing a decode. That's further down somewhere, and the backtrace
didn't tell me where.

The program is converting some .CSV files that come packaged in .ZIP
files. The files are big, so rather than expanding them, they're
read directly from the ZIP files and processed through the ZIP
and CSV modules.

Here's the code that's causing the error above:


decoder = codecs.getreader('utf-8')
with decoder(infdraw,errors="replace") as infd :
with codecs.open(outfilename, encoding='utf-8', mode='w') as
outfd :
headerline = infd.readline()
self.doheaderline(headerline)
reader = csv.reader(infd, delimiter=',', quotechar='"')
for fields in reader :
pass

Normally, the "pass" is a call to something that
uses the data, but for test purposes, I put a "pass" in there. It still
fails. With that "pass", nothing is ever written to the
output file, and no "encoding" should be taking place.

"infdraw" is a stream from the zip module, create like this:

with inzip.open(zipelt.filename,"r") as infd :

You probably need a 'rb' rather than 'r', since the file is not ASCII.
self.dofilecsv(infile, infd)

This works for data records that are pure ASCII, but as soon as some
non-ASCII character comes through, it fails.

Where is the error being generated? I'm not seeing any place
where there's a conversion to ASCII. Not even a print.

John Nagle

If that isn't enough, then please give the whole context, such as where
zipelt and filename came from. And don't forget to specify Python
version. Version 3.x treats nonbinary files very differently than 2.x
 
J

John Nagle

A bit more context would be helpful. Starting with Python version.

Sorry, Python 2.7.
If that isn't enough, then please give the whole context, such as where
zipelt and filename came from. And don't forget to specify Python
version. Version 3.x treats nonbinary files very differently than 2.x

Here it is, with some email wrap problems.

John Nagle


def dofilecsv(self, infilename, infdraw) :
"""
Loader for Companies House company data, with files already open.
"""
self.logger.info('Converting "%s"' % (infilename, )) # log
(pathpart, filepart) = os.path.split(infilename) #
split off file part to construct outputfile)
(outfile, ext) = os.path.splitext(filepart) #
remove extension
outfile += ".sql" #
add SQL suffix
outfilename = os.path.abspath(os.path.join(self.options.destdir,
outfile))
# ***NEED TO INSURE UNIQUE OUTFILENAME EVEN IF DUPLICATED IN
ZIP FILES***
decoder = codecs.getreader('utf-8') #
UTF-8 reader
with decoder(infdraw,errors="replace") as infd :
with codecs.open(outfilename, encoding='utf-8', mode='w') as
outfd :
headerline = infd.readline() #
read header line
self.doheaderline(headerline) #
process header line
reader = csv.reader(infd, delimiter=',', quotechar='"')
# CSV file
for fields in reader : #
read entire CSV file
self.doline(outfd, fields) #
copy fields
self.logstats(infilename) #
log statistics of this file

def dofilezip(self, infilename) :
"""
Do a ZIP file containing CSV files.
"""
try :
inzip = zipfile.ZipFile(infilename, "r", allowZip64=True)
# try to open
zipdir = inzip.infolist() # get
objects in file
for zipelt in zipdir : # for all
objects in file
self.logger.debug('ZIP file "%s" contains "%s".' %
(infilename, zipelt.filename))
(infile, ext) = os.path.splitext(zipelt.filename) #
remove extension
if ext.lower() == ".csv" : # if a CSV file
with inzip.open(zipelt.filename,"r") as infd :
# do this file
self.dofilecsv(infile, infd) # as a CSV file
else :
self.logger.error('Non-CSV file in ZIP file: "%s"' %
(zipelt.filename,))
self.errorcount += 1 # tally

except zipfile.BadZipfile as message : # if trouble
self.logger.error('Bad ZIP file: "%s"' % (infilename,)) #
note trouble
self.errorcount += 1 # tally

def dofile(self, infilename) :
"""
Loader for Companies House company data
"""
(sink, ext) = os.path.splitext(infilename) # get extension
if ext == ".zip" : # if .ZIP file
self.dofilezip(infilename) # do ZIP file
elif ext == ".csv" :
self.logger.info('Converting "%s"' % (infilename,))# log
with open(infilename, "rb") as infd :
self.dofilecsv(infilename, infd) # do
self.logstats(infilename) # log statistics
of this file
else :
self.logger.error('File of unexpected type (not .csv or
..zip): %s ' % (infilename,))
self.errorcount += 1
 
J

John Nagle

On 03/07/2013 01:33 AM, John Nagle wrote:

You probably need a 'rb' rather than 'r', since the file is not ASCII.

No, the ZIP module gives you back the bytes you
put in. "rb" is not accepted there:

File "InfoCompaniesHouse.py", line 197, in dofilezip
with inzip.open(zipelt.filename,"rb") as infd : # do this
file
File "C:\python27\lib\zipfile.py", line 872, in open
raise RuntimeError, 'open() requires mode "r", "U", or "rU"'
RuntimeError: open() requires mode "r", "U", or "rU"

"b" for files is about end of line handling (CR LF -> LF), anyway.

John Nagle
 
J

John Nagle

Sorry, Python 2.7.

The trouble comes from here:

decoder = codecs.getreader('utf-8') # UTF-8 reader
with decoder(infdraw,errors="replace") as infd :

It's not the CSV module that's blowing up. If I just feed the
raw unconverted bytes from the ZIP module into the CSV module,
the CSV module runs without complaint.

I've tried 'utf-8', 'ascii', and 'windows-1252' as codecs.
They all blow up. 'errors="replace"' doesn't help.

John Nagle
 
I

Ian Kelly

The trouble comes from here:

decoder = codecs.getreader('utf-8') # UTF-8 reader
with decoder(infdraw,errors="replace") as infd :

It's not the CSV module that's blowing up. If I just feed the
raw unconverted bytes from the ZIP module into the CSV module,
the CSV module runs without complaint.

I've tried 'utf-8', 'ascii', and 'windows-1252' as codecs.
They all blow up. 'errors="replace"' doesn't help.

I believe that the csv module is expecting string data, not unicode.
Since it receives unicode as a result of your decoder step, it tries
to convert it to a string using str(), which implicitly tries to
encode the data using the ascii codec, hence the error that you're
seeing.
 
D

Dave Angel

raise RuntimeError, 'open() requires mode "r", "U", or "rU"'
RuntimeError: open() requires mode "r", "U", or "rU"

"b" for files is about end of line handling (CR LF -> LF), anyway.

Only for Python 2. Since originally you didn't specify, I took my best
shot. If you omit the 'b' opening a binary file in Python 3, you'd get
problems similar to yours. Text files will be converted to Unicode.

That's one of the reasons that specifying the full environment is important.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,764
Messages
2,569,564
Members
45,039
Latest member
CasimiraVa

Latest Threads

Top