Suggestions for workaround in CSV bug

S

Simmons, Stephen

Hi,

I've come across a bug in CSV where the csv.reader() raises an
exception if the input line contains '\r'. Example code and output
below shows a test case where csv.reader() cannot read an array
written by csv.writer().

I believe this is a known bug and may have been fixed for Python 2.5.
However I'm after suggestions for workarounds for Python 2.4.2.

This is part of a project where I'm storing large tables from
mainframe systems as CSVs for subsequent data cleansing and
post-processing. Some tables have 300 columns and tens of millions
of rows. The mainframe data fields are poorly documented, so I
don't know at the time of writing the CSV whether a '\r'
is part of a binary field and so must be retained,
or is a random byte in an uninitialised field and so
can safely be deleted. Therefore I'd prefer
to make minimum changes that might screw up the data.

Any suggestions for how to proceed are most welcome!

Thanks in advance,

Stephen Simmons


#======================================================
# Bug in Python 2.4.2's csv module
# Stephen Simmons, mail at stevesimmons.com, 24 Jan 2006

import csv

s = [ ['a'], ['\r'], ['b'] ]
name = 'c://temp//test2.csv'

print 'Writing CSV file containing %s' % repr(s)
f = file(name, 'wb')
csv.writer(f).writerows(s)
f.close()

print 'CSV file is %s' % repr(file(name, 'rb').read())

print 'Now reading back as CSV...'
for r in csv.reader(file(name, 'rb')):
print 'Read row containing %s' % repr(r)


# Output is
"""In [29]: run csv_error.py
Writing CSV file containing [['a'], ['\r'], ['b']]
Contents of the CSV file are 'a\r\n"\r"\r\nb\r\n'
Now reading back as CSV...
Read row containing ['a']
---------------------------------------------------------------------------
_csv.Error Traceback (most recent call last)


c:\temp\csv_error.py
14 print 'CSV file is %s' % repr(file(name, 'rb').read())
15
16 print 'Now reading back as CSV...'
---> 17 for r in csv.reader(file(name, 'rb')):
18 print 'Read row containing %s' % repr(r)

Error: newline inside string
WARNING: Failure executing file: <csv_error.py>

"""
 
?

=?ISO-8859-1?Q?Michael_Str=F6der?=

I've come across a bug in CSV where the csv.reader() raises an
exception if the input line contains '\r'. Example code and output
below shows a test case where csv.reader() cannot read an array
written by csv.writer().

Error: newline inside string
WARNING: Failure executing file: <csv_error.py>

Did you play with the csv.Dialect setting lineterminator='\n' ?

csv.reader(file(name, 'rb'),lineterminator='\n)

See also: http://docs.python.org/lib/csv-fmt-params.html

Ciao, Michael.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,764
Messages
2,569,567
Members
45,041
Latest member
RomeoFarnh

Latest Threads

Top