Python-2.3b1 bugs on Windows2000 with: the new csv module, stringreplace, and the re module

D

Daniel Ortmann

These problems only happen on Windows. On Linux everything works fine.
Has anyone else run into these bugs? Any suggestions?

Where do I find out the proper bug reporting process?


Problem #1:

While using the csv module's DictWriter on MSDOS (a.k.a. Windows2000),
the output files get newlines like \x0d\x0d\x0a instead of \x0d\x0a.

csvwriter = csv.DictWriter( file( out1filename, 'w' ), infieldnames, extrasaction='ignore' )
csvwriter.writerow( dict( zip( infieldnames, infieldnames ) ) )

Problem #2:

While trying to fix up the first problem I run into another problem.
The following string replace code works until right around the boundary
at 2^7 * 1024, i.e. near 131072 (around line 1224), and then inserts a
bunch of \x00's in the string!

Before the \x00's, all of the \x0d's were correctly replaced. After the
\x00's, NONE of them were replaced.

content = file( fname, 'rb' ).read().replace( '\x0d', '' )
file( fname, 'wb' ).write( content )

Problem #3:

The same problem also happens with the re module.

content = re.sub( '\x0d', '', file( fname, 'rb' ).read() )
file( fname, 'wb' ).write( content )
 
S

Skip Montanaro

Daniel> Problem #1:

Daniel> While using the csv module's DictWriter on MSDOS
Daniel> (a.k.a. Windows2000), the output files get newlines like
Daniel> \x0d\x0d\x0a instead of \x0d\x0a.

Daniel> csvwriter = csv.DictWriter( file( out1filename, 'w' ), infieldnames, extrasaction='ignore' )
Daniel> csvwriter.writerow( dict( zip( infieldnames, infieldnames ) ) )

CSV files are not really plain text files. The line terminator string is an
explicit property of the file. For example, you might want to write a CSV
file on a Windows 2000 machine which you intend to read on a Mac OS9 system
(where the line terminator is just \r). You need to open CSV files with the
'b' flag. This should work for you:

csvwriter = csv.DictWriter( file( out1filename, 'wb' ), infieldnames,
extrasaction='ignore' )
csvwriter.writerow( dict( zip( infieldnames, infieldnames ) ) )

Skip
 
D

Daniel Ortmann

Daniel> While using the csv module's DictWriter on MSDOS
Daniel> (a.k.a. Windows2000), the output files get newlines like
Daniel> \x0d\x0d\x0a instead of \x0d\x0a.

Daniel> csvwriter = csv.DictWriter( file( out1filename, 'w' ), infieldnames, extrasaction='ignore' )
Daniel> csvwriter.writerow( dict( zip( infieldnames, infieldnames ) ) )

Skip> CSV files are not really plain text files. The line terminator
Skip> string is an explicit property of the file. For example, you
Skip> might want to write a CSV file on a Windows 2000 machine which you
Skip> intend to read on a Mac OS9 system (where the line terminator is
Skip> just \r). You need to open CSV files with the 'b' flag. This
Skip> should work for you:

Skip> csvwriter = csv.DictWriter( file( out1filename, 'wb' ), infieldnames, extrasaction='ignore' )
Skip> csvwriter.writerow( dict( zip( infieldnames, infieldnames ) ) )

Ok, that is the same work around that I used. Perhaps the documentation
should say something about using binary mode?

Or perhaps the DictWriter constructure should open the file in binary
mode if given a string rather than a file object?

How do we avoid people stumbling as I did?
 
S

Skip Montanaro

Daniel> Perhaps the documentation should say something about using
Daniel> binary mode?

Good point. I'll fix the docs.

Daniel> Or perhaps the DictWriter constructure should open the file in
Daniel> binary mode if given a string rather than a file object?

Nah, too much overloading going on.

Skip
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,020
Latest member
GenesisGai

Latest Threads

Top