codecs.open on Win32 -- converting my newlines to CR+LF

R

Ryan McGuire

I've got a UTF-8 encoded text file from Linux with standard newlines
("\n").

I'm reading this file on Win32 with Python 2.6:

codecs.open("whatever.txt","r","utf-8").read()

Inexplicably, all the newlines ("\n") are replaced with CR+LF ("\r
\n") ... Why?

As a workaround I'm having to do this:

open("whatever.txt","r").read().decode("utf-8")

which appropriately does not alter my newlines.

What really gets me confused though is the Python docs for
codecs.open:

"Files are always opened in binary mode, even if no binary mode was
specified. This is done to avoid data loss due to encodings using 8-
bit values. This means that no automatic conversion of '\n' is done on
reading and writing."

The way I read that, codecs.open should not touch my newlines. What am
I doing wrong? Is this a bug in Python, or in the docs, or both?
 
P

Philip Semanchuk

I've got a UTF-8 encoded text file from Linux with standard newlines
("\n").

I'm reading this file on Win32 with Python 2.6:

codecs.open("whatever.txt","r","utf-8").read()

Inexplicably, all the newlines ("\n") are replaced with CR+LF ("\r
\n") ... Why?

Try using "rb" instead of "r" for the mode in the call to open().

HTH
Philip
 
R

Ryan McGuire

Try using "rb" instead of "r" for the mode in the call to open().

HTH
Philip

That does indeed fix the problem, thanks! Still seems like the docs
are wrong though.
 
C

Chris Rebert

That does indeed fix the problem, thanks! Still seems like the docs
are wrong though.

Yeah, the need to specify "b" does seem rather incongruous:

codecs.open(filename, mode[, encoding[, errors[, buffering]]])
[...]
Note: Files are always opened in binary mode, even if no binary
mode was specified. This is done to avoid data loss due to encodings
using 8-bit values. This means that no automatic conversion of b'\n'
is done on reading and writing.

File a bug perhaps?: http://bugs.python.org/

Cheers,
Chris
 
C

Chris Rebert

That does indeed fix the problem, thanks! Still seems like the docs
are wrong though.

Yeah, the need to specify "b" does seem rather incongruous:

codecs.open(filename, mode[, encoding[, errors[, buffering]]])
   [...]
   Note: Files are always opened in binary mode, even if no binary
mode was specified. This is done to avoid data loss due to encodings
using 8-bit values. This means that no automatic conversion of b'\n'
is done on reading and writing.

File a bug perhaps?: http://bugs.python.org/

Ah, I see you already did: http://bugs.python.org/issue6788

- Chris
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,770
Messages
2,569,583
Members
45,074
Latest member
StanleyFra

Latest Threads

Top