codecs.open on Win32 -- converting my newlines to CR+LF

Ryan McGuire · Aug 27, 2009

I've got a UTF-8 encoded text file from Linux with standard newlines
("\n").

I'm reading this file on Win32 with Python 2.6:

codecs.open("whatever.txt","r","utf-8").read()

Inexplicably, all the newlines ("\n") are replaced with CR+LF ("\r
\n") ... Why?

As a workaround I'm having to do this:

open("whatever.txt","r").read().decode("utf-8")

which appropriately does not alter my newlines.

What really gets me confused though is the Python docs for
codecs.open:

"Files are always opened in binary mode, even if no binary mode was
specified. This is done to avoid data loss due to encodings using 8-
bit values. This means that no automatic conversion of '\n' is done on
reading and writing."

The way I read that, codecs.open should not touch my newlines. What am
I doing wrong? Is this a bug in Python, or in the docs, or both?

Philip Semanchuk · Aug 27, 2009

I've got a UTF-8 encoded text file from Linux with standard newlines
("\n").

I'm reading this file on Win32 with Python 2.6:

codecs.open("whatever.txt","r","utf-8").read()

Inexplicably, all the newlines ("\n") are replaced with CR+LF ("\r
\n") ... Why?

Try using "rb" instead of "r" for the mode in the call to open().

HTH
Philip

Ryan McGuire · Aug 27, 2009

Try using "rb" instead of "r" for the mode in the call to open().

HTH
Philip

That does indeed fix the problem, thanks! Still seems like the docs
are wrong though.

Chris Rebert · Aug 27, 2009

That does indeed fix the problem, thanks! Still seems like the docs
are wrong though.

Yeah, the need to specify "b" does seem rather incongruous:

codecs.open(filename, mode[, encoding[, errors[, buffering]]])
[...]
Note: Files are always opened in binary mode, even if no binary
mode was specified. This is done to avoid data loss due to encodings
using 8-bit values. This means that no automatic conversion of b'\n'
is done on reading and writing.

File a bug perhaps?: http://bugs.python.org/

Cheers,
Chris

Chris Rebert · Aug 27, 2009

That does indeed fix the problem, thanks! Still seems like the docs
are wrong though.

Click to expand...

Yeah, the need to specify "b" does seem rather incongruous:

codecs.open(filename, mode[, encoding[, errors[, buffering]]])
Â Â [...]
Â Â Note: Files are always opened in binary mode, even if no binary
mode was specified. This is done to avoid data loss due to encodings
using 8-bit values. This means that no automatic conversion of b'\n'
is done on reading and writing.

File a bug perhaps?: http://bugs.python.org/

Ah, I see you already did: http://bugs.python.org/issue6788

- Chris

codecs.open() doesn't handle platform-specific line terminator	0	May 10, 2011
CR-LF translation	6	May 3, 2006
StringIO + unicode	1	Mar 25, 2008
Reading in cooked mode (was Re: Python MSI not installing, log fileshowing name of a Viatnemese comm	8	Mar 23, 2014
q: how to output a unicode string?	5	Apr 24, 2007
my newsgroup base database. (test)	2	Jul 4, 2006
Newbie question: Unicode hiccup on reading file i just wrote	3	Jan 30, 2006
Accessibility of Docs on Win32: Navigation, Names and PyDoc	0	Oct 6, 2005

codecs.open on Win32 -- converting my newlines to CR+LF

Ryan McGuire

Philip Semanchuk

Ryan McGuire

Chris Rebert

Chris Rebert

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads