codecs.open on Win32 -- converting my newlines to CR+LF

Discussion in 'Python' started by Ryan McGuire, Aug 27, 2009.

  1. Ryan McGuire

    Ryan McGuire Guest

    I've got a UTF-8 encoded text file from Linux with standard newlines
    ("\n").

    I'm reading this file on Win32 with Python 2.6:

    codecs.open("whatever.txt","r","utf-8").read()

    Inexplicably, all the newlines ("\n") are replaced with CR+LF ("\r
    \n") ... Why?

    As a workaround I'm having to do this:

    open("whatever.txt","r").read().decode("utf-8")

    which appropriately does not alter my newlines.

    What really gets me confused though is the Python docs for
    codecs.open:

    "Files are always opened in binary mode, even if no binary mode was
    specified. This is done to avoid data loss due to encodings using 8-
    bit values. This means that no automatic conversion of '\n' is done on
    reading and writing."

    The way I read that, codecs.open should not touch my newlines. What am
    I doing wrong? Is this a bug in Python, or in the docs, or both?
    Ryan McGuire, Aug 27, 2009
    #1
    1. Advertising

  2. On Aug 26, 2009, at 10:52 PM, Ryan McGuire wrote:

    > I've got a UTF-8 encoded text file from Linux with standard newlines
    > ("\n").
    >
    > I'm reading this file on Win32 with Python 2.6:
    >
    > codecs.open("whatever.txt","r","utf-8").read()
    >
    > Inexplicably, all the newlines ("\n") are replaced with CR+LF ("\r
    > \n") ... Why?


    Try using "rb" instead of "r" for the mode in the call to open().

    HTH
    Philip
    Philip Semanchuk, Aug 27, 2009
    #2
    1. Advertising

  3. Ryan McGuire

    Ryan McGuire Guest

    On Aug 26, 11:04 pm, Philip Semanchuk <> wrote:
    > Try using "rb" instead of "r" for the mode in the call to open().
    >
    > HTH
    > Philip


    That does indeed fix the problem, thanks! Still seems like the docs
    are wrong though.
    Ryan McGuire, Aug 27, 2009
    #3
  4. Ryan McGuire

    Chris Rebert Guest

    On Wed, Aug 26, 2009 at 8:40 PM, Ryan McGuire<> wrote:
    > On Aug 26, 11:04 pm, Philip Semanchuk <> wrote:
    >> Try using "rb" instead of "r" for the mode in the call to open().
    >>
    >> HTH
    >> Philip

    >
    > That does indeed fix the problem, thanks! Still seems like the docs
    > are wrong though.


    Yeah, the need to specify "b" does seem rather incongruous:

    codecs.open(filename, mode[, encoding[, errors[, buffering]]])
    [...]
    Note: Files are always opened in binary mode, even if no binary
    mode was specified. This is done to avoid data loss due to encodings
    using 8-bit values. This means that no automatic conversion of b'\n'
    is done on reading and writing.

    File a bug perhaps?: http://bugs.python.org/

    Cheers,
    Chris
    --
    http://blog.rebertia.com
    Chris Rebert, Aug 27, 2009
    #4
  5. Ryan McGuire

    Chris Rebert Guest

    On Wed, Aug 26, 2009 at 11:06 PM, Chris Rebert<> wrote:
    > On Wed, Aug 26, 2009 at 8:40 PM, Ryan McGuire<> wrote:
    >> On Aug 26, 11:04 pm, Philip Semanchuk <> wrote:
    >>> Try using "rb" instead of "r" for the mode in the call to open().
    >>>
    >>> HTH
    >>> Philip

    >>
    >> That does indeed fix the problem, thanks! Still seems like the docs
    >> are wrong though.

    >
    > Yeah, the need to specify "b" does seem rather incongruous:
    >
    > codecs.open(filename, mode[, encoding[, errors[, buffering]]])
    >    [...]
    >    Note: Files are always opened in binary mode, even if no binary
    > mode was specified. This is done to avoid data loss due to encodings
    > using 8-bit values. This means that no automatic conversion of b'\n'
    > is done on reading and writing.
    >
    > File a bug perhaps?: http://bugs.python.org/


    Ah, I see you already did: http://bugs.python.org/issue6788

    - Chris
    Chris Rebert, Aug 27, 2009
    #5
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Jason Teagle
    Replies:
    8
    Views:
    7,095
    Jon A. Cruz
    Feb 8, 2004
  2. Eric Brunel
    Replies:
    3
    Views:
    554
    Richard Brodie
    Jun 28, 2005
  3. Sam
    Replies:
    1
    Views:
    385
  4. John Machin
    Replies:
    0
    Views:
    168
    John Machin
    May 10, 2011
  5. Karl Knechtel
    Replies:
    2
    Views:
    363
    Walter Dörwald
    Jul 10, 2012
Loading...

Share This Page