codecs.open() doesn't handle platform-specific line terminator

Discussion in 'Python' started by John Machin, May 10, 2011.

  1. John Machin

    John Machin Guest

    According to the 3.2 docs
    (http://docs.python.org/py3k/library/codecs.html#codecs.open),

    """Files are always opened in binary mode, even if no binary mode was
    specified. This is done to avoid data loss due to encodings using 8-bit
    values. This means that no automatic conversion of b'\n' is done on
    reading and writing."""

    The first point is that one would NOT expect "conversion of b'\n'" anyway.
    One expects '\n' -> os.sep.encode(the_encoding) on writing and vice versa
    on reading.

    The second point is that there is no such restriction with the built-in
    open(), which appears to work as expected, doing (e.g. Windows, UTF-16LE)
    '\n' -> b'\r\x00\n\x00' when writing and vice versa on reading, and not
    striking out when thrown curve balls like '\u0a0a'.

    Why is codecs.open() different? What does "encodings using 8-bit values"
    mean? What data loss?
     
    John Machin, May 10, 2011
    #1
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.