codecs.open() doesn't handle platform-specific line terminator

J

John Machin

According to the 3.2 docs
(http://docs.python.org/py3k/library/codecs.html#codecs.open),

"""Files are always opened in binary mode, even if no binary mode was
specified. This is done to avoid data loss due to encodings using 8-bit
values. This means that no automatic conversion of b'\n' is done on
reading and writing."""

The first point is that one would NOT expect "conversion of b'\n'" anyway.
One expects '\n' -> os.sep.encode(the_encoding) on writing and vice versa
on reading.

The second point is that there is no such restriction with the built-in
open(), which appears to work as expected, doing (e.g. Windows, UTF-16LE)
'\n' -> b'\r\x00\n\x00' when writing and vice versa on reading, and not
striking out when thrown curve balls like '\u0a0a'.

Why is codecs.open() different? What does "encodings using 8-bit values"
mean? What data loss?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,763
Messages
2,569,562
Members
45,039
Latest member
CasimiraVa

Latest Threads

Top