J
John Machin
According to the 3.2 docs
(http://docs.python.org/py3k/library/codecs.html#codecs.open),
"""Files are always opened in binary mode, even if no binary mode was
specified. This is done to avoid data loss due to encodings using 8-bit
values. This means that no automatic conversion of b'\n' is done on
reading and writing."""
The first point is that one would NOT expect "conversion of b'\n'" anyway.
One expects '\n' -> os.sep.encode(the_encoding) on writing and vice versa
on reading.
The second point is that there is no such restriction with the built-in
open(), which appears to work as expected, doing (e.g. Windows, UTF-16LE)
'\n' -> b'\r\x00\n\x00' when writing and vice versa on reading, and not
striking out when thrown curve balls like '\u0a0a'.
Why is codecs.open() different? What does "encodings using 8-bit values"
mean? What data loss?
(http://docs.python.org/py3k/library/codecs.html#codecs.open),
"""Files are always opened in binary mode, even if no binary mode was
specified. This is done to avoid data loss due to encodings using 8-bit
values. This means that no automatic conversion of b'\n' is done on
reading and writing."""
The first point is that one would NOT expect "conversion of b'\n'" anyway.
One expects '\n' -> os.sep.encode(the_encoding) on writing and vice versa
on reading.
The second point is that there is no such restriction with the built-in
open(), which appears to work as expected, doing (e.g. Windows, UTF-16LE)
'\n' -> b'\r\x00\n\x00' when writing and vice versa on reading, and not
striking out when thrown curve balls like '\u0a0a'.
Why is codecs.open() different? What does "encodings using 8-bit values"
mean? What data loss?