P
pekka niiranen
Hi there,
I have two files "my.utf8" and "my.utf16" which
both contain BOM and two "a" characters.
Contents of "my.utf8" in HEX:
EFBBBF6161
Contents of "my.utf16" in HEX:
FEFF6161
For some reason Python2.4 decodes the BOM for UTF8
but not for UTF16. See below:
Is there a trick to read UTF8 encoded file with BOM not decoded?
-pekka-
I have two files "my.utf8" and "my.utf16" which
both contain BOM and two "a" characters.
Contents of "my.utf8" in HEX:
EFBBBF6161
Contents of "my.utf16" in HEX:
FEFF6161
For some reason Python2.4 decodes the BOM for UTF8
but not for UTF16. See below:
>>> fh = codecs.open("my.uft8", "rb", "utf8")
>>> fh.readlines() [u'\ufeffaa'] # BOM is decoded, why
>>> fh.close()
>>> fh = codecs.open("my.utf16", "rb", "utf16")
>>> fh.readlines() [u'\u6161'] # No BOM here
>>> fh.close()
Is there a trick to read UTF8 encoded file with BOM not decoded?
-pekka-