A
Achim Domma
Hi,
I read some text from a utf-8 encoded text file like this:
text = codecs.open('example.txt','r','utf8').read()
If I pass this text to a COM object, I can see that there is still the BOM
in the file, which marks the file as utf-8. Simply removing the first
character in the string is not ok, because the BOM is optional. So I tried
something like this:
if text.startswith(codecs.BOM_UTF8):
print "found BOM"
but then I get the following error:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xef in position 0:
ordinal not in range(128)
What's the right way to remove the BOM from the string?
regards,
Achim
I read some text from a utf-8 encoded text file like this:
text = codecs.open('example.txt','r','utf8').read()
If I pass this text to a COM object, I can see that there is still the BOM
in the file, which marks the file as utf-8. Simply removing the first
character in the string is not ok, because the BOM is optional. So I tried
something like this:
if text.startswith(codecs.BOM_UTF8):
print "found BOM"
but then I get the following error:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xef in position 0:
ordinal not in range(128)
What's the right way to remove the BOM from the string?
regards,
Achim