J
Jason Diamond
Hi.
Is it possible to decode a UTF-8 (with or without a BOM), UTF-16 (BE or
LE with a BOM), or UTF-32 (BE or LE with a BOM) byte stream without
knowing what encoding the stream is in?
I know how to use the codecs module to get StreamReader classes that can
decode a specific encoding but I have to know what that enocding is
before hand.
If I read up to four bytes from the byte stream, I can figure out what
encoding the stream is in but that has problems for UTF-8 streams
without BOMs--I would have just eaten one or more bytes that might need
to be decoded by the StreamReader. I could seek back to the beginning of
the stream but what if the file-like object I was reading from didn't
support seeking?
Thanks.
-- Jason
Is it possible to decode a UTF-8 (with or without a BOM), UTF-16 (BE or
LE with a BOM), or UTF-32 (BE or LE with a BOM) byte stream without
knowing what encoding the stream is in?
I know how to use the codecs module to get StreamReader classes that can
decode a specific encoding but I have to know what that enocding is
before hand.
If I read up to four bytes from the byte stream, I can figure out what
encoding the stream is in but that has problems for UTF-8 streams
without BOMs--I would have just eaten one or more bytes that might need
to be decoded by the StreamReader. I could seek back to the beginning of
the stream but what if the file-like object I was reading from didn't
support seeking?
Thanks.
-- Jason