C
Chris
My SAX parser is choking on UTF-8 encoded files (a "document root element is
missing" error). The problem is three bytes that appear at the beginning of
each file:
0xEF 0xBB 0xBF
If I delete the bytes the problem goes away.
I'm accessing the file by using a FileInputStream and then wrapping it in a
SAX InputSource. My guess is that the InputSource is converting bytes to
chars using the platform's default encoding, rather than UTF-8.
Is there any existing InputSource class or Reader class that will
automatically detect UTF-8 and encode chars correctly? Or do I have to write
my own Reader class to do it?
missing" error). The problem is three bytes that appear at the beginning of
each file:
0xEF 0xBB 0xBF
If I delete the bytes the problem goes away.
I'm accessing the file by using a FileInputStream and then wrapping it in a
SAX InputSource. My guess is that the InputSource is converting bytes to
chars using the platform's default encoding, rather than UTF-8.
Is there any existing InputSource class or Reader class that will
automatically detect UTF-8 and encode chars correctly? Or do I have to write
my own Reader class to do it?