Problem using local xhtml DTD when parsing file with DocumentBuilder

R

Ryan McFall

Hi:

I've got some XHTML documents that I'm using the classes in
java.xml.xpath to find certain tags. These documents contain a DTD
declaration for XHTML, with a public identifier. Since my application
needs to work without a network connection, I've downloaded the DTD
and associated entities and made them available to my application as
resources. I then set an EntityResolver the document builder that I
get from DocumentBuilderFactory.newInstance(). Here's the relevant
code from the resolveEntity method:

url = getClass().getResource (identifierMap.get(publicId));
return new InputSource (url.toString());

When I run the application, I get the following message from the
parser:
com.sun.org.apache.xerces.internal.impl.io.MalformedByteSequenceException:
Invalid byte 1 of 1-byte UTF-8 sequence.

After browsing around a bit, I tried:

url = getClass().getResource (identifierMap.get(publicId));
FileReader reader = new FileReader (new File (url.toURI()));
return new InputSource (reader);

but this had the same problem.

I downloaded the files from the W3C site, both by using FireFox and by
using wget. In both cases I get the same behavior.

I don't know much about character encodings, so I'm at a loss as to
what to try next. Any suggestions would be greatly appreciated.

Ryan
 
L

Lew

Ryan said:
Hi:

I've got some XHTML documents that I'm using the classes in
java.xml.xpath to find certain tags. These documents contain a DTD
I get the following message from the parser:
com.sun.org.apache.xerces.internal.impl.io.MalformedByteSequenceException:
Invalid byte 1 of 1-byte UTF-8 sequence.

Ideally, all XML documents should be in UTF-8 encoding. Apparently the DTD or
your XML file isn't. When they aren't, the XML declaration should specify the
encoding.
After browsing around a bit, I tried:

url = getClass().getResource (identifierMap.get(publicId));
FileReader reader = new FileReader (new File (url.toURI()));
return new InputSource (reader);

but this had the same problem.

Have you considered using
<http://java.sun.com/javase/6/docs/a...ava.io.InputStream, java.nio.charset.Charset)>
?

This will let you specify the document encoding to match how it's stored.
 
R

Ryan McFall

Pardon my stupidity - the XML file was saved by someone else, and
apparently it was saved as something other than UTF-8. Re-saving it
into UTF-8 solved my problem.

Ryan
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,482
Members
44,901
Latest member
Noble71S45

Latest Threads

Top