R
Ryan McFall
Hi:
I've got some XHTML documents that I'm using the classes in
java.xml.xpath to find certain tags. These documents contain a DTD
declaration for XHTML, with a public identifier. Since my application
needs to work without a network connection, I've downloaded the DTD
and associated entities and made them available to my application as
resources. I then set an EntityResolver the document builder that I
get from DocumentBuilderFactory.newInstance(). Here's the relevant
code from the resolveEntity method:
url = getClass().getResource (identifierMap.get(publicId));
return new InputSource (url.toString());
When I run the application, I get the following message from the
parser:
com.sun.org.apache.xerces.internal.impl.io.MalformedByteSequenceException:
Invalid byte 1 of 1-byte UTF-8 sequence.
After browsing around a bit, I tried:
url = getClass().getResource (identifierMap.get(publicId));
FileReader reader = new FileReader (new File (url.toURI()));
return new InputSource (reader);
but this had the same problem.
I downloaded the files from the W3C site, both by using FireFox and by
using wget. In both cases I get the same behavior.
I don't know much about character encodings, so I'm at a loss as to
what to try next. Any suggestions would be greatly appreciated.
Ryan
I've got some XHTML documents that I'm using the classes in
java.xml.xpath to find certain tags. These documents contain a DTD
declaration for XHTML, with a public identifier. Since my application
needs to work without a network connection, I've downloaded the DTD
and associated entities and made them available to my application as
resources. I then set an EntityResolver the document builder that I
get from DocumentBuilderFactory.newInstance(). Here's the relevant
code from the resolveEntity method:
url = getClass().getResource (identifierMap.get(publicId));
return new InputSource (url.toString());
When I run the application, I get the following message from the
parser:
com.sun.org.apache.xerces.internal.impl.io.MalformedByteSequenceException:
Invalid byte 1 of 1-byte UTF-8 sequence.
After browsing around a bit, I tried:
url = getClass().getResource (identifierMap.get(publicId));
FileReader reader = new FileReader (new File (url.toURI()));
return new InputSource (reader);
but this had the same problem.
I downloaded the files from the W3C site, both by using FireFox and by
using wget. In both cases I get the same behavior.
I don't know much about character encodings, so I'm at a loss as to
what to try next. Any suggestions would be greatly appreciated.
Ryan