Problem using local xhtml DTD when parsing file with DocumentBuilder

Discussion in 'Java' started by Ryan McFall, Jun 13, 2007.

  1. Ryan McFall

    Ryan McFall Guest

    Hi:

    I've got some XHTML documents that I'm using the classes in
    java.xml.xpath to find certain tags. These documents contain a DTD
    declaration for XHTML, with a public identifier. Since my application
    needs to work without a network connection, I've downloaded the DTD
    and associated entities and made them available to my application as
    resources. I then set an EntityResolver the document builder that I
    get from DocumentBuilderFactory.newInstance(). Here's the relevant
    code from the resolveEntity method:

    url = getClass().getResource (identifierMap.get(publicId));
    return new InputSource (url.toString());

    When I run the application, I get the following message from the
    parser:
    com.sun.org.apache.xerces.internal.impl.io.MalformedByteSequenceException:
    Invalid byte 1 of 1-byte UTF-8 sequence.

    After browsing around a bit, I tried:

    url = getClass().getResource (identifierMap.get(publicId));
    FileReader reader = new FileReader (new File (url.toURI()));
    return new InputSource (reader);

    but this had the same problem.

    I downloaded the files from the W3C site, both by using FireFox and by
    using wget. In both cases I get the same behavior.

    I don't know much about character encodings, so I'm at a loss as to
    what to try next. Any suggestions would be greatly appreciated.

    Ryan
     
    Ryan McFall, Jun 13, 2007
    #1
    1. Advertisements

  2. Ryan McFall

    Lew Guest

    Ideally, all XML documents should be in UTF-8 encoding. Apparently the DTD or
    your XML file isn't. When they aren't, the XML declaration should specify the
    encoding.
    Have you considered using
    <http://java.sun.com/javase/6/docs/a...ava.io.InputStream, java.nio.charset.Charset)>
    ?

    This will let you specify the document encoding to match how it's stored.
     
    Lew, Jun 13, 2007
    #2
    1. Advertisements

  3. Ryan McFall

    Ryan McFall Guest

    Pardon my stupidity - the XML file was saved by someone else, and
    apparently it was saved as something other than UTF-8. Re-saving it
    into UTF-8 solved my problem.

    Ryan
     
    Ryan McFall, Jun 13, 2007
    #3
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.