Problem using local xhtml DTD when parsing file with DocumentBuilder

Discussion in 'Java' started by Ryan McFall, Jun 13, 2007.

  1. Ryan McFall

    Ryan McFall Guest


    I've got some XHTML documents that I'm using the classes in
    java.xml.xpath to find certain tags. These documents contain a DTD
    declaration for XHTML, with a public identifier. Since my application
    needs to work without a network connection, I've downloaded the DTD
    and associated entities and made them available to my application as
    resources. I then set an EntityResolver the document builder that I
    get from DocumentBuilderFactory.newInstance(). Here's the relevant
    code from the resolveEntity method:

    url = getClass().getResource (identifierMap.get(publicId));
    return new InputSource (url.toString());

    When I run the application, I get the following message from the
    Invalid byte 1 of 1-byte UTF-8 sequence.

    After browsing around a bit, I tried:

    url = getClass().getResource (identifierMap.get(publicId));
    FileReader reader = new FileReader (new File (url.toURI()));
    return new InputSource (reader);

    but this had the same problem.

    I downloaded the files from the W3C site, both by using FireFox and by
    using wget. In both cases I get the same behavior.

    I don't know much about character encodings, so I'm at a loss as to
    what to try next. Any suggestions would be greatly appreciated.

    Ryan McFall, Jun 13, 2007
    1. Advertisements

  2. Ryan McFall

    Lew Guest

    Ideally, all XML documents should be in UTF-8 encoding. Apparently the DTD or
    your XML file isn't. When they aren't, the XML declaration should specify the
    Have you considered using
    <, java.nio.charset.Charset)>

    This will let you specify the document encoding to match how it's stored.
    Lew, Jun 13, 2007
    1. Advertisements

  3. Ryan McFall

    Ryan McFall Guest

    Pardon my stupidity - the XML file was saved by someone else, and
    apparently it was saved as something other than UTF-8. Re-saving it
    into UTF-8 solved my problem.

    Ryan McFall, Jun 13, 2007
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.