parsing XML to DOM, validating against a local DTD, using Xerces under JAXP?


B

bugbear

Subject pretty much says it all. I'd like to
parse XML (duh!) using Xerces (because its fast,
and reliable, and comprehensive, and supports lots
of features).

I'd like to conform to standards as much as possible,
so I'd like to call Xerces under the JAXP API.

I'd like to validate the XML against a DTD, so that
errors are flagged up to the user, and I can transcribe
the DOM to my own data structures without (too much)
checking.

And I'd like to check aginst a local DTD so that parsing
XML doesn't cause Xerxes to try to download a DTD over
the 'net for each file (sloooow).

I'm working with j2sdk 1.4.2 and Xerces 2.5.0

This should be easy, right?
It isn't. As far as I can tell, the DocumentBuilder API
http://java.sun.com/j2se/1.4.2/docs/api/javax/xml/parsers/DocumentBuilder.html
doesn't support any way of setting features. You have to do it in the
setAttribute() method in the Factory.
http://java.sun.com/j2se/1.4.2/docs...derFactory.html#setAttribute(java.lang.String,
java.lang.Object)
But...
the Xerces documentaion says this API "cannot be relied on"
http://xml.apache.org/xerces2-j/features.html

I could get at it IFF I use a Sax parser (same Xerces page).
My desire to use a local copy of the DTD can also be addresses via
Xerces/SAX.

By defining an EntityResolver
and associating it with the DefaultHandler
passed into the SAXParser parse(...) method
I can persuade the DOCTYPE to be resolved off to a local
resource, as per this helpful page:
http://doctypechanger.sourceforge.net/#doc.Other

(javadoc URLs
http://java.sun.com/j2se/1.4.2/docs/api/org/xml/sax/EntityResolver.html
http://java.sun.com/j2se/1.4.2/docs/api/org/xml/sax/helpers/DefaultHandler.html
http://java.sun.com/j2se/1.4.2/docs/api/javax/xml/parsers/SAXParser.html#parse(java.io.File,
org.xml.sax.helpers.DefaultHandler))

But I'd rather use DOM; is it *really* this hard
to get Xerces to co-exist with JAXP?

One way around this is to define my own DOM
parser by wrapping a SAX Parser, and using a SAX ContenHandler that
simply builds up a Document object; I'm sure someone has done this
already...

But I'd like to just use Xerces "direct".

So - is there a way?

All hints, tips and links gratefully listened to.

BugBear
 
Ad

Advertisements


Top