XML parsing with Java

vk02720 · Dec 8, 2008

What is the standard/safe/optimal way of parsing XML from Java
program? I have to use JDK 1.4 to begin with now but in few months
code should be compilable and workable with 1.5 as well.
What has been confusing is 1.4 including XML parser but not Xerces?
How can I use Xerces with 1.4? Should I just add the xerces jar file
to my project and use JAXP API? From some previous projects, I tried
to look for xerces jars - some have xerces.jar and some
xercesImpl.jar. What is the difference? Where can I download recent/
correct one to be able to use it with 1.4/JAXP?

After I build my program, is there any way I can know which parser is
being used? Anything I can call in my program to print some info about
parser?

I am currently required to do atleast:
- validate my input XML against a schema (in a seperate xsd file)
- to be able to use DOM and SAX basic APIs.
- to be able to use Xpath.

Any insights/advice appreciated.

TIA

Lew · Dec 8, 2008

What is the standard/safe/optimal way of parsing XML from Java
program? I have to use JDK 1.4 to begin with now but in few months

Java 1.4 has been completely retired for a few weeks now, and obsolescent for
quite some time.

code should be compilable and workable with 1.5 as well.
What has been confusing is 1.4 including XML parser but not Xerces?

It is Xerces.

How can I use Xerces with 1.4? Should I just add the xerces jar file

Just use the libraries that come with Java.

to my project and use JAXP API? From some previous projects, I tried
to look for xerces jars - some have xerces.jar and some
xercesImpl.jar. What is the difference? Where can I download recent/
correct one to be able to use it with 1.4/JAXP?

Just use the libraries that come with Java.

After I build my program, is there any way I can know which parser is
being used? Anything I can call in my program to print some info about
parser?

What does knowing which parser you're using tell you? How is that knowledge
going to serve you?

I did a cursory review of the Sun Javadocs and didn't see any dynamic means to
identify the parser, although the Javadocs themselves tell us that Java uses
the org.xml.sax libraries for SAX, and the org.w3c.dom libraries for DOM
parsing. A similarly cursory review of the saxproject.org docs referenced
from Sun's Javadocs didn't find me what you're asking for either. I guess
you'll need to do some searching with our friend Google and its cousins for this.

However, it is likely that knowing that information will not tell you anything
that matters.

Arne Vajhøj · Dec 8, 2008

What is the standard/safe/optimal way of parsing XML from Java
program? I have to use JDK 1.4 to begin with now but in few months
code should be compilable and workable with 1.5 as well.
What has been confusing is 1.4 including XML parser but not Xerces?

Java contains a XML parser that follows the JAXP standard.

If you use only the JAXP standard, then it should not matter.

Implementation wise Java 1.4 used Crimson and Java 1.5 and newer
uses Xerces.

How can I use Xerces with 1.4? Should I just add the xerces jar file
to my project and use JAXP API? From some previous projects, I tried
to look for xerces jars - some have xerces.jar and some
xercesImpl.jar. What is the difference? Where can I download recent/
correct one to be able to use it with 1.4/JAXP?

Get Xerces in the classpath and use:

System.setProperty("javax.xml.parsers.DocumentBuilderFactory",
"org.apache.xerces.jaxp.DocumentBuilderFactoryImpl");

After I build my program, is there any way I can know which parser is
being used? Anything I can call in my program to print some info about
parser?

Sure - just print the concrete class of some object with:
somexmlobj.getClass().getName()
and you can see from package name what it is.

I am currently required to do atleast:
- validate my input XML against a schema (in a seperate xsd file)

Easy. Both with DOM and SAX.

- to be able to use DOM and SAX basic APIs.
Yep.

- to be able to use Xpath.

Requires DOM.

Arne

Arne VajhÃ¸j · Dec 8, 2008

Lew said:
It is Xerces.

No. 1.4 used Crimson. It is Xerces from 1.5 and newer.

Arne

Arne Vajhøj · Dec 8, 2008

Spud said:
I'd consider using stax instead. It's built into jdk 1.6 and yields much
cleaner code.

StAX is a good alternative to SAX. But the Java 1.6 implementation does
not support validation (at least I get an exception when I set the
property).

Arne

Lew · Dec 8, 2008

Arne said:
No. 1.4 used Crimson. It is Xerces from 1.5 and newer.

Oh, I stand corrected. Thanks.

Arne VajhÃ¸j · Dec 8, 2008

Lew said:
Oh, I stand corrected. Thanks.

They probably should have chosen Xerces already for
1.4, because Xerces were already better than Crimson
at the time, but Crimson was said to be slightly
faster *and* Crimson was donated to Apache by SUN
while Xerces was donated to Apache by IBM (XML4J).

Arne

vk02720 · Dec 8, 2008

They probably should have chosen Xerces already for
1.4, because Xerces were already better than Crimson
at the time, but Crimson was said to be slightly
faster *and* Crimson was donated to Apache by SUN
while Xerces was donated to Apache by IBM (XML4J).

Arne

Thanks.
Java 1.4 does use Crimson by default. There is an option to print some
debug info using -Djaxp.debug=1 which shows how it selects the
FactoryImpl.
This is what gets printed if xerces jar is not included.
JAXP: loaded from fallback value:
org.apache.crimson.jaxp.SAXParserFactoryImpl

1.4 with xercer jar included uses Xerces
JAXP: found META-INF/services/javax.xml.parsers.SAXParserFactory
JAXP: loaded from services:
org.apache.xerces.jaxp.SAXParserFactoryImpl

1.5
JAXP: find factoryId =javax.xml.parsers.SAXParserFactory
JAXP: loaded from fallback value:
com.sun.org.apache.xerces.internal.jaxp.SAXParserFactoryImpl

1.5 with xerces jar (not really necessary I guess)
JAXP: find factoryId =javax.xml.parsers.SAXParserFactory
JAXP: found jar resource=META-INF/services/
javax.xml.parsers.SAXParserFactory using ClassLoader: sun.misc.Launcher
$AppClassLoader@e39a3e

1.4 without xerces jar could work for most purposes. However the
capabilities and differences do begin to show. For example for schema
validation, this did not work with crimson.
factory.setFeature("http://apache.org/xml/features/validation/
schema",true);
Got error :
org.xml.sax.SAXNotRecognizedException: Feature:
http://apache.org/xml/features/validation/schema
at org.apache.crimson.parser.XMLReaderImpl.setFeature(Unknown Source)

Does Crimson not support schema validation?

Arne Vajhøj · Dec 8, 2008

Does Crimson not support schema validation?

I don't know. It may not. Crimson is from the age
of the DTD !

Arne

vk02720 · Dec 9, 2008

I don't know. It may not. Crimson is from the age
of the DTD !

Arne

Well, in that case Java 1.4 without adding xerces would have that
limitation. 1.5 has validation API and xerces as default so no issues
there.
Anything new in 1.6?

Also how about dom4j or JDOM - do lot of people use it? Any of these
candidates for becoming a "standard" or making their way in the JDK
someday?

Peter D. · Dec 9, 2008

What is the standard/safe/optimal way of parsing XML from Java
program? I have to use JDK 1.4 to begin with now but in few months
code should be compilable and workable with 1.5 as well.
What has been confusing is 1.4 including XML parser but not Xerces?
How can I use Xerces with 1.4? Should I just add the xerces jar file
to my project and use JAXP API? From some previous projects, I tried
to look for xerces jars - some have xerces.jar and some
xercesImpl.jar. What is the difference? Where can I download recent/
correct one to be able to use it with 1.4/JAXP?

After I build my program, is there any way I can know which parser is
being used? Anything I can call in my program to print some info about
parser?

I am currently required to do atleast:
- validate my input XML against a schema (in a seperate xsd file)
- to be able to use DOM and SAX basic APIs.
- to be able to use Xpath.

Any insights/advice appreciated.

TIA

Anyone ever use JAXB? I think it's fantastic.

http://java.sun.com/developer/technicalArticles/WebServices/jaxb/

Arne Vajhøj · Dec 10, 2008

Well, in that case Java 1.4 without adding xerces would have that
limitation. 1.5 has validation API and xerces as default so no issues
there.
Anything new in 1.6?

StAX and JAXB API's were added.

Also how about dom4j or JDOM - do lot of people use it? Any of these
candidates for becoming a "standard" or making their way in the JDK
someday?

I don't think they will ever be added to Java, since they are more
user friendly oriented than standard oriented.

I have used JDOM a few times. It is simply easier to use than the
standard W3C DOM.

But the advantage with W3C DOM is that you can code the same way
in Java, C#, C, JS, VBS etc..

I know that dom4j also is popular with some projects, but I have not
used it myself.

Arne

Arne Vajhøj · Dec 10, 2008

Peter said:
Anyone ever use JAXB? I think it's fantastic.

http://java.sun.com/developer/technicalArticles/WebServices/jaxb/

JAXB is good.

But probably not to the original posters problem (at least not
as described).

Arne

vk02720 · Dec 11, 2008

JAXB is good.

But probably not to the original posters problem (at least not
as described).

Arne

True. I was trying to look at more basic barebones XML/XPath API
although binding frameworks like JAXB are a good option if you can use
it. Unfortunately, one of the system I am interfacing with has a lot
of name/value pair type of info (in XML) and they dont commit on
publishing the XSD which I believe is a must for JAXB type frameworks.

Mike Davis · Dec 12, 2008

Lew said:
Java 1.4 has been completely retired for a few weeks now, and
obsolescent for quite some time.

Ha! That may be true, but I am now working on a project where that is
the only version of the language allowed. We found this out after
writing a few thousand lines with generics, enums, and a few other 1.5
features.

--mad

Arne VajhÃ¸j · Dec 12, 2008

Mike said:
Ha! That may be true, but I am now working on a project where that is
the only version of the language allowed. We found this out after
writing a few thousand lines with generics, enums, and a few other 1.5
features.

If I were to guess at the Java version usage distribution I would say:

1.2.2 - 5%
1.3.1 - 15%
1.4.2 - 25%
1.5.0 - 35%
1.6.0 - 20%

(please ignore the fact that it is really impossible to quantify
usage in a meaningful way)

Arne

Lew · Dec 13, 2008

At my day job half the Java infrastructure is just coming in to Java 1.4, the
other half to 1.5. The problem is widespread.

If I were to guess at the Java version usage distribution I would say:

1.2.2 - 5%
1.3.1 - 15%
1.4.2 - 25%
1.5.0 - 35%
1.6.0 - 20%

(please ignore the fact that it is really impossible to quantify
usage in a meaningful way)

I would guess that the usage is higher for 1.4 than your guess, and Java 6 is
much lower. But I'm in the position of trying to guess the shape of an
elephant knowing only the feel of its ears.

If I had the ears of the decision makers where I work, I'd suggest to them
that the risk of continuing with Java 1.4, with its insufficient concurrent
memory model and slower performance than modern versions, exceeds that of the
conversion to Java 5, especially in our environment which involves multiple
nodes with multiple processors running multiple JVMs with various forms of
communication between them processing high peak volumes of information per
unit of time under tight time constraints and rigorous availability requirements.

Some similarly high-demand production Java code I've seen runs about three
times faster under Java 5 and the associated Java EE (J2EE) servers than it
did with older platforms. Not just CPU-bound code, but all sorts of different
stuff involving messages and files and databases and the like. Obviously Java
by itself is only a piece of that improvement - the app-server vendors were
busy improving their stuff, too.

The fear of upgrade that I've witnessed was based on considerations of product
reliability on a new platform, cost of code conversions (rooting out misuse of
the 'enum' keyword and the like), and operations costs associated with
migration to and maintenance of the new enterprise platform. Decision makers
seemed utterly unimpressed with claims of performance improvement; only risk
mattered.

Lately I have been meditating on the balance of risks between those that arise
from conversion and those that arise from the failure to convert to Java 5 or
later. I posit that risk comparison will carry more meaning to decision
makers than benefit comparison.

John B. Matthews · Dec 13, 2008

Arne VajhÃ¸j said:
If I were to guess at the Java version usage distribution I would say:

1.2.2 - 5%
1.3.1 - 15%
1.4.2 - 25%
1.5.0 - 35%
1.6.0 - 20%

(please ignore the fact that it is really impossible to quantify
usage in a meaningful way)

Google - millions of hits:

java 1.1 - 22.8
java 1.2 - 16.1
java 1.3 - 12.0
java 1.4 - 12.6
java 1.5 - 38.1
java 1.6 - 10.2
java 1.7 - 5.2

Bimodal!?

Lew · Dec 13, 2008

Google - millions of hits:

Is it your intention to claim that "Google - millions of hits" is a meaningful
metric of Java platform usage?

java 1.1 - 22.8
java 1.2 - 16.1
java 1.3 - 12.0
java 1.4 - 12.6
java 1.5 - 38.1
java 1.6 - 10.2
java 1.7 - 5.2

Bimodal!?

The number of hits for Java 1.7 clearly doesn't reflect usage, since Java 7
isn't even fully defined yet and is therefore not yet in use at all.

Your hit counts ignored the new version numbering scheme whereby the two most
recent versions are "Java 5" and "Java 6".

Hit counts represent how many documents exist for a particular search term
set, but one has to show how that correlates to usage, if it even does.

Hits are cumulative, the longer something is around the more documents there
could be that pertain to it. That could explain the high count for Java 1.1
The high count for 1.5 might reflect hits on newsgroups, which are multiply
republished on a host of hosts, but who knows, really? Maybe the types of
hits for 1.5 are those more likely to be duplicated on multiple nodes,
inflating the hit count. Maybe it was more contemporaneous with heavy Web use
than earlier versions. Maybe it is in wider use than other versions. Who
knows? I can't tell from these hit count numbers.

John B. Matthews · Dec 13, 2008

Google - millions of hits:

Is it your intention to claim that "Google - millions of hits" is a
meaningful metric of Java platform usage?[/QUOTE]

Egad, no! I should have reiterated Arne's caveat. OTOH, the result is
not entirely unexpected and parallels Arne's (considerable) experience.

The number of hits for Java 1.7 clearly doesn't reflect usage, since
Java 7 isn't even fully defined yet and is therefore not yet in use
at all.

Your hit counts ignored the new version numbering scheme whereby the
two most recent versions are "Java 5" and "Java 6".

Yes, this is confounding; I was sticking with the developer version
numbers:

Hit counts represent how many documents exist for a particular search
term set, but one has to show how that correlates to usage, if it
even does.

Hits are cumulative, the longer something is around the more
documents there could be that pertain to it. That could explain the
high count for Java 1.1 The high count for 1.5 might reflect hits on
newsgroups, which are multiply republished on a host of hosts, but
who knows, really? Maybe the types of hits for 1.5 are those more
likely to be duplicated on multiple nodes, inflating the hit count.
Maybe it was more contemporaneous with heavy Web use than earlier
versions. Maybe it is in wider use than other versions. Who knows?
I can't tell from these hit count numbers.

Indeed, such numbers are almost meaningless, yet strangely fascinating.
Cf <http://www.google.com/intl/en/press/zeitgeist2008/index.html>

Here's a very rough measure of features/version from skimming the Java
1.5 API documentation (J2SE 5.0):

<code>
#!/bin/sh
DIR=/Developer/Documentation/Java/docs
ECHO=/bin/echo
for ((i=0; i<=6; i++)) ; do
${ECHO} -n "Since 1.${i}: "
grep -R "<DD>1.${i}" $DIR/* | wc -l
done
</code>

<console>
$ ./since.sh
Since 1.0: 26
Since 1.1: 89
Since 1.2: 965
Since 1.3: 550
Since 1.4: 1384
Since 1.5: 1321
Since 1.6: 0
</console>

Java with Netbeans	2	Apr 12, 2022
Parsing XML with Java 1.4.2 own tools?	13	Sep 12, 2008
Parsing Soap Response in java	10	Apr 4, 2014
How to implement a html parser in java?	1	Dec 28, 2023
Detect XML document encodings with SAX	42	Nov 21, 2012
How to save textBox values into a xml-file(with naming an choosing directory)?	1	Aug 23, 2022
Eclipse to Java command line	3	Apr 29, 2010
Parsing XML documents behind a firewall; java makes a connect to theactual DTD?	1	Apr 18, 2008

XML parsing with Java

vk02720

Lew

Arne Vajhøj

Arne VajhÃ¸j

Arne Vajhøj

Lew

Arne VajhÃ¸j

vk02720

Arne Vajhøj

vk02720

Peter D.

Arne Vajhøj

Arne Vajhøj

vk02720

Mike Davis

Arne VajhÃ¸j

Lew

John B. Matthews

Lew

John B. Matthews

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads