How to speed up XML parsing

J

Juergen Weber

Hi,

I tried to parse an XML file using Xerces 2.5

The file is about 10K with about 300 end elements. I used SAX parsing
with empty callbacks.

In a loop with 10 times parsing the first parse needed 844 ms and the
last 94 (on an old 600 MHz Machine with jdk 1.4.2).

This seems very slow to me. I wonder, if there are efforts to speed up
XML parsing in java, e.g. using a fast C parse via JNI.

Juergen
 
F

Frank Plassmeier

Hello Juergen,

you may want to check JADE (http://jade.dautelle.com/) which contains a
realtime parser for which the developer claims that it is much faster
than others. I haven't tried it myself though but from my experience
with JADE there isn't any reason to doubt the claim.

Kind regards,
Frank Plassmeier
 
H

Harald Hein

Juergen Weber said:
This seems very slow to me. I wonder, if there are efforts to
speed up XML parsing in java, e.g. using a fast C parse via JNI.

There are smaller, faster, but less sophisticated Java XML parsers
available. http://www.xml.com/pub/rg/46 has a small list. There are for
sure more.
 
J

John C. Bollinger

Juergen said:
I tried to parse an XML file using Xerces 2.5

The file is about 10K with about 300 end elements. I used SAX parsing
with empty callbacks.

In a loop with 10 times parsing the first parse needed 844 ms and the
last 94 (on an old 600 MHz Machine with jdk 1.4.2).

The difference probably shows the dual effects of caching the file and
hotspot compilation.
This seems very slow to me. I wonder, if there are efforts to speed up
XML parsing in java, e.g. using a fast C parse via JNI.

< 0.1 second for a 10K file with 300 elements, on a slow computer, is
slow? It seems fairly good to me, but it is rarely useful to make
judgements based on raw speed numbers. First ask whether this is a
bottleneck in your code and then also whether there is any advantage to
be gained by making it faster. (If you're parsing for display in a GUI,
for instance, the user is unlikely to see any difference between 100 ms
and 10 ms.) Finally ask whether this kind of optimization is the most
fruitful place for you to spend your time.

If yes to all of the above questions, then there are multiple potential
avenues to attempting to improve the speed. One avenue would be to
reorganize the application to hide the parse time by either (1)
performing the parse in parallel with something else, (2) pre-parsing
the file, or (3) avoiding the parse altogether wherever possible.
Another avenue would be to try to use a faster native parser via JNI, as
you suggest (but make sure your native parser _is_ faster before you
expend much effort here, and then test the JNI overhead early on). A
third avenue would be to restructure or shrink the XML that needs to be
parsed. There are probably other possibilities as well.


Good luck,

John Bollinger
(e-mail address removed)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,763
Messages
2,569,562
Members
45,038
Latest member
OrderProperKetocapsules

Latest Threads

Top