Juergen said:
I tried to parse an XML file using Xerces 2.5
The file is about 10K with about 300 end elements. I used SAX parsing
with empty callbacks.
In a loop with 10 times parsing the first parse needed 844 ms and the
last 94 (on an old 600 MHz Machine with jdk 1.4.2).
The difference probably shows the dual effects of caching the file and
hotspot compilation.
This seems very slow to me. I wonder, if there are efforts to speed up
XML parsing in java, e.g. using a fast C parse via JNI.
< 0.1 second for a 10K file with 300 elements, on a slow computer, is
slow? It seems fairly good to me, but it is rarely useful to make
judgements based on raw speed numbers. First ask whether this is a
bottleneck in your code and then also whether there is any advantage to
be gained by making it faster. (If you're parsing for display in a GUI,
for instance, the user is unlikely to see any difference between 100 ms
and 10 ms.) Finally ask whether this kind of optimization is the most
fruitful place for you to spend your time.
If yes to all of the above questions, then there are multiple potential
avenues to attempting to improve the speed. One avenue would be to
reorganize the application to hide the parse time by either (1)
performing the parse in parallel with something else, (2) pre-parsing
the file, or (3) avoiding the parse altogether wherever possible.
Another avenue would be to try to use a faster native parser via JNI, as
you suggest (but make sure your native parser _is_ faster before you
expend much effort here, and then test the JNI overhead early on). A
third avenue would be to restructure or shrink the XML that needs to be
parsed. There are probably other possibilities as well.
Good luck,
John Bollinger
(e-mail address removed)