Xerces 2.7 vs 1.6 performance problem

B

Bryan

Hi,

We have an application that we just upgraded to xerces-c-2_7_0-win32.
This same application used to use xerces-c1_6_0-win32.

We didnt change any other code in our app other than the xerces libs and
dlls that were used.

We are loading up large (>20mb) xml files using DOM (we should use SAX,
I know)- in 1.6 we can parse through the file and pull data on the order
of 10sec. With 2.7, this same parsing takes more than 10 _minutes_!!!!!

I have been scouring the net looking for info, but I am not an xml
expert, nor am I particularly familiar with xerces.

Can any offer any suggestions as to where I might look or clues as to
what might be going on? I found some info on deferred node expansion,
but I really dont know if this can explain this differece.

Thanks,
Bryan
 
J

Joseph Kesselman

Have you tried asking on Xerces' own mailing list? That's where you're
most likely to find folks who have current understanding of the
internals of the parser and where possible bottlenecks might be. (My own
best guess is that you're having a swapping problem, but it's been years
since I looked at the Xerces-C code so I really can't advise you.)
 
B

Bryan

Joseph said:
Have you tried asking on Xerces' own mailing list? That's where you're
most likely to find folks who have current understanding of the
internals of the parser and where possible bottlenecks might be. (My own
best guess is that you're having a swapping problem, but it's been years
since I looked at the Xerces-C code so I really can't advise you.)

Didn't try the mailing list yet- hate those things, you get spammed with
a load of emails and they are a pain to subscribe to.

But I think I will have no choice but to give it a go soon...
 
J

Joe Kesselman

Didn't try the mailing list yet- hate those things, you get spammed with
a load of emails and they are a pain to subscribe to.

Apache's mailing lists are almost completely spam-free, in my
experience. If you need expertise specifically about Apache code, they
really are the best place to find it.
 
B

Boris Kolpackov

Hi,

Bryan said:
Can any offer any suggestions as to where I might look or clues as to
what might be going on?

It is hard to say what exactly is causing this without seeing the
code. My guess is that in order to support requirements of future
DOM versions (e.g., DOM level 3), the implementation has changed
and become less efficient. Here is a blog post about two DOM API
functions that can slow things down significantly:

http://www.codesynthesis.com/~boris/blog/2006/11/28/xerces-c-dom-potholes/


Also the Xerces-C++ mailing list is a better place for this kind of
questions.


hth,
-boris
 
J

Joseph Kesselman

Boris said:
It is hard to say what exactly is causing this without seeing the
code. My guess is that in order to support requirements of future
DOM versions (e.g., DOM level 3), the implementation has changed
and become less efficient.

If you can supply samples to the Xerces developers, I'm sure they'll be
interested in investigating what has changed and improving it if they can.

Appropos of
http://www.codesynthesis.com/~boris/blog/2006/11/28/xerces-c-dom-potholes/
.... For years, I've been telling people that the semantice of nodelists,
specifically "live view" behavior, are a set of bugs and performance
disasters waiting to happen. The DOM Level 2 Traversal chapter provides
alternatives that can be implemented much more efficiently... or, as
suggested on the website, you can switch to explicit traversal.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,483
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top