Large XML Document Processing

A

Albert Leibbrandt

Hi

Just want to check which xml parser you guys have found to be the
quickest. I have xml documents with 250 000 records or more and the
processing of these documents are taking way to long. The validation is
the main problem. Any module names, non validating would be find to,
would help a lot.

Thanks
Albert
 
R

Rene Pijlman

Albert Leibbrandt:
Just want to check which xml parser you guys have found to be the
quickest. I have xml documents with 250 000 records or more and the
processing of these documents are taking way to long.

What type of parser are you using? Dom, minidom or sax? Sax is fastest,
but somewhat more work for the programmer. Minidom is a nice compromise
when processing many identical records.
 
U

uche.ogbuji

Albert said:
Hi

Just want to check which xml parser you guys have found to be the
quickest. I have xml documents with 250 000 records or more and the
processing of these documents are taking way to long. The validation is
the main problem. Any module names, non validating would be find to,
would help a lot.

It would help us help you if you posted samples of the target docs.
XML processing strategy often depends on the structure of the XML, just
as relational query optimization strategy often depends on the schema.
In general SAX or iterative tree-callback methods will give you the
best speed. Fredrik already mentioned ElementTree's IterParse.
Amara's pushbind and pushdom and 4Suite's Saxlette (which has some neat
callback features) are other options.

http://uche.ogbuji.net/tech/4suite/amara/
http://4suite.org/docs/CoreManual.xml#saxlette
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top