Parsing for Performance

Paul · Apr 22, 2005

I have users who want to search 6 different large flat xml documents

I can only fit 3 of these documents into memory at one time

So I continually have to swap XML documents in and out of memory

Is it best to use DOM or SAX? or maybe something else?

Using SAX seems like the technology of choice for large xml files
because there is no need to put the xml into memory. But under load
would there not be a hard disk issue from numerous concurrent searches
on a big xml file?

Using DOM would give really quick search times, but since the
different xml files need to keep swapping in and out of memory, surly
constantly parsing the files into memory is hammering the hd just as
much as SAX?

So presumably SAX is the best of the worse?

or is there some other technique that would be better (Discount normal
databases and native xml databases) I know these would be faster, but
we need a quick fix

William Park · Apr 22, 2005

Paul said:
I have users who want to search 6 different large flat xml documents

I can only fit 3 of these documents into memory at one time

So I continually have to swap XML documents in and out of memory

Is it best to use DOM or SAX? or maybe something else?

Using SAX seems like the technology of choice for large xml files
because there is no need to put the xml into memory. But under load
would there not be a hard disk issue from numerous concurrent searches
on a big xml file?

Using DOM would give really quick search times, but since the
different xml files need to keep swapping in and out of memory, surly
constantly parsing the files into memory is hammering the hd just as
much as SAX?

So presumably SAX is the best of the worse?

or is there some other technique that would be better (Discount normal
databases and native xml databases) I know these would be faster, but
we need a quick fix

If you want to extract some data and throw away the rest, then top-down
XML parser is good choice. Eg. practically every scripting language has
interface to Expat XML parser (www.libexpat.org). Heck, even Awk and Bash
shell has it.

ajm · Apr 25, 2005

t'ja ....

as far as DOM v. SAX is concerned the former has a large
(sometimes v.v.large) memory footprint which might be a
problem for you. SAX on the other hand generally does
not (and concurrency might not matter depending on your
implementation e.g., a sensible SAX parser impl might
perform deep searches only when necessary etc.)

the rest, as they say, is implementation detail

(and
likely depends on your choice of language etc.) I
recommend you profile your results etc. and take your
time (your "quick fix" might be nothing of the sort once
you have figured the total cost of your solution

hth,
ajm.

SAX Parser for creating (not parsing) XML document	8	Jul 21, 2008
Why is SAX faster than DOM?	4	Jun 3, 2012
How to scale and/or "object orient" SAX parsing for big files?	7	Jan 14, 2008
memory considerations when parsing XML file	2	Jan 31, 2008
XML parsing: SAX/expat & yield	2	Aug 4, 2010
Parsing multiple XML trees?	3	Dec 15, 2005
Help with SAX parsing	3	Aug 29, 2005
parsing xml (xmpp) with ruby	3	Sep 27, 2008

Parsing for Performance

Paul

William Park

ajm

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads