Parsing for Performance

P

Paul

I have users who want to search 6 different large flat xml documents

I can only fit 3 of these documents into memory at one time

So I continually have to swap XML documents in and out of memory

Is it best to use DOM or SAX? or maybe something else?

Using SAX seems like the technology of choice for large xml files
because there is no need to put the xml into memory. But under load
would there not be a hard disk issue from numerous concurrent searches
on a big xml file?

Using DOM would give really quick search times, but since the
different xml files need to keep swapping in and out of memory, surly
constantly parsing the files into memory is hammering the hd just as
much as SAX?

So presumably SAX is the best of the worse?

or is there some other technique that would be better (Discount normal
databases and native xml databases) I know these would be faster, but
we need a quick fix
 
W

William Park

Paul said:
I have users who want to search 6 different large flat xml documents

I can only fit 3 of these documents into memory at one time

So I continually have to swap XML documents in and out of memory

Is it best to use DOM or SAX? or maybe something else?

Using SAX seems like the technology of choice for large xml files
because there is no need to put the xml into memory. But under load
would there not be a hard disk issue from numerous concurrent searches
on a big xml file?

Using DOM would give really quick search times, but since the
different xml files need to keep swapping in and out of memory, surly
constantly parsing the files into memory is hammering the hd just as
much as SAX?

So presumably SAX is the best of the worse?

or is there some other technique that would be better (Discount normal
databases and native xml databases) I know these would be faster, but
we need a quick fix

If you want to extract some data and throw away the rest, then top-down
XML parser is good choice. Eg. practically every scripting language has
interface to Expat XML parser (www.libexpat.org). Heck, even Awk and Bash
shell has it.
 
A

ajm

t'ja ....

as far as DOM v. SAX is concerned the former has a large
(sometimes v.v.large) memory footprint which might be a
problem for you. SAX on the other hand generally does
not (and concurrency might not matter depending on your
implementation e.g., a sensible SAX parser impl might
perform deep searches only when necessary etc.)

the rest, as they say, is implementation detail ;) (and
likely depends on your choice of language etc.) I
recommend you profile your results etc. and take your
time (your "quick fix" might be nothing of the sort once
you have figured the total cost of your solution ;)

hth,
ajm.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,534
Members
45,008
Latest member
Rahul737

Latest Threads

Top