Parsing for Performance

Discussion in 'XML' started by Paul, Apr 22, 2005.

  1. Paul

    Paul Guest

    I have users who want to search 6 different large flat xml documents

    I can only fit 3 of these documents into memory at one time

    So I continually have to swap XML documents in and out of memory

    Is it best to use DOM or SAX? or maybe something else?

    Using SAX seems like the technology of choice for large xml files
    because there is no need to put the xml into memory. But under load
    would there not be a hard disk issue from numerous concurrent searches
    on a big xml file?

    Using DOM would give really quick search times, but since the
    different xml files need to keep swapping in and out of memory, surly
    constantly parsing the files into memory is hammering the hd just as
    much as SAX?

    So presumably SAX is the best of the worse?

    or is there some other technique that would be better (Discount normal
    databases and native xml databases) I know these would be faster, but
    we need a quick fix
     
    Paul, Apr 22, 2005
    #1
    1. Advertising

  2. Paul

    William Park Guest

    Paul <> wrote:
    > I have users who want to search 6 different large flat xml documents
    >
    > I can only fit 3 of these documents into memory at one time
    >
    > So I continually have to swap XML documents in and out of memory
    >
    > Is it best to use DOM or SAX? or maybe something else?
    >
    > Using SAX seems like the technology of choice for large xml files
    > because there is no need to put the xml into memory. But under load
    > would there not be a hard disk issue from numerous concurrent searches
    > on a big xml file?
    >
    > Using DOM would give really quick search times, but since the
    > different xml files need to keep swapping in and out of memory, surly
    > constantly parsing the files into memory is hammering the hd just as
    > much as SAX?
    >
    > So presumably SAX is the best of the worse?
    >
    > or is there some other technique that would be better (Discount normal
    > databases and native xml databases) I know these would be faster, but
    > we need a quick fix


    If you want to extract some data and throw away the rest, then top-down
    XML parser is good choice. Eg. practically every scripting language has
    interface to Expat XML parser (www.libexpat.org). Heck, even Awk and Bash
    shell has it.

    --
    William Park <>, Toronto, Canada
    Slackware Linux -- because it works.
     
    William Park, Apr 22, 2005
    #2
    1. Advertising

  3. Paul

    ajm Guest

    t'ja ....

    as far as DOM v. SAX is concerned the former has a large
    (sometimes v.v.large) memory footprint which might be a
    problem for you. SAX on the other hand generally does
    not (and concurrency might not matter depending on your
    implementation e.g., a sensible SAX parser impl might
    perform deep searches only when necessary etc.)

    the rest, as they say, is implementation detail ;) (and
    likely depends on your choice of language etc.) I
    recommend you profile your results etc. and take your
    time (your "quick fix" might be nothing of the sort once
    you have figured the total cost of your solution ;)

    hth,
    ajm.


    William Park <> wrote in message news:<85eb5$42691a04$d1b71688$>...
    > Paul <> wrote:
    > > I have users who want to search 6 different large flat xml documents
    > >
    > > I can only fit 3 of these documents into memory at one time
    > >
    > > So I continually have to swap XML documents in and out of memory
    > >
    > > Is it best to use DOM or SAX? or maybe something else?
    > >
    > > Using SAX seems like the technology of choice for large xml files
    > > because there is no need to put the xml into memory. But under load
    > > would there not be a hard disk issue from numerous concurrent searches
    > > on a big xml file?
    > >
    > > Using DOM would give really quick search times, but since the
    > > different xml files need to keep swapping in and out of memory, surly
    > > constantly parsing the files into memory is hammering the hd just as
    > > much as SAX?
    > >
    > > So presumably SAX is the best of the worse?
    > >
    > > or is there some other technique that would be better (Discount normal
    > > databases and native xml databases) I know these would be faster, but
    > > we need a quick fix
     
    ajm, Apr 25, 2005
    #3
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. jm
    Replies:
    1
    Views:
    541
    alien2_51
    Dec 12, 2003
  2. =?Utf-8?B?TWFyaw==?=

    odd performance question - xml parsing

    =?Utf-8?B?TWFyaw==?=, Jan 17, 2006, in forum: ASP .Net
    Replies:
    2
    Views:
    2,029
    =?Utf-8?B?TWFyaw==?=
    Jan 18, 2006
  3. Thomas Kowalski

    Performance File Parsing

    Thomas Kowalski, Aug 18, 2006, in forum: C++
    Replies:
    1
    Views:
    320
    Jerry Coffin
    Aug 18, 2006
  4. Micah Wedemeyer
    Replies:
    1
    Views:
    172
    NAKAMURA, Hiroshi
    Jun 6, 2007
  5. Software Engineer
    Replies:
    0
    Views:
    371
    Software Engineer
    Jun 10, 2011
Loading...

Share This Page