Why is SAX faster than DOM?

Discussion in 'XML' started by Ramon F Herrera, Jun 3, 2012.

  1. When I first started learning XML, after giving a cursory look to SAX,
    I decided that I need it like a need a hole in the head, and have
    been using DOM since. The event-based architecture scared me.

    My initial impression was that SAX is good for huge files (ie, those
    that do not fit in RAM).

    Am I correct?

    The reason I ask is because I would like to speed up my application
    and was wondering whether what it needs is some SAX appeal. :)

    -Ramon
    Ramon F Herrera, Jun 3, 2012
    #1
    1. Advertising

  2. * Ramon F Herrera wrote in comp.text.xml:
    >When I first started learning XML, after giving a cursory look to SAX,
    >I decided that I need it like a need a hole in the head, and have
    >been using DOM since. The event-based architecture scared me.
    >
    >My initial impression was that SAX is good for huge files (ie, those
    >that do not fit in RAM).
    >
    >Am I correct?
    >
    >The reason I ask is because I would like to speed up my application
    >and was wondering whether what it needs is some SAX appeal. :)


    Usually when you have very large files they are comprised of relatively
    small records. Wikipedia for instance offers database dumps in XML forms
    that contain every revision of every article, and even when compressed
    they tend to be many GB in size. But the data for each article, or for
    each revision, tends to be very small. Processing everything with a SAX-
    style interface would require you to code up a lot of logic to maintain
    information about the document structure, so people have come up with
    combinations of "SAX" and "DOM" style interfaces, "Reader" interfaces
    are one example. With a typical "Reader" interface you might navigate
    to an article or a revision in Wikipedia dumps, and then read all of it
    into some structure where you can access the subtree in a DOM-like way.

    That might be what you are looking for, but since you worry about speed,
    rather than memory, it would help to have more details how your markup
    looks like and what you do with it. Creating a DOM from a SAX stream is
    not something that comes at a great cost normally, it's mainly memory
    allocation and data copying in proportion of input size.
    --
    Björn Höhrmann · mailto: · http://bjoern.hoehrmann.de
    Am Badedeich 7 · Telefon: +49(0)160/4415681 · http://www.bjoernsworld.de
    25899 Dagebüll · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/
    Bjoern Hoehrmann, Jun 4, 2012
    #2
    1. Advertising

  3. On 6/3/2012 6:48 PM, Ramon F Herrera wrote:
    > My initial impression was that SAX is good for huge files (ie, those
    > that do not fit in RAM).


    That's one of the things SAX is good for. It's also good for situations
    where you want to load the data into custom datastructures tuned for the
    needs of your application, when you wouldn't want to build a DOM first
    just to recopy its contents into another representation.

    > The reason I ask is because I would like to speed up my application
    > and was wondering whether what it needs is some SAX appeal. :)


    In other words, your question isn't "why" but "whether and when". Which
    makes more sense.

    *IF* you either don't need random access to the data (can process it as
    it comes in), are able to easily filter out unneeded data (discarding it
    immediately rather than processing it), and/or can and will create data
    structures tuned for your specific application's needs -- and if you're
    careful about your coding -- moving to SAX ***MAY*** help you.

    Or it may be completely irrelevant, if that isn't where your application
    is spending most of its time. Remember that infinite speedup of
    something that accounts for 1% of your total runtime is only a 1%
    speedup of the application. I ***STRONGLY*** recommend you get your
    hands on some performance profiling tools, establish how much of your
    application's time is actually being spent in parsing and DOM
    construction and DOM navigation vs. other tasks, and only then decide
    whether this is where you want to invest your effort.

    --
    Joe Kesselman,
    http://www.love-song-productions.com/people/keshlam/index.html

    {} ASCII Ribbon Campaign | "may'ron DaroQbe'chugh vaj bIrIQbej" --
    /\ Stamp out HTML mail! | "Put down the squeezebox & nobody gets hurt."
    Joe Kesselman, Jun 4, 2012
    #3
  4. Ramon F Herrera

    japisoft Guest

    You can't compare SAX and DOM. SAX is under the parsing level therefore DOM
    is for manipulating an XML document. DOM is mostly built with SAX system.
    You can use it or ignore it building your own SAX code. However create your
    own SAX handler is much complex and the final result could be much slower
    than with a pure DOM usage.

    Best regards,

    A.Brillant
    EditiX XML Editor - http://www.editix.com


    "Ramon F Herrera" a écrit dans le message de groupe de discussion :
    ...


    When I first started learning XML, after giving a cursory look to SAX,
    I decided that I need it like a need a hole in the head, and have
    been using DOM since. The event-based architecture scared me.

    My initial impression was that SAX is good for huge files (ie, those
    that do not fit in RAM).

    Am I correct?

    The reason I ask is because I would like to speed up my application
    and was wondering whether what it needs is some SAX appeal. :)

    -Ramon
    japisoft, Jun 7, 2012
    #4
  5. On 6/7/2012 11:52 AM, japisoft wrote:
    > You can't compare SAX and DOM. SAX is under the parsing level therefore
    > DOM is for manipulating an XML document. DOM is mostly built with SAX
    > system. You can use it or ignore it building your own SAX code. However
    > create your own SAX handler is much complex and the final result could
    > be much slower than with a pure DOM usage.


    Very true. (Though some DOM parsers/loaders bypass SAX for greater
    efficiency; I believe Xerces actually uses lower-level events to drive
    its DOM construction.)

    SAX does require that you manage all the state information, which may or
    may not include building something like the DOM for part or all of the
    document. How fast or slow that will be depends entirely on the problem
    at hand and how good your code is.

    If you've got time, doing it all via SAX may be worth trying. But it
    isn't always going to be a magic bullet.

    As I said in my other post, the first thing to do is to find out whether
    this is even a significant part of your application's processing time.

    --
    Joe Kesselman,
    http://www.love-song-productions.com/people/keshlam/index.html

    {} ASCII Ribbon Campaign | "may'ron DaroQbe'chugh vaj bIrIQbej" --
    /\ Stamp out HTML mail! | "Put down the squeezebox & nobody gets hurt."
    Joe Kesselman, Jun 7, 2012
    #5
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. RickMuller
    Replies:
    4
    Views:
    692
    Alexey Shamrin
    Mar 26, 2005
  2. Mr. SweatyFinger
    Replies:
    2
    Views:
    1,761
    Smokey Grindel
    Dec 2, 2006
  3. Replies:
    104
    Views:
    10,935
    Jordan Abel
    Oct 28, 2005
  4. Replies:
    99
    Views:
    2,479
    eliza81
    Jun 11, 2010
  5. Alexandre Ferrieux

    Why is "for line in f" faster than readline()

    Alexandre Ferrieux, Jul 26, 2007, in forum: Python
    Replies:
    2
    Views:
    271
    Alexandre Ferrieux
    Jul 27, 2007
Loading...

Share This Page