Slowness of SAX

Discussion in 'Java' started by Sigfried, Nov 12, 2008.

  1. Sigfried

    Sigfried Guest

    Hi, using a java profiler, i've realized that SAX is consuming too much
    time:
    - endElement + startElement 40 %
    - *.read 7 %
    - a few <= 1%

    So SAX take about 50 % of the time !!

    Do you know faster XML API ?
    Sigfried, Nov 12, 2008
    #1
    1. Advertising

  2. Sigfried

    Roedy Green Guest

    On Wed, 12 Nov 2008 09:16:44 +0100, Sigfried <>
    wrote, quoted or indirectly quoted someone who said :

    >Do you know faster XML API ?


    XML/SAX is inherently a high-overhead format, best used for small
    files. Consider converting your file to something else, e.g.
    DataInputStream ar Serialised stream so you pay the overhead only
    once.

    See http://mindprod.com/jgloss/xml.html
    for alternative processing techniques.

    --
    Roedy Green Canadian Mind Products
    http://mindprod.com
    Your old road is
    Rapidly agin'.
    Please get out of the new one
    If you can't lend your hand
    For the times they are a-changin'.
    Roedy Green, Nov 12, 2008
    #2
    1. Advertising

  3. Sigfried

    Lew Guest

    bugbear wrote:
    > Sigfried wrote:
    >> Hi, using a java profiler, i've [sic] realized that SAX is consuming too
    >> much time:
    >> - endElement + startElement 40 %
    >> - *.read 7 %
    >> - a few <= 1%
    >>
    >> So SAX take about 50 % of the time !!

    >
    > If all you're doing is parsing, what would you expect?


    Indeed.

    > Give us more context.


    I found SAX to be extremely fast, arguably the (possibly tied for) fastest XML
    parsing in Java. Back in 1999 we were able to parse a million rather large
    documents in about three hours over a 10MB/s Ethernet connection using Java
    1.2 on the hardware extant in those days using SAX, and it was very
    parsimonious of memory. Parsers and JVMs (and hardware) have improved
    considerably since then.

    As bugbear points out, 50% of the time parsing is quite reasonable if at least
    50% of the work to do is parsing, and if three-quarters of the work is parsing
    you're money ahead.

    --
    Lew
    Lew, Nov 12, 2008
    #3
  4. Sigfried

    Sigfried Guest

    bugbear a écrit :
    > Sigfried wrote:
    >> Hi, using a java profiler, i've realized that SAX is consuming too
    >> much time:
    >> - endElement + startElement 40 %
    >> - *.read 7 %
    >> - a few <= 1%
    >>
    >> So SAX take about 50 % of the time !!

    >
    > If all you're doing is parsing, what would you expect?
    >
    > Give us more context.


    I've tried the jdk 1.6 stax implementation which is 10 % faster, but the
    DTD is ignored... So i guess Stax speed is the same as SAX. I would hope
    pushing to 30 % for XML parsing.
    Sigfried, Nov 12, 2008
    #4
  5. Sigfried

    Tom Anderson Guest

    On Wed, 12 Nov 2008, Sigfried wrote:

    > Hi, using a java profiler, i've realized that SAX is consuming too much time:
    > - endElement + startElement 40 %
    > - *.read 7 %
    > - a few <= 1%
    >
    > So SAX take about 50 % of the time !!


    Which startElement and endElement methods are these? I assume not the ones
    in the ContentHandler, right?

    > Do you know faster XML API ?


    http://www.itu.int/rec/T-REC-X.891-200505-I/en
    http://java.sun.com/developer/technicalArticles/xml/fastinfoset/

    Although that's probably not what you meant.

    But seriously, XML isn't fast. Never has been, never will be. If you need
    fast, don't use XML. Fast XML parsing is like semi racing: even if you
    win, you're still retarded.

    tom

    --
    Safety not guaranteed. I have only done this once before.
    Tom Anderson, Nov 12, 2008
    #5
  6. Sigfried

    Tom Anderson Guest

    On Wed, 12 Nov 2008, Tom Anderson wrote:

    > On Wed, 12 Nov 2008, Sigfried wrote:
    >
    >> Hi, using a java profiler, i've realized that SAX is consuming too much
    >> time:
    >> - endElement + startElement 40 %
    >> - *.read 7 %
    >> - a few <= 1%
    >>
    >> So SAX take about 50 % of the time !!
    >>
    >> Do you know faster XML API ?

    >
    > http://www.itu.int/rec/T-REC-X.891-200505-I/en
    > http://java.sun.com/developer/technicalArticles/xml/fastinfoset/
    >
    > Although that's probably not what you meant.
    >
    > But seriously, XML isn't fast. Never has been, never will be. If you need
    > fast, don't use XML. Fast XML parsing is like semi racing: even if you win,
    > you're still retarded.


    Although you could try this:

    http://piccolo.sourceforge.net/

    tom

    --
    Safety not guaranteed. I have only done this once before.
    Tom Anderson, Nov 12, 2008
    #6
  7. Sigfried

    Arne Vajhøj Guest

    Roedy Green wrote:
    > On Wed, 12 Nov 2008 09:16:44 +0100, Sigfried <>
    > wrote, quoted or indirectly quoted someone who said :
    >> Do you know faster XML API ?

    >
    > XML/SAX is inherently a high-overhead format, best used for small
    > files.


    No - SAX is the XML parser for huge files.

    For small files DOM and XPath is much easier.

    Arne
    Arne Vajhøj, Nov 13, 2008
    #7
  8. Sigfried

    Arne Vajhøj Guest

    Sigfried wrote:
    > Hi, using a java profiler, i've realized that SAX is consuming too much
    > time:
    > - endElement + startElement 40 %
    > - *.read 7 %
    > - a few <= 1%
    >
    > So SAX take about 50 % of the time !!
    >
    > Do you know faster XML API ?


    SAX is usually the fastest XML parser.

    And I can not see why you are surprised that the XML parser
    uses most of the CPU time when doing XML parsing.

    Arne
    Arne Vajhøj, Nov 13, 2008
    #8
  9. Sigfried

    Daniel Pitts Guest

    Sigfried wrote:
    > Hi, using a java profiler, i've realized that SAX is consuming too much
    > time:
    > - endElement + startElement 40 %
    > - *.read 7 %
    > - a few <= 1%
    >
    > So SAX take about 50 % of the time !!
    >
    > Do you know faster XML API ?

    SAX uses callbacks. startElement/endElement probably calls some code
    that processes the result. It is *that* code which is taking up CPU
    time, you should see what is under that part of the callstack.

    --
    Daniel Pitts' Tech Blog: <http://virtualinfinity.net/wordpress/>
    Daniel Pitts, Nov 13, 2008
    #9
  10. Sigfried

    Lew Guest

    Arne Vajhøj wrote:
    > Roedy Green wrote:
    >> On Wed, 12 Nov 2008 09:16:44 +0100, Sigfried <>
    >> wrote, quoted or indirectly quoted someone who said :
    >>> Do you know faster XML API ?

    >>
    >> XML/SAX is inherently a high-overhead format, best used for small
    >> files.

    >
    > No - SAX is the XML parser for huge files.
    >
    > For small files DOM and XPath is much easier.


    Quite so. The advantage of SAX over DOM is that it is quite fast, very easy
    on memory requirements and suitable for single-pass processing of XML
    documents. Its disadvantage is that it does not keep an in-memory
    representation of the XML document for repeated processing.

    --
    Lew
    Lew, Nov 13, 2008
    #10
  11. Lew wrote:
    > Arne Vajhøj wrote:
    >> Roedy Green wrote:
    >>> On Wed, 12 Nov 2008 09:16:44 +0100, Sigfried <>
    >>> wrote, quoted or indirectly quoted someone who said :
    >>>> Do you know faster XML API ?
    >>>
    >>> XML/SAX is inherently a high-overhead format, best used for small
    >>> files.

    >>
    >> No - SAX is the XML parser for huge files.
    >>
    >> For small files DOM and XPath is much easier.

    >
    > Quite so. The advantage of SAX over DOM is that it is quite fast, very
    > easy on memory requirements and suitable for single-pass processing of
    > XML documents. Its disadvantage is that it does not keep an in-memory
    > representation of the XML document for repeated processing.


    Plus compared to XPath you need to write a lot of code to do some
    advanced searching.

    Arne
    Arne Vajhøj, Nov 13, 2008
    #11
  12. Lew wrote:
    > Arne Vajhøj wrote:
    >> Roedy Green wrote:
    >>> On Wed, 12 Nov 2008 09:16:44 +0100, Sigfried
    >>> <>
    >>> wrote, quoted or indirectly quoted someone who said :
    >>>> Do you know faster XML API ?
    >>>
    >>> XML/SAX is inherently a high-overhead format, best used for small
    >>> files.

    >>
    >> No - SAX is the XML parser for huge files.
    >>
    >> For small files DOM and XPath is much easier.

    >
    > Quite so. The advantage of SAX over DOM is that it is quite fast,
    > very easy on memory requirements and suitable for single-pass
    > processing of XML documents. Its disadvantage is that it does not
    > keep an in-memory representation of the XML document for repeated
    > processing.


    However, if you want to create an in-memory representation of a subset
    of a huge document, SAX is the way to build it. In fact, making SAX
    callbacks create a DOM (optionally filtering out part of the
    document's content) is a pretty trivial exercise.
    Mike Schilling, Nov 13, 2008
    #12
  13. Sigfried

    Lew Guest

    Arne Vajhøj wrote:
    > Lew wrote:
    >> Arne Vajhøj wrote:
    >>> Roedy Green wrote:
    >>>> On Wed, 12 Nov 2008 09:16:44 +0100, Sigfried <>
    >>>> wrote, quoted or indirectly quoted someone who said :
    >>>>> Do you know faster XML API ?
    >>>>
    >>>> XML/SAX is inherently a high-overhead format, best used for small
    >>>> files.
    >>>
    >>> No - SAX is the XML parser for huge files.
    >>>
    >>> For small files DOM and XPath is much easier.

    >>
    >> Quite so. The advantage of SAX over DOM is that it is quite fast,
    >> very easy on memory requirements and suitable for single-pass
    >> processing of XML documents. Its disadvantage is that it does not
    >> keep an in-memory representation of the XML document for repeated
    >> processing.

    >
    > Plus compared to XPath you need to write a lot of code to do some
    > advanced searching.


    That isn't the point of SAX. SAX lets you import XML-encoded information
    directly into an in-memory structure - that being the "lot" of code you need
    to write but not really necessarily all that much. Once you have your object
    model built, there shouldn't be a need for "advanced searching", you just
    directly use the objects that you built.

    If there is a need for advanced searching, then perhaps SAX is the wrong choice.

    --
    Lew
    Lew, Nov 13, 2008
    #13
  14. Lew wrote:
    > Arne Vajhøj wrote:
    >> Lew wrote:
    >>> Arne Vajhøj wrote:
    >>>> Roedy Green wrote:
    >>>>> On Wed, 12 Nov 2008 09:16:44 +0100, Sigfried <>
    >>>>> wrote, quoted or indirectly quoted someone who said :
    >>>>>> Do you know faster XML API ?
    >>>>>
    >>>>> XML/SAX is inherently a high-overhead format, best used for small
    >>>>> files.
    >>>>
    >>>> No - SAX is the XML parser for huge files.
    >>>>
    >>>> For small files DOM and XPath is much easier.
    >>>
    >>> Quite so. The advantage of SAX over DOM is that it is quite fast,
    >>> very easy on memory requirements and suitable for single-pass
    >>> processing of XML documents. Its disadvantage is that it does not
    >>> keep an in-memory representation of the XML document for repeated
    >>> processing.

    >>
    >> Plus compared to XPath you need to write a lot of code to do some
    >> advanced searching.

    >
    > That isn't the point of SAX. SAX lets you import XML-encoded
    > information directly into an in-memory structure - that being the "lot"
    > of code you need to write but not really necessarily all that much.
    > Once you have your object model built, there shouldn't be a need for
    > "advanced searching", you just directly use the objects that you built.
    >
    > If there is a need for advanced searching, then perhaps SAX is the wrong
    > choice.


    The last is my point.

    Doing //sometag/someothertag[athirdtag/@someattr='foobar']/afourthtag/text()
    in SAX would require a lot more code than just a selectSingleNode
    call.

    Arne
    Arne Vajhøj, Nov 13, 2008
    #14
  15. Sigfried

    Lew Guest

    Lew wrote:
    >> If there is a need for advanced searching, then perhaps SAX is the
    >> wrong choice.


    Arne Vajhøj wrote:
    > The last is my point.
    >
    > Doing
    > //sometag/someothertag[athirdtag/@someattr='foobar']/afourthtag/text()
    > in SAX would require a lot more code than just a selectSingleNode
    > call.


    But that wouldn't even be SAX - it's an entirely different universe. I know
    that's your point, but it leaves me confused. If you use SAX, there wouldn't
    even be a need to search - everything would already be right where you could
    find it. The whole question of searching would never even come up.

    That is one of the advantages of SAX over DOM. With DOM, you have this huge
    memory structure that you have to search with XPath expressions that are hard
    to figure out and run really slowly. With SAX you read things right into an
    object model where you don't have to look for things, and you can access them
    directly. Searching is irrelevant.

    --
    Lew
    Lew, Nov 13, 2008
    #15
  16. Sigfried

    Sigfried Guest

    Tom Anderson a écrit :
    > On Wed, 12 Nov 2008, Sigfried wrote:
    >
    >> Hi, using a java profiler, i've realized that SAX is consuming too
    >> much time:
    >> - endElement + startElement 40 %
    >> - *.read 7 %
    >> - a few <= 1%
    >>
    >> So SAX take about 50 % of the time !!

    >
    > Which startElement and endElement methods are these? I assume not the
    > ones in the ContentHandler, right?
    >
    >> Do you know faster XML API ?

    >
    > http://www.itu.int/rec/T-REC-X.891-200505-I/en
    > http://java.sun.com/developer/technicalArticles/xml/fastinfoset/
    >
    > Although that's probably not what you meant.


    Your articles did convince me to use a binary format instead of text
    format. But fastinfoset is still close to XML. Since my XML is mostly
    Double.toString / parseDouble, i guess using java serialization would be
    a better (and bigger) step.


    > But seriously, XML isn't fast. Never has been, never will be. If you
    > need fast, don't use XML. Fast XML parsing is like semi racing: even if
    > you win, you're still retarded.


    lol i did knew it for arguing on the internet.
    Sigfried, Nov 13, 2008
    #16
  17. Lew wrote:
    > Lew wrote:
    >>> If there is a need for advanced searching, then perhaps SAX is the
    >>> wrong choice.

    >
    > Arne Vajhøj wrote:
    >> The last is my point.
    >>
    >> Doing
    >> //sometag/someothertag[athirdtag/@someattr='foobar']/afourthtag/text()
    >> in SAX would require a lot more code than just a selectSingleNode
    >> call.

    >
    > But that wouldn't even be SAX - it's an entirely different universe. I
    > know that's your point, but it leaves me confused. If you use SAX,
    > there wouldn't even be a need to search - everything would already be
    > right where you could find it. The whole question of searching would
    > never even come up.
    >
    > That is one of the advantages of SAX over DOM. With DOM, you have this
    > huge memory structure that you have to search with XPath expressions
    > that are hard to figure out and run really slowly. With SAX you read
    > things right into an object model where you don't have to look for
    > things, and you can access them directly. Searching is irrelevant.


    Not necessarily.

    You can use can use SAX to just pick a small subset of the XML as well.

    And have a need to code that "pick".

    Arne
    Arne Vajhøj, Nov 16, 2008
    #17
  18. On 16.11.2008 04:55, Arne Vajhøj wrote:
    > Lew wrote:
    >> Lew wrote:
    >>>> If there is a need for advanced searching, then perhaps SAX is the
    >>>> wrong choice.

    >>
    >> Arne Vajhøj wrote:
    >>> The last is my point.
    >>>
    >>> Doing
    >>> //sometag/someothertag[athirdtag/@someattr='foobar']/afourthtag/text()
    >>> in SAX would require a lot more code than just a selectSingleNode
    >>> call.

    >>
    >> But that wouldn't even be SAX - it's an entirely different universe.
    >> I know that's your point, but it leaves me confused. If you use SAX,
    >> there wouldn't even be a need to search - everything would already be
    >> right where you could find it. The whole question of searching would
    >> never even come up.
    >>
    >> That is one of the advantages of SAX over DOM. With DOM, you have
    >> this huge memory structure that you have to search with XPath
    >> expressions that are hard to figure out and run really slowly. With
    >> SAX you read things right into an object model where you don't have to
    >> look for things, and you can access them directly. Searching is
    >> irrelevant.

    >
    > Not necessarily.
    >
    > You can use can use SAX to just pick a small subset of the XML as well.
    >
    > And have a need to code that "pick".


    I fully agree with Lew: if you have to do XPath like searching on your
    subset you picked the completely wrong data structure for your SAX
    processing.

    If you meant that the subset picking should be done with XPath then you
    have a generic mechanism for which DOM is probably a better choice. If
    your searching requirements are not as broad you can easily create your
    own simplified searching with SAX - and it's still more efficient for
    this than DOM.

    robert
    Robert Klemme, Nov 17, 2008
    #18
  19. Sigfried

    Arne Vajhøj Guest

    Robert Klemme wrote:
    > If you meant that the subset picking should be done with XPath then you
    > have a generic mechanism for which DOM is probably a better choice.


    That was approx. my point.

    Arne
    Arne Vajhøj, Nov 19, 2008
    #19
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Jason K
    Replies:
    6
    Views:
    3,970
    Jeff Flinn
    May 12, 2005
  2. re.I slowness

    , Mar 30, 2006, in forum: Python
    Replies:
    1
    Views:
    336
    Paul McGuire
    Mar 30, 2006
  3. Replies:
    10
    Views:
    537
  4. Replies:
    2
    Views:
    379
  5. Joshua Cranmer

    The myth of Java's slowness

    Joshua Cranmer, Dec 8, 2007, in forum: Java
    Replies:
    15
    Views:
    608
Loading...

Share This Page