Element name length & performance implications

Discussion in 'XML' started by Tom Kerigan, Oct 25, 2005.

  1. Tom  Kerigan

    Tom Kerigan Guest

    I know that longer element names increase the size of an XML document,
    ultimately resulting in a larger amount of data at parse-time. Is there
    anything else, specifically related to an element name and its length,
    that can impact the performance of an XML parser?

    The bulk of our XML parsing uses the latest and greatest version of
    Apache Xerces.
     
    Tom Kerigan, Oct 25, 2005
    #1
    1. Advertising

  2. Tom Kerigan wrote:

    > ultimately resulting in a larger amount of data at parse-time. Is there
    > anything else, specifically related to an element name and its length,
    > that can impact the performance of an XML parser?


    The number of attributes of an element may have
    an influence. I have seen a parser (I think it
    was xmllint) which seemed to have runtime O(n^2)
    where n=number of attributes. This became unbearable
    in some unlikely situations (more than 1000 attributes).
     
    =?ISO-8859-1?Q?J=FCrgen_Kahrs?=, Oct 25, 2005
    #2
    1. Advertising

  3. Tom  Kerigan

    Peter Flynn Guest

    Jürgen Kahrs wrote:

    > Tom Kerigan wrote:
    >
    >> I know that longer element names increase the size of an XML document,
    >> ultimately resulting in a larger amount of data at parse-time.


    I'm not sure that that in itself would significantly affect the performance,
    as reading bytes (which is all it's doing at that stage) is a relatively
    low-level occupation. If the lexer is tokenising element type names and
    storing them in some array-like data structure, big names will affect I/O
    but not much else. But I'm happy to be proved wrong on that.

    >> Is there
    >> anything else, specifically related to an element name and its length,
    >> that can impact the performance of an XML parser?


    Depth can have an effect, especially in mixed content. I have relatively
    small documents (4-5Mb) which are marked up very densely in TEI, with
    deeply-nested structures such as variant readings of a manuscript or
    linguistic (part-of-speech) markup in mixed content such that the character
    data can be 15-20 levels below the root element. Nevertheless, onsgmls
    rips through these in 5-8 seconds on a Dell 4150 running FC4/KDE/Emacs.

    I have seen some truly ludicrous examples of data-oriented e-commerce XML
    with element type names machine-generated from concatenated
    database-table-field-relation[-field-relation]*-value names which ran to
    400-500 characters, but the files were very small (40-50kb) so I'm not
    sure what effect the names had on the parser (apart from the initial I/O).

    > The number of attributes of an element may have
    > an influence. I have seen a parser (I think it
    > was xmllint) which seemed to have runtime O(n^2)
    > where n=number of attributes. This became unbearable
    > in some unlikely situations (more than 1000 attributes).


    Number of attributes could probably affect it, but anyone who "designs"
    a document type with elements bearing 1000 attributes deserves all they
    get, IMHO.

    ///Peter
    --
    XML FAQ: http://xml.silmaril.ie/
     
    Peter Flynn, Oct 25, 2005
    #3
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Mark
    Replies:
    0
    Views:
    376
  2. leo
    Replies:
    8
    Views:
    388
    Tom Anderson
    Oct 5, 2005
  3. Replies:
    2
    Views:
    358
    Steve C. Orr [MVP, MCSD]
    Nov 11, 2006
  4. GreenLight
    Replies:
    3
    Views:
    200
    Anno Siegel
    May 1, 2004
  5. okey
    Replies:
    2
    Views:
    168
    David Mark
    May 24, 2009
Loading...

Share This Page