parsing XML files with SAX

Discussion in 'XML' started by mike henkins, Jul 23, 2005.

  1. mike henkins

    mike henkins Guest

    hi,

    I've been looking through the various XML parsers API available and I have
    decided to use the SAX parser. Probably not the best of choices but I think
    it can do the job. What is the best way to parse an XML file using the SAX
    parser ? I have seen examples where they store each element tag in java bean
    classes. I am not sure this is a good way for my XML file which looks like
    this:

    <parent>
    <node1>
    <child1>AAA</child1>
    <grandchild1>BBB</grandchild1>
    <grandchild2>
    <anything>CCC</anything>
    </grandchild2>
    <child2>DDD<</child2>
    <child3>DDD<</child3>
    </node1>
    <node2>
    <child1>AAA<</child1>
    <grandchild1>BBB</grandchild1>
    <grandchild2>
    <anything>CCC</anything>
    </grandchild2>
    <child2>DDD<</child2>
    <child3>DDD<</child3>
    </node2>
    </parent>

    I have to get the value of the tag "anything" in node1, node2 etc ..., store
    the value of child3 in a database etc ...

    Does anyone have any experience or advices regarding the fastest way to do
    that using SAX (or any other parser) ?

    Thanks !
     
    mike henkins, Jul 23, 2005
    #1
    1. Advertising

  2. mike henkins

    William Park Guest

    mike henkins <> wrote:
    > hi,
    >
    > I've been looking through the various XML parsers API available and I have
    > decided to use the SAX parser. Probably not the best of choices but I think
    > it can do the job. What is the best way to parse an XML file using the SAX
    > parser ? I have seen examples where they store each element tag in java bean
    > classes. I am not sure this is a good way for my XML file which looks like
    > this:
    >
    > <parent>
    > <node1>
    > <child1>AAA</child1>
    > <grandchild1>BBB</grandchild1>
    > <grandchild2>
    > <anything>CCC</anything>
    > </grandchild2>
    > <child2>DDD<</child2>
    > <child3>DDD<</child3>
    > </node1>
    > <node2>
    > <child1>AAA<</child1>
    > <grandchild1>BBB</grandchild1>
    > <grandchild2>
    > <anything>CCC</anything>
    > </grandchild2>
    > <child2>DDD<</child2>
    > <child3>DDD<</child3>
    > </node2>
    > </parent>
    >
    > I have to get the value of the tag "anything" in node1, node2 etc ..., store
    > the value of child3 in a database etc ...
    >
    > Does anyone have any experience or advices regarding the fastest way to do
    > that using SAX (or any other parser) ?


    Try
    http://home.eol.ca/~parkw/index.html#expat
    which is shell interface to Expat XML parser.

    --
    William Park <>, Toronto, Canada
    ThinFlash: Linux thin-client on USB key (flash) drive
    http://home.eol.ca/~parkw/thinflash.html
    BashDiff: Super Bash shell
    http://freshmeat.net/projects/bashdiff/
     
    William Park, Jul 23, 2005
    #2
    1. Advertising

  3. mike henkins

    Mukul Gandhi Guest

    I would personally prefer DOM parsing in this case. DOM gives us a neat
    object oriented method to read elements and attributes, as well as
    modify the document tree.

    With SAX approach, I'll have to set up the whole parser call back
    infrastructure in my application just to read a single element node,
    which somehow does'nt appeal to me!

    Another advantage with DOM is, that I can easily store element and
    attribute properties in Java beans or in other kind of container
    objects easily. With SAX I'll find that difficult to do.

    I'll prefer SAX, if I have to ready the whole (or nearly whole)
    document serially in one pass.

    Regards,
    Mukul
     
    Mukul Gandhi, Jul 24, 2005
    #3
  4. Mukul Gandhi wrote:

    > With SAX approach, I'll have to set up the whole parser call back
    > infrastructure in my application just to read a single element node,
    > which somehow does'nt appeal to me!


    Is it really so complicated to set up "the whole parser call back
    infrastructure" ? Even in Java this should not be much more text
    than a comparable DOM solution.

    Besides Java, there are scripting languages based upon
    the SAX approach. In these languages, reading a single element node
    can be done with a one-line script. The larger your file is,
    the greater the speed advantage of a SAX-based script.
     
    =?ISO-8859-1?Q?J=FCrgen_Kahrs?=, Jul 24, 2005
    #4
  5. mike henkins

    Mukul Gandhi Guest

    Thanks for telling more about SAX. Which scripting languages have SAX
    bindings? Can you please provide some references?

    Regards,
    Mukul
     
    Mukul Gandhi, Jul 24, 2005
    #5
  6. Mukul Gandhi wrote:

    > Thanks for telling more about SAX. Which scripting languages have SAX
    > bindings? Can you please provide some references?


    GNU Awk and bash have XML extensions which are not
    yet merged into the official source code:

    http://home.vrweb.de/~juergen.kahrs/gawk/XML/
    http://home.eol.ca/~parkw/index.html#expat

    Perl is probably the script language that has
    the longest tradition of supporting XML files.
    Python, Ruby etc. also have some kind of XML
    support. Recently, there has been an ECMA proposal
    for extending JavaScript with functions for processing
    XML data. Use Google to find out more.
     
    =?ISO-8859-1?Q?J=FCrgen_Kahrs?=, Jul 24, 2005
    #6
  7. mike henkins

    Mukul Gandhi Guest

    Seems you have done good work with gawk XML. Very nice.

    Regards,
    Mukul

    J├╝rgen Kahrs wrote:
    > GNU Awk and bash have XML extensions which are not
    > yet merged into the official source code:
    >
    > http://home.vrweb.de/~juergen.kahrs/gawk/XML/
    > http://home.eol.ca/~parkw/index.html#expat
    >
    > Perl is probably the script language that has
    > the longest tradition of supporting XML files.
    > Python, Ruby etc. also have some kind of XML
    > support. Recently, there has been an ECMA proposal
    > for extending JavaScript with functions for processing
    > XML data. Use Google to find out more.
     
    Mukul Gandhi, Jul 25, 2005
    #7
  8. mike henkins

    Jimmy zhang Guest

    http://vtd-xml.sf.net
    "mike henkins" <> wrote in message
    news:42e2b4a5$0$899$...
    > hi,
    >
    > I've been looking through the various XML parsers API available and I have
    > decided to use the SAX parser. Probably not the best of choices but I
    > think
    > it can do the job. What is the best way to parse an XML file using the SAX
    > parser ? I have seen examples where they store each element tag in java
    > bean
    > classes. I am not sure this is a good way for my XML file which looks like
    > this:
    >
    > <parent>
    > <node1>
    > <child1>AAA</child1>
    > <grandchild1>BBB</grandchild1>
    > <grandchild2>
    > <anything>CCC</anything>
    > </grandchild2>
    > <child2>DDD<</child2>
    > <child3>DDD<</child3>
    > </node1>
    > <node2>
    > <child1>AAA<</child1>
    > <grandchild1>BBB</grandchild1>
    > <grandchild2>
    > <anything>CCC</anything>
    > </grandchild2>
    > <child2>DDD<</child2>
    > <child3>DDD<</child3>
    > </node2>
    > </parent>
    >
    > I have to get the value of the tag "anything" in node1, node2 etc ...,
    > store
    > the value of child3 in a database etc ...
    >
    > Does anyone have any experience or advices regarding the fastest way to do
    > that using SAX (or any other parser) ?
    >
    > Thanks !
    >
    >
     
    Jimmy zhang, Jul 26, 2005
    #8
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Per Magnus L?vold
    Replies:
    0
    Views:
    1,385
    Per Magnus L?vold
    Nov 15, 2004
  2. Naren
    Replies:
    0
    Views:
    584
    Naren
    May 11, 2004
  3. Replies:
    2
    Views:
    502
  4. alex masselot
    Replies:
    2
    Views:
    874
    Joseph Kesselman
    Jan 10, 2007
  5. Erik Wasser
    Replies:
    5
    Views:
    463
    Peter J. Holzer
    Mar 5, 2006
Loading...

Share This Page