parsing XML with 'expat'

Discussion in 'XML' started by Bjoern Hoehrmann, Aug 20, 2007.

  1. * Roman Mashak wrote in comp.text.xml:
    >I hope this might be the right group to ask. I need to parse out in C
    >language the XML of the following structure:
    >
    ><BERTEST>
    > <NODE1>
    > <FREQ>666000000</FREQ>
    > <POWER>-82</POWER>
    > </NODE1>
    > <NODE1>
    > <FREQ>484000000</FREQ>
    > <POWER>-80</POWER>
    > </NODE2>
    ></BERTEST>
    >
    >So I took the 'expat' library to do that (I've never dealt with XML before
    >though), and tried to cutomize the example they ship with library
    >(outline.c). What I can't quite understand is:
    >1) is my XML really can be called XML, or it's some sort of invalid?
    >According to wikipedia page on XML, the valid document should look like
    >this:
    >
    ><name attribute="value">content</name>
    >
    >while mine is a bit different


    Your second <NODE1> should probably be <NODE2> (otherwise the start- and
    end-tags do not match up), but other than that it certainly is XML. You
    are free to choose (when designing a new XML format) whether you use an
    attribute or element to encode some information.

    >2) if anyway my xml document is correct, then how can I parse it with expat?
    >What I need is upon occurences of FREQ and POWER tags to extract their
    >values (i.e. 666000000 for FREQ or 082 for POWER in the above example).
    >
    >So, I think I need to register callback function for start tags and try to
    >do what I want in there. But how can I get the values of tags, which 'expat'
    >functions to use? Or there's another, more simple way?


    Expart reports the text through the `characters` callback. You have to
    setup a handler for it and accumulate the text reported to it; then
    process the text e.g. in the end_element handler. There is no direct way
    get to the text when using Expat.
    --
    Björn Höhrmann · mailto: · http://bjoern.hoehrmann.de
    Weinh. Str. 22 · Telefon: +49(0)621/4309674 · http://www.bjoernsworld.de
    68309 Mannheim · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/
     
    Bjoern Hoehrmann, Aug 20, 2007
    #1
    1. Advertising

  2. Roman Mashak wrote:

    > 2) if anyway my xml document is correct, then how can I parse it with expat?
    > What I need is upon occurences of FREQ and POWER tags to extract their
    > values (i.e. 666000000 for FREQ or 082 for POWER in the above example).


    What do you do with the extracted text ?
    Do you put it into a text file for further
    processing ? Then you can use any scripting
    languag that is easier to use than the Expat
    at C level.

    For example, you can try this one:

    http://home.vrweb.de/~juergen.kahrs/gawk/XML/xmlgawk.html#Printing-an-outline-of-an-XML-file

    @load xml
    XMLSTARTELEM {
    printf("%*s%s", 2*XMLDEPTH-2, "", XMLSTARTELEM)
    for (i=1; i<=NF; i++)
    printf(" %s='%s'", $i, XMLATTR[$i])
    print ""
    }

    This script does exactly what the outline.c example
    from Expat does.
     
    =?ISO-8859-1?Q?J=FCrgen_Kahrs?=, Aug 20, 2007
    #2
    1. Advertising

  3. Bjoern Hoehrmann

    Roman Mashak Guest

    Hello,

    I hope this might be the right group to ask. I need to parse out in C
    language the XML of the following structure:

    <BERTEST>
    <NODE1>
    <FREQ>666000000</FREQ>
    <POWER>-82</POWER>
    </NODE1>
    <NODE1>
    <FREQ>484000000</FREQ>
    <POWER>-80</POWER>
    </NODE2>
    </BERTEST>

    So I took the 'expat' library to do that (I've never dealt with XML before
    though), and tried to cutomize the example they ship with library
    (outline.c). What I can't quite understand is:
    1) is my XML really can be called XML, or it's some sort of invalid?
    According to wikipedia page on XML, the valid document should look like
    this:

    <name attribute="value">content</name>

    while mine is a bit different

    2) if anyway my xml document is correct, then how can I parse it with expat?
    What I need is upon occurences of FREQ and POWER tags to extract their
    values (i.e. 666000000 for FREQ or 082 for POWER in the above example).

    So, I think I need to register callback function for start tags and try to
    do what I want in there. But how can I get the values of tags, which 'expat'
    functions to use? Or there's another, more simple way?

    Thanks in advance

    --
    Best regards, Roman
     
    Roman Mashak, Aug 20, 2007
    #3
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Fabian Kr?ger
    Replies:
    0
    Views:
    437
    Fabian Kr?ger
    Nov 19, 2003
  2. Thomas Guettler

    xml.parsers.expat vs. xml.sax

    Thomas Guettler, Apr 27, 2004, in forum: Python
    Replies:
    2
    Views:
    933
    Martijn Faassen
    Apr 27, 2004
  3. sharan
    Replies:
    1
    Views:
    739
    Pavel Lepin
    Oct 26, 2007
  4. aha
    Replies:
    2
    Views:
    512
    Stefan Behnel
    Jan 23, 2009
  5. kj
    Replies:
    2
    Views:
    297
Loading...

Share This Page