c++ XML processor class?

Discussion in 'XML' started by Pep, May 18, 2007.

  1. Pep

    Pep Guest

    Hi anyone know of a C++ class capable of parsing a XML stream in to
    elements?

    I have tried using the xerces class but unfortunately this requires me
    to do a lot of complex processing to isolate the elements and their
    attributes and content which I do not want to do.

    I want a class that will parse the XML stream and then allow me to
    iterate the elements recursively, similar to this

    void iterateElements(element)
    {

    for (element.attributes)
    {
    attributePair = element.nextAttribute();
    // do some processing on the attribute pair
    }

    elementPair = element.getContentPair();
    // do some processing on the element content

    for (element.elements)
    {
    iterateElements(element.nextElement()); // recursively call
    this function
    }

    }

    So I would get a key/data pair for each element and for each element
    attribute.

    Here's hoping :)
     
    Pep, May 18, 2007
    #1
    1. Advertising

  2. Pep

    Pavel Lepin Guest

    Pep <> wrote in
    <>:
    > I have tried using the xerces class but unfortunately this
    > requires me to do a lot of complex processing to isolate
    > the elements and their attributes and content which I do
    > not want to do.


    Do you imply xerces-c++ doesn't have a DOM parser? I can
    hardly believe that... Hmm, of course it does:

    http://xml.apache.org/xerces-c/apiDocs/classDOMBuilder.html

    If your problem is that you find DOM API cumbersome, I would
    seriously recommend getting over it. Modules / components /
    class libs for parsing XML using something less elephantine
    than DOM certainly do exist (perl5's XML::Simple comes to
    mind... rather forcefully, in fact), certainly do have
    their uses, but also certainly have a big
    problem--generally, you cannot predict when you are going
    to run into one of their inherent limitations so that your
    project comes to a screeching halt at the worst possible
    moment.

    If your problem is that you need a streaming parser for
    whatever reason, I believe SAX is the only practical
    choice. I've no hands-on experience with SAX parsers, but
    from what I've heard using:

    http://xml.apache.org/xerces-c/apiDocs/classSAXParser.html

    ....should be straightforward enough.

    > elementPair = element.getContentPair();
    > // do some processing on the element content


    Define 'element content'. string(.)? That's, generally
    speaking, is a bit broken. text()? That's not too good
    either. *? Then you don't need all that nonsense with
    processing 'next element' recursively.

    > for (element.elements)
    > {
    > iterateElements(element.nextElement()); //
    > recursively call this function


    Define 'next element'. following-sibling::*[1]?
    following::*[1]? (Hint: in this case you lose important
    information about the document.)

    > So I would get a key/data pair for each element and for
    > each element attribute.


    'Key/data pair' in element context sounds fishy to me since
    you seem to imply--correct me if I'm wrong--that 'data'
    would be primitive, and not a tree (which it is in
    practice).

    --
    Pavel Lepin
     
    Pavel Lepin, May 18, 2007
    #2
    1. Advertising

  3. Pep

    Guest

    On 18 May, 10:17, Pep <> wrote:
    > Hi anyone know of a C++ class capable of parsing a XML stream in to
    > elements?
    >
    > I have tried using the xerces class but unfortunately this requires me
    > to do a lot of complex processing to isolate the elements and their
    > attributes and content which I do not want to do.
    >
    > I want a class that will parse the XML stream and then allow me to
    > iterate the elements recursively, similar to this
    >
    > void iterateElements(element)
    > {
    >
    > for (element.attributes)
    > {
    > attributePair = element.nextAttribute();
    > // do some processing on the attribute pair
    > }
    >
    > elementPair = element.getContentPair();
    > // do some processing on the element content
    >
    > for (element.elements)
    > {
    > iterateElements(element.nextElement()); // recursively call
    > this function
    > }
    >
    > }
    >
    > So I would get a key/data pair for each element and for each element
    > attribute.
    >
    > Here's hoping :)


    It looks like you're looking for a pull-parser.

    The Microsoft XML-lite C++ parser (http://msdn2.microsoft.com/en-us/
    library/ms752838.aspx) is such a parser, although it's only available
    as a DLL and hence it may not be appropriate for you. I don't think
    it supports validation against a schema, but I could be wrong.

    libxml2 (http://xmlsoft.org/) also has such a parser, but written in
    C. This has source code available (I think under MIT license, but
    you'd best check if you're interested). I believe this can validate
    against a schema if needed.

    StAX (as opposed to SAX) is a specification that defines a pull-
    parser. But I'm not sure how well implementations conform to the
    definition. However, searching for something like "C++ StAX" might
    yield additional results.

    HTH,

    Pete.
    =============================================
    Pete Cordell
    Tech-Know-Ware Ltd
    for XML Schema to C++ data binding visit
    http://www.tech-know-ware.com/lmx/
    http://www.codalogic.com/lmx/
    =============================================
     
    , May 18, 2007
    #3
  4. Pep

    Pep Guest

    Pavel Lepin wrote:
    > Pep <> wrote in
    > <>:
    > > I have tried using the xerces class but unfortunately this
    > > requires me to do a lot of complex processing to isolate
    > > the elements and their attributes and content which I do
    > > not want to do.

    >
    > Do you imply xerces-c++ doesn't have a DOM parser? I can
    > hardly believe that... Hmm, of course it does:
    >
    > http://xml.apache.org/xerces-c/apiDocs/classDOMBuilder.html
    >
    > If your problem is that you find DOM API cumbersome, I would
    > seriously recommend getting over it. Modules / components /
    > class libs for parsing XML using something less elephantine
    > than DOM certainly do exist (perl5's XML::Simple comes to
    > mind... rather forcefully, in fact), certainly do have
    > their uses, but also certainly have a big
    > problem--generally, you cannot predict when you are going
    > to run into one of their inherent limitations so that your
    > project comes to a screeching halt at the worst possible
    > moment.
    >
    > If your problem is that you need a streaming parser for
    > whatever reason, I believe SAX is the only practical
    > choice. I've no hands-on experience with SAX parsers, but
    > from what I've heard using:
    >
    > http://xml.apache.org/xerces-c/apiDocs/classSAXParser.html
    >
    > ...should be straightforward enough.
    >
    > > elementPair = element.getContentPair();
    > > // do some processing on the element content

    >
    > Define 'element content'. string(.)? That's, generally
    > speaking, is a bit broken. text()? That's not too good
    > either. *? Then you don't need all that nonsense with
    > processing 'next element' recursively.
    >
    > > for (element.elements)
    > > {
    > > iterateElements(element.nextElement()); //
    > > recursively call this function

    >
    > Define 'next element'. following-sibling::*[1]?
    > following::*[1]? (Hint: in this case you lose important
    > information about the document.)
    >
    > > So I would get a key/data pair for each element and for
    > > each element attribute.

    >
    > 'Key/data pair' in element context sounds fishy to me since
    > you seem to imply--correct me if I'm wrong--that 'data'
    > would be primitive, and not a tree (which it is in
    > practice).
    >
    > --
    > Pavel Lepin


    Erm, I think you miss the point here.

    No I'm not implying or suggesting that xerces does not have a dom
    parser, rather I don't see a easy way of traversing a tree with it and
    I admit this may well be my inexperience with the library.

    As for you ripping apart what is obviously pseudo code supplied by me
    to illustrate the simple task I want to perform, I don't get your
    point. Irrespective of whether the data is in a tree format or not,
    xml does indeed have data in the form of key pairs and it is simply
    the key pairs I want to deal with not the whole tree structure.

    As it happens I have now looked at the libxml2 class and found i can
    quickly traverse the tree in a less complex manner than I had to
    follow with the xerces library, though this is probably because the
    documentation is slightly better.

    So in using the libxml2 class I can quickly get to the data that I
    want which is in a crude key/pair format i.e.

    <Cat ID="1" >
    <CatName>Models</CatName>
    </Cat>

    Which crudely gives key pair ID:1 from the <Cat> element and
    text:Models from the <CatName> element. Admittedly I have to do a
    little processing in order to derive the key/pair data entities I want
    but I get the end result.

    So like i said, I don't see your point in trying to analyse someones
    pseudo code with the attempt to imply the notation of key/pair as
    being "fishy"?

    Still thanks anyway ;)
     
    Pep, May 18, 2007
    #4
  5. Pep

    Pep Guest

    wrote:
    > On 18 May, 10:17, Pep <> wrote:
    > > Hi anyone know of a C++ class capable of parsing a XML stream in to
    > > elements?
    > >
    > > I have tried using the xerces class but unfortunately this requires me
    > > to do a lot of complex processing to isolate the elements and their
    > > attributes and content which I do not want to do.
    > >
    > > I want a class that will parse the XML stream and then allow me to
    > > iterate the elements recursively, similar to this
    > >
    > > void iterateElements(element)
    > > {
    > >
    > > for (element.attributes)
    > > {
    > > attributePair = element.nextAttribute();
    > > // do some processing on the attribute pair
    > > }
    > >
    > > elementPair = element.getContentPair();
    > > // do some processing on the element content
    > >
    > > for (element.elements)
    > > {
    > > iterateElements(element.nextElement()); // recursively call
    > > this function
    > > }
    > >
    > > }
    > >
    > > So I would get a key/data pair for each element and for each element
    > > attribute.
    > >
    > > Here's hoping :)

    >
    > It looks like you're looking for a pull-parser.
    >
    > The Microsoft XML-lite C++ parser (http://msdn2.microsoft.com/en-us/
    > library/ms752838.aspx) is such a parser, although it's only available
    > as a DLL and hence it may not be appropriate for you. I don't think
    > it supports validation against a schema, but I could be wrong.
    >
    > libxml2 (http://xmlsoft.org/) also has such a parser, but written in
    > C. This has source code available (I think under MIT license, but
    > you'd best check if you're interested). I believe this can validate
    > against a schema if needed.
    >
    > StAX (as opposed to SAX) is a specification that defines a pull-
    > parser. But I'm not sure how well implementations conform to the
    > definition. However, searching for something like "C++ StAX" might
    > yield additional results.
    >
    > HTH,
    >
    > Pete.
    > =============================================
    > Pete Cordell
    > Tech-Know-Ware Ltd
    > for XML Schema to C++ data binding visit
    > http://www.tech-know-ware.com/lmx/
    > http://www.codalogic.com/lmx/
    > =============================================


    Hey thanks Pete, a pull-parser is definitely what I want although I
    was not aware of the correct terminology here.

    I have since my OP, looked at libxml2 and adopted it's use. Which is
    great as it is C compliant and therefor C++ compliant by default and
    although I did not mention the architecture requirement, is nix
    compatible so it ticks all the boxes.

    So now I am trundling through the documentation and sample program to
    quickly develop the tool I need.

    Thanks again,
    Pep.
     
    Pep, May 18, 2007
    #5
  6. Pep wrote:

    > So I would get a key/data pair for each element and for each element
    > attribute.


    Did you consider a scripting language ?
    You said you wanted to simply pull one element
    after the other and also look at the attributes.

    http://home.vrweb.de/~juergen.kahrs/gawk/XML/xmlgawk.html#Printing-an-outline-of-an-XML-file

    This script reads one element after the other and
    simply prints an outline:

    @load xml
    XMLSTARTELEM {
    printf("%*s%s", 2*XMLDEPTH-2, "", XMLSTARTELEM)
    for (i=1; i<=NF; i++)
    printf(" %s='%s'", $i, XMLATTR[$i])
    print ""
    }

    That's all.
     
    =?ISO-8859-1?Q?J=FCrgen_Kahrs?=, May 18, 2007
    #6
  7. Pep

    Pavel Lepin Guest

    Pep <> wrote in
    <>:
    > Pavel Lepin wrote:
    >> Pep <> wrote in
    >> <>:
    >> > I have tried using the xerces class but unfortunately
    >> > this requires me to do a lot of complex processing to
    >> > isolate the elements and their attributes and content
    >> > which I do not want to do.

    >>
    >> Do you imply xerces-c++ doesn't have a DOM parser? I can
    >> hardly believe that... Hmm, of course it does:
    >>
    >> If your problem is that you find DOM API cumbersome, I
    >> would seriously recommend getting over it.
    >>
    >> If your problem is that you need a streaming parser for
    >> whatever reason, I believe SAX is the only practical
    >> choice.

    >
    > Erm, I think you miss the point here.


    That's what I thought, because I couldn't really see what
    your problem was...

    > No I'm not implying or suggesting that xerces does not
    > have a dom parser, rather I don't see a easy way of
    > traversing a tree with it and I admit this may well be my
    > inexperience with the library.


    ....on the other hand, maybe not. Is there any specific
    problem you're having with DOM tree traversal as
    implemented in xerces-c++? As I said, DOM might *seem* a
    bit cumbersome, and, well, I suppose it *is* a bit on the
    cumbersome side, but can you be a bit more specific on what
    gives you trouble with traversing the tree?

    > As for you ripping apart what is obviously pseudo code
    > supplied by me to illustrate the simple task I want to
    > perform, I don't get your point.


    My point wasn't really anything about your pseudo-code, but
    rather that I perceive a problem with your way of thinking
    about XML processing. Naturally, I might be mistaken, my
    opinion being based solely on the code and comments you
    posted...

    > Irrespective of whether the data is in a tree format or
    > not, xml does indeed have data in the form of key pairs
    > and it is simply the key pairs I want to deal with not the
    > whole tree structure.


    There's no 'whether'. Any XML document represents a tree.
    You could, indeed, say that nodes are 'key-data' pairs, but
    only if you fully understand that in case of element
    nodes 'data' is always a list of nodes. Now that I think
    about it, there are no explicit keys, so you couldn't even
    say that.

    Okay, I guess I just might be on the wrong level of
    abstraction here and that causes misunderstanding. If
    you're talking about documents similar to:

    <document>
    <data key="foo">bar</data>
    <data key="baz">quux</data>
    <etc/>
    </document>

    ....then my point would be that you probably don't need
    actual traversal anymore as soon as you reach one of
    the 'data' elements. getAttributeNS() and getTextContent()
    should do anyway, since you would know the semantics of
    data elements.

    > As it happens I have now looked at the libxml2 class and
    > found i can quickly traverse the tree in a less complex
    > manner than I had to follow with the xerces library,
    > though this is probably because the documentation is
    > slightly better.


    Whatever works for you. libxml2 is certainly workable, and I
    don't believe there are any significant limitations. There
    are just two points against it I think: it doesn't
    implement the W3C DOM API (although I think there was an
    adapter of sorts, developer separately from libxml2 itself)
    and it's written in C (but that's probably irrelevant in
    your case).

    > So in using the libxml2 class I can quickly get to the
    > data that I want which is in a crude key/pair format i.e.
    >
    > <Cat ID="1" >
    > <CatName>Models</CatName>
    > </Cat>


    Oh yeah, I thought I was missing something. Wrong level of
    abstraction. I thought you were perceiving nodes themselves
    as key-value pairs.

    > Which crudely gives key pair ID:1 from the <Cat> element
    > and text:Models from the <CatName> element. Admittedly I
    > have to do a little processing in order to derive the
    > key/pair data entities I want but I get the end result.


    Well, it would work the same way with xerces-c++. I suppose
    libxml2 is a bit more light-weight, but in my eyes that is
    offset by it being non-standard. YMMV.

    > So like i said, I don't see your point in trying to
    > analyse someones pseudo code with the attempt to imply the
    > notation of key/pair as being "fishy"?


    If you *represent* key-value pairs in XML that is perfectly
    okay I suppose. What I was objecting to was perceiving
    nodes as key-value pairs. Just a bit of misunderstanding,
    as I said.

    --
    Pavel Lepin
     
    Pavel Lepin, May 18, 2007
    #7
  8. Hi,

    Pep <> writes:

    > So in using the libxml2 class I can quickly get to the data that I
    > want which is in a crude key/pair format i.e.
    >
    > <Cat ID="1" >
    > <CatName>Models</CatName>
    > </Cat>


    If all you need is to get the data stored in XML then a data
    binding approach may be an easy solution. In short you will
    have C++ classes generated that model your XML and which you
    can use to get to the data in a more convenient way:

    class Cat
    {
    int ID () const;
    string CatName () const;
    };

    Cat c = cat ("cat.xml");

    cout << c.ID () << " " << c.CatName () << endl;


    The following article provide a quick intro to XML data binding in
    C++:

    http://www.artima.com/cppsource/xml_data_binding.html


    hth,
    -boris
    --
    Boris Kolpackov
    Code Synthesis Tools CC
    http://www.codesynthesis.com
    Open-Source, Cross-Platform C++ XML Data Binding
     
    Boris Kolpackov, May 18, 2007
    #8
  9. Pep

    Pep Guest

    J├╝rgen Kahrs wrote:
    > Pep wrote:
    >
    > > So I would get a key/data pair for each element and for each element
    > > attribute.

    >
    > Did you consider a scripting language ?
    > You said you wanted to simply pull one element
    > after the other and also look at the attributes.
    >
    > http://home.vrweb.de/~juergen.kahrs/gawk/XML/xmlgawk.html#Printing-an-outline-of-an-XML-file
    >
    > This script reads one element after the other and
    > simply prints an outline:
    >
    > @load xml
    > XMLSTARTELEM {
    > printf("%*s%s", 2*XMLDEPTH-2, "", XMLSTARTELEM)
    > for (i=1; i<=NF; i++)
    > printf(" %s='%s'", $i, XMLATTR[$i])
    > print ""
    > }
    >
    > That's all.


    Thanks, it looks interesting but unfortunately I have to do this as
    part of a c++ library, so scripting is not an option for me.
     
    Pep, May 21, 2007
    #9
  10. Pep

    Pep Guest

    Boris Kolpackov wrote:
    > Hi,
    >
    > Pep <> writes:
    >
    > > So in using the libxml2 class I can quickly get to the data that I
    > > want which is in a crude key/pair format i.e.
    > >
    > > <Cat ID="1" >
    > > <CatName>Models</CatName>
    > > </Cat>

    >
    > If all you need is to get the data stored in XML then a data
    > binding approach may be an easy solution. In short you will
    > have C++ classes generated that model your XML and which you
    > can use to get to the data in a more convenient way:
    >
    > class Cat
    > {
    > int ID () const;
    > string CatName () const;
    > };
    >
    > Cat c = cat ("cat.xml");
    >
    > cout << c.ID () << " " << c.CatName () << endl;
    >
    >
    > The following article provide a quick intro to XML data binding in
    > C++:
    >
    > http://www.artima.com/cppsource/xml_data_binding.html
    >
    >
    > hth,
    > -boris
    > --
    > Boris Kolpackov
    > Code Synthesis Tools CC
    > http://www.codesynthesis.com
    > Open-Source, Cross-Platform C++ XML Data Binding


    Thanks Boris, I have found a solution to my problem using libxml2 but
    as always, I am now interested in XML as I have to use it now, so I
    will look in to URI you posted.
     
    Pep, May 21, 2007
    #10
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. RC
    Replies:
    6
    Views:
    645
    Martin Honnen
    Apr 22, 2005
  2. E11
    Replies:
    1
    Views:
    4,789
    Thomas Weidenfeller
    Oct 12, 2005
  3. Hatem KNANI
    Replies:
    2
    Views:
    657
    Morphon Technologies
    Aug 4, 2003
  4. sylvain.loiseau

    XML processor treatement of entities

    sylvain.loiseau, Apr 17, 2004, in forum: XML
    Replies:
    1
    Views:
    359
    Richard Tobin
    Apr 17, 2004
  5. brahatha
    Replies:
    1
    Views:
    661
Loading...

Share This Page