c++ XML processor class?

Pep · May 18, 2007

Hi anyone know of a C++ class capable of parsing a XML stream in to
elements?

I have tried using the xerces class but unfortunately this requires me
to do a lot of complex processing to isolate the elements and their
attributes and content which I do not want to do.

I want a class that will parse the XML stream and then allow me to
iterate the elements recursively, similar to this

void iterateElements(element)
{

for (element.attributes)
{
attributePair = element.nextAttribute();
// do some processing on the attribute pair
}

elementPair = element.getContentPair();
// do some processing on the element content

for (element.elements)
{
iterateElements(element.nextElement()); // recursively call
this function
}

}

So I would get a key/data pair for each element and for each element
attribute.

Here's hoping

Pavel Lepin · May 18, 2007

Pep said:
I have tried using the xerces class but unfortunately this
requires me to do a lot of complex processing to isolate
the elements and their attributes and content which I do
not want to do.

Do you imply xerces-c++ doesn't have a DOM parser? I can
hardly believe that... Hmm, of course it does:

http://xml.apache.org/xerces-c/apiDocs/classDOMBuilder.html

If your problem is that you find DOM API cumbersome, I would
seriously recommend getting over it. Modules / components /
class libs for parsing XML using something less elephantine
than DOM certainly do exist (perl5's XML::Simple comes to
mind... rather forcefully, in fact), certainly do have
their uses, but also certainly have a big
problem--generally, you cannot predict when you are going
to run into one of their inherent limitations so that your
project comes to a screeching halt at the worst possible
moment.

If your problem is that you need a streaming parser for
whatever reason, I believe SAX is the only practical
choice. I've no hands-on experience with SAX parsers, but
from what I've heard using:

http://xml.apache.org/xerces-c/apiDocs/classSAXParser.html

....should be straightforward enough.

elementPair = element.getContentPair();
// do some processing on the element content

Define 'element content'. string(.)? That's, generally
speaking, is a bit broken. text()? That's not too good
either. *? Then you don't need all that nonsense with
processing 'next element' recursively.

for (element.elements)
{
iterateElements(element.nextElement()); //
recursively call this function

Define 'next element'. following-sibling::*[1]?
following::*[1]? (Hint: in this case you lose important
information about the document.)

So I would get a key/data pair for each element and for
each element attribute.

'Key/data pair' in element context sounds fishy to me since
you seem to imply--correct me if I'm wrong--that 'data'
would be primitive, and not a tree (which it is in
practice).

usenet · May 18, 2007

Hi anyone know of a C++ class capable of parsing a XML stream in to
elements?

I have tried using the xerces class but unfortunately this requires me
to do a lot of complex processing to isolate the elements and their
attributes and content which I do not want to do.

I want a class that will parse the XML stream and then allow me to
iterate the elements recursively, similar to this

void iterateElements(element)
{

for (element.attributes)
{
attributePair = element.nextAttribute();
// do some processing on the attribute pair
}

elementPair = element.getContentPair();
// do some processing on the element content

for (element.elements)
{
iterateElements(element.nextElement()); // recursively call
this function
}

}

So I would get a key/data pair for each element and for each element
attribute.

Here's hoping

It looks like you're looking for a pull-parser.

The Microsoft XML-lite C++ parser (http://msdn2.microsoft.com/en-us/
library/ms752838.aspx) is such a parser, although it's only available
as a DLL and hence it may not be appropriate for you. I don't think
it supports validation against a schema, but I could be wrong.

libxml2 (http://xmlsoft.org/) also has such a parser, but written in
C. This has source code available (I think under MIT license, but
you'd best check if you're interested). I believe this can validate
against a schema if needed.

StAX (as opposed to SAX) is a specification that defines a pull-
parser. But I'm not sure how well implementations conform to the
definition. However, searching for something like "C++ StAX" might
yield additional results.

HTH,

Pete.
=============================================
Pete Cordell
Tech-Know-Ware Ltd
for XML Schema to C++ data binding visit
http://www.tech-know-ware.com/lmx/
http://www.codalogic.com/lmx/
=============================================

Pep · May 18, 2007

Pavel said:
Pep said:

I have tried using the xerces class but unfortunately this
requires me to do a lot of complex processing to isolate
the elements and their attributes and content which I do
not want to do.

Click to expand...

Do you imply xerces-c++ doesn't have a DOM parser? I can
hardly believe that... Hmm, of course it does:

http://xml.apache.org/xerces-c/apiDocs/classDOMBuilder.html

If your problem is that you find DOM API cumbersome, I would
seriously recommend getting over it. Modules / components /
class libs for parsing XML using something less elephantine
than DOM certainly do exist (perl5's XML::Simple comes to
mind... rather forcefully, in fact), certainly do have
their uses, but also certainly have a big
problem--generally, you cannot predict when you are going
to run into one of their inherent limitations so that your
project comes to a screeching halt at the worst possible
moment.

If your problem is that you need a streaming parser for
whatever reason, I believe SAX is the only practical
choice. I've no hands-on experience with SAX parsers, but
from what I've heard using:

http://xml.apache.org/xerces-c/apiDocs/classSAXParser.html

...should be straightforward enough.

elementPair = element.getContentPair();
// do some processing on the element content

Click to expand...

Define 'element content'. string(.)? That's, generally
speaking, is a bit broken. text()? That's not too good
either. *? Then you don't need all that nonsense with
processing 'next element' recursively.

for (element.elements)
{
iterateElements(element.nextElement()); //
recursively call this function

Click to expand...

Define 'next element'. following-sibling::*[1]?
following::*[1]? (Hint: in this case you lose important
information about the document.)

So I would get a key/data pair for each element and for
each element attribute.

Click to expand...

'Key/data pair' in element context sounds fishy to me since
you seem to imply--correct me if I'm wrong--that 'data'
would be primitive, and not a tree (which it is in
practice).

Erm, I think you miss the point here.

No I'm not implying or suggesting that xerces does not have a dom
parser, rather I don't see a easy way of traversing a tree with it and
I admit this may well be my inexperience with the library.

As for you ripping apart what is obviously pseudo code supplied by me
to illustrate the simple task I want to perform, I don't get your
point. Irrespective of whether the data is in a tree format or not,
xml does indeed have data in the form of key pairs and it is simply
the key pairs I want to deal with not the whole tree structure.

As it happens I have now looked at the libxml2 class and found i can
quickly traverse the tree in a less complex manner than I had to
follow with the xerces library, though this is probably because the
documentation is slightly better.

So in using the libxml2 class I can quickly get to the data that I
want which is in a crude key/pair format i.e.

<Cat ID="1" >
<CatName>Models</CatName>
</Cat>

Which crudely gives key pair ID:1 from the <Cat> element and
text:Models from the <CatName> element. Admittedly I have to do a
little processing in order to derive the key/pair data entities I want
but I get the end result.

So like i said, I don't see your point in trying to analyse someones
pseudo code with the attempt to imply the notation of key/pair as
being "fishy"?

Still thanks anyway

Pep · May 18, 2007

[email protected] said:
It looks like you're looking for a pull-parser.

The Microsoft XML-lite C++ parser (http://msdn2.microsoft.com/en-us/
library/ms752838.aspx) is such a parser, although it's only available
as a DLL and hence it may not be appropriate for you. I don't think
it supports validation against a schema, but I could be wrong.

libxml2 (http://xmlsoft.org/) also has such a parser, but written in
C. This has source code available (I think under MIT license, but
you'd best check if you're interested). I believe this can validate
against a schema if needed.

StAX (as opposed to SAX) is a specification that defines a pull-
parser. But I'm not sure how well implementations conform to the
definition. However, searching for something like "C++ StAX" might
yield additional results.

HTH,

Pete.
=============================================
Pete Cordell
Tech-Know-Ware Ltd
for XML Schema to C++ data binding visit
http://www.tech-know-ware.com/lmx/
http://www.codalogic.com/lmx/
=============================================

Hey thanks Pete, a pull-parser is definitely what I want although I
was not aware of the correct terminology here.

I have since my OP, looked at libxml2 and adopted it's use. Which is
great as it is C compliant and therefor C++ compliant by default and
although I did not mention the architecture requirement, is nix
compatible so it ticks all the boxes.

So now I am trundling through the documentation and sample program to
quickly develop the tool I need.

Thanks again,
Pep.

=?ISO-8859-1?Q?J=FCrgen_Kahrs?= · May 18, 2007

Pep said:
So I would get a key/data pair for each element and for each element
attribute.

Did you consider a scripting language ?
You said you wanted to simply pull one element
after the other and also look at the attributes.

http://home.vrweb.de/~juergen.kahrs/gawk/XML/xmlgawk.html#Printing-an-outline-of-an-XML-file

This script reads one element after the other and
simply prints an outline:

@load xml
XMLSTARTELEM {
printf("%*s%s", 2*XMLDEPTH-2, "", XMLSTARTELEM)
for (i=1; i<=NF; i++)
printf(" %s='%s'", $i, XMLATTR[$i])
print ""
}

That's all.

Pavel Lepin · May 18, 2007

Pep said:
Erm, I think you miss the point here.

That's what I thought, because I couldn't really see what
your problem was...

No I'm not implying or suggesting that xerces does not
have a dom parser, rather I don't see a easy way of
traversing a tree with it and I admit this may well be my
inexperience with the library.

....on the other hand, maybe not. Is there any specific
problem you're having with DOM tree traversal as
implemented in xerces-c++? As I said, DOM might *seem* a
bit cumbersome, and, well, I suppose it *is* a bit on the
cumbersome side, but can you be a bit more specific on what
gives you trouble with traversing the tree?

As for you ripping apart what is obviously pseudo code
supplied by me to illustrate the simple task I want to
perform, I don't get your point.

My point wasn't really anything about your pseudo-code, but
rather that I perceive a problem with your way of thinking
about XML processing. Naturally, I might be mistaken, my
opinion being based solely on the code and comments you
posted...

Irrespective of whether the data is in a tree format or
not, xml does indeed have data in the form of key pairs
and it is simply the key pairs I want to deal with not the
whole tree structure.

There's no 'whether'. Any XML document represents a tree.
You could, indeed, say that nodes are 'key-data' pairs, but
only if you fully understand that in case of element
nodes 'data' is always a list of nodes. Now that I think
about it, there are no explicit keys, so you couldn't even
say that.

Okay, I guess I just might be on the wrong level of
abstraction here and that causes misunderstanding. If
you're talking about documents similar to:

<document>
<data key="foo">bar</data>
<data key="baz">quux</data>
<etc/>
</document>

....then my point would be that you probably don't need
actual traversal anymore as soon as you reach one of
the 'data' elements. getAttributeNS() and getTextContent()
should do anyway, since you would know the semantics of
data elements.

As it happens I have now looked at the libxml2 class and
found i can quickly traverse the tree in a less complex
manner than I had to follow with the xerces library,
though this is probably because the documentation is
slightly better.

Whatever works for you. libxml2 is certainly workable, and I
don't believe there are any significant limitations. There
are just two points against it I think: it doesn't
implement the W3C DOM API (although I think there was an
adapter of sorts, developer separately from libxml2 itself)
and it's written in C (but that's probably irrelevant in
your case).

So in using the libxml2 class I can quickly get to the
data that I want which is in a crude key/pair format i.e.

<Cat ID="1" >
<CatName>Models</CatName>
</Cat>

Oh yeah, I thought I was missing something. Wrong level of
abstraction. I thought you were perceiving nodes themselves
as key-value pairs.

Which crudely gives key pair ID:1 from the <Cat> element
and text:Models from the <CatName> element. Admittedly I
have to do a little processing in order to derive the
key/pair data entities I want but I get the end result.

Well, it would work the same way with xerces-c++. I suppose
libxml2 is a bit more light-weight, but in my eyes that is
offset by it being non-standard. YMMV.

So like i said, I don't see your point in trying to
analyse someones pseudo code with the attempt to imply the
notation of key/pair as being "fishy"?

If you *represent* key-value pairs in XML that is perfectly
okay I suppose. What I was objecting to was perceiving
nodes as key-value pairs. Just a bit of misunderstanding,
as I said.

Boris Kolpackov · May 18, 2007

Hi,

Pep said:
So in using the libxml2 class I can quickly get to the data that I
want which is in a crude key/pair format i.e.

<Cat ID="1" >
<CatName>Models</CatName>
</Cat>

If all you need is to get the data stored in XML then a data
binding approach may be an easy solution. In short you will
have C++ classes generated that model your XML and which you
can use to get to the data in a more convenient way:

class Cat
{
int ID () const;
string CatName () const;
};

Cat c = cat ("cat.xml");

cout << c.ID () << " " << c.CatName () << endl;

The following article provide a quick intro to XML data binding in
C++:

http://www.artima.com/cppsource/xml_data_binding.html

hth,
-boris

Pep · May 21, 2007

Jürgen Kahrs said:
Pep said:

So I would get a key/data pair for each element and for each element
attribute.

Click to expand...

Did you consider a scripting language ?
You said you wanted to simply pull one element
after the other and also look at the attributes.

http://home.vrweb.de/~juergen.kahrs/gawk/XML/xmlgawk.html#Printing-an-outline-of-an-XML-file

This script reads one element after the other and
simply prints an outline:

@load xml
XMLSTARTELEM {
printf("%*s%s", 2*XMLDEPTH-2, "", XMLSTARTELEM)
for (i=1; i<=NF; i++)
printf(" %s='%s'", $i, XMLATTR[$i])
print ""
}

That's all.

Thanks, it looks interesting but unfortunately I have to do this as
part of a c++ library, so scripting is not an option for me.

Pep · May 21, 2007

Boris said:
Hi,

If all you need is to get the data stored in XML then a data
binding approach may be an easy solution. In short you will
have C++ classes generated that model your XML and which you
can use to get to the data in a more convenient way:

class Cat
{
int ID () const;
string CatName () const;
};

Cat c = cat ("cat.xml");

cout << c.ID () << " " << c.CatName () << endl;

The following article provide a quick intro to XML data binding in
C++:

http://www.artima.com/cppsource/xml_data_binding.html

hth,
-boris
--
Boris Kolpackov
Code Synthesis Tools CC
http://www.codesynthesis.com
Open-Source, Cross-Platform C++ XML Data Binding

Thanks Boris, I have found a solution to my problem using libxml2 but
as always, I am now interested in XML as I have to use it now, so I
will look in to URI you posted.

Class receipt c++	5	Nov 7, 2020
XML support featured in the DataSet class for reading and writingdata as XML	0	Feb 16, 2014
SportsFeed Xml Processor 1.0 released	0	Jan 24, 2008
Parking lot C#	5	Nov 5, 2020
How to speed up XML reading	11	Sep 11, 2012
WANTED: C++ class generator for XML input, uses libxml2	2	Sep 23, 2010
Create a DOM document from an XML string (Xerces-C++ 2.5)	1	Apr 29, 2009
XSLT processor doesnt like DOCTYPE declaration in XML file	1	Dec 10, 2004

c++ XML processor class?

Pep

Pavel Lepin

usenet

Pep

Pep

=?ISO-8859-1?Q?J=FCrgen_Kahrs?=

Pavel Lepin

Boris Kolpackov

Pep

Pep

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads