parsing XML with 'expat'

Bjoern Hoehrmann · Aug 20, 2007

* Roman Mashak wrote in comp.text.xml:

I hope this might be the right group to ask. I need to parse out in C
language the XML of the following structure:

<BERTEST>
<NODE1>
<FREQ>666000000</FREQ>
<POWER>-82</POWER>
</NODE1>
<NODE1>
<FREQ>484000000</FREQ>
<POWER>-80</POWER>
</NODE2>
</BERTEST>

So I took the 'expat' library to do that (I've never dealt with XML before
though), and tried to cutomize the example they ship with library
(outline.c). What I can't quite understand is:
1) is my XML really can be called XML, or it's some sort of invalid?
According to wikipedia page on XML, the valid document should look like
this:

<name attribute="value">content</name>

while mine is a bit different

Your second <NODE1> should probably be <NODE2> (otherwise the start- and
end-tags do not match up), but other than that it certainly is XML. You
are free to choose (when designing a new XML format) whether you use an
attribute or element to encode some information.

2) if anyway my xml document is correct, then how can I parse it with expat?
What I need is upon occurences of FREQ and POWER tags to extract their
values (i.e. 666000000 for FREQ or 082 for POWER in the above example).

So, I think I need to register callback function for start tags and try to
do what I want in there. But how can I get the values of tags, which 'expat'
functions to use? Or there's another, more simple way?

Expart reports the text through the `characters` callback. You have to
setup a handler for it and accumulate the text reported to it; then
process the text e.g. in the end_element handler. There is no direct way
get to the text when using Expat.

=?ISO-8859-1?Q?J=FCrgen_Kahrs?= · Aug 20, 2007

Roman said:
2) if anyway my xml document is correct, then how can I parse it with expat?
What I need is upon occurences of FREQ and POWER tags to extract their
values (i.e. 666000000 for FREQ or 082 for POWER in the above example).

What do you do with the extracted text ?
Do you put it into a text file for further
processing ? Then you can use any scripting
languag that is easier to use than the Expat
at C level.

For example, you can try this one:

http://home.vrweb.de/~juergen.kahrs/gawk/XML/xmlgawk.html#Printing-an-outline-of-an-XML-file

@load xml
XMLSTARTELEM {
printf("%*s%s", 2*XMLDEPTH-2, "", XMLSTARTELEM)
for (i=1; i<=NF; i++)
printf(" %s='%s'", $i, XMLATTR[$i])
print ""
}

This script does exactly what the outline.c example
from Expat does.

Roman Mashak · Aug 20, 2007

Hello,

I hope this might be the right group to ask. I need to parse out in C
language the XML of the following structure:

<BERTEST>
<NODE1>
<FREQ>666000000</FREQ>
<POWER>-82</POWER>
</NODE1>
<NODE1>
<FREQ>484000000</FREQ>
<POWER>-80</POWER>
</NODE2>
</BERTEST>

So I took the 'expat' library to do that (I've never dealt with XML before
though), and tried to cutomize the example they ship with library
(outline.c). What I can't quite understand is:
1) is my XML really can be called XML, or it's some sort of invalid?
According to wikipedia page on XML, the valid document should look like
this:

<name attribute="value">content</name>

while mine is a bit different

2) if anyway my xml document is correct, then how can I parse it with expat?
What I need is upon occurences of FREQ and POWER tags to extract their
values (i.e. 666000000 for FREQ or 082 for POWER in the above example).

So, I think I need to register callback function for start tags and try to
do what I want in there. But how can I get the values of tags, which 'expat'
functions to use? Or there's another, more simple way?

Thanks in advance

expat XML parser manpage	2	Dec 11, 2010
Show full path to all tags in xml (xslt newbie)	1	Sep 6, 2010
Parsing cdata using expat in C	0	Mar 27, 2012
XML parsing: SAX/expat & yield	2	Aug 4, 2010
libcurl and expat	1	May 23, 2006
build a hierarchical tree, without using DOM,schema, and sax using expat parser and c	2	Nov 5, 2007
expat having problems with entities (&)	3	Dec 11, 2009
loosing data while parsing xml with expat	0	Nov 19, 2003

parsing XML with 'expat'

Bjoern Hoehrmann

=?ISO-8859-1?Q?J=FCrgen_Kahrs?=

Roman Mashak

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads