xml4c child nodes

marfi95 · Oct 12, 2006

I'm trying to iterate through a list of child nodes. It seems like to
get the text value of the node, you have to do a
node->getFirstChild()->getNodeValue. This being said, there is a
hasChildNodes method, but if I use that, it includes the "text" nodes
also, which I don't want ot include.

if this is my xml:

<A>

<C></C>
</A>

if I have a node for B, I thought getNextSibling would return C, but it
didn't. it returned #text.

confused.

Joe Kesselman · Oct 12, 2006

<A>

<C></C>
</A>

if I have a node for B, I thought getNextSibling would return C, but it
didn't. it returned #text.

If you'd stopped to look at the value of that text node, you'd have
answered your own question -- it's the whitespace (newline and
indentation) between the B's end-tag and the start-tag for C.

XML doesn't know whether that whitespace text is meaningful or not, so
XML APIs will deliver it. Your app needs to deal with that appropriately.

Magnus Henriksson · Oct 12, 2006

Joe said:
XML doesn't know whether that whitespace text is meaningful or not, so
XML APIs will deliver it. Your app needs to deal with that appropriately.

Some XML APIs may report such whitespace as "ignorable". This is
whitespace between elements where the DTD does not allow PCDATA. This
assumes that there is a DTD.

But they are still nodes in the infoset.

// Magnus

Martin Honnen · Oct 12, 2006

I'm trying to iterate through a list of child nodes. It seems like to
get the text value of the node, you have to do a
node->getFirstChild()->getNodeValue. This being said, there is a
hasChildNodes method, but if I use that, it includes the "text" nodes
also, which I don't want ot include.

if this is my xml:

<A>

<C></C>
</A>

if I have a node for B, I thought getNextSibling would return C, but it
didn't. it returned #text.

Then check nodeType (respectively getNodeType()) till you find an
element node (node type is 1).

Joe Kesselman · Oct 12, 2006

Magnus said:
Some XML APIs may report such whitespace as "ignorable". This is
whitespace between elements where the DTD does not allow PCDATA. This
assumes that there is a DTD.

Good point. *If* there is a DTD or Schema available which provides that
information, some tools can be asked to suppress whitespace that appears
where only elements where expected. That's getting beyond straight
parsing into preliminary processing/filtering, since as Magnus says it
involves delivering a modified infoset.

Since that support is not always supported by the API -- or may be
supported in theory but not actually implemented on all parsers -- you
need to exercise a bit of care in relying on it. I've generally
preferred not to do so, for that reason and because sometimes users want
the whitespace preserved even when it isn't "meaningful" to the document.

marfi95 · Oct 12, 2006

information, some tools can be asked to suppress whitespace that appears
where only elements where expected. That's getting beyond straight
parsing into preliminary processing/filtering, since as Magnus says it
involves delivering a modified infoset.

Since that support is not always supported by the API -- or may be
supported in theory but not actually implemented on all parsers -- you
need to exercise a bit of care in relying on it. I've generally
preferred not to do so, for that reason and because sometimes users want
the whitespace preserved even when it isn't "meaningful" to the document.

Thanks for the replies. But going back to my original XML example.

<A>
Data
<C>Data</C>
</A>

How can I determine if A has children ? calling hasChildNodes seems
worthless to me since it will always have the text node underneath it.
I guess I have to write my own version that doesn't look at the
TextNodes ?

TIA.

Martin Honnen · Oct 12, 2006

But going back to my original XML example.

<A>
Data
<C>Data</C>
</A>

How can I determine if A has children ? calling hasChildNodes seems
worthless to me since it will always have the text node underneath it.

Why, if you have e.g.
<A/>
or
<A />
or
<A></A>
then the element is really emtpy and hasChildNodes is false.
If you are looking for element child nodes only then you can use the
getElementsByTagName("*").length check (reports all descendant elements)
or use XPath if you API supports that (e.g. selectNodes("*").length,
reports all child elements).

marfi95 · Oct 12, 2006

Martin said:
Why, if you have e.g.
<A/>
or
<A />
or
<A></A>
then the element is really emtpy and hasChildNodes is false.
If you are looking for element child nodes only then you can use the
getElementsByTagName("*").length check (reports all descendant elements)
or use XPath if you API supports that (e.g. selectNodes("*").length,
reports all child elements).

I was incorrect in my question. I was meaning to ask about B. I got a
DOM_Node for B and then check hasChildNodes and it returns True, when
there are no "real" child nodes. I didn't realize you could use a "*"
in the getElements, so I can use this instead of the hasChildNodes
call. Thanks for the help.

marfi95 · Oct 12, 2006

[email protected] said:
I was incorrect in my question. I was meaning to ask about B. I got a
DOM_Node for B and then check hasChildNodes and it returns True, when
there are no "real" child nodes. I didn't realize you could use a "*"
in the getElements, so I can use this instead of the hasChildNodes
call. Thanks for the help.

sorry to bother again. But can someone please explain the difference
between a DOM_Node and a DOM_Element. Is a DOM_Element just a "type"
of DOM_Node ?

What I did was a getElementsByTagName for the DOM_Document to give me a
NodeList, then for each of those nodes, I was going to use a
getElementsByTag to get the child elemnentnodes("*").length to
determine if that node has any child elements, but can't because
getElementsByTagname is not part of DOM_Node, but DOM_Element. What is
the correct way of doing this please ? I'm new to DOM as you can see.

Martin Honnen · Oct 12, 2006

But can someone please explain the difference
between a DOM_Node and a DOM_Element. Is a DOM_Element just a "type"
of DOM_Node ?

Yes, node is usually an abstract base class (or interface) that is
extended by several concrete sub classes (or interfaces) (e.g. for
document, element, attribute, text, processing instruction, cdata
section, comment nodes).

What I did was a getElementsByTagName for the DOM_Document to give me a
NodeList, then for each of those nodes, I was going to use a
getElementsByTag to get the child elemnentnodes("*").length to
determine if that node has any child elements, but can't because
getElementsByTagname is not part of DOM_Node, but DOM_Element. What is
the correct way of doing this please ? I'm new to DOM as you can see.

You need to cast that DOM_Node that you have to a DOM_Element. With Java
you would simply do e.g.
Element el = (Element)node;
Can't help with exact xml4c syntax.

marfi95 · Oct 12, 2006

Martin said:
Yes, node is usually an abstract base class (or interface) that is
extended by several concrete sub classes (or interfaces) (e.g. for
document, element, attribute, text, processing instruction, cdata
section, comment nodes).

You need to cast that DOM_Node that you have to a DOM_Element. With Java
you would simply do e.g.
Element el = (Element)node;
Can't help with exact xml4c syntax.

Thanks. I got it.

marfi95 · Oct 12, 2006

Thanks. I got it.

Since getElementsByTag("*") returns all element nodes, is there an easy
way to only get the next level of elements.

i.e.
<A>

<C>
</C>


<C>
</C>

</A>

I would only want the NodeList returned to contain the B's element
nodes (and not C's)

Thanks.

Joseph Kesselman · Oct 12, 2006

Since getElementsByTag("*") returns all element nodes, is there an easy
way to only get the next level of elements.

Simplest: getFirstChild followed by repeated getNextSibling, ignoring
those which aren't elements.

Overkill: Use one of the mechanisms in the DOM Level 2 Traversal
feature, setting its filters to show you only the nodes you're
interested in.

marfi95 · Oct 12, 2006

Joseph said:
Simplest: getFirstChild followed by repeated getNextSibling, ignoring
those which aren't elements.

Overkill: Use one of the mechanisms in the DOM Level 2 Traversal
feature, setting its filters to show you only the nodes you're
interested in.

Thanks. I've heard that using DOM can be very memory intensive because
of the tree and it might not be the best approach on "large" XML
documents. Does anyone have any numbers on what large would be and
where it might not be the appropriate method to use.

The XML we're talking about here could be around 30-40K, with about
1000 simultaneous users. Each thread would have their own parser
instance, which based on my understanding of what I've been reading,
that shouldn't be an issue. But I'm a little concerned over what I'm
reading about the memory usage.

Any ideas what kind of sizes we're talking about here ?

Joseph Kesselman · Oct 12, 2006

See http://www.w3.org/DOM/faq.html#SAXandDOM

The DOM is only an API; if you want to talk about memory usage, you need
to discuss specific implementations, especially since the storage behind
that API doesn't necessarily use the same structure as that presented to
the user.

[C#] Extend main interface on child level	0	Aug 31, 2023
Help with Creating a Looping Procedure	1	Dec 10, 2007
Iterating through DOM tree using an Iterator	0	Oct 20, 2011
XSL select only nodes which contain a specific child node	4	Jul 8, 2009
Need help finding Segmentation fault C++	0	Apr 16, 2022
parsing nested unbounded XML fields with ElementTree	6	Nov 25, 2013
XSLT - Sub-grouping in mixed nodes	4	Aug 14, 2008
xpath to select text nodes and <br>	1	Jun 24, 2008

xml4c child nodes

marfi95

Joe Kesselman

Magnus Henriksson

Martin Honnen

Joe Kesselman

marfi95

Martin Honnen

marfi95

marfi95

Martin Honnen

marfi95

marfi95

Joseph Kesselman

marfi95

Joseph Kesselman

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads