How do I get the value out of a DOM Element

K

kj7ny

I have been able to get xml.dom.minidom.parse('somefile.xml') and then
dom.getElementsByTagName('LLobjectID') to work to the point where I
get something like: [<DOM Element: LLobjectID at 0x13cba08>] which I
can get down to <DOM Element: LLobjectID at 0x13cba08> but then I
can't find any way to just get the value out from the thing!

..toxml() returns something like: u'<LLobjectID><![CDATA[1871203]]></
LLobjectID>'.

How do I just get the 1871203 out of the DOM Element?

Thanks,
 
S

Stefan Behnel

kj7ny said:
I have been able to get xml.dom.minidom.parse('somefile.xml') and then
dom.getElementsByTagName('LLobjectID') to work to the point where I
get something like: [<DOM Element: LLobjectID at 0x13cba08>] which I
can get down to <DOM Element: LLobjectID at 0x13cba08> but then I
can't find any way to just get the value out from the thing!

.toxml() returns something like: u'<LLobjectID><![CDATA[1871203]]></
LLobjectID>'.

How do I just get the 1871203 out of the DOM Element?

It contains a CDATA node which in turn contains a Text node (AFAIR), so you
have to walk through the children to get what you want.

Alternatively, try an XML API that makes it easy to handle XML, like
ElementTree (part of the stdlin in Python 2.5) or lxml, both of which have
compatible APIs. The code would look like this:

tree = etree.parse("some_file.xml")
id = tree.find("//LLobjectID")
print id.text

Stefan
 
P

Paul Boddie

I have been able to get xml.dom.minidom.parse('somefile.xml') and then
dom.getElementsByTagName('LLobjectID') to work to the point where I
get something like: [<DOM Element: LLobjectID at 0x13cba08>] which I
can get down to <DOM Element: LLobjectID at 0x13cba08> but then I
can't find any way to just get the value out from the thing!

.toxml() returns something like: u'<LLobjectID><![CDATA[1871203]]></
LLobjectID>'.

How do I just get the 1871203 out of the DOM Element?

DOM Level 3 provides the textContent property:

http://www.w3.org/TR/DOM-Level-3-Core/core.html#Node3-textContent

You'll find this in libxml2dom and possibly some other packages such
as pxdom. For the above case with minidom specifically (at least with
versions I've used), you need to iterate over the childNodes of the
element, obtaining the nodeValue for each node and joining them
together. Something like this might do it:

"".join([n.nodeValue for n in element.childNodes])

It's not pretty, but encapsulating stuff like this is what functions
are good for.

Paul
 
S

Stefan Behnel

kj7ny said:
Forgot to mention I'm using Python 2.4.3.

You can install both lxml and ET on Python 2.4 (and 2.3). It's just that ET
went into the stdlib from 2.5 on.

Stefan
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,438
Messages
2,571,699
Members
48,796
Latest member
Greg L.
Top