ElementTree - Howto access text within XML tag element...

C

cmalmqui

Hi,

I am writing on a small XML parser and are currently stuck as I am not
able to get the whole element name in ElementTree.

Please see the below example where "print root[0][0]" returns
"<Element 'Activity' at 018A3938>"

Is there a way to get hold of the "Running" string in the tag using
elementTree?

<Activities>
<Activity Sport="Running">
<Id>2009-07-10T14:48:00Z</Id>
<Lap StartTime="2009-07-10T14:48:00Z">
.........

For those of you that know how to program XML I have another
question:
I am currently "hardcoding" my XML parser using brackets, is this a
good approach or should I build it using a "search on tag" approach.

Thank you for any answers!
 
N

Ned Deily

I am writing on a small XML parser and are currently stuck as I am not
able to get the whole element name in ElementTree.

Please see the below example where "print root[0][0]" returns
"<Element 'Activity' at 018A3938>"

Is there a way to get hold of the "Running" string in the tag using
elementTree?

<Activities>
<Activity Sport="Running">
<Id>2009-07-10T14:48:00Z</Id>
<Lap StartTime="2009-07-10T14:48:00Z">
.........

"Running" is the value of the "Sport" attribute of the "Activity"
element. The documentation for the Element interface lists several ways
to access element attributes; in your example,
elem = root[0][0]
elem.get("Sport") 'Running'
elem.attrib {'Sport': 'Running'}
elem.items()
[('Sport', 'Running')]

See http://docs.python.org/library/xml.etree.elementtree.html
 
D

Diez B. Roggisch

cmalmqui said:
Hi,

I am writing on a small XML parser and are currently stuck as I am not
able to get the whole element name in ElementTree.

Please see the below example where "print root[0][0]" returns
"<Element 'Activity' at 018A3938>"

Is there a way to get hold of the "Running" string in the tag using
elementTree?

<Activities>
<Activity Sport="Running">
<Id>2009-07-10T14:48:00Z</Id>
<Lap StartTime="2009-07-10T14:48:00Z">
.........

For those of you that know how to program XML I have another
question:
I am currently "hardcoding" my XML parser using brackets, is this a
good approach or should I build it using a "search on tag" approach.

What do you mean by that - hardcoding by brackets?

Diez
 
C

cmalmqui

cmalmqui said:
I am writing on a small XML parser and are currently stuck as I am not
able to get the whole element name in ElementTree.
Please see the below example where "print root[0][0]" returns
"<Element 'Activity' at 018A3938>"
Is there a way to get hold of the "Running" string in the tag using
elementTree?
<Activities>
    <Activity Sport="Running">
      <Id>2009-07-10T14:48:00Z</Id>
      <Lap StartTime="2009-07-10T14:48:00Z">
      .........
For those of you that know how to program XML I have another
question:
I am currently "hardcoding" my XML parser using brackets, is this a
good approach or should I build it using a "search on tag" approach.

What do you mean by that - hardcoding by brackets?

Diez

Indeed, my current approach has been to hardcode the XML parser using
brackets. Is there a more elegant way?

I am parsing a garmin xml file from a handheld GPS and as you can see
in the below script, I am hardcoding each node:

import xml.etree.cElementTree as etree

def gettext(elem):
text = elem.text or ""
for e in elem:
text += gettext(e)
if e.tail:
text += e.tail
return text

tree = etree.parse('10_07_2009 16_48_00_history.tcx')
root = tree.getroot()

elem = root[0][0]

# ID Tag
print "type of exercise : " + elem.get("Sport")
print "excercise starttime : " + gettext(elem[0])

# iterate over all laps
for i in range(1, len(elem)-1):

# LAP TAG
print "\nlap number : " + str(i)
print "lap start time : " + str(elem.get("StartTime"))
print "lap duration (s) : " + gettext(elem[0])
print "lap length (m) : " + gettext(elem[1])
print "max speed (km/h) : " + gettext(elem[2])
print "number of calories : " + gettext(elem[3])
print "average heartbeat : " + gettext(elem[4][0])
print "max heartbeat : " + gettext(elem[5][0])
print "number of records : " + str(len(elem[8])-1)
for j in range(1, len(elem[8])-1):
time = gettext(elem[8][j][0]) #time
lat = gettext(elem[8][j][1][0]) #lat
lon = gettext(elem[8][j][1][1]) #lon
alt = gettext(elem[8][j][2]) #alt
dist = gettext(elem[8][j][3]) #distance from start
bpm = gettext(elem[8][j][4][0]) #beats per minute
#print time + " " + lat + " " + lon + " " + alt + " " + dist +
" " + bpm

print "\nReceiver Info : " + gettext(elem[len(elem)-1][0])
 
C

cmalmqui

 cmalmqui said:
I am writing on a small XML parser and are currently stuck as I am not
able to get the whole element name in ElementTree.
Please see the below example where "print root[0][0]" returns
"<Element 'Activity' at 018A3938>"
Is there a way to get hold of the "Running" string in the tag using
elementTree?
<Activities>
    <Activity Sport="Running">
      <Id>2009-07-10T14:48:00Z</Id>
      <Lap StartTime="2009-07-10T14:48:00Z">
      .........

"Running" is the value of the "Sport" attribute of the "Activity"
element.  The documentation for the Element interface lists several ways
to access element attributes; in your example,
elem = root[0][0]
elem.get("Sport") 'Running'
elem.attrib

{'Sport': 'Running'}>>> elem.items()

[('Sport', 'Running')]

Seehttp://docs.python.org/library/xml.etree.elementtree.html

Excellent!
Thanks!
The XML magic is getting there slowly...
 
D

Diez B. Roggisch

cmalmqui said:
cmalmqui said:
Hi,
I am writing on a small XML parser and are currently stuck as I am not
able to get the whole element name in ElementTree.
Please see the below example where "print root[0][0]" returns
"<Element 'Activity' at 018A3938>"
Is there a way to get hold of the "Running" string in the tag using
elementTree?
<Activities>
<Activity Sport="Running">
<Id>2009-07-10T14:48:00Z</Id>
<Lap StartTime="2009-07-10T14:48:00Z">
.........
For those of you that know how to program XML I have another
question:
I am currently "hardcoding" my XML parser using brackets, is this a
good approach or should I build it using a "search on tag" approach.
What do you mean by that - hardcoding by brackets?

Diez

Indeed, my current approach has been to hardcode the XML parser using
brackets. Is there a more elegant way?

I am parsing a garmin xml file from a handheld GPS and as you can see
in the below script, I am hardcoding each node:

As you don't give an actual example of how the XML looks like, it's hard
to tell. But under the assumption that the tag-names are not generic,
I'd certainly go for

root.find("tagname")

instead. That's much clearer, and you don't rely on an actual order of
elements.

Diez
 
S

Stefan Behnel

cmalmqui said:
tree = etree.parse('10_07_2009 16_48_00_history.tcx')
root = tree.getroot()

elem = root[0][0]

# iterate over all laps
for i in range(1, len(elem)-1):

Note that you can iterate over elements as in

for lap_element in elem:
# ...

Then use

record = lap.find("recordtagname")

to find things inside the subtree. You can also use XPath-like expressions
such as

all_intersting_elements =
lap.findall("sometag/somechild//somedescendant")

Stefan
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,483
Members
44,901
Latest member
Noble71S45

Latest Threads

Top