problems with xml parsing (python 3.3)

jannidis · Oct 27, 2012

Hello all,

I am new to Python and have a problem with the behaviour of the xml parser. Assume we have this xml document:

<?xml version="1.0" encoding="UTF-8"?>
<bibliography>
<entry>
Title of the first book.
</entry>
<entry>
<coauthored/>
Title of the second book.
</entry>
</bibliography>

If I now check for the text of all 'entry' nodes, the text for the node with the empty element isn't shown

import xml.etree.ElementTree as ET
tree = ET.ElementTree(file='test.xml')
root = tree.getroot()
resultSet = root.findall(".//entry")
for r in resultSet:
print (r.text)

jannidis · Oct 27, 2012

To my understanding the empty element is a child of entry as is the text node.
Is there anything I am doing wrong here? Any help is appreciated,

Fotis

MRAB · Oct 27, 2012

Hello all,

I am new to Python and have a problem with the behaviour of the xml parser. Assume we have this xml document:

<?xml version="1.0" encoding="UTF-8"?>
<bibliography>
<entry>
Title of the first book.
</entry>
<entry>
<coauthored/>
Title of the second book.
</entry>
</bibliography>

If I now check for the text of all 'entry' nodes, the text for the node with the empty element isn't shown

import xml.etree.ElementTree as ET
tree = ET.ElementTree(file='test.xml')
root = tree.getroot()
resultSet = root.findall(".//entry")
for r in resultSet:
print (r.text)

It _is_ shown, it's just that it's all whitespace:
print(ascii(r.text))

'\n Title of the first book.\n '
'\n '

Dieter Maurer · Oct 28, 2012

I am new to Python and have a problem with the behaviour of the xml parser. Assume we have this xml document:

<?xml version="1.0" encoding="UTF-8"?>
<bibliography>
<entry>
Title of the first book.
</entry>
<entry>
<coauthored/>
Title of the second book.
</entry>
</bibliography>

If I now check for the text of all 'entry' nodes, the text for the node with the empty element isn't shown

import xml.etree.ElementTree as ET
tree = ET.ElementTree(file='test.xml')
root = tree.getroot()
resultSet = root.findall(".//entry")
for r in resultSet:
print (r.text)

I do not know about "xml.etree" but the (said) quite compatible
"lxml.etree" handles text nodes in a quite different way from
that of "DOM": they are *not* considered children of the parent
element but are attached as attributes "text" and "tail" to either
the container element (if the first DOM node is a text node) or the preceeding
element, otherwise.

Your code snippet suggests that "xml.etree" behaves identically in
this respect. In this case, you would find "Title of the second book"
as the "tail" attribute of the element "coauthored".

jannidis · Oct 29, 2012

Am Sonntag, 28. Oktober 2012 03:27:14 UTC+1 schrieb (e-mail address removed):

Hello all,

I am new to Python and have a problem with the behaviour of the xml parser. Assume we have this xml document:

<?xml version="1.0" encoding="UTF-8"?>

<bibliography>

<entry>

Title of the first book.

</entry>

<entry>

<coauthored/>

Title of the second book.

</entry>

</bibliography>

If I now check for the text of all 'entry' nodes, the text for the node with the empty element isn't shown

import xml.etree.ElementTree as ET

tree = ET.ElementTree(file='test.xml')

root = tree.getroot()

resultSet = root.findall(".//entry")

for r in resultSet:

print (r.text)

thanks a lot for your answer. as I am looking for a tool to teach using xml in programming it is a pity that this modul implements a very idiosyncratic view on xml data, but dom and sax are out there too, so I will look at them.

jannidis · Oct 30, 2012

If someone comes across this posting with the same problem, the best answer seems to be:
avoid Pythons xml.etree.ElementTree and use this library instead:
http://lxml.de/
It works like expected and supports xpath much better.

Python 3.3, gettext and Unicode problems	0	Dec 30, 2012
ElementTree XML parsing problem	8	Apr 27, 2011
XML parsing ExpatError with xml.dom.minidom at line 1, column 0	2	Feb 13, 2014
parsing nested unbounded XML fields with ElementTree	6	Nov 25, 2013
XML parsing with python	1	Aug 17, 2009
Python teaching book recommendations: 3.3+ and with exercises	1	May 2, 2013
simple ElementTree based parser that allows entity definition map	0	Dec 4, 2013
Problems with sockets and threads	9	Apr 11, 2013

problems with xml parsing (python 3.3)

jannidis

jannidis

MRAB

Dieter Maurer

jannidis

jannidis

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads