problems with xml parsing (python 3.3)

J

jannidis

Hello all,

I am new to Python and have a problem with the behaviour of the xml parser. Assume we have this xml document:

<?xml version="1.0" encoding="UTF-8"?>
<bibliography>
<entry>
Title of the first book.
</entry>
<entry>
<coauthored/>
Title of the second book.
</entry>
</bibliography>


If I now check for the text of all 'entry' nodes, the text for the node with the empty element isn't shown



import xml.etree.ElementTree as ET
tree = ET.ElementTree(file='test.xml')
root = tree.getroot()
resultSet = root.findall(".//entry")
for r in resultSet:
print (r.text)
 
J

jannidis

To my understanding the empty element is a child of entry as is the text node.
Is there anything I am doing wrong here? Any help is appreciated,

Fotis
 
M

MRAB

Hello all,

I am new to Python and have a problem with the behaviour of the xml parser. Assume we have this xml document:

<?xml version="1.0" encoding="UTF-8"?>
<bibliography>
<entry>
Title of the first book.
</entry>
<entry>
<coauthored/>
Title of the second book.
</entry>
</bibliography>


If I now check for the text of all 'entry' nodes, the text for the node with the empty element isn't shown



import xml.etree.ElementTree as ET
tree = ET.ElementTree(file='test.xml')
root = tree.getroot()
resultSet = root.findall(".//entry")
for r in resultSet:
print (r.text)
It _is_ shown, it's just that it's all whitespace:
print(ascii(r.text))


'\n Title of the first book.\n '
'\n '
 
D

Dieter Maurer

I am new to Python and have a problem with the behaviour of the xml parser. Assume we have this xml document:

<?xml version="1.0" encoding="UTF-8"?>
<bibliography>
<entry>
Title of the first book.
</entry>
<entry>
<coauthored/>
Title of the second book.
</entry>
</bibliography>


If I now check for the text of all 'entry' nodes, the text for the node with the empty element isn't shown



import xml.etree.ElementTree as ET
tree = ET.ElementTree(file='test.xml')
root = tree.getroot()
resultSet = root.findall(".//entry")
for r in resultSet:
print (r.text)

I do not know about "xml.etree" but the (said) quite compatible
"lxml.etree" handles text nodes in a quite different way from
that of "DOM": they are *not* considered children of the parent
element but are attached as attributes "text" and "tail" to either
the container element (if the first DOM node is a text node) or the preceeding
element, otherwise.

Your code snippet suggests that "xml.etree" behaves identically in
this respect. In this case, you would find "Title of the second book"
as the "tail" attribute of the element "coauthored".
 
J

jannidis

Am Sonntag, 28. Oktober 2012 03:27:14 UTC+1 schrieb (e-mail address removed):
Hello all,



I am new to Python and have a problem with the behaviour of the xml parser. Assume we have this xml document:



<?xml version="1.0" encoding="UTF-8"?>

<bibliography>

<entry>

Title of the first book.

</entry>

<entry>

<coauthored/>

Title of the second book.

</entry>

</bibliography>





If I now check for the text of all 'entry' nodes, the text for the node with the empty element isn't shown







import xml.etree.ElementTree as ET

tree = ET.ElementTree(file='test.xml')

root = tree.getroot()

resultSet = root.findall(".//entry")

for r in resultSet:

print (r.text)

thanks a lot for your answer. as I am looking for a tool to teach using xml in programming it is a pity that this modul implements a very idiosyncratic view on xml data, but dom and sax are out there too, so I will look at them.
 
J

jannidis

If someone comes across this posting with the same problem, the best answer seems to be:
avoid Pythons xml.etree.ElementTree and use this library instead:
http://lxml.de/
It works like expected and supports xpath much better.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,537
Members
45,023
Latest member
websitedesig25

Latest Threads

Top