D
Damjan
Attached is the smallest test case, that shows that ElementTree returns
a
string object if the text in the tree is only ascii, but returns a
unicode
object otherwise.
This would make sense if the sting object and unicode object were
interchangeable... but they are not - one example, the translate method
is
completelly different.
I've tested with cElementTree (1.0.2) too, it has the same behaviour.
Any suggestions?
Do I need to check the output of ElementTree everytime, or there's some
hidden switch to change this behaviour?
from elementtree import ElementTree
xml = """\
<?xml version="1.0" encoding="UTF-8"?>
<root>
<p1> ascii </p1>
<p2> \xd0\xba\xd0\xb8\xd1\x80\xd0\xb8\xd0\xbb\xd0\xb8\xd1\x86\xd0\xb0
</p2>
</root>
"""
tree = ElementTree.fromstring(xml)
p1, p2 = tree.getchildren()
print "type(p1.text):", type(p1.text)
print "type(p2.text):", type(p2.text)
a
string object if the text in the tree is only ascii, but returns a
unicode
object otherwise.
This would make sense if the sting object and unicode object were
interchangeable... but they are not - one example, the translate method
is
completelly different.
I've tested with cElementTree (1.0.2) too, it has the same behaviour.
Any suggestions?
Do I need to check the output of ElementTree everytime, or there's some
hidden switch to change this behaviour?
from elementtree import ElementTree
xml = """\
<?xml version="1.0" encoding="UTF-8"?>
<root>
<p1> ascii </p1>
<p2> \xd0\xba\xd0\xb8\xd1\x80\xd0\xb8\xd0\xbb\xd0\xb8\xd1\x86\xd0\xb0
</p2>
</root>
"""
tree = ElementTree.fromstring(xml)
p1, p2 = tree.getchildren()
print "type(p1.text):", type(p1.text)
print "type(p2.text):", type(p2.text)