problems with xml parsing (python 3.3)

Discussion in 'Python' started by jannidis@gmail.com, Oct 28, 2012.

  1. Guest

    Hello all,

    I am new to Python and have a problem with the behaviour of the xml parser. Assume we have this xml document:

    <?xml version="1.0" encoding="UTF-8"?>
    <bibliography>
    <entry>
    Title of the first book.
    </entry>
    <entry>
    <coauthored/>
    Title of the second book.
    </entry>
    </bibliography>


    If I now check for the text of all 'entry' nodes, the text for the node with the empty element isn't shown



    import xml.etree.ElementTree as ET
    tree = ET.ElementTree(file='test.xml')
    root = tree.getroot()
    resultSet = root.findall(".//entry")
    for r in resultSet:
    print (r.text)
     
    , Oct 28, 2012
    #1
    1. Advertising

  2. Guest

    To my understanding the empty element is a child of entry as is the text node.
    Is there anything I am doing wrong here? Any help is appreciated,

    Fotis
     
    , Oct 28, 2012
    #2
    1. Advertising

  3. MRAB Guest

    On 2012-10-28 02:27, wrote:
    > Hello all,
    >
    > I am new to Python and have a problem with the behaviour of the xml parser. Assume we have this xml document:
    >
    > <?xml version="1.0" encoding="UTF-8"?>
    > <bibliography>
    > <entry>
    > Title of the first book.
    > </entry>
    > <entry>
    > <coauthored/>
    > Title of the second book.
    > </entry>
    > </bibliography>
    >
    >
    > If I now check for the text of all 'entry' nodes, the text for the node with the empty element isn't shown
    >
    >
    >
    > import xml.etree.ElementTree as ET
    > tree = ET.ElementTree(file='test.xml')
    > root = tree.getroot()
    > resultSet = root.findall(".//entry")
    > for r in resultSet:
    > print (r.text)
    >

    It _is_ shown, it's just that it's all whitespace:

    >>> for r in resultSet:

    print(ascii(r.text))


    '\n Title of the first book.\n '
    '\n '
     
    MRAB, Oct 28, 2012
    #3
  4. writes:

    > I am new to Python and have a problem with the behaviour of the xml parser. Assume we have this xml document:
    >
    > <?xml version="1.0" encoding="UTF-8"?>
    > <bibliography>
    > <entry>
    > Title of the first book.
    > </entry>
    > <entry>
    > <coauthored/>
    > Title of the second book.
    > </entry>
    > </bibliography>
    >
    >
    > If I now check for the text of all 'entry' nodes, the text for the node with the empty element isn't shown
    >
    >
    >
    > import xml.etree.ElementTree as ET
    > tree = ET.ElementTree(file='test.xml')
    > root = tree.getroot()
    > resultSet = root.findall(".//entry")
    > for r in resultSet:
    > print (r.text)


    I do not know about "xml.etree" but the (said) quite compatible
    "lxml.etree" handles text nodes in a quite different way from
    that of "DOM": they are *not* considered children of the parent
    element but are attached as attributes "text" and "tail" to either
    the container element (if the first DOM node is a text node) or the preceeding
    element, otherwise.

    Your code snippet suggests that "xml.etree" behaves identically in
    this respect. In this case, you would find "Title of the second book"
    as the "tail" attribute of the element "coauthored".
     
    Dieter Maurer, Oct 28, 2012
    #4
  5. Guest

    Am Sonntag, 28. Oktober 2012 03:27:14 UTC+1 schrieb :
    > Hello all,
    >
    >
    >
    > I am new to Python and have a problem with the behaviour of the xml parser. Assume we have this xml document:
    >
    >
    >
    > <?xml version="1.0" encoding="UTF-8"?>
    >
    > <bibliography>
    >
    > <entry>
    >
    > Title of the first book.
    >
    > </entry>
    >
    > <entry>
    >
    > <coauthored/>
    >
    > Title of the second book.
    >
    > </entry>
    >
    > </bibliography>
    >
    >
    >
    >
    >
    > If I now check for the text of all 'entry' nodes, the text for the node with the empty element isn't shown
    >
    >
    >
    >
    >
    >
    >
    > import xml.etree.ElementTree as ET
    >
    > tree = ET.ElementTree(file='test.xml')
    >
    > root = tree.getroot()
    >
    > resultSet = root.findall(".//entry")
    >
    > for r in resultSet:
    >
    > print (r.text)


    thanks a lot for your answer. as I am looking for a tool to teach using xml in programming it is a pity that this modul implements a very idiosyncratic view on xml data, but dom and sax are out there too, so I will look at them.
     
    , Oct 29, 2012
    #5
  6. Guest

    If someone comes across this posting with the same problem, the best answer seems to be:
    avoid Pythons xml.etree.ElementTree and use this library instead:
    http://lxml.de/
    It works like expected and supports xpath much better.
     
    , Oct 30, 2012
    #6
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Per Magnus L?vold
    Replies:
    0
    Views:
    1,422
    Per Magnus L?vold
    Nov 15, 2004
  2. John Smith
    Replies:
    3
    Views:
    2,011
    Roedy Green
    Sep 27, 2005
  3. WP
    Replies:
    5
    Views:
    1,433
  4. John Levine
    Replies:
    0
    Views:
    756
    John Levine
    Feb 2, 2012
  5. Erik Wasser
    Replies:
    5
    Views:
    499
    Peter J. Holzer
    Mar 5, 2006
Loading...

Share This Page