get element text in DOM?

Discussion in 'Python' started by Juliano Freitas, Nov 10, 2004.

  1. How can i get the text between the <teste> tags??

    >>> xml = """<root><teste> texto </teste></root>"""
    >>> from xml.dom import minidom
    >>> document = minidom.parseString(xml)
    >>> document

    <xml.dom.minidom.Document instance at 0x4181df0c>
    >>> minidom.getElementsByTagName('teste')


    >>> element = document.getElementsByTagName('teste')
    >>> element

    [<DOM Element: teste at 0x418e110c>]
    >>> element[0].nodeType

    1

    Juliano Freitas
    Juliano Freitas, Nov 10, 2004
    #1
    1. Advertising

  2. Juliano Freitas wrote:
    > How can i get the text between the <teste> tags??
    >
    >
    >>>>xml = """<root><teste> texto </teste></root>"""


    You must know that the text between the tags is a DOM element
    by itself, namely a TEXT node, which is a child of the
    elment node formed by the tag.

    So try;

    xml = """<root><teste> texto </teste></root>"""
    from xml.dom import minidom
    document = minidom.parseString(xml)
    element = document.getElementsByTagName('teste')
    textelt=element[0].firstChild
    print textelt.nodeType, textelt.nodeValue

    and it will print:

    3 texto

    --Irmen
    Irmen de Jong, Nov 10, 2004
    #2
    1. Advertising

  3. Juliano Freitas

    Uche Ogbuji Guest

    Juliano Freitas <> wrote in message news:<>...
    > How can i get the text between the <teste> tags??
    >
    > >>> xml = """<root><teste> texto </teste></root>"""
    > >>> from xml.dom import minidom
    > >>> document = minidom.parseString(xml)
    > >>> document

    > <xml.dom.minidom.Document instance at 0x4181df0c>
    > >>> minidom.getElementsByTagName('teste')

    >
    > >>> element = document.getElementsByTagName('teste')
    > >>> element

    > [<DOM Element: teste at 0x418e110c>]
    > >>> element[0].nodeType

    > 1
    >
    > Juliano Freitas


    http://lists.fourthought.com/pipermail/4suite/2004-November/013027.html

    Verbatim:

    """
    Or, ObTopic, for 4Suite recent CVS:

    >>> from Ft.Xml.Domlette import NonvalidatingReader
    >>> doc = NonvalidatingReader.parseString("<root><teste> texto

    </teste></root>", 'urn:dummy')
    >>> print doc.xpath('string(/root/teste)')

    texto

    Simple and sweet IMHO.
    """

    --
    Uche Ogbuji Fourthought, Inc.
    http://uche.ogbuji.net http://4Suite.org http://fourthought.com
    A hands-on introduction to ISO Schematron -
    http://www-106.ibm.com/developerworks/edu/x-dw-xschematron-i.html
    Schematron abstract patterns -
    http://www.ibm.com/developerworks/xml/library/x-stron.html
    Wrestling HTML (using Python) -
    http://www.xml.com/pub/a/2004/09/08/pyxml.html
    XML's growing pains - http://www.adtmag.com/article.asp?id=10196
    XMLOpen and more XML Hacks -
    http://www.ibm.com/developerworks/xml/library/x-think27.html
    A survey of XML standards -
    http://www-106.ibm.com/developerworks/xml/library/x-stand4/
    Uche Ogbuji, Nov 12, 2004
    #3
  4. On Wed, 10 Nov 2004 17:11:09 -0200, Juliano Freitas
    <> wrote:

    >How can i get the text between the <teste> tags??
    >
    >>>> xml = """<root><teste> texto </teste></root>"""
    >>>> from xml.dom import minidom
    >>>> document = minidom.parseString(xml)
    >>>> document

    ><xml.dom.minidom.Document instance at 0x4181df0c>
    >>>> minidom.getElementsByTagName('teste')

    >
    >>>> element = document.getElementsByTagName('teste')
    >>>> element

    >[<DOM Element: teste at 0x418e110c>]
    >>>> element[0].nodeType

    >1
    >



    Here is an useful function I have written:

    def getText(node, recursive = False):
    """
    Get all the text associated with this node.
    With recursive == True, all text from child nodes is retrieved
    """
    L = ['']
    for n in node.childNodes:
    if n.nodeType in (dom.Node.TEXT_NODE,
    dom.Node.CDATA_SECTION_NODE):
    L.append(n.data)
    else:
    if not recursive:
    return None
    L.append( get_text(n) )

    return ''.join(L)



    >>> print getText(element[0])





    Regards Manlio Perillo
    Manlio Perillo, Nov 13, 2004
    #4
  5. Manlio Perillo <> wrote:

    > for n in node.childNodes:
    > if n.nodeType in (dom.Node.TEXT_NODE, dom.Node.CDATA_SECTION_NODE):


    (Aside: node.TEXT_NODE would probably be better here. Can't guarantee
    that a DOM's implementation of the 'Node' interface is available as a
    class called 'Node' inside its module.)

    > L.append(n.data)
    > else:
    > if not recursive:
    > return None


    Surely 'continue'? This will exit the function (returning None instead
    of the expected empty string) the first time a non-Text node is met.

    Incidentally, DOM Level 3 Core defines the property 'textContent' to
    return pretty much exactly this (although it removes the ignorable
    whitespace). Not in minidom yet, but... <insert usual plug here>

    --
    Andrew Clover
    mailto:
    http://www.doxdesk.com/
    Andrew Clover, Nov 14, 2004
    #5
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. HANM
    Replies:
    2
    Views:
    683
    Joseph Kesselman
    Jan 29, 2008
  2. RC
    Replies:
    3
    Views:
    353
    Roger Lindsjö
    Aug 27, 2008
  3. Robert Oschler

    Fast way to find a text node element in DOM tree?

    Robert Oschler, Aug 29, 2005, in forum: Javascript
    Replies:
    3
    Views:
    122
    Martin Honnen
    Aug 29, 2005
  4. ted benedict
    Replies:
    3
    Views:
    140
  5. Andrew Poulos

    get text from dom element

    Andrew Poulos, Nov 6, 2008, in forum: Javascript
    Replies:
    1
    Views:
    126
    Andrew Poulos
    Nov 6, 2008
Loading...

Share This Page