Getting elements and text with lxml

  • Thread starter J. Pablo Fernández
  • Start date
J

J. Pablo Fernández

Hello,

I have an XML file that starts with:

<vortaro>
<art mrk="$Id: a.xml,v 1.10 2007/09/11 16:30:20 revo Exp $">
<kap>
<ofc>*</ofc>-<rad>a</rad>
</kap>

out of it, I'd like to extract something like (I'm just showing one
structure, any structure as long as all data is there is fine):

[("ofc", "*"), "-", ("rad", "a")]

How can I do it? I managed to get the content of boths tags and the
text up to the first tag ("\n "), but not the - (and in other XML
files, there's more text outside the elements).

Thanks.
 
G

Gabriel Genellina

Hello,

I have an XML file that starts with:

<vortaro>
<art mrk="$Id: a.xml,v 1.10 2007/09/11 16:30:20 revo Exp $">
<kap>
<ofc>*</ofc>-<rad>a</rad>
</kap>

out of it, I'd like to extract something like (I'm just showing one
structure, any structure as long as all data is there is fine):

[("ofc", "*"), "-", ("rad", "a")]

How can I do it? I managed to get the content of boths tags and the
text up to the first tag ("\n "), but not the - (and in other XML
files, there's more text outside the elements).

Look for the "tail" attribute.
 
J

J. Pablo Fernández

En Fri, 16 May 2008 18:53:03 -0300, J. Pablo Fernández <[email protected]>  
escribió:


I have an XML file that starts with:
<vortaro>
<art mrk="$Id: a.xml,v 1.10 2007/09/11 16:30:20 revo Exp $">
<kap>
  <ofc>*</ofc>-<rad>a</rad>
</kap>
out of it, I'd like to extract something like (I'm just showing one
structure, any structure as long as all data is there is fine):
[("ofc", "*"), "-", ("rad", "a")]
How can I do it? I managed to get the content of boths tags and the
text up to the first tag ("\n   "), but not the - (and in other XML
files, there's more text outside the elements).

Look for the "tail" attribute.

That gives me the last part, but not the one in the middle:

In : etree.tounicode(e)
Out: u'<kap>\n <ofc>*</ofc>-<rad>a</rad>\n</kap>\n'

In : e.text
Out: '\n '

In : e.tail
Out: '\n'

Thanks.
 
J

John Machin

J. Pablo Fernández said:
En Fri, 16 May 2008 18:53:03 -0300, J. Pablo Fernández <[email protected]>
escribió:


Hello,
I have an XML file that starts with:
<vortaro>
<art mrk="$Id: a.xml,v 1.10 2007/09/11 16:30:20 revo Exp $">
<kap>
<ofc>*</ofc>-<rad>a</rad>
</kap>
out of it, I'd like to extract something like (I'm just showing one
structure, any structure as long as all data is there is fine):
[("ofc", "*"), "-", ("rad", "a")]
How can I do it? I managed to get the content of boths tags and the
text up to the first tag ("\n "), but not the - (and in other XML
files, there's more text outside the elements).
Look for the "tail" attribute.

That gives me the last part, but not the one in the middle:

In : etree.tounicode(e)
Out: u'<kap>\n <ofc>*</ofc>-<rad>a</rad>\n</kap>\n'

In : e.text
Out: '\n '

In : e.tail
Out: '\n'

You need the text content of your initial element's children, which
needs that of their children, and so on.

See http://effbot.org/zone/element-bits-and-pieces.htm

HTH,
John
 
S

Stefan Behnel

J. Pablo Fernández said:
I have an XML file that starts with:

<vortaro>
<art mrk="$Id: a.xml,v 1.10 2007/09/11 16:30:20 revo Exp $">
<kap>
<ofc>*</ofc>-<rad>a</rad>
</kap>

out of it, I'd like to extract something like (I'm just showing one
structure, any structure as long as all data is there is fine):

[("ofc", "*"), "-", ("rad", "a")]
>>> root = etree.fromstring(xml)
>>> l = []
>>> for el in root.iter(): # or root.getiterator()
... l.append((el, el.text))
... l.append(el.text)

or maybe this is enough:

list(root.itertext())

Stefan
 
J

J. Pablo Fernández

J. Pablo Fernández said:
I have an XML file that starts with:
<vortaro>
<art mrk="$Id: a.xml,v 1.10 2007/09/11 16:30:20 revo Exp $">
<kap>
  <ofc>*</ofc>-<rad>a</rad>
</kap>
out of it, I'd like to extract something like (I'm just showing one
structure, any structure as long as all data is there is fine):
[("ofc", "*"), "-", ("rad", "a")]

    >>> root = etree.fromstring(xml)
    >>> l = []
    >>> for el in root.iter():    # or root.getiterator()
    ...     l.append((el, el.text))
    ...     l.append(el.text)

or maybe this is enough:

    list(root.itertext())

Stefan

Hello,

My object doesn't have iter() or itertext(), it only has:
iterancestors, iterchildren, iterdescendants, itersiblings.

Thanks.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,579
Members
45,053
Latest member
BrodieSola

Latest Threads

Top