elementtree question

Tim Arnold · Sep 21, 2007

Hi, I'm using elementtree and elementtidy to work with some HTML files. For
some of these files I need to enclose the body content in a new div tag,
like this:
<body>
<div class="remapped">
original contents...
</div>
</body>

I figure there must be a way to do it by creating a 'div' SubElement to the
'body' tag and somehow copying the rest of the tree under that SubElement,
but it's beyond my comprehension.

How can I accomplish this?
(I know I could put the class on the body tag itself, but that won't satisfy
the powers-that-be).

thanks,
--Tim Arnold

Ivo · Sep 21, 2007

Tim said:
Hi, I'm using elementtree and elementtidy to work with some HTML files. For
some of these files I need to enclose the body content in a new div tag,
like this:
<body>
<div class="remapped">
original contents...
</div>
</body>

I figure there must be a way to do it by creating a 'div' SubElement to the
'body' tag and somehow copying the rest of the tree under that SubElement,
but it's beyond my comprehension.

How can I accomplish this?
(I know I could put the class on the body tag itself, but that won't satisfy
the powers-that-be).

thanks,
--Tim Arnold

You could also try something like this:

from sgmllib import SGMLParser
class IParse(SGMLParser):
def __init__(self, verbose=0):
SGMLParser.__init__(self, verbose)
self.data = ""
def _attr_to_str(self, attrs):
return ' '.join(['%s="%s"' % a for a in attrs])

def start_body(self, attrs):
self.data += "<body %s>" % self._attr_to_str(attrs)
print "remapping"
self.data += '''<div class="remapped">'''
def end_body(self):
self.data += "</div>" # end remapping
self.data += "</body>"
def handle_data(self, data):
self.data += data
def unknown_starttag(self, tag, attrs):
self.data+="<%s %s>" % (tag, self._attr_to_str(attrs),)
def unknown_endtag(self, tag):
self.data += "</%s>" % tag

if __name__=="__main__":
i = IParse()
i.feed('''
<html>
<body bgcolor="#fffff">
original
<i>italic</i>
<b class="test">contents</b>...
</body>
</html>''');

print i.data
i.close()

just look at the code from sgmllib (standard lib) and it is very easy to
make a parser. for some much needed refactoring

Gabriel Genellina · Sep 21, 2007

Hi, I'm using elementtree and elementtidy to work with some HTML files.
For
some of these files I need to enclose the body content in a new div tag,
like this:
<body>
<div class="remapped">
original contents...
</div>
</body>

I figure there must be a way to do it by creating a 'div' SubElement to
the
'body' tag and somehow copying the rest of the tree under that
SubElement,
but it's beyond my comprehension.

import xml.etree.ElementTree as ET
source = """<html><head><title>Test</title></head><body>
original contents... 2&3 <a href="hello/world">some text</a>
<p>Another paragraph</p>
</body></html>"""
tree = ET.XML(source)
body = tree.find("body")
newdiv = ET.Element('div', {'class':'remapped'})
newdiv.append(body)
bodyidx = tree.getchildren().index(body)
tree[bodyidx]=newdiv
ET.dump(tree)

Mark T · Sep 21, 2007

Gabriel Genellina said:
Hi, I'm using elementtree and elementtidy to work with some HTML files.
For
some of these files I need to enclose the body content in a new div tag,
like this:
<body>
<div class="remapped">
original contents...
</div>
</body>

I figure there must be a way to do it by creating a 'div' SubElement to
the
'body' tag and somehow copying the rest of the tree under that
SubElement,
but it's beyond my comprehension.

Click to expand...

import xml.etree.ElementTree as ET
source = """<html><head><title>Test</title></head><body>
original contents... 2&3 <a href="hello/world">some text</a>
<p>Another paragraph</p>
</body></html>"""
tree = ET.XML(source)
body = tree.find("body")
newdiv = ET.Element('div', {'class':'remapped'})
newdiv.append(body)
bodyidx = tree.getchildren().index(body)
tree[bodyidx]=newdiv
ET.dump(tree)

The above wraps the body element, not the contents of the body element. I'm
no ElementTree expert, but this seems to work:

import xml.etree.ElementTree as ET
source = """<html><head><title>Test</title></head><body>
original contents... 2&3 <a href="hello/world">some text</a>
<p>Another paragraph</p>
</body></html>"""
tree = ET.XML(source)
body = tree.find("body")
newdiv = ET.Element('div', {'class':'remapped'})
for e in body.getchildren():
newdiv.append(e)
newdiv.text = body.text
newdiv.tail = body.tail
body.clear()
body.append(newdiv)
ET.dump(tree)

Result:

<html><head><title>Test</title></head><body><div class="remapped">
original contents... 2&3 <a href="hello/world">some text</a>
<p>Another paragraph</p>
</div></body></html>

-Mark

Gabriel Genellina · Sep 24, 2007

[wrong code]

The above wraps the body element, not the contents of the body element.
I'm
no ElementTree expert, but this seems to work:

[better code]

Almost right. clear() removes all attributes too, so if the body element
had any attribute, it is lost. I would remove children from body at the
same time they're copied into newdiv.
(This whole thing appears to be harder than one would expect at first)

import xml.etree.cElementTree as ET
source = """<html><head><title>Test</title></head><body lang="en">
original contents... 2&3 <a href="hello/world">some text</a>
<p>Another paragraph</p>
</body></html>"""
tree = ET.XML(source)
body = tree.find("body")
newdiv = ET.Element('div', {'class':'remapped'})
for e in list(body.getchildren()):
newdiv.append(e)
body.remove(e)
newdiv.text, body.text = body.text, ''
newdiv.tail, body.tail = body.tail, ''
body.append(newdiv)
ET.dump(tree)

Stefan Behnel · Sep 24, 2007

Tim said:
Hi, I'm using elementtree and elementtidy to work with some HTML files. For
some of these files I need to enclose the body content in a new div tag,
like this:
<body>
<div class="remapped">
original contents...
</div>
</body>

Give lxml.etree (or lxml.html) a try:

tree = etree.parse("http://url.to/some.html", etree.HTMLParser())
body = tree.find("body")

and then:

div = etree.Element("div", {"class" : "remapped"})
div.extend(body)
body.append(div)

or alternatively:

children = list(body)
div = etree.SubElement(body, "div", {"class" : "remapped"})
div.extend(children)

http://codespeak.net/lxml/

and for lxml.html, which is currently in alpha status:

http://codespeak.net/lxml/dev/

ET 1.3 will also support the extend() function, BTW.

Stefan

Tim Arnold · Sep 24, 2007

Thanks for the great answers--I learned a lot. I'm looking forward to the ET
1.3 version. I'm currently working on some older HP10.20ux machines and
haven't been able to compile lxml all the way through yet.

thanks again,
--Tim Arnold

Fredrik Lundh · Sep 26, 2007

Stefan said:
ET 1.3 will also support the extend() function, BTW.

div.extend(seq) can be trivially rewritten as

div[len(div):] = seq

and in this case, you know that len(div) is 0, so you can simply do:

div[:] = seq

(this recent lxml habit of using lxml-specific versions of things that
are trivial to do with the standard API is a bit disappointing. kind of
defeats the purpose of having a standard API...)

</F>

Stefan Behnel · Sep 26, 2007

Fredrik said:
(this recent lxml habit of using lxml-specific versions of things that
are trivial to do with the standard API is a bit disappointing. kind of
defeats the purpose of having a standard API...)

ElementTree is not the only standard API that lxml is following. Another one
is the standard API of the "list" builtin type, which has an extend() method.

ah-you're-just-jealous-we-had-it-first-ly,

Stefan

Stefan Behnel · Sep 26, 2007

Tim said:
Thanks for the great answers--I learned a lot. I'm looking forward to the ET
1.3 version.

Note that there is a difference in behaviour, though. lxml.etree forces
Elements to be uniquely positioned in a tree, so the code I posted relies on
the "side effect" of automatically removing an Element from the old position
when inserting it at a different place. ElementTree does not do that, so this
code is not portable between the two libraries.

Stefan

Fredrik Lundh · Sep 26, 2007

Tim Arnold wrote:

I figure there must be a way to do it by creating a 'div' SubElement to the
'body' tag and somehow copying the rest of the tree under that SubElement,
but it's beyond my comprehension.

How can I accomplish this?
(I know I could put the class on the body tag itself, but that won't satisfy
the powers-that-be).

for completeness, here's an efficient and fairly straightforward way to
do it under plain 2.5 xml.etree:

body = doc.find(".//body")

# clone and mutate the body element
div = copy.copy(body)
div.tag = "div"
div.set("class", "remapped")

# replace the body contents with the new div
body.clear()
body[:] = [div]

</F>

insert comments into elementtree	2	Nov 16, 2007
simple ElementTree based parser that allows entity definition map	0	Dec 4, 2013
I am trying to make an audio player, how do I get the selected file to be playable?	5	Mar 29, 2022
elementtree w/utf8	6	Oct 25, 2007
import statement / ElementTree	1	Nov 4, 2005
Play mp3 on body load call.	1	Dec 10, 2024
using TreeBuilder in an ElementTree like way	0	Jun 28, 2006
Issues with XMLTreeBuilder in cElementTree and ElementTree	1	Mar 20, 2008

elementtree question

Tim Arnold

Ivo

Gabriel Genellina

Mark T

Gabriel Genellina

Stefan Behnel

Tim Arnold

Fredrik Lundh

Stefan Behnel

Stefan Behnel

Fredrik Lundh

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads