Parse and clean odt docs: with lxml ? hints to start ?

Thread starter kaer
Start date Jun 4, 2010

kaer

Jun 4, 2010

Basically, I have to upgrade a website with a lot of new content. I
received those docs in the openoffice format. If I open and save one
of those documents in the html format, I can cut and paste the result
in the html page, it's not that bad as a start but I need to clean
that html (remove tags, remove or change attributes, ...). My first
idea is to use lxml for that. My questions:
- is there a better way ?
- is lxml the right tool for that ?
- some examples of code for doing that ?

Have a nice day.

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads

To install lxml (and easy_install) for Python 3 under Windows...	1	Jun 12, 2011
CDATA and lxml	5	Apr 11, 2008
csv read clean up and write out to csv	2	Nov 2, 2012
Using lxml to screen scrap a site, problem with charset	2	Feb 2, 2009
Where to start?	1	Dec 1, 2017
[ANN] lxml 1.0 released	2	Jun 2, 2006
opinion: comp lang docs style	10	Jan 4, 2011
Noob trying to parse bad HTML using xml.etree.ElementTree	0	Dec 30, 2012

Facebook Twitter Reddit Pinterest Tumblr WhatsApp Email Link

Members online

No members online now.

Total: 42 (members: 0, guests: 42)
Robots: 204

Forum statistics

Threads: 473,767

Messages: 2,569,570

Members: 45,045

Latest member: DRCM

Latest Threads

Reverse search for a website
- Started by DRCM
- 40 minutes ago
Sign Certificate, Library jsrsasign-latest-all-min.js using function KJUR.jws.JWS.sign('PS256')
- Started by icassiem
- Today at 8:29 AM
Sign Certificate, Library jsrsasign-latest-all-min.js using function KJUR.jws.JWS.sign('PS256')
- Started by icassiem
- Today at 8:23 AM
What are the key advantages of using a SaaS (Software as a Service) model for application development?
- Started by remotedevelopers
- Yesterday at 12:34 PM
How to build a database-driven web page
- Started by av3mar1a153
- Monday at 5:24 PM
Hola
- Started by luuciefer
- Monday at 2:24 AM
Using a DTSX file with GoDaddy
- Started by IBMJunkman
- Sunday at 8:33 PM
Exit the infinity while loop by pressing the button and continue with the switch element.
- Started by NexaHn
- Sunday at 7:06 PM
Hello Everyone
- Started by welly
- Sunday at 5:03 PM
Problem with code
- Started by camilin05
- Saturday at 6:27 PM

Top