Parse and clean odt docs: with lxml ? hints to start ?

Discussion in 'Python' started by kaer, Jun 4, 2010.

  1. kaer

    kaer Guest

    Basically, I have to upgrade a website with a lot of new content. I
    received those docs in the openoffice format. If I open and save one
    of those documents in the html format, I can cut and paste the result
    in the html page, it's not that bad as a start but I need to clean
    that html (remove tags, remove or change attributes, ...). My first
    idea is to use lxml for that. My questions:
    - is there a better way ?
    - is lxml the right tool for that ?
    - some examples of code for doing that ?

    Have a nice day.
    kaer, Jun 4, 2010
    #1
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Katja Suess

    odt -> pdf

    Katja Suess, Feb 21, 2006, in forum: Python
    Replies:
    2
    Views:
    406
  2. KYG
    Replies:
    2
    Views:
    859
    Ian Collins
    Aug 18, 2008
  3. Replies:
    2
    Views:
    440
  4. Joel VanderWerf
    Replies:
    0
    Views:
    84
    Joel VanderWerf
    Jan 30, 2006
Loading...

Share This Page