RE: Easy way to remove HTML entities from an HTML document?

Discussion in 'Python' started by Robert Brewer, Jul 25, 2004.

  1. Robert Oschler wrote:
    > Is there a module/function to remove all the HTML entities
    > from an HTML document (e.g. - &nbsp, &amp, &apos, etc.)?


    Grab cleanhtml.py from the bottom of
    http://www.aminus.org/rbre/python/index.html -- you should be able to
    quickly rewrite the Plaintext class and just limit it to replacing (or
    removing) entities--at least the regex is already written for you.

    HTH!


    Robert Brewer
    MIS
    Amor Ministries
     
    Robert Brewer, Jul 25, 2004
    #1
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Kent Tong
    Replies:
    3
    Views:
    390
    Dimitre Novatchev
    Feb 20, 2004
  2. Don Hiatt
    Replies:
    3
    Views:
    1,644
    Terry Reedy
    Jul 24, 2003
  3. Robert Oschler
    Replies:
    8
    Views:
    768
    Christopher T King
    Jul 31, 2004
  4. Geoff Wilkins

    document.write, HTML entities and IE

    Geoff Wilkins, Oct 12, 2003, in forum: Javascript
    Replies:
    2
    Views:
    206
  5. Jim Higson
    Replies:
    3
    Views:
    241
    Eric Amick
    Jul 25, 2004
Loading...

Share This Page