Behaviour of htmllib's HTML parser and formatter

Discussion in 'Python' started by Morten W. Petersen, Mar 11, 2005.

  1. Hi,

    I have an HTML page that displays some content, and a part of that
    content is HTML changed into regular text. The encoding of the page
    is UTF-8.

    Here's the code that makes the change (the HTML in self.contents is
    UTF-8 encoded):

    file = cStringIO.StringIO()
    parser = htmllib.HTMLParser(formatter.AbstractFormatter(
    formatter.DumbWriter(file=file)))
    parser.feed(self.contents)
    parser.close()
    data = file.getvalue()[:size]
    return return data

    This renders entities such as   as black diamonds with a ? sign
    in them in Firefox, so I guess something is going wrong along the way.
    Any suggestions what it might be?

    Thanks,

    Morten
     
    Morten W. Petersen, Mar 11, 2005
    #1
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. KC
    Replies:
    8
    Views:
    476
  2. Achim Domma

    htmllib.HTMLParser and unicode

    Achim Domma, Sep 17, 2003, in forum: Python
    Replies:
    0
    Views:
    489
    Achim Domma
    Sep 17, 2003
  3. jennyw
    Replies:
    7
    Views:
    396
    Dennis Lee Bieber
    Nov 6, 2003
  4. A.M-SG

    Switching from XML formatter to Binary Formatter

    A.M-SG, Nov 21, 2005, in forum: ASP .Net Web Services
    Replies:
    1
    Views:
    351
    Steven Cheng[MSFT]
    Nov 22, 2005
  5. Tiaburn Stedd

    Strange logging.Formatter behaviour

    Tiaburn Stedd, Nov 22, 2011, in forum: Python
    Replies:
    1
    Views:
    164
    Vinay Sajip
    Nov 22, 2011
Loading...

Share This Page