Behaviour of htmllib's HTML parser and formatter

M

Morten W. Petersen

Hi,

I have an HTML page that displays some content, and a part of that
content is HTML changed into regular text. The encoding of the page
is UTF-8.

Here's the code that makes the change (the HTML in self.contents is
UTF-8 encoded):

file = cStringIO.StringIO()
parser = htmllib.HTMLParser(formatter.AbstractFormatter(
formatter.DumbWriter(file=file)))
parser.feed(self.contents)
parser.close()
data = file.getvalue()[:size]
return return data

This renders entities such as   as black diamonds with a ? sign
in them in Firefox, so I guess something is going wrong along the way.
Any suggestions what it might be?

Thanks,

Morten
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top