htmllib.HTMLParser and unicode

A

Achim Domma

Hi,

should the HTMLParser be able to handle unicode input? I get the following
traceback:

self.feed(self.data)
File "C:\Python23\lib\sgmllib.py", line 94, in feed
self.goahead(0)
File "C:\Python23\lib\sgmllib.py", line 183, in goahead
self.handle_entityref(name)
File "C:\Python23\lib\sgmllib.py", line 390, in handle_entityref
self.handle_data(table[name])
File "C:\Python23\lib\htmllib.py", line 49, in handle_data
self.savedata = self.savedata + data
UnicodeDecodeError: 'ascii' codec can't decode byte 0xa0 in position 0:
ordinal not in range(128)

The input is a html page from the web, encoded as utf8. I converted the
string via data.decode('utf8'). The result is passed to the feed function.

regards,
Achim
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top