S
Sakcee
html =
'<html><!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
<head></head> <body bgcolor=#ffffff>\r\n Foo foo , blah blah
Traceback (most recent call last):
File "<stdin>", line 1, in ?
File "/usr/lib/python2.4/sgmllib.py", line 95, in feed
self.goahead(0)
File "/usr/lib/python2.4/sgmllib.py", line 165, in goahead
k = self.parse_declaration(i)
File "/usr/lib/python2.4/markupbase.py", line 132, in parse_declaration
self.error(
File "/usr/lib/python2.4/htmllib.py", line 40, in error
raise HTMLParseError(message)
htmllib.HTMLParseError: unexpected '<' char in declaration
the error is generated by unclosed DOCTYPE declaration
what is the best way to handle this kind of document. should I use
regex to check and strip, or does HTMLParser offers something? , can i
override default sgmllib behaviour
I have to work with this htmllib because of existing modules .
thanks
'<html><!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
<head></head> <body bgcolor=#ffffff>\r\n Foo foo , blah blah
Traceback (most recent call last):
File "<stdin>", line 1, in ?
File "/usr/lib/python2.4/sgmllib.py", line 95, in feed
self.goahead(0)
File "/usr/lib/python2.4/sgmllib.py", line 165, in goahead
k = self.parse_declaration(i)
File "/usr/lib/python2.4/markupbase.py", line 132, in parse_declaration
self.error(
File "/usr/lib/python2.4/htmllib.py", line 40, in error
raise HTMLParseError(message)
htmllib.HTMLParseError: unexpected '<' char in declaration
the error is generated by unclosed DOCTYPE declaration
what is the best way to handle this kind of document. should I use
regex to check and strip, or does HTMLParser offers something? , can i
override default sgmllib behaviour
I have to work with this htmllib because of existing modules .
thanks