J
John Nagle
The syntax that browsers understand as HTML comments is much less
restrictive than what BeautifulSoup understands. I keep running into
sites with formally incorrect HTML comments which are parsed happily
by browsers. Here's yet another example, this one from
"http://www.webdirectory.com". The page starts like this:
<!Hello there! Welcome to The Environment Directory!>
<!Not too much exciting HTML code here but it does the job! >
<!See ya, - JD >
<HTML><HEAD>
<TITLE>Environment Web Directory</TITLE>
Those are, of course, invalid HTML comments. But Firefox, IE, etc. handle them
without problems.
BeautifulSoup can't parse this page usefully at all.
It treats the entire page as a text chunk. It's actually
HTMLParser that parses comments, so this is really an HTMLParser
level problem.
John Nagle
restrictive than what BeautifulSoup understands. I keep running into
sites with formally incorrect HTML comments which are parsed happily
by browsers. Here's yet another example, this one from
"http://www.webdirectory.com". The page starts like this:
<!Hello there! Welcome to The Environment Directory!>
<!Not too much exciting HTML code here but it does the job! >
<!See ya, - JD >
<HTML><HEAD>
<TITLE>Environment Web Directory</TITLE>
Those are, of course, invalid HTML comments. But Firefox, IE, etc. handle them
without problems.
BeautifulSoup can't parse this page usefully at all.
It treats the entire page as a text chunk. It's actually
HTMLParser that parses comments, so this is really an HTMLParser
level problem.
John Nagle