Forums
New posts
Search forums
Members
Current visitors
Log in
Register
What's new
Search
Search
Search titles only
By:
New posts
Search forums
Menu
Log in
Register
Install the app
Install
Forums
Archive
Archive
Python
BeautifulSoup bug when ">>>" found in attribute value
JavaScript is disabled. For a better experience, please enable JavaScript in your browser before proceeding.
You are using an out of date browser. It may not display this or other websites correctly.
You should upgrade or use an
alternative browser
.
Reply to thread
Message
[QUOTE="Duncan Booth, post: 2051336"] I don't think I would quibble with what BeautifulSoup extracted from that mess. The input isn't valid HTML so any output has to be guessing at what was meant. A lot of code for parsing html would assume that there was a quote missing and the tag was terminated by the first '>'. IE and Firefox seem to assume that the '>' is allowed inside the attribute. BeautifulSoup seems to have given you the best of both worlds: the attribute is parsed to the closing quote, but the tag itself ends at the first '>'. As for inserting a semicolon after linkurl, I think you'll find it is just being nice and cleaning up an unterminated entity. Browsers (or at least IE) will often accept entities without the terminating semicolon, so that's a common problem in badly formed html that BeautifulSoup can fix. [/QUOTE]
Verification
Post reply
Forums
Archive
Archive
Python
BeautifulSoup bug when ">>>" found in attribute value
Top