Parsing HTML with JavaScript

mtfulmer · May 13, 2005

I am trying to extract some information from a few web pages, and I was
using the HTMLParser module. It worked fine until it got to the
javascript, at which it gave a parse error. Is there a good way to work
around this or should I just preparse the file to remove the javascript
manually? This is my first python program.

Richard Brodie · May 13, 2005

I am trying to extract some information from a few web pages, and I was
using the HTMLParser module. It worked fine until it got to the
javascript, at which it gave a parse error.

It's fairly common for pages with Javascript to also be invalid HTML.
HTMLParser isn't an 'ignore all errors silently and guess what it's
meant to be' parser. Unless you have known good inputs it's often
best to use an alternative. Some options are discussed in Uche's article
here: http://www.xml.com/pub/a/2004/09/08/pyxml.html

John J. Lee · May 13, 2005

I am trying to extract some information from a few web pages, and I was
using the HTMLParser module. It worked fine until it got to the
javascript, at which it gave a parse error. Is there a good way to work
around this or should I just preparse the file to remove the javascript
manually? This is my first python program.

sgmllib is very similar to HTMLParser, but doesn't break so easily
(but sgmllib has some problems with XHTML -- swings and roundabouts).

Or, try BeautifulSoup.

John

HTML Parser	3	Jul 2, 2013
Canvas drawing HTML Javascript on elementor	1	Feb 22, 2023
crawling/parsing a webpage based on dynamic javascript	0	Aug 18, 2013
HTMLParser not parsing whole html file	4	Oct 24, 2010
Check forms With JavaScript	1	Mar 28, 2023
insert html into ElementTree without parsing it	1	Feb 24, 2014
Stuck with html and css	25	Dec 14, 2022
HTML File Parsing	3	Oct 28, 2008

Parsing HTML with JavaScript

mtfulmer

Richard Brodie

John J. Lee

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads