Parsing HTML with JavaScript

Discussion in 'Python' started by mtfulmer@tacobell.land, May 13, 2005.

  1. Guest

    I am trying to extract some information from a few web pages, and I was
    using the HTMLParser module. It worked fine until it got to the
    javascript, at which it gave a parse error. Is there a good way to work
    around this or should I just preparse the file to remove the javascript
    manually? This is my first python program.
     
    , May 13, 2005
    #1
    1. Advertising

  2. <> wrote in message news:...

    > I am trying to extract some information from a few web pages, and I was
    > using the HTMLParser module. It worked fine until it got to the
    > javascript, at which it gave a parse error.


    It's fairly common for pages with Javascript to also be invalid HTML.
    HTMLParser isn't an 'ignore all errors silently and guess what it's
    meant to be' parser. Unless you have known good inputs it's often
    best to use an alternative. Some options are discussed in Uche's article
    here: http://www.xml.com/pub/a/2004/09/08/pyxml.html
     
    Richard Brodie, May 13, 2005
    #2
    1. Advertising

  3. John J. Lee Guest

    writes:

    > I am trying to extract some information from a few web pages, and I was
    > using the HTMLParser module. It worked fine until it got to the
    > javascript, at which it gave a parse error. Is there a good way to work
    > around this or should I just preparse the file to remove the javascript
    > manually? This is my first python program.


    sgmllib is very similar to HTMLParser, but doesn't break so easily
    (but sgmllib has some problems with XHTML -- swings and roundabouts).

    Or, try BeautifulSoup.


    John
     
    John J. Lee, May 13, 2005
    #3
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Arjun  Guha

    Parsing HTML and JavaScript in Java

    Arjun Guha, Nov 6, 2006, in forum: Java
    Replies:
    0
    Views:
    403
    Arjun Guha
    Nov 6, 2006
  2. parkurm
    Replies:
    1
    Views:
    507
    Pierre Lecocq
    Feb 2, 2010
  3. Replies:
    7
    Views:
    1,440
  4. Ninja Li

    Parsing HTML with HTML::TableExtract

    Ninja Li, Nov 27, 2009, in forum: Perl Misc
    Replies:
    2
    Views:
    252
    Martien Verbruggen
    Nov 28, 2009
  5. Ninja Li

    Parsing HTML with HTML::Tree

    Ninja Li, Mar 1, 2010, in forum: Perl Misc
    Replies:
    1
    Views:
    169
    Ninja Li
    Mar 1, 2010
Loading...

Share This Page