Screenscraping, in python, a web page that requires javascript?

  • Thread starter Dan Stromberg - Datallegro
  • Start date
D

Dan Stromberg - Datallegro

Is there a method, with python, of screenscraping a web page, if that web
page uses javascript?

I know about BeautifulSoup, but AFAIK at this time, BeautifulSoup is for
HTML that doesn't have embedded javascript.

Thanks!
 
J

John J. Lee

Dan Stromberg - Datallegro said:
Is there a method, with python, of screenscraping a web page, if that web
page uses javascript?

Not pure CPython, no.

I know about BeautifulSoup, but AFAIK at this time, BeautifulSoup is for
HTML that doesn't have embedded javascript.

It's not that BeautifulSoup is unhappy with JS, it's just that there's
no support for executing the JS.

There are some Java libraries that know how to execute JS embedded in
web pages, which could be used from Jython:

http://www.thefrontside.net/crosscheck

http://htmlunit.sourceforge.net/

http://httpunit.sourceforge.net/


You can also automate a browser, but that still seems to be painful in
one way or another.


John
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,007
Latest member
obedient dusk

Latest Threads

Top