crawlers in python with graphing?

B

bryan rasmussen

Hi

I'm wondering if there is a toolkit in python anywhere for doing at a
high level web crawling, dumping links to a data set that could be
imported into R relatively easy or used in python natively to generate
a graph over the website.


It should hopefully be as high level as Wget, not download the pages
but just follow the links, and output graphs.



Cheers
Bryan Rasmussen
 
M

Marc 'BlackJack' Rintsch

bryan rasmussen said:
It should hopefully be as high level as Wget, not download the pages
but just follow the links, and output graphs.

How do you get at the links without downloading the page!?

Ciao,
Marc 'BlackJack' Rintsch
 
B

bryan rasmussen

Hi,

Sorry, was imprecise, I meant not save the downloaded page locally.
There probably isn't one though, so I should build one myself.
Probably just need a good crawler that can be set to dump all links
into dataset that I can analyse with R.

Cheers,
Bryan Rasmussen
 
G

George Sakkis

bryan said:
Hi,

Sorry, was imprecise, I meant not save the downloaded page locally.
There probably isn't one though, so I should build one myself.
Probably just need a good crawler that can be set to dump all links
into dataset that I can analyse with R.

Cheers,
Bryan Rasmussen

Harvestman (http://harvestman.freezope.org/) is your best bet.

George
 
G

gene tani

bryan said:
Hi,

Sorry, was imprecise, I meant not save the downloaded page locally.
There probably isn't one though, so I should build one myself.
Probably just need a good crawler that can be set to dump all links
into dataset that I can analyse with R.

Cheers,
Bryan Rasmussen

There's quite a few already: webchecker, Orchid, mechanize, mygale:

http://codesnipers.com/?q=node/223&&title=Detecting-Dead-Links

http://pxr.openlook.org/pxr/source/Tools/webchecker/

http://sig.levillage.org/?p=599
http://www.robertblum.com/articles/2005/11/21/challenge-map-i-python-web-scraping
http://www.rexx.com/~dkuhlman/quixote_htmlscraping.html
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,012
Latest member
RoxanneDzm

Latest Threads

Top