L
Luis G.
Hi there...
I've been playing with Ruby and Nokogiri to crawl some website to get
text information, but after a while I realize that some of those
websites block my access while the script is running. Since the moment
they block the access, the script keeps running (cause I handled the
exception) but is getting what is suppose to.
After the block, if I try to access using the browser, I just can't, so
I guess they block the IP address, right?
I also tried using TOR, like this:
Nokogiri::HTML(open(url, roxy => 'http://(ip_address):(port)'))
But I still have the same problem: works in the beginning, but after a
while stops working.
I can just run the crawler in steps, to not do lots of calls to the
website in the same moment, but is kinda boring...
Any of you face the same problem? Any of you have a solution for this?
thanks,
Luis
I've been playing with Ruby and Nokogiri to crawl some website to get
text information, but after a while I realize that some of those
websites block my access while the script is running. Since the moment
they block the access, the script keeps running (cause I handled the
exception) but is getting what is suppose to.
After the block, if I try to access using the browser, I just can't, so
I guess they block the IP address, right?
I also tried using TOR, like this:
Nokogiri::HTML(open(url, roxy => 'http://(ip_address):(port)'))
But I still have the same problem: works in the beginning, but after a
while stops working.
I can just run the crawler in steps, to not do lots of calls to the
website in the same moment, but is kinda boring...
Any of you face the same problem? Any of you have a solution for this?
thanks,
Luis