Network failure when using urllib2

jdvolz · Jan 8, 2007

I have a script that uses urllib2 to repeatedly lookup web pages (in a
spider sort of way). It appears to function normally, but if it runs
too long I start to get 404 responses. If I try to use the internet
through any other programs (Outlook, FireFox, etc.) it will also fail.
If I stop the script, the internet returns.

Has anyone observed this behavior before? I am relatively new to
Python and would appreciate any suggestions.

Shuad

Ravi Teja · Jan 8, 2007

I have a script that uses urllib2 to repeatedly lookup web pages (in a
spider sort of way). It appears to function normally, but if it runs
too long I start to get 404 responses. If I try to use the internet
through any other programs (Outlook, FireFox, etc.) it will also fail.
If I stop the script, the internet returns.

Has anyone observed this behavior before? I am relatively new to
Python and would appreciate any suggestions.

Shuad

I am assuming that you are fetching the full page every little while.
You are not supposed to do that. The admin of the web site you are
constantly hitting probably configured his server to block you
temporarily when that happens. But don't feel bad

. This is a common
Beginners mistake.

Read here on the proper way to do this.
http://diveintopython.org/http_web_services/review.html
especially 11.3.3. Last-Modified/If-Modified-Since in the next page

Ravi Teja.

jdvolz · Jan 8, 2007

I am fetching different web pages (never the same one) from a web
server. Does that make a difference with them trying to block me?
Also, if it was only that site blocking me, then why does the internet
not work in other programs when this happens in the script. It is
almost like something is seeing a lot of traffic from my computer, and
cutting it off thinking it is some kind of virus or worm. I am
starting to suspect my firewall. Anyone else have this happen?

I am going to read over that documentation you suggested to see if I
can get any ideas. Thanks for the link.

Shuad

Gabriel Genellina · Jan 8, 2007

At said:
I am fetching different web pages (never the same one) from a web
server. Does that make a difference with them trying to block me?
Also, if it was only that site blocking me, then why does the internet
not work in other programs when this happens in the script. It is
almost like something is seeing a lot of traffic from my computer, and
cutting it off thinking it is some kind of virus or worm. I am
starting to suspect my firewall. Anyone else have this happen?

Perhaps you're not closing connections once finished?
Try netstat -an from the command line and see how many open
connections you have.

--
Gabriel Genellina
Softlab SRL

__________________________________________________
Preguntá. Respondé. Descubrí.
Todo lo que querías saber, y lo que ni imaginabas,
está en Yahoo! Respuestas (Beta).
¡Probalo ya!
http://www.yahoo.com.ar/respuestas

Ravi Teja · Jan 8, 2007

I am fetching different web pages (never the same one) from a web
server. Does that make a difference with them trying to block me?
Also, if it was only that site blocking me, then why does the internet
not work in other programs when this happens in the script. It is
almost like something is seeing a lot of traffic from my computer, and
cutting it off thinking it is some kind of virus or worm. I am
starting to suspect my firewall. Anyone else have this happen?

I am going to read over that documentation you suggested to see if I
can get any ideas. Thanks for the link.

Shuad

No! What I suggested should not effect traffic from other servers. I
would go with Gabriel's suggestion and check for open connections just
in case. Although I can't imagine why that would give you a 404
response since it is a server response (implies successful connection).
I would expect that you would get a client error in such a case.

Of course, you can always rule out your suspicions of local conditions
(turn off security software briefly or try from a different machine)
unless your ISP is implementing safeguards against DOS attacks from
their network with normal users in mind.

Ravi Teja.

Using asyncio in event-driven network library	0	Dec 23, 2013
Sending Error when attaching files	1	Aug 7, 2023
url2lib (windows 7) does not notice when network reconnects (getaddrinfoproblem)	1	Mar 17, 2010
Progressive download with Urllib2.	0	Dec 5, 2008
urllib2 hangs "forever" where there is no network interface	4	Feb 1, 2007
urllib2.urlopen(url) pulling something other than HTML	7	Aug 20, 2007
how to force HTTP 1.1 when using urllib2?	0	Dec 21, 2004
% in POST when using URLLIB2.URLOPEN with PROXY	1	Jul 14, 2004

Network failure when using urllib2

jdvolz

Ravi Teja

jdvolz

Gabriel Genellina

Ravi Teja

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads