How to test a URL request in a "while True" loop


Brian D

I'm actually using mechanize, but that's too complicated for testing
purposes. Instead, I've simulated in a urllib2 sample below an attempt
to test for a valid URL request.

I'm attempting to craft a loop that will trap failed attempts to
request a URL (in cases where the connection intermittently fails),
and repeat the URL request a few times, stopping after the Nth attempt
is tried.

Specifically, in the example below, a bad URL is requested for the
first and second iterations. On the third iteration, a valid URL will
be requested. The valid URL will be requested until the 5th iteration,
when a break statement is reached to stop the loop. The 5th iteration
also restores the values to their original state for ease of repeat

What I don't understand is how to test for a valid URL request, and
then jump out of the "while True" loop to proceed to another line of
code below the loop. There's probably faulty logic in this approach. I
imagine I should wrap the URL request in a function, and perhaps store
the response as a global variable.

This is really more of a basic Python logic question than it is a
urllib2 question.

Any suggestions?


import urllib2
user_agent = 'Windows; U; Windows NT 5.1; en-US; rv: Gecko/
2009042316 Firefox/3.0.10'
user_agent = 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv: ' \
'Gecko/2009120208 Firefox/3.0.16 (.NET CLR 3.5.30729)'
url = 'http://this is a bad url'
count = 0
while True:
count += 1
print 'attempt ' + str(count)
request = urllib2.Request(url, None, headers)
response = urllib2.urlopen(request)
if response:
print 'True response.'
if count == 5:
count = 0
url = 'http://this is a bad url'
print 'How do I get out of this thing?'
print 'fail ' + str(count)
if count == 3:
url = ''


What I don't understand is how to test for a valid URL request, and
then jump out of the "while True" loop to proceed to another line of
code below the loop. There's probably faulty logic in this approach. I
imagine I should wrap the URL request in a function, and perhaps store
the response as a global variable.

This is really more of a basic Python logic question than it is a
urllib2 question.

There, I've condensed your question to what you really meant to say.
You have several approaches. First, let's define some useful objects: assert 2 < i < 5

Getting back to original question, if you want to limit the number of
attempts, don't use a while, use this:
print 'attempt', count+1

attempt 1
Traceback (most recent call last):
File "<pyshell#55>", line 3, in <module>
File "<pyshell#47>", line 2, in do_something
assert 2 < i < 5

If you want to keep exceptions from ending the loop prematurely, you
add this:
print 'attempt', count+1
except StandardError:

Note that bare except clauses are *evil* and should be avoided. Most
exceptions derive from StandardError, so trap that if you want to
catch errors. Finally, to stop iterating when the errors cease, do
for count in xrange(max_attempts):
print 'attempt', count+1
raise StopIteration
except StandardError:
except StopIteration:

attempt 1
attempt 2
attempt 3

Note that StopIteration doesn't derive from StandardError, because
it's not an error, it's a notification. So, throw it if and when you
want to stop iterating.

BTW, note that you don't have to wrap your code in a function.
do_something could be replaced with it's body and everything would
still work.

Brian D

There, I've condensed your question to what you really meant to say.
You have several approaches.  First, let's define some useful objects:>>> max_attempts = 5

        assert 2 < i < 5

Getting back to original question, if you want to limit the number of
attempts, don't use a while, use this:

        print 'attempt', count+1

attempt 1
Traceback (most recent call last):
  File "<pyshell#55>", line 3, in <module>
  File "<pyshell#47>", line 2, in do_something
    assert 2 < i < 5

If you want to keep exceptions from ending the loop prematurely, you
add this:

        print 'attempt', count+1
        except StandardError:

Note that bare except clauses are *evil* and should be avoided.  Most
exceptions derive from StandardError, so trap that if you want to
catch errors.  Finally, to stop iterating when the errors cease, do

        for count in xrange(max_attempts):
                print 'attempt', count+1
                        raise StopIteration
                except StandardError:
except StopIteration:

attempt 1
attempt 2
attempt 3

Note that StopIteration doesn't derive from StandardError, because
it's not an error, it's a notification.  So, throw it if and when you
want to stop iterating.

BTW, note that you don't have to wrap your code in a function.
do_something could be replaced with it's body and everything would
still work.

I'm totally impressed. I love elegant code. Could you tell I was
trained as a VB programmer? I think I can still be reformed.

I appreciate the admonition not to use bare except clauses. I will
avoid that in the future.

I've never seen StopIteration used -- and certainly not used in
combination with a try/except pair. That was an exceptionally valuable

I think I can take it from here, so I'll just say thank you, Sam, for
steering me straight -- very nice.

Philip Semanchuk

I'm actually using mechanize, but that's too complicated for testing
purposes. Instead, I've simulated in a urllib2 sample below an attempt
to test for a valid URL request.

I'm attempting to craft a loop that will trap failed attempts to
request a URL (in cases where the connection intermittently fails),
and repeat the URL request a few times, stopping after the Nth attempt
is tried.

Specifically, in the example below, a bad URL is requested for the
first and second iterations. On the third iteration, a valid URL will
be requested. The valid URL will be requested until the 5th iteration,
when a break statement is reached to stop the loop. The 5th iteration
also restores the values to their original state for ease of repeat

What I don't understand is how to test for a valid URL request, and
then jump out of the "while True" loop to proceed to another line of
code below the loop. There's probably faulty logic in this approach. I
imagine I should wrap the URL request in a function, and perhaps store
the response as a global variable.

This is really more of a basic Python logic question than it is a
urllib2 question.

Hi Brian,
While I don't fully understand what you're trying to accomplish by
changing the URL to after 3 iterations, I suspect that some
of your trouble comes from using "while True". Your code would be
clearer if the while clause actually stated the exit condition. Here's
a suggestion (untested):


count = 0
while count <= MAX_ATTEMPTS:
count += 1
print 'attempt ' + str(count)
request = urllib2.Request(url, None, headers)
response = urllib2.urlopen(request)
if response:
print 'True response.'
except URLError:
print 'fail ' + str(count)

You could also save the results (untested):


count = 0
results = [ ]
while count <= MAX_ATTEMPTS:
count += 1
print 'attempt ' + str(count)
request = urllib2.Request(url, None, headers)
f = urllib2.urlopen(request)
# Note that here I ignore the doc that says "None may be
# returned if no handler handles the request". Caveat emptor.
except URLError:
# Even better, append actual reasons for the failure.

for result in results:
print result

I guess if you're going to do the same number of attempts each time, a
for loop would be more expressive, but you probably get the idea.

Hope this helps


Brian said:
I'm totally impressed. I love elegant code. Could you tell I was
trained as a VB programmer? I think I can still be reformed.

I appreciate the admonition not to use bare except clauses. I will
avoid that in the future.

I've never seen StopIteration used -- and certainly not used in
combination with a try/except pair. That was an exceptionally valuable

I think I can take it from here, so I'll just say thank you, Sam, for
steering me straight -- very nice.

Instead of raising StopIteration you could use 'break':

for count in xrange(max_attempts):
print 'attempt', count + 1
do_something(count + 1)
except StandardError:

The advantage, apart from the length, is that you can then add the
'else' clause to the 'for' loop, which will be run if it _didn't_ break
out of the loop. If you break out only after do_something() is
successful, then not breaking out means that do_something() never

for count in xrange(max_attempts):
print 'attempt', count + 1
do_something(count + 1)
except StandardError:
print 'all attempts failed'

Brian D

I'm actually using mechanize, but that's too complicated for testing
purposes. Instead, I've simulated in a urllib2 sample below an attempt
to test for a valid URL request.
I'm attempting to craft a loop that will trap failed attempts to
request a URL (in cases where the connection intermittently fails),
and repeat the URL request a few times, stopping after the Nth attempt
is tried.
Specifically, in the example below, a bad URL is requested for the
first and second iterations. On the third iteration, a valid URL will
be requested. The valid URL will be requested until the 5th iteration,
when a break statement is reached to stop the loop. The 5th iteration
also restores the values to their original state for ease of repeat
What I don't understand is how to test for a valid URL request, and
then jump out of the "while True" loop to proceed to another line of
code below the loop. There's probably faulty logic in this approach. I
imagine I should wrap the URL request in a function, and perhaps store
the response as a global variable.
This is really more of a basic Python logic question than it is a
urllib2 question.

Hi Brian,
While I don't fully understand what you're trying to accomplish by  
changing the URL to after 3 iterations, I suspect that some  
of your trouble comes from using "while True". Your code would be  
clearer if the while clause actually stated the exit condition. Here's  
a suggestion (untested):


count = 0
while count <= MAX_ATTEMPTS:
    count += 1
       print 'attempt ' + str(count)
       request = urllib2.Request(url, None, headers)
       response = urllib2.urlopen(request)
       if response:
          print 'True response.'
    except URLError:
       print 'fail ' + str(count)

You could also save the results  (untested):


count = 0
results = [ ]
while count <= MAX_ATTEMPTS:
    count += 1
       print 'attempt ' + str(count)
       request = urllib2.Request(url, None, headers)
       f = urllib2.urlopen(request)
       # Note that here I ignore the doc that says "None may be
       # returned if no handler handles the request". Caveat emptor.
    except URLError:
       # Even better, append actual reasons for the failure.

for result in results:
    print result

I guess if you're going to do the same number of attempts each time, a  
for loop would be more expressive, but you probably get the idea.

Hope this helps

Nice to have options, Philip. Thanks! I'll give your solution a try in
mechanize as well. I really can't thank you enough for contributing to
helping me solve this issue. I love Python.

Brian D

Thanks MRAB as well. I've printed all of the replies to retain with my
pile of essential documentation.

To follow up with a complete response, I'm ripping out of my mechanize
module the essential components of the solution I got to work.

The main body of the code passes a URL to the scrape_records function.
The function attempts to open the URL five times.

If the URL is opened, a values dictionary is populated and returned to
the calling statement. If the URL cannot be opened, a fatal error is
printed and the module terminates. There's a little sleep call in the
function to leave time for any errant connection problem to resolve

Thanks to all for your replies. I hope this helps someone else:

import urllib2, time
from mechanize import Browser

def scrape_records(url):
maxattempts = 5
br = Browser()
user_agent = 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv: Gecko/2009120208 Firefox/3.0.16 (.NET CLR 3.5.30729)'
br.addheaders = [('User-agent', user_agent)]
for count in xrange(maxattempts):
print url, count
except urllib2.URLError:
print 'URL error', count
# Pretend a failed connection was fixed
if count == 2:
url = ''
print 'Fatal URL error. Process terminated.'
return None
# Scrape page and populate valuesDict
valuesDict = {}
return valuesDict

url = 'http://badurl'
valuesDict = scrape_records(url)
if valuesDict == None:
print 'Failed to retrieve valuesDict'


Brian said:
Thanks MRAB as well. I've printed all of the replies to retain with my
pile of essential documentation.

To follow up with a complete response, I'm ripping out of my mechanize
module the essential components of the solution I got to work.

The main body of the code passes a URL to the scrape_records function.
The function attempts to open the URL five times.

If the URL is opened, a values dictionary is populated and returned to
the calling statement. If the URL cannot be opened, a fatal error is
printed and the module terminates. There's a little sleep call in the
function to leave time for any errant connection problem to resolve

Thanks to all for your replies. I hope this helps someone else:

import urllib2, time
from mechanize import Browser

def scrape_records(url):
maxattempts = 5
br = Browser()
user_agent = 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv: Gecko/2009120208 Firefox/3.0.16 (.NET CLR 3.5.30729)'
br.addheaders = [('User-agent', user_agent)]
for count in xrange(maxattempts):
print url, count
except urllib2.URLError:
print 'URL error', count
# Pretend a failed connection was fixed
if count == 2:
url = ''

'pass' isn't necessary.
print 'Fatal URL error. Process terminated.'
return None
# Scrape page and populate valuesDict
valuesDict = {}
return valuesDict

url = 'http://badurl'
valuesDict = scrape_records(url)
if valuesDict == None:

When checking whether or not something is a singleton, such as None, use
"is" or "is not" instead of "==" or "!=".

Brian D

Brian said:
Thanks MRAB as well. I've printed all of the replies to retain with my
pile of essential documentation.
To follow up with a complete response, I'm ripping out of my mechanize
module the essential components of the solution I got to work.
The main body of the code passes a URL to the scrape_records function.
The function attempts to open the URL five times.
If the URL is opened, a values dictionary is populated and returned to
the calling statement. If the URL cannot be opened, a fatal error is
printed and the module terminates. There's a little sleep call in the
function to leave time for any errant connection problem to resolve
Thanks to all for your replies. I hope this helps someone else:
import urllib2, time
from mechanize import Browser
def scrape_records(url):
    maxattempts = 5
    br = Browser()
    user_agent = 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv: Gecko/2009120208 Firefox/3.0.16 (.NET CLR 3.5.30729)'
    br.addheaders = [('User-agent', user_agent)]
    for count in xrange(maxattempts):
            print url, count
        except urllib2.URLError:
            print 'URL error', count
            # Pretend a failed connection was fixed
            if count == 2:
                url = ''

'pass' isn't necessary.
        print 'Fatal URL error. Process terminated.'
        return None
    # Scrape page and populate valuesDict
    valuesDict = {}
    return valuesDict
url = 'http://badurl'
valuesDict = scrape_records(url)
if valuesDict == None:

When checking whether or not something is a singleton, such as None, use
"is" or "is not" instead of "==" or "!=".
    print 'Failed to retrieve valuesDict'

I'm definitely acquiring some well-deserved schooling -- and it's
really appreciated. I'd seen the "is/is not" preference before, but it
just didn't stick.

I see now that "pass" is redundant -- thanks for catching that.


Steve Holden

Brian D wrote:
I'm definitely acquiring some well-deserved schooling -- and it's
really appreciated. I'd seen the "is/is not" preference before, but it
just didn't stick.
Yes, a lot of people have acquired the majority of their Python
education from this list - I have certainly learned a thing or two from
it over the years, and had some very interesting discussions.

is/is not are about object identity. Saying

a is b

is pretty much the same thing as saying

id(a) == id(b)

so it's a test that two expressions are references to the exact same
object. So it works with None, since there is only ever one value of
<type 'NoneType'>.

Be careful not to use it when there can be several different but equal
values, though.
I see now that "pass" is redundant -- thanks for catching that.


While I don't fully understand what you're trying to accomplish by
changing the URL to after 3 iterations, I suspect that some
of your trouble comes from using "while True". Your code would be
clearer if the while clause actually stated the exit condition. Here's
a suggestion (untested):


count = 0
while count <= MAX_ATTEMPTS:
count += 1
print 'attempt ' + str(count)
request = urllib2.Request(url, None, headers)
response = urllib2.urlopen(request)
if response:
print 'True response.'
except URLError:
print 'fail ' + str(count)

Note that you may have good reason for doing it differently:

def retry(url):
count = 0
while True:
count += 1
print 'attempt', count
request = urllib2.Request(url, None, headers)
response = urllib2.urlopen(request)
if response:
print 'True response'
except URLError:
if count < MAX_ATTEMPTS:

This structure is required in order for the raise to do a proper

BTW, your code is rather oddly indented, please stick with PEP8.


Note that you may have good reason for doing it differently:

def retry(url):
count = 0
while True:
count += 1
print 'attempt', count
request = urllib2.Request(url, None, headers)
response = urllib2.urlopen(request)
if response:
print 'True response'
Oops, that print should have been a return.

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Latest member

Latest Threads
