Progress Bar with urllib2

Andrew Godwin · Apr 26, 2005

I'm trying to write a python script to download data (well, files) from a HTTP server (well, a PHP script spitting them out, at least).
The file data is just the returned data from the request (the server script echoes the file and then dies).

I call the page using urllib2, like so:

satelliteRequest = urllib2.Request(satelliteServer + "?command=download&filepath="+filepath)
satelliteRequestData = {"username":satelliteUsername, "password":satellitePassword}
satelliteRequest.add_data(urllib.urlencode(satelliteRequestData))
satelliteOpener = urllib2.build_opener()
satelliteOpener.addheaders = [('User-agent', userAgent)]

Now, if I want to download the file all at once, I just do

satelliteData = satelliteOpener.open(satelliteRequest).read()

But some of these files are going to be really, really big, and I want to get a progress bar going.
I've tried doing a while loop like this:

chunkSize = 10240
while 1:
dataBuffer = satelliteOpener.open(satelliteRequest).read(chunkSize)
data += dataBuffer
if not dataBuffer:
break

But that just gives me the first 10240 bytes again and again. Is there something I'm missing here?
It might even be I'm calling urllib2 the wrong way (does it download when you read() or when you create the Request?)

All help is appreciated, I'm sort of stuck here.

Andrew Godwin

Trent Mick · Apr 26, 2005

But some of these files are going to be really, really big, and I want

to get a progress bar going. I've tried doing a while loop like this:

Here is a little snippet that I use occassionally:

------------------ geturl.py ---------------------------
import os
import sys
import urllib

def _reporthook(numblocks, blocksize, filesize, url=None):
#print "reporthook(%s, %s, %s)" % (numblocks, blocksize, filesize)
base = os.path.basename(url)
#XXX Should handle possible filesize=-1.
try:
percent = min((numblocks*blocksize*100)/filesize, 100)
except:
percent = 100
if numblocks != 0:
sys.stdout.write("\b"*70)
sys.stdout.write("%-66s%3d%%" % (base, percent))

def geturl(url, dst):
print "get url '%s' to '%s'" % (url, dst)
if sys.stdout.isatty():
urllib.urlretrieve(url, dst,
lambda nb, bs, fs, url=url: _reporthook(nb,bs,fs,url))
sys.stdout.write('\n')
else:
urllib.urlretrieve(url, dst)

if __name__ == "__main__":
if len(sys.argv) == 2:
url = sys.argv[1]
base = url[url.rindex('/')+1:]
geturl(url, base)
elif len(sys.argv) == 3:
url, base = sys.argv[1:]
geturl(url, base)
else:
print "Usage: geturl.py URL [DEST]"
sys.exit(1)
--------------- end of geturl.py ---------------------------

Save that as geturl.py and try running:

python geturl.py http://example.com/downloads/bigfile.zip

Cheers,
Trent

John A Ferguson · Apr 26, 2005

I'm trying to write a python script to download data (well, files) from a HTTP server (well, a PHP script spitting them out, at least).
The file data is just the returned data from the request (the server script echoes the file and then dies).

I call the page using urllib2, like so:

satelliteRequest = urllib2.Request(satelliteServer + "?command=download&filepath="+filepath)
satelliteRequestData = {"username":satelliteUsername, "password":satellitePassword}
satelliteRequest.add_data(urllib.urlencode(satelliteRequestData))
satelliteOpener = urllib2.build_opener()
satelliteOpener.addheaders = [('User-agent', userAgent)]

Now, if I want to download the file all at once, I just do

satelliteData = satelliteOpener.open(satelliteRequest).read()

But some of these files are going to be really, really big, and I want to get a progress bar going.
I've tried doing a while loop like this:

chunkSize = 10240
while 1:
dataBuffer = satelliteOpener.open(satelliteRequest).read(chunkSize)
data += dataBuffer
if not dataBuffer:
break

But that just gives me the first 10240 bytes again and again. Is there something I'm missing here?
It might even be I'm calling urllib2 the wrong way (does it download when you read() or when you create the Request?)

All help is appreciated, I'm sort of stuck here.

Andrew Godwin

Each time through the loop you re-open the url and thus start from the
beginning. You need to separate the opening from the reading.

HTH,
John

urllib2 request with binary file as payload	0	May 12, 2011
urllib2 - not returning page expected after post	1	Mar 23, 2011
Need urllib.urlretrieve and urllib2.OpenerDirector together	0	Dec 26, 2010
urllib2.urlopen Progress bar	0	Jan 15, 2006
mysteries of urllib/urllib2	6	Jul 3, 2007
Web authentication urllib2	0	Jan 24, 2009
[urllib2] No time-out?	1	Nov 16, 2008
Progressive download with Urllib2.	0	Dec 6, 2008

Progress Bar with urllib2

Andrew Godwin

Trent Mick

John A Ferguson

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads