Progress Bar with urllib2

A

Andrew Godwin

I'm trying to write a python script to download data (well, files) from a HTTP server (well, a PHP script spitting them out, at least).
The file data is just the returned data from the request (the server script echoes the file and then dies).

I call the page using urllib2, like so:

satelliteRequest = urllib2.Request(satelliteServer + "?command=download&filepath="+filepath)
satelliteRequestData = {"username":satelliteUsername, "password":satellitePassword}
satelliteRequest.add_data(urllib.urlencode(satelliteRequestData))
satelliteOpener = urllib2.build_opener()
satelliteOpener.addheaders = [('User-agent', userAgent)]

Now, if I want to download the file all at once, I just do

satelliteData = satelliteOpener.open(satelliteRequest).read()

But some of these files are going to be really, really big, and I want to get a progress bar going.
I've tried doing a while loop like this:

chunkSize = 10240
while 1:
dataBuffer = satelliteOpener.open(satelliteRequest).read(chunkSize)
data += dataBuffer
if not dataBuffer:
break

But that just gives me the first 10240 bytes again and again. Is there something I'm missing here?
It might even be I'm calling urllib2 the wrong way (does it download when you read() or when you create the Request?)

All help is appreciated, I'm sort of stuck here.

Andrew Godwin
 
T

Trent Mick

But some of these files are going to be really, really big, and I want
to get a progress bar going. I've tried doing a while loop like this:

Here is a little snippet that I use occassionally:

------------------ geturl.py ---------------------------
import os
import sys
import urllib

def _reporthook(numblocks, blocksize, filesize, url=None):
#print "reporthook(%s, %s, %s)" % (numblocks, blocksize, filesize)
base = os.path.basename(url)
#XXX Should handle possible filesize=-1.
try:
percent = min((numblocks*blocksize*100)/filesize, 100)
except:
percent = 100
if numblocks != 0:
sys.stdout.write("\b"*70)
sys.stdout.write("%-66s%3d%%" % (base, percent))

def geturl(url, dst):
print "get url '%s' to '%s'" % (url, dst)
if sys.stdout.isatty():
urllib.urlretrieve(url, dst,
lambda nb, bs, fs, url=url: _reporthook(nb,bs,fs,url))
sys.stdout.write('\n')
else:
urllib.urlretrieve(url, dst)

if __name__ == "__main__":
if len(sys.argv) == 2:
url = sys.argv[1]
base = url[url.rindex('/')+1:]
geturl(url, base)
elif len(sys.argv) == 3:
url, base = sys.argv[1:]
geturl(url, base)
else:
print "Usage: geturl.py URL [DEST]"
sys.exit(1)
--------------- end of geturl.py ---------------------------


Save that as geturl.py and try running:

python geturl.py http://example.com/downloads/bigfile.zip


Cheers,
Trent
 
J

John A Ferguson

I'm trying to write a python script to download data (well, files) from a HTTP server (well, a PHP script spitting them out, at least).
The file data is just the returned data from the request (the server script echoes the file and then dies).

I call the page using urllib2, like so:

satelliteRequest = urllib2.Request(satelliteServer + "?command=download&filepath="+filepath)
satelliteRequestData = {"username":satelliteUsername, "password":satellitePassword}
satelliteRequest.add_data(urllib.urlencode(satelliteRequestData))
satelliteOpener = urllib2.build_opener()
satelliteOpener.addheaders = [('User-agent', userAgent)]

Now, if I want to download the file all at once, I just do

satelliteData = satelliteOpener.open(satelliteRequest).read()

But some of these files are going to be really, really big, and I want to get a progress bar going.
I've tried doing a while loop like this:

chunkSize = 10240
while 1:
dataBuffer = satelliteOpener.open(satelliteRequest).read(chunkSize)
data += dataBuffer
if not dataBuffer:
break

But that just gives me the first 10240 bytes again and again. Is there something I'm missing here?
It might even be I'm calling urllib2 the wrong way (does it download when you read() or when you create the Request?)

All help is appreciated, I'm sort of stuck here.

Andrew Godwin

Each time through the loop you re-open the url and thus start from the
beginning. You need to separate the opening from the reading.

HTH,
John
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,582
Members
45,065
Latest member
OrderGreenAcreCBD

Latest Threads

Top