parralel downloads

J

John Deas

Hi,

I would like to write a python script that will download a list of
files (mainly mp3s) from Internet. For this, I thought to use urllib,
with

urlopen("myUrl").read() and then writing the resulting string to a
file

my problem is that I would like to download several files at the time.
As I have not much experience in programming, could you point me the
easier ways to do this in python ?

Thanks,

JD
 
G

Gary Herron

poof65 said:
For your problem you have to use threads.
Not at all true. Thread provide one way to solve this, but another is
the select function. For this simple case, select() may (or may not) be
easier to write. Pseudo-code would look something like this:

openSockets = list of sockets one per download file:
while openSockets:
readySockets = select(openSockets ...) # Identifies sockets with
data to be read
for each s in readSockets:
read from s and do whatever with the data
if s is at EOF: close and remove s from openSockets

That's it. Far easier than threads.

Gary Herron
 
J

John Deas

Not at all true. Thread provide one way to solve this, but another is
the select function. For this simple case, select() may (or may not) be
easier to write. Pseudo-code would look something like this:

openSockets = list of sockets one per download file:
while openSockets:
readySockets = select(openSockets ...) # Identifies sockets with
data to be read
for each s in readSockets:
read from s and do whatever with the data
if s is at EOF: close and remove s from openSockets

That's it. Far easier than threads.

Gary Herron

Thank you both for your help. Threads are working for me. However, a
new problem for me is that the url I want to download are in an xml
file (I want to download podcasts), and is not the same as the file
downloaded:

http://www.sciam.com/podcast/podcast.mp3?e_id=86102326-0B1F-A3D4-74B2BBD61E9ECD2C&ref=p_rss

will be redirected to download:

http://podcast.sciam.com/daily/sa_d_podcast_080307.mp3

is there a way, knowing the first url to get the second at runtime in
my script ?
 
J

John Deas

Thank you both for your help. Threads are working for me. However, a
new problem for me is that the url I want to download are in an xml
file (I want to download podcasts), and is not the same as the file
downloaded:

http://www.sciam.com/podcast/podcast.mp3?e_id=86102326-0B1F-A3D4-74B2...

will be redirected to download:

http://podcast.sciam.com/daily/sa_d_podcast_080307.mp3

is there a way, knowing the first url to get the second at runtime in
my script ?

Found it: geturl() does the job
 
C

castironpi

 my problem is that I would like to download several files at the time.
Found it: geturl() does the job

That's for normalizing schemes. I believe you subclass FancyURLopener
and override the read method.
 
G

Gabriel Genellina

En Sat, 08 Mar 2008 14:47:45 -0200, Gary Herron
Not at all true. Thread provide one way to solve this, but another is
the select function. For this simple case, select() may (or may not) be
easier to write. Pseudo-code would look something like this:

openSockets = list of sockets one per download file:
while openSockets:
readySockets = select(openSockets ...) # Identifies sockets with
data to be read
for each s in readSockets:
read from s and do whatever with the data
if s is at EOF: close and remove s from openSockets

That's it. Far easier than threads.

Easier? If you omit all the relevant details, yes, looks easy. For
example, you read some data from one socket, part of the file you're
downloading. Where do you write it? You require additional structures to
keep track of things.
Pseudocode for the threaded version, complete with socket creation:

def downloadfile(url, fn):
s = create socket for url
f = open filename for writing
shutil.copyfileobj(s.makefile(), f)

for each url, filename to retrieve:
t = threading.Thread(target=downloadfile, args=(url,filename))
add t to threadlist
t.start()

for each t in threadlist:
t.join()

The downloadfile function looks simpler to me - it's what anyone would
write in a single threaded program, with local variables and keeping full
state.
The above pseudocode can be converted directly into Python code - no more
structures nor code are required.

Of course, don't try to download a million files at the same time -
neither a million sockets nor a million threads would work.
 
C

castironpi

That's it.  Far easier than threads.

I'll order a 'easyness' metric from the warehouse. Of course,
resources are parameters to the metric, such as facility given lots of
time, facility given lots of libraries, facility given hot shots, &c.
Easier? If you omit all the relevant details, yes, looks easy. For  

def downloadfile(url, fn):
   s = create socket for url
   f = open filename for writing
   shutil.copyfileobj(s.makefile(), f)

for each url, filename to retrieve:
[ threadlist.addandstart( threading.Thread(target=downloadfile,
args=(url,filename)) ) ][ threadlist.joineach() ]
Of course, don't try to download a million files at the same time -  
neither a million sockets nor a million threads would work.

Dammit! Then what's my million-core machine for? If architectures
"have reached the point of diminishing returns" ( off py.ideas ), then
what's the PoDR for numbers of threads per core?

Answer: One. Just write data structures and don't swap context. But
when do you want it by? What is the PoDR for amount of effort per
clock cycle saved? Get a Frank and a Brit and ask them what language
is easiest to speak.

(Answer: Math. Har *plonk*.)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,774
Messages
2,569,598
Members
45,150
Latest member
MakersCBDReviews
Top