asynchronous downloading

P

Plumo

I want to download content asynchronously. This would be straightforward to do threaded or across processes, but difficult asynchronously so people seem to rely on external libraries (twisted / gevent / eventlet).

(I would use gevent under different circumstances, but currently need to stick to standard libraries.)

I looked around and found there is little interest in developing a proper HTTP client on asyncore. The best I found stopped development a decade ago: http://sourceforge.net/projects/asynchttp/

What do you recommend?
And why is there poor support for asynchronous execution?

Richard
 
M

Mark Hammond

I want to download content asynchronously. This would be
straightforward to do threaded or across processes, but difficult
asynchronously so people seem to rely on external libraries (twisted
/ gevent / eventlet).

Exactly - the fact it's difficult is why those tools compete.
(I would use gevent under different circumstances, but currently need
to stick to standard libraries.)

As above - use threads or processes - they are fine for relatively
modest tasks. If your needs go beyond modest, I'd reevaluate your need
to stick with just the stdlib - even demanding *sync* http apps often
wind up using modules outside the stdlib. Look into virtualenv etc if
permission to install packages is the issue.

Batteries included free, but turbo-chargers are an extra ;)

Mark
 
P

Paul Rubin

Plumo said:
What do you recommend?
Threads.

And why is there poor support for asynchronous execution?

The freenode #python crowd seems to hate threads and prefer twisted,
which seems to have the features you want and probably handles very
large #'s of connections better than POSIX threads do. But I find the
whole event-driven model to be an annoying abstraction inversion and
threads to be simpler, so I've stayed with threads. I keep hearing
boogieman stories about the evil hazards of race conditions etc. but
none of that stuff has ever happened to me (yet) as far as I can tell.
The main thing is to avoid sharing mutable data between threads to the
extent that you can. Keep the threads isolated from each other except
for communication through Queues and not too much can go wrong. The
last program I wrote had around 20 threads and one or two condition
variables and I don't think any significant bugs resulted from that.

FWIW, the Erlang language is built around the above concept, it uses
super-lightweight userland threads so it can handle millions of them
concurrently, and it's used successfully for ultra-high-reliability
phone switches and similar applications that are not allowed to fail, so
it must be doing something right.

There are a few schemes like Camaelia (sp?) implementing concurrency
with Python generators or coroutines, but I think they're not widely
used, and Python coroutines are kind of crippled because they don't
carry any stack below their entry point.
 
R

Richard Baron Penman

I want to download content asynchronously. This would be
Exactly - the fact it's difficult is why those tools compete.

It is difficult in Python because the async libraries do not offer
much. Straightforward in some other languages.

Do you know why there is little support for asynchronous execution in
the standard libraries?
For large scale downloading I found thread pools do not scale well.

Richard
 
G

Giampaolo Rodolà

Il 23 febbraio 2012 07:58 said:
I want to download content asynchronously. This would be straightforward to do threaded or across processes, but difficult asynchronously so people seem to rely on external libraries (twisted / gevent / eventlet).

(I would use gevent under different circumstances, but currently need to stick to standard libraries.)

I looked around and found there is little interest in developing a proper HTTP client on asyncore. The best I found stopped development a decade ago: http://sourceforge.net/projects/asynchttp/

What do you recommend?
And why is there poor support for asynchronous execution?

Richard

If you want to stick with asyncore try to take a look at this:
https://gist.github.com/1519999
And why is there poor support for asynchronous execution?

I'd say that's true for stdlib only (asyncore/asynchat).
There are plenty of choices amongst third party modules though.
To say one, I particularly like tornado which is simple and powerful:
http://www.tornadoweb.org/documentation/httpclient.html

--- Giampaolo
http://code.google.com/p/pyftpdlib/
http://code.google.com/p/psutil/
http://code.google.com/p/pysendfile/
 
P

Plumo

My current implementation works fine below a few hundred threads. But each thread takes up a lot of memory so does not scale well.

I have been looking at Erlang for that reason, but found it is missing useful libraries in other areas.
 
P

Plumo

that example is excellent - best use of asynchat I have seen so far.

I read through the python-dev archives and found the fundamental problem is no one maintains asnycore / asynchat.
 
P

Plumo

that example is excellent - best use of asynchat I have seen so far.

I read through the python-dev archives and found the fundamental problem is no one maintains asnycore / asynchat.
 
G

Giampaolo Rodolà

Il 24 febbraio 2012 02:10 said:
that example is excellent - best use of asynchat I have seen so far.

I read through the python-dev archives and found the fundamental problem is no one maintains asnycore / asynchat.

Well, actually I do/did.
Point with asyncore/asynchat is that it's original design is so flawed
and simplicistic it doesn't allow actual customization without
breaking compatibility.
See for example:
http://bugs.python.org/issue6692


--- Giampaolo
http://code.google.com/p/pyftpdlib/
http://code.google.com/p/psutil/
http://code.google.com/p/pysendfile/
 
R

Richard Baron Penman

I read through the python-dev archives and found the fundamental problem is no one maintains asnycore / asynchat.
Well, actually I do/did.

ah OK. I had read this comment from a few years back:
"IIRC, there was a threat to remove asyncore because there were no
maintainers, no one was fixing bugs, no one was improving it, and no
one was really using it"

Point with asyncore/asynchat is that it's original design is so flawed
and simplicistic it doesn't allow actual customization without
breaking compatibility.

Python3 uses the same API - was there not enough interest to improve it?

Richard
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,764
Messages
2,569,565
Members
45,041
Latest member
RomeoFarnh

Latest Threads

Top