distributed queue?

P

Paul Rubin

Does anyone have an implementation of a distributed queue? I.e. I
have a long running computation f(x) and I'd like to be able to
evaluate it (for different values of x) on a bunch of different
computers simultaneously, the usual "worker thread" pattern except
distributed across a network. I guess this is pretty easy to write
with a centralized socket listener that dispatches requests through a
Queue to multiple threads on the same machine, each talking
synchronously to a server socket. I wonder if something like it
already exists. I see a little bit of discussion in the newsgroup
archive but no obvious pointers to code.

Thanks.
 
R

Robert Kern

Paul said:
Does anyone have an implementation of a distributed queue? I.e. I
have a long running computation f(x) and I'd like to be able to
evaluate it (for different values of x) on a bunch of different
computers simultaneously, the usual "worker thread" pattern except
distributed across a network. I guess this is pretty easy to write
with a centralized socket listener that dispatches requests through a
Queue to multiple threads on the same machine, each talking
synchronously to a server socket. I wonder if something like it
already exists. I see a little bit of discussion in the newsgroup
archive but no obvious pointers to code.

Take a look at the work being done on IPython:

http://ipython.scipy.org/moin/Parallel_Computing

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco
 
S

skip

Paul> Does anyone have an implementation of a distributed queue? I.e. I
Paul> have a long running computation f(x) and I'd like to be able to
Paul> evaluate it (for different values of x) on a bunch of different
Paul> computers simultaneously, the usual "worker thread" pattern except
Paul> distributed across a network.

PyBrenda maybe? (Dunno if it's even still around.)

Skip
 
B

Bjoern Schliessmann

Paul said:
Does anyone have an implementation of a distributed queue? I.e. I
have a long running computation f(x) and I'd like to be able to
evaluate it (for different values of x) on a bunch of different
computers simultaneously, the usual "worker thread" pattern except
distributed across a network.

Doesn't sound difficult to implement.
I guess this is pretty easy to write with a centralized socket
listener that dispatches requests through a Queue to multiple
threads on the same machine, each talking synchronously to a
server socket.

(Why does everyone think that "concurrency" equals "usage of
multiple threads"?)
I wonder if something like it already exists. I see a little bit
of discussion in the newsgroup archive but no obvious pointers to
code.

Try Twisted for your networking needs. With it it's quite easy to
develop an own little protocol (e. g. on top of TCP). When you have
this ready, writing a server and client using it is a matter of
minutes.

http://twistedmatrix.com/projects/core/documentation/howto/servers.html
http://twistedmatrix.com/projects/core/documentation/howto/clients.html

For bigger needs, Twisted also has RPC features.

Regards,


Björn
 
P

Paul Rubin

Bjoern Schliessmann said:
(Why does everyone think that "concurrency" equals "usage of
multiple threads"?)

Well, it doesn't necessarily, but that's easiest a lot of the time.
Try Twisted for your networking needs.

I should try to understand Twisted better one of these days, but it's
much more confusing than threads. Also, the function I want to
parallelize does blocking operations (database lookups), so in Twisted
I'd have to figure out some way to do them asynchronously.
 
H

Hendrik van Rooyen

Paul Rubin said:
Well, it doesn't necessarily, but that's easiest a lot of the time.


I should try to understand Twisted better one of these days, but it's
much more confusing than threads. Also, the function I want to
parallelize does blocking operations (database lookups), so in Twisted
I'd have to figure out some way to do them asynchronously.

I would think of making 'pullers' in the remote machines in front of
whatever it is you are making parallel to get the next thing to do,
from a 'queue server' in the originating machine to distribute the
stuff.

I am not sure if Pyro can help you as I have only read about it and
not used it but I think its worth a look. If it were a one on one
setup I would not hesitate to recommend it but I can't remember
if it is any good for one to many scenarios.

- Hendrik
 
I

Irmen de Jong

Paul said:
Does anyone have an implementation of a distributed queue? I.e. I
have a long running computation f(x) and I'd like to be able to
evaluate it (for different values of x) on a bunch of different
computers simultaneously, the usual "worker thread" pattern except
distributed across a network. I guess this is pretty easy to write
with a centralized socket listener that dispatches requests through a
Queue to multiple threads on the same machine, each talking
synchronously to a server socket. I wonder if something like it
already exists. I see a little bit of discussion in the newsgroup
archive but no obvious pointers to code.

Thanks.

Pyro (http://pyro.sf.net) contains 2 examples that do just this.
One is a distributed merge sort / md5 "cracker", the other is
distributed prime factorization of a set of numbers.

--Irmen
 
B

Bjoern Schliessmann

Paul said:
Bjoern Schliessmann <[email protected]>

Well, it doesn't necessarily, but that's easiest a lot of the
time.

I don't think so. Personally, I like multiplexing better. I think it
has less complex code.
I should try to understand Twisted better one of these days, but
it's much more confusing than threads.

Sure? I don't think so. I once tried using threads and quickly had
more code for threading than for my functionality, and even had
problems, e. g. with synchronisation or shutting down properly.

IMHO, Twisted is easier, like most multiplexing techniques: Just
write your code (derived from existing client or server classes)
and hook it up to the reactor, and the rest works automagically.

(You can even hook your protocol to stdin/stdout or install
a "manhole" in the server. You also have twistd which does all the
daemon and logging work for you. Yes, I'm biased ;) )
Also, the function I want to parallelize does blocking operations
(database lookups), so in Twisted I'd have to figure out some way
to do them asynchronously.

Twisted _is_ asynchronous networking. It also has database classes:

http://twistedmatrix.com/documents/current/api/twisted.enterprise.html

Regards,


Björn
 
A

A.T.Hofkamp

Does anyone have an implementation of a distributed queue? I.e. I
have a long running computation f(x) and I'd like to be able to
evaluate it (for different values of x) on a bunch of different

batchlib and the underlying exec_proxy are designed to handle exactly this type
of problem.
Both of them are in PyPI (and available at my site http://se.wtb.tue.nl/~hat).

Alnert
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,773
Messages
2,569,594
Members
45,121
Latest member
LowellMcGu
Top