multi threading in multi processor (computer)

A

Aahz

Personally I need a solution which touches this discussion. I need to run
multiple processes, which I communicate with via stdin/out, simultaneously,
and my plan was to do this with threads. Any favorite document pointers,
common traps, or something else which could be good to know?

Threads and forks tend to be problematic. This is one case I'd recommend
against threads.
--
Aahz ([email protected]) <*> http://www.pythoncraft.com/

"The joy of coding Python should be in seeing short, concise, readable
classes that express a lot of action in a small amount of clear code --
not in reams of trivial code that bores the reader to death." --GvR
 
N

Nick Coghlan

Mike said:
Actually, this is one of the cases I was talking about. I find it
saner to convert to non-blocking I/O and use select() for
synchronization. That solves the problem, without introducing any of
the headaches related to shared access and locking that come with
threads.

Use a communicating sequential processes model for the threading and you don't
have many data synchronisation problems because you have barely any shared
access - no application data is ever shared between threads, they only send
messages to each other via message queues. Most threads simply block on their
incoming message queue permanently. Those doing blocking I/O set an appropriate
timeout on the I/O call so they can check for messages occasionally.

Conveniently, you end up with an architecture that supports switching to
multiple processes, or even multiple machines just by changing the transport
mechanism used by the message system.

(We did exactly this for a GUI application - detached the GUI so it talked to a
server via CORBA instead of via direct DLL calls. This meant the server could be
ported to a different platform without having to port the far more platform
specific GUI. This would have been much harder if we weren't already using a CSP
model for communication between different parts of the system)

Cheers,
Nick.
 
D

Dave Brueck

Mike said:
Actually, this is one of the cases I was talking about. I find it
saner to convert to non-blocking I/O and use select() for
synchronization. That solves the problem, without introducing any of
the headaches related to shared access and locking that come with
threads.

This whole tangent to the original thread intrigues me - I've found that if
you're going to use threads in any language, Python is the one to use because
the GIL reduces so many of the problems common to multithreaded programming (I'm
not a huge fan of the GIL, but its presence effectively prevents a pure Python
multithreaded app from corrupting the interpreter, which is especially handy for
those just learning Python or programming).

I've done a lot of networking applications using select/poll (usually for
performance reasons) and found that going that route *can* in some cases
simplify things but it requires looking at the problem differently, often from
perspectives that seem unnatural to me - it's not just an implementation detail
but one you have to consider during design.

One nice thing about using threads is that components of your application that
are logically separate can remain separate in the code as well - the
implementations don't have to be tied together at some common dispatch loop, and
a failure to be completely non-blocking in one component doesn't necessarily
spell disaster for the entire app (I've had apps in production where one thread
would die or get hung but I was relieved to find out that the main service
remained available).

Another related benefit is that a lot of application state is implicitly and
automatically managed by your local variables when the task is running in a
separate thread, whereas other approaches often end up forcing you to think in
terms of a state machine when you don't really care* and as a by-product you
have to [semi-]manually track the state and state transitions - for some
problems this is fine, for others it's downright tedious.

Anyway, if someone doesn't know about alternatives to threads, then that's a
shame as other approaches have their advantages (often including a certain
elegance that is just darn *cool*), but I wouldn't shy away from threads too
much either - especially in Python.

-Dave

* Simple case in point: a non-blocking logging facility. In Python you can just
start up a thread that pops strings off a Queue object and writes them to an
open file. A non-threaded version is more complicated to implement, debug, and
maintain.
 
D

Donn Cave

Quoth Dave Brueck <[email protected]>:
....
| Another related benefit is that a lot of application state is implicitly and
| automatically managed by your local variables when the task is running in a
| separate thread, whereas other approaches often end up forcing you to think in
| terms of a state machine when you don't really care* and as a by-product you
| have to [semi-]manually track the state and state transitions - for some
| problems this is fine, for others it's downright tedious.

I don't know if the current Stackless implementation has regained any
of this ground, but at least of historical interest here, the old one's
ability to interrupt, store and resume a computation could be used to

As you may know, it used to be, in Stackless Python, that you could have
both. Your function would suspend itself, the select loop would resume it,
for something like serialized threads. (The newer version of Stackless
lost this continuation feature, but for all I know there may be new
features that regain some of that ground.)

I put that together with real OS threads once, where the I/O loop was a
message queue instead of select. A message queueing multi-threaded
architecture can end up just as much a state transition game.

I like threads when they're used in this way, as application components
that manage some device-like thing like a socket or a graphic user interface
window, interacting through messages. Even then, though, there tend to
be a lot of undefined behaviors in events like termination of the main
thread, receipt of signals, etc.

Donn Cave, (e-mail address removed)
 
D

Dave Brueck

Donn said:
Quoth Dave Brueck <[email protected]>:
...
| Another related benefit is that a lot of application state is implicitly and
| automatically managed by your local variables when the task is running in a
| separate thread, whereas other approaches often end up forcing you to think in
| terms of a state machine when you don't really care* and as a by-product you
| have to [semi-]manually track the state and state transitions - for some
| problems this is fine, for others it's downright tedious.

I don't know if the current Stackless implementation has regained any
of this ground, but at least of historical interest here, the old one's
ability to interrupt, store and resume a computation could be used to

As you may know, it used to be, in Stackless Python, that you could have
both. Your function would suspend itself, the select loop would resume it,
for something like serialized threads. (The newer version of Stackless
lost this continuation feature, but for all I know there may be new
features that regain some of that ground.)

Yep, I follow Stackless development for this very reason. Last I heard, a more
automatic scheduler was in the works, without which in can be a little confusing
about when non-I/O tasks should get resumed (and by who), but it's not
insurmountable. Ideally with Stackless you'd avoid OS threads altogether since
the interpreter takes a performance hit with them, but this can be tough if
you're e.g. also talking to a database via a blocking API.
I put that together with real OS threads once, where the I/O loop was a
message queue instead of select. A message queueing multi-threaded
architecture can end up just as much a state transition game.

Definitely, but for many cases it does not - having each thread represent a
distinct "worker" that pops some item of work off one queue, processes it, and
puts it on another queue can really simplify things. Often this maps to
real-world objects quite well, additional steps can be inserted or removed
easily (and dynamically), and each worker can be developed, tested, and debugged
independently.
I like threads when they're used in this way, as application components
that manage some device-like thing like a socket or a graphic user interface
window, interacting through messages. Even then, though, there tend to
be a lot of undefined behaviors in events like termination of the main
thread, receipt of signals, etc.

That's how I tend to like using threads too. In practice I haven't found the
undefined behaviors to be too much trouble though, e.g. deciding on common
shutdown semantics for all child threads and making them daemon threads pretty
much takes care of both expected and unexpected shutdown of the main thread.

Usings threads and signals can be confusing and troublesome, but often in cases
where I would use them I end up wanting a richer interface anyway so something
besides signals is a better fit.

-Dave
 
D

Donn Cave

Donn Cave wrote:
[... re stackless inside-out event loop ]
Definitely, but for many cases it does not - having each thread represent a
distinct "worker" that pops some item of work off one queue, processes it,
and
puts it on another queue can really simplify things. Often this maps to
real-world objects quite well, additional steps can be inserted or removed
easily (and dynamically), and each worker can be developed, tested, and
debugged
independently.

Well, one of the things that makes the world interesting is
how many different universes we seem to be coming from, but
in mine, when I have divided an application into several
thread components, about the second time I need to send a
message from one thread to another, the sender needs something
back in return, as in T2 = from_thread_B(T1). At this point,
our conventional procedural model breaks up along a state fault,
so to speak, like

...
to_thread_B(T1)
return

def continue_from_T1(T1, T2):
...

So, yeah, now I have a model where each thread pops, processes
and pushes messages, but only because my program spent the night
in Procrustes' inn, not because it was a natural way to write
the computation. In a procedural language, anyway - there are
interesting alternatives, in particular a functional language
called O'Haskell that models threads in a "reactive object"
construct, an odd but elegant mix of state machine and pure
functional programming, but it's kind of a research project
and I know of nothing along these lines that's really supported
today.

Donn Cave, (e-mail address removed)
 
A

Aahz

Yes. Have you ever asked a polite question?

Yes. I just get a bit irritated with some of the standard lines that
people use.
--
Aahz ([email protected]) <*> http://www.pythoncraft.com/

"The joy of coding Python should be in seeing short, concise, readable
classes that express a lot of action in a small amount of clear code --
not in reams of trivial code that bores the reader to death." --GvR
 
P

Paul Rubin

[phr] The day is coming when even cheap computers have multiple cpu's.
See hyperthreading and the coming multi-core P4's, and the finally
announced Cell processor.

Conclusion: the GIL must die.

It's not clear to what extent these processors will perform well with
shared memory space. One of the things I remember most about Bruce
Eckel's discussions of Java and threading is just how broken Java's
threading model is in certain respects when it comes to CPU caches
failing to maintain cache coherency.

Um??? I'm not experienced with multiprocessors but I thought that
maintaining cache coherency was a requirement. What's the deal? If
coherency isn't maintained, is it really multiprocessing?
It's always going to be true that getting fully scaled performance
will require more CPUs with non-shared memory -- that's going to
mean IPC with multiple processes instead of threads.

But unless you use shared memory, the context switch overhead from
IPC becomes a bad bottleneck.

See http://poshmodule.sourceforge.net/posh/html/node1.html
for an interesting scheme of working around the GIL by spreading
naturally multi-threaded applications into multiple processes
(using shared memory). It would simplify things a lot if you could
just use threads.
 
D

Donn Cave

[email protected] (Aahz) said:
Yes. I just get a bit irritated with some of the standard lines that
people use.

Hey, stop me if you've heard this one: "I used threads to solve
my problem - and now I have two problems!"

Donn Cave, (e-mail address removed)
 
A

Aahz

Hey, stop me if you've heard this one: "I used threads to solve
my problem - and now I have two problems!"

Point to you. ;-)
--
Aahz ([email protected]) <*> http://www.pythoncraft.com/

"The joy of coding Python should be in seeing short, concise, readable
classes that express a lot of action in a small amount of clear code --
not in reams of trivial code that bores the reader to death." --GvR
 
D

Dennis Lee Bieber

Hey, stop me if you've heard this one: "I used threads to solve
my problem - and now I have two problems!"
<devil's advocate mode>

Your employee was so impressed by the quickness of your first
solution that he's assigned you twice as much work...

</devil's advocate mode>

--
 
A

Adrian Casey

Aahz said:
Threads and forks tend to be problematic. This is one case I'd recommend
against threads.

Multiple threads interacting with stdin/stdout? I've done it with 2 queues.
One for feeding the threads input and one for them to use for output. In
fact, using queues takes care of the serialization problems generally
associated with many threads trying to access a single resource (e.g.
stdout). Python Queues are thread-safe so you don't have to worry about
such issues.
 
P

Peter Hansen

Adrian said:
Multiple threads interacting with stdin/stdout? I've done it with 2 queues.
One for feeding the threads input and one for them to use for output. In
fact, using queues takes care of the serialization problems generally
associated with many threads trying to access a single resource (e.g.
stdout). Python Queues are thread-safe so you don't have to worry about
such issues.

Hee hee.... do you realize who you're writing these comments to?

This is like someone telling _me_ I could be more effective using
test-driven development to write my code... ;-)

-Peter
 
M

Martin Christensen

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Irmen> Naah. What about: http://www.razorvine.net/img/killGIL.jpg

Some people have too much spare time and too weird senses of
humour...

Fortunately for the rest of us. :) This one actually made me laugh
out loud.

Martin

- --
Homepage: http://www.cs.auc.dk/~factotum/
GPG public key: http://www.cs.auc.dk/~factotum/gpgkey.txt
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (GNU/Linux)
Comment: Using Mailcrypt+GnuPG <http://www.gnupg.org>

iEYEARECAAYFAkISBcwACgkQYu1fMmOQldUg2QCgq1ATLCJWqAS7SBsHpcXTduma
xjMAoII+AzDwkp2F2NZvw4PUrBUx+GDh
=Yqjf
-----END PGP SIGNATURE-----
 
A

Aahz

Multiple threads interacting with stdin/stdout? I've done it with 2
queues. One for feeding the threads input and one for them to use
for output. In fact, using queues takes care of the serialization
problems generally associated with many threads trying to access a
single resource (e.g. stdout). Python Queues are thread-safe so you
don't have to worry about such issues.

The problem is that each sub-process really needs its own stdin/stdout.
Also, to repeat, forking tends to be problematic with threads. Finally,
as Peter implied, I'm well-known on c.l.py for responding to thread
problems with, "Really? Are you using Queue? Why not?" However, this
is one case where Queue can't help.
--
Aahz ([email protected]) <*> http://www.pythoncraft.com/

"The joy of coding Python should be in seeing short, concise, readable
classes that express a lot of action in a small amount of clear code --
not in reams of trivial code that bores the reader to death." --GvR
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,579
Members
45,053
Latest member
BrodieSola

Latest Threads

Top