threading support in python

K

km

Hi all,

Is there any PEP to introduce true threading features into python's
next version as in java? i mean without having GIL.
when compared to other languages, python is fun to code but i feel its
is lacking behind in threading

regards,
KM
 
K

km

Hi all,
Are there any alternate ways of attaining true threading in python ?
if GIL doesnt go then does it mean that python is useless for
computation intensive scientific applications which are in need of
parallelization in threading context ?

regards,
KM
 
B

bayerj

Hi,

You might want to split your calculation onto different
worker-processes.

Then you can use POSH [1] to share data and objects.
You might even want to go a step further and share the data via
Sockets/XML-RPC or something like that. That makes it easy to throw
aditional boxes at a specific calculation, because it can be set up in
about no time.
You can even use Twisted Spread [2] and its perspective broker to do
this on a higher level.

If that's not what you want, you are left with Java I guess.

Regards,
-Justin

[1] http://poshmodule.sourceforge.net/
[2] http://twistedmatrix.com/projects/core/documentation/howto/pb.html
 
S

Sybren Stuvel

km enlightened us with:
Is there any PEP to introduce true threading features into python's
next version as in java? i mean without having GIL.

What is GIL? Except for the Dutch word for SCREAM that is...
when compared to other languages, python is fun to code but i feel
its is lacking behind in threading

What's wrong with the current threading? AFAIK it's directly linked to
the threading of the underlying platform.

Sybren
 
R

Richard Brodie

if GIL doesnt go then does it mean that python is useless for
computation intensive scientific applications which are in need of
parallelization in threading context ?

No.
 
D

Diez B. Roggisch

Sybren said:
km enlightened us with:

What is GIL? Except for the Dutch word for SCREAM that is...

the global interpreter lock, that prevents python from concurrently
modifying internal structures causing segfaults.
What's wrong with the current threading? AFAIK it's directly linked to
the threading of the underlying platform.

There exist rare cases (see the link from bayerj) where the GIL is an
annoyance, and with the dawn of MP-cores all over the place it might be
considered a good idea removing it - maybe. But I doubt that is something
to be considered for py2.x

Diez
 
S

Sandra-24

The trouble is there are some environments where you are forced to use
threads. Apache and mod_python are an example. You can't make use of
mutliple CPUs unless you're on *nux and run with multiple processes AND
you're application doesn't store large amounts of data in memory (which
mine does) so you'd have to physically double the computer's memory for
a daul-core, or quadruple it for a quadcore. And forget about running a
windows server, apache will not even run with multiple processes.

In years to come this will be more of an issue because single core CPUs
will be harder to come by, you'll be throwing away half of every CPU
you buy.

-Sandra
 
D

Daniel Dittmar

km said:
Is there any PEP to introduce true threading features into python's
next version as in java? i mean without having GIL.
when compared to other languages, python is fun to code but i feel its
is lacking behind in threading

Some of the technical problems:

- probably breaks compatibility of extensions at the source level in a
big way, although this might be handled by SWIG, boost and other code
generators
- reference counting will have to be synchronized, which means that
Python will become slower
- removing reference counting and relying on garbage collection alone
will break many Python applications (because they rely on files being
closed at end of scope etc.)

Daniel
 
R

Rob Williscroft

Daniel Dittmar wrote in in
comp.lang.python:
- removing reference counting and relying on garbage collection alone
will break many Python applications (because they rely on files being
closed at end of scope etc.)

They are already broken on at least 2 python implementations, so
why worry about another one.

Rob.
 
S

sjdevnull

Sandra-24 said:
The trouble is there are some environments where you are forced to use
threads. Apache and mod_python are an example. You can't make use of
mutliple CPUs unless you're on *nux and run with multiple processes AND
you're application doesn't store large amounts of data in memory (which
mine does) so you'd have to physically double the computer's memory for
a daul-core, or quadruple it for a quadcore.

You seem to be confused about the nature of multiple-process
programming.

If you're on a modern Unix/Linux platform and you have static read-only
data, you can just read it in before forking and it'll be shared
between the processes..

If it's read/write data or you're not on a Unix platform, you can use
shared memory to shared it between many processes.

Threads are way overused in modern multiexecution programming. The
decision on whether to use processes or threads should come down to
whether you want to share everything, or whether you have specific
pieces of data you want to share. With processes + shm, you can gain
the security of protected memory for the majority of your code + data,
only sacrificing it where you need to share the data.

The entire Windows programming world tends to be so biased toward
multithreading that they often don't even acknowledge the existence of
generally superior alternatives. I think that's in large part because
historically on Windows 3.1/95/98 there was no good way to create
processes without running a new binary, and so a culture of threading
grew up. Even today many Windows programmers are unfamiliar with using
CreateProcessEx with SectionHandle=NULL for efficient copy-on-write
process creation.
And forget about running a
windows server, apache will not even run with multiple processes.

It used to run on windows with multiple processes. If it really won't
now, use an older version or contribute a fix.

Now, the GIL is independent of this; if you really need threading in
your situation (you share almost everything and have hugely complex
data structures that are difficult to maintain in shm) then you're
still going to run into GIL serialization. If you're doing a lot of
work in native code extensions this may not actually be a big
performance hit, if not it can be pretty bad.
 
P

Paul Rubin

If it's read/write data or you're not on a Unix platform, you can use
shared memory to shared it between many processes.

Threads are way overused in modern multiexecution programming. The
decision on whether to use processes or threads should come down to
whether you want to share everything, or whether you have specific
pieces of data you want to share.

Shared memory means there's a byte vector (the shared memory region)
accessible to multiple processes. The processes don't use the same
machine addresses to reference the vector. Any data structures
(e.g. those containing pointers) shared between the processes have to
be marshalled in and out of the byte vector instead of being accessed
normally. Any live objects such as open sockets have to be shared
some other way. It's not a matter of sharing "everything"; shared
memory is a pain in the neck even to share a single object. These
things really can be easier with threads.
 
D

Daniel Dittmar

Rob said:
Daniel Dittmar wrote in in
comp.lang.python:




They are already broken on at least 2 python implementations, so
why worry about another one.

I guess few applications or libraries are being ported from CPython to
Jython or IronPython as each is targeting a different standard library,
so this isn't that much of a problem yet.

Daniel
 
S

Sandra-24

You seem to be confused about the nature of multiple-process
programming.

If you're on a modern Unix/Linux platform and you have static read-only
data, you can just read it in before forking and it'll be shared
between the processes..

Not familiar with *nix programming, but I'll take your word on it.
If it's read/write data or you're not on a Unix platform, you can use
shared memory to shared it between many processes.

I know how shared memory works, it's the last resort in my opinion.
Threads are way overused in modern multiexecution programming. The

It used to run on windows with multiple processes. If it really won't
now, use an older version or contribute a fix.

First of all I'm not in control of spawning processes or threads.
Apache does that, and apache has no MPM for windows that uses more than
1 process. Secondly "Superior" is definately a matter of opinion. Let's
see how you would define superior.

1) Port (a nicer word for rewrite) the worker MPM from *nix to Windows.
2) Alternately switch to running Linux servers (which have their
plusses) but about which I know nothing. I've been using Windows since
I was 10 years old, I'm confident in my ability to build, secure, and
maintain a Windows server. I don't think anyone would recommend me to
run Linux servers with very little in the way of Linux experience.
3) Rewrite my codebase to use some form of shared memory. This would be
a terrible nightmare that would take at least a month of development
time and a lot of heavy rewriting. It would be very difficult, but I'll
grant that it may work if done properly with only small performance
losses. Sounds like a deal.

I would find an easier time, I think, porting mod_python to .net and
leaving that GIL behind forever. Thankfully, I'm not considering such
drastic measures - yet.

Why on earth would I want to do all of that work? Just because you want
to keep this evil thing called a GIL? My suggestion is in python 3
ditch the ref counting, use a real garbage collector, and make that GIL
walk the plank. I have my doubts that it would happen, but that's fine,
the future of python is in things like IronPython and PyPy. CPython's
days are numbered. If there was a mod_dotnet I wouldn't be using
CPython anymore.
Now, the GIL is independent of this; if you really need threading in
your situation (you share almost everything and have hugely complex
data structures that are difficult to maintain in shm) then you're
still going to run into GIL serialization. If you're doing a lot of
work in native code extensions this may not actually be a big
performance hit, if not it can be pretty bad.

Actually, I'm not sure I understand you correctly. You're saying that
in an environment like apache (with 250 threads or so) and my hugely
complex shared data structures, that the GIL is going to cause a huge
performance hit? So even if I do manage to find my way around in the
Linux world, and I upgrade my memory, I'm still going to be paying for
that darned GIL?

Will the madness never end?
-Sandra
 
S

Steve Holden

Sandra-24 wrote:
[Sandra understands shared memory]
I would find an easier time, I think, porting mod_python to .net and
leaving that GIL behind forever. Thankfully, I'm not considering such
drastic measures - yet.
Quite right too. You haven't even sacrificed a chicken yet ...
Why on earth would I want to do all of that work? Just because you want
to keep this evil thing called a GIL? My suggestion is in python 3
ditch the ref counting, use a real garbage collector, and make that GIL
walk the plank. I have my doubts that it would happen, but that's fine,
the future of python is in things like IronPython and PyPy. CPython's
days are numbered. If there was a mod_dotnet I wouldn't be using
CPython anymore.
You write as though the GIL was invented to get in the programmer's way,
which is quite wrong. It's there to avoid deep problems with thread
interaction. Languages that haven't bitten that bullet can bite you in
quite nasty ways when you write threaded applications.

Contrary to your apparent opinion, the GIL has nothing to do with
reference-counting.
Actually, I'm not sure I understand you correctly. You're saying that
in an environment like apache (with 250 threads or so) and my hugely
complex shared data structures, that the GIL is going to cause a huge
performance hit? So even if I do manage to find my way around in the
Linux world, and I upgrade my memory, I'm still going to be paying for
that darned GIL?
I think the suggestion was rather that abandoning Python because of the
GIL might be premature optimisation. But since you appear to be sticking
with it, that might have been unnecessary advice.
Will the madness never end?

This reveals an opinion of the development team that's altogether too
low. I believe the GIL was introduced for good reasons.

regards
Steve
 
P

Paul Rubin

Steve Holden said:
You write as though the GIL was invented to get in the programmer's
way, which is quite wrong. It's there to avoid deep problems with
thread interaction. Languages that haven't bitten that bullet can bite
you in quite nasty ways when you write threaded applications.

And yet, Java programmers manage to write threaded applications all
day long without getting bitten (once they're used to the issues),
despite usually being less skilled than Python programmers ;-).
Contrary to your apparent opinion, the GIL has nothing to do with
reference-counting.

I think it does, i.e. one of the GIL's motivations was to protect the
management of reference counts in CPython, which otherwise wasn't
thread-safe. The obvious implementation of Py_INCREF has a race
condition, for example. The GIL documentation at

http://docs.python.org/api/threads.html

describes this in its very first paragraph.
This reveals an opinion of the development team that's altogether too
low. I believe the GIL was introduced for good reasons.

The GIL was an acceptable tradeoff when it was first created in the
previous century. First of all, it gave a way to add threads to the
existing, non-threadsafe CPython implementation without having to
rework the old code too much. Second, Python was at that time
considered a "scripting language" and there was less concern about
writing complex apps in it, especially multiprocessing apps. Third,
multiprocessor computers were themselves exotic, so people who wanted
to program them probably had exotic problems that they were willing to
jump through hoops to solve.

These days, even semi-entry-level consumer laptop computers have dual
core CPU's, and quad Opteron boxes (8-way multiprocessing using X2
processors) are quite affordable for midrange servers or engineering
workstations, and there's endless desire to write fancy server apps
completely in Python. There is no point paying for all that
multiprocessor hardware if your programming language won't let you use
it. So, Python must punt the GIL if it doesn't want to keep
presenting undue obstacles to writing serious apps on modern hardware.
 
S

Sandra-24

Steve said:
Quite right too. You haven't even sacrificed a chicken yet ...

Hopefully we don't get to that point.
You write as though the GIL was invented to get in the programmer's way,
which is quite wrong. It's there to avoid deep problems with thread
interaction. Languages that haven't bitten that bullet can bite you in
quite nasty ways when you write threaded applications.

I know it was put there because it is meant to be a good thing.
However, it gets in my way. I would be perfectly happy if it were gone.
I've never written code that assumes there's a GIL. I always write my
code with all shared writable objects protected by locks. It's far more
portable, and a good habit to get into. You realize that because of the
GIL, they were discussing (and may have already implemented) Java style
synchronized dictionaries and lists for IronPython simply because
python programmers just assume they are thread safe thanks to the GIL.
I always hated that about Java. If you want to give me thread safe
collections, fine, they'll be nice for sharing between threads, but
don't make me use synchronized collections for single-threaded code.
You'll notice the newer Java collections are not synchronized, it would
seem I'm not alone in that opinion.
Contrary to your apparent opinion, the GIL has nothing to do with
reference-counting.

Actually it does. Without the GIL reference counting is not thread
safe. You have to synchronize all reference count accesses, increments,
and decrements because you have no way of knowing which objects get
shared across threads. I think with Python's current memory management,
the GIL is the lesser evil.

I'm mostly writing this to provide a different point of view, many
people seem to think (previously linked blog) that there is no downside
to the GIL, and that's just not true. However, I don't expect that the
GIL can be safely removed from CPython. I also think that it doesn't
matter because projects like IronPython and PyPy are very likely the
way of the future for Python anyway. Once you move away from C there
are so many more things you can do.
I think the suggestion was rather that abandoning Python because of the
GIL might be premature optimisation. But since you appear to be sticking
with it, that might have been unnecessary advice.

I would never abandon Python, and I hold the development team in very
high esteem. That doesn't mean there's a few things (like the GIL, or
super) that I don't like. But overall they've done an excellent job on
the 99% of things the've got right. I guess we don't say that enough.

I might switch from CPython sometime to another implementation, but it
won't be because of the GIL. I'm very fond of the .net framework as a
library, and I'd also rather write performance critical code in C# than
C (who wouldn't?) I'm also watching PyPy with interest.

-Sandra
 
B

Bryan Olson

bayerj said:
Then you can use POSH [1] to share data and objects.

Do you use POSH? How well does it work with current Python?
Any major gotchas?

I think POSH looks like a great thing to have, but the latest
version is an alpha from over three years ago. Also, it only
runs on *nix systems.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top