About threads in python

D

dutche

Hi folks, how are ya?

Here's the thing...I had to make a program with threads and after
finished, I found some posts and articles in Google about Python and
threads, raising the question about if it really implements thread
programming or not, because of GIL and the way Python needs to lock
some objects.

I have now in my program something about 8 threads and each of them
runs different code based on some directives, these codes relies
sometimes in C api, using classes like zipfile and tarfile, and others
relies only in python code.

My question is about the efficiency of threads in python, does anybody
has something to share?

Thanks in advance

Eduardo
 
S

Stefan Behnel

dutche, 21.04.2011 15:19:
Here's the thing...I had to make a program with threads and after
finished, I found some posts and articles in Google about Python and
threads, raising the question about if it really implements thread
programming or not, because of GIL and the way Python needs to lock
some objects.

I have now in my program something about 8 threads and each of them
runs different code based on some directives, these codes relies
sometimes in C api, using classes like zipfile and tarfile, and others
relies only in python code.

What do you mean by "relies sometimes on C api"? Do you mean that it uses
binary modules, as opposed to Python modules?

My question is about the efficiency of threads in python, does anybody
has something to share?

From your (rather unspecific) description, I gather that you are doing
mostly I/O operations. Threading is an acceptable programming model for
that, and Python supports it just fine. The GIL is usually released for I/O
operations by the runtime, so if your program is I/O bound, you will get
full threading concurrency.

Stefan
 
S

sturlamolden

My question is about the efficiency of threads in python, does anybody
has something to share?

Never mind all the FUD about the GIL. Most of it is ill-informed
and plain wrong.

The GIL prevents you from doing one thing, which is parallel
compute-bound code in plain Python. But that is close to
idiotic use of Python threads anyway. I would seriously
question the competance of anyone attempting this, regardless
of the GIL. It is optimising computational code in the wrong
end.

To optimise computational code, notice that Python itself
gives you a 200x performance penalty. That is much more
important than not using all 4 cores on a quadcore processor.
In this case, start by identifying bottlenecks using the
profiler. Then apply C libraries or these or rewrite to Cython.
If that is not sufficient, you can start to think about using
more hardware (e.g. multithreading in C or Cython). This advice
only applies to computational code though. Most usecases for
Python will be i/o bound (file i/o, GUI, database, webserver,
internet client), for which the GIL is not an issue at all.

Python threads will almost always do what you expect. Try
your code first -- if they don't scale, then ask this question.
Usually the problem will be in your own code, and have nothing
to do with the GIL. This is almost certainly the case if an
i/o bound server do not scale, as i/o bound Python code are
(almost) never affected by the GIL.

In cases where a multi-threaded i/o solution do not scale,
you likely want to use asynchronous design instead, as the problem
can be the use of threads per se. See if Twisted fits your need.
Scalability problems for i/o bound server might also be external to
Python. For example it could be a database server and not the use
of Python threads. For example switching from Sqlite to
Microsoft SQL server will have impact on the way your program
behaves under concurrent load: Sqlite is is faster per query,
but has a global lock. If you want concurrent access to a
database, a global lock in the database is a much more important
issue than a global lock in the Python interpreter. If you want
a fast response, the sluggishness of the database can be more
important than the synchronization of the Python code.

In the rare event that the GIL is an issue, there are still
things you can do:

You can always use processes instead of threads (i.e.
multiprocessing.Process instead of threading.Thread). Since
the API is similar, writing threaded code is never a waste of
effort.

There are also Python implementations that don't have a GIL
(PyPy, Jython, IronPython). You can just swap interpreter
and see if it scales better.

Testing with another interpreter or multiprocessing is a good
litmus test to see if the problem is in your own code.

Cython and Pyrex are compilers for a special CPython extension
module language. They give you full control over the GIL, as well
as the speed of C when you need it. They can make Python threads
perform as good as threads in C for computational code -- I have
e.g. compared with OpenMP to confirm for myself, and there is
really no difference.

You thus have multiple options if the GIL gets in your way,
without major rewrite, including:

- Interpreter without a GIL (PyPy, Jython, IronPython)
- multiprocessing.Process instead of threading.Thread
- Cython or Pyrex

Things that require a little bit more effort include:

- Use a multi-threaded C library for your task.
- ctypes.CDLL
- OpenMP in C/C++, call with ctypes or Cython
- Outproc COM+ server + pywin32
- Fortran + f2py
- rewrite to use os.fork (except Windows)


IMHO:

The biggest problems with the GIL is not the GIL, but
bullshit FUD and C libraries that don't release the GIL
as often as they should. NumPy ans SciPy is notorius cases
of the latter, and there are similar cases in the standard
library as well.

If I should give an advice it would be to just try Python threads
and see for your self. Usually they will do what you expect.
You only have a problem if they do not. In that case there
are plenty of things that can be done, most of them with very
little effort.


Sturla
 
H

Hans Georg Schaathun

To optimise computational code, notice that Python itself
: gives you a 200x performance penalty. That is much more
: important than not using all 4 cores on a quadcore processor.
: In this case, start by identifying bottlenecks using the
: profiler. Then apply C libraries or these or rewrite to Cython.
: If that is not sufficient, you can start to think about using
: more hardware (e.g. multithreading in C or Cython). This advice
: only applies to computational code though.

And not necessarily even there. The extra programmers to recode
in C come with more than a 200x cost factor. It is almost trivial
to make a multithread map implementation which could have exploited
umpteen core box were it not for GIL. That would be a cheap gain.
It matters little that you could gain 100x more at 200x cost ...
Besides, the bottleneck is likely to be deeply embedded in some
library like numpy or scipy already.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,770
Messages
2,569,583
Members
45,074
Latest member
StanleyFra

Latest Threads

Top