Parallelization in Python 2.6

Discussion in 'Python' started by Robert Dailey, Aug 18, 2009.

  1. I'm looking for a way to parallelize my python script without using
    typical threading primitives. For example, C++ has pthreads and TBB to
    break things into "tasks". I would like to see something like this for
    python. So, if I have a very linear script:

    doStuff1()
    doStuff2()


    I can parallelize it easily like so:

    create_task( doStuff1 )
    create_task( doStuff2 )

    Both of these functions would be called from new threads, and once
    execution ends the threads would die. I realize this is a simple
    example and I could create my own classes for this functionality, but
    I do not want to bother if a solution already exists.

    Thanks in advance.
    Robert Dailey, Aug 18, 2009
    #1
    1. Advertising

  2. Robert Dailey wrote:
    > I'm looking for a way to parallelize my python script without using
    > typical threading primitives. For example, C++ has pthreads and TBB to
    > break things into "tasks". I would like to see something like this for
    > python. So, if I have a very linear script:
    >
    > doStuff1()
    > doStuff2()
    >
    >
    > I can parallelize it easily like so:
    >
    > create_task( doStuff1 )
    > create_task( doStuff2 )
    >
    > Both of these functions would be called from new threads, and once
    > execution ends the threads would die. I realize this is a simple
    > example and I could create my own classes for this functionality, but
    > I do not want to bother if a solution already exists.


    I think the canonical answer is to use the threading module or (preferably)
    the multiprocessing module, which is new in Py2.6.

    http://docs.python.org/library/threading.html
    http://docs.python.org/library/multiprocessing.html

    Both share a (mostly) common interface and are simple enough to use. They
    are pretty close to the above interface already.

    Stefan
    Stefan Behnel, Aug 18, 2009
    #2
    1. Advertising

  3. On Aug 18, 11:19 am, Robert Dailey <> wrote:
    > I'm looking for a way to parallelize my python script without using
    > typical threading primitives. For example, C++ has pthreads and TBB to
    > break things into "tasks". I would like to see something like this for
    > python. So, if I have a very linear script:
    >
    > doStuff1()
    > doStuff2()
    >
    > I can parallelize it easily like so:
    >
    > create_task( doStuff1 )
    > create_task( doStuff2 )
    >
    > Both of these functions would be called from new threads, and once
    > execution ends the threads would die. I realize this is a simple
    > example and I could create my own classes for this functionality, but
    > I do not want to bother if a solution already exists.
    >


    If you haven't heard of the Python GIL, you'll want to find out sooner
    rather than later. Short summary: Python doesn't do threading very
    well.

    There are quite a few parallelization solutions out there for Python,
    however. (I don't know what they are off the top of my head, however.)
    The way they work is they have worker processes that can be spread
    across machines. When you want to parallelize a task, you send off a
    function to those worker threads.

    There are some serious caveats and problems, not the least of which is
    sharing code between the worker threads and the director, so this
    isn't a great solution.

    If you're looking for highly parallelized code, Python may not be the
    right answer. Try something like Erlang or Haskell.
    Jonathan Gardner, Aug 18, 2009
    #3
  4. On Aug 18, 3:41 pm, Jonathan Gardner <>
    wrote:
    > On Aug 18, 11:19 am, Robert Dailey <> wrote:
    >
    >
    >
    >
    >
    > > I'm looking for a way to parallelize my python script without using
    > > typical threading primitives. For example, C++ has pthreads and TBB to
    > > break things into "tasks". I would like to see something like this for
    > > python. So, if I have a very linear script:

    >
    > > doStuff1()
    > > doStuff2()

    >
    > > I can parallelize it easily like so:

    >
    > > create_task( doStuff1 )
    > > create_task( doStuff2 )

    >
    > > Both of these functions would be called from new threads, and once
    > > execution ends the threads would die. I realize this is a simple
    > > example and I could create my own classes for this functionality, but
    > > I do not want to bother if a solution already exists.

    >
    > If you haven't heard of the Python GIL, you'll want to find out sooner
    > rather than later. Short summary: Python doesn't do threading very
    > well.
    >
    > There are quite a few parallelization solutions out there for Python,
    > however. (I don't know what they are off the top of my head, however.)
    > The way they work is they have worker processes that can be spread
    > across machines. When you want to parallelize a task, you send off a
    > function to those worker threads.
    >
    > There are some serious caveats and problems, not the least of which is
    > sharing code between the worker threads and the director, so this
    > isn't a great solution.
    >
    > If you're looking for highly parallelized code, Python may not be the
    > right answer. Try something like Erlang or Haskell.


    Really, all I'm trying to do is the most trivial type of
    parallelization. Take two functions, execute them in parallel. This
    type of parallelization is called "embarrassingly parallel", and is
    the simplest form. There are no dependencies between the two
    functions. They do requires read-only access to shared data, though.
    And if they are being spawned as sub-processes this could cause
    problems, unless the multiprocess module creates pipelines or other
    means to handle this situation.
    Robert Dailey, Aug 18, 2009
    #4
  5. On Tue, 18 Aug 2009 13:45:38 -0700 (PDT), Robert Dailey
    <> declaimed the following in
    gmane.comp.python.general:


    > Really, all I'm trying to do is the most trivial type of
    > parallelization. Take two functions, execute them in parallel. This
    > type of parallelization is called "embarrassingly parallel", and is
    > the simplest form. There are no dependencies between the two
    > functions. They do requires read-only access to shared data, though.
    > And if they are being spawned as sub-processes this could cause
    > problems, unless the multiprocess module creates pipelines or other
    > means to handle this situation.


    If they are number crunchers (CPU-bound) and don't make use of
    binary extension libraries that release the GIL (for the most common
    Python implementation), they'll run faster being called in sequence
    since you won't have the overhead of task switching.

    For I/O bound tasks, which spend most of their time blocked waiting
    for an I/O, Python threads work fine with fairly rapid response time.
    --
    Wulfraed Dennis Lee Bieber KD6MOG
    HTTP://wlfraed.home.netcom.com/
    Dennis Lee Bieber, Aug 19, 2009
    #5
  6. Dennis Lee Bieber wrote:
    > On Tue, 18 Aug 2009 13:45:38 -0700 (PDT), Robert Dailey wrote:
    >> Really, all I'm trying to do is the most trivial type of
    >> parallelization. Take two functions, execute them in parallel. This
    >> type of parallelization is called "embarrassingly parallel", and is
    >> the simplest form. There are no dependencies between the two
    >> functions. They do requires read-only access to shared data, though.
    >> And if they are being spawned as sub-processes this could cause
    >> problems, unless the multiprocess module creates pipelines or other
    >> means to handle this situation.


    It wouldn't be much worth if it didn't, as the subprocess module handles
    everything else nicely. See the Queue classes.


    > If they are number crunchers (CPU-bound) and don't make use of
    > binary extension libraries that release the GIL (for the most common
    > Python implementation), they'll run faster being called in sequence
    > since you won't have the overhead of task switching.


    .... unless, obviously, the hardware is somewhat up to date (which is not
    that uncommon for number crunching environments) and can execute more than
    one thing at once.

    Stefan
    Stefan Behnel, Aug 19, 2009
    #6
  7. On Tuesday 18 August 2009 22:45:38 Robert Dailey wrote:

    > Really, all I'm trying to do is the most trivial type of
    > parallelization. Take two functions, execute them in parallel. This
    > type of parallelization is called "embarrassingly parallel", and is
    > the simplest form. There are no dependencies between the two
    > functions. They do requires read-only access to shared data, though.
    > And if they are being spawned as sub-processes this could cause
    > problems, unless the multiprocess module creates pipelines or other
    > means to handle this situation.


    Just use thread then and thread.start_new_thread.
    It just works.

    - Hendrik
    Hendrik van Rooyen, Aug 19, 2009
    #7
  8. Robert Dailey

    Paul Rubin Guest

    Hendrik van Rooyen <> writes:
    > Just use thread then and thread.start_new_thread.
    > It just works.


    The GIL doesn't apply to threads made like that?!
    Paul Rubin, Aug 19, 2009
    #8
  9. Robert Dailey

    sturlamolden Guest

    On 18 Aug, 11:19, Robert Dailey <> wrote:

    > I'm looking for a way to parallelize my python script without using
    > typical threading primitives. For example, C++ has pthreads and TBB to
    > break things into "tasks".


    In C++, parallelization without "typical threading primitives" usually
    means one of three things:

    - OpenMP pragmas
    - the posix function fork(), unless you are using Windows
    - MPI

    In Python, you find the function os.fork and wrappers for MPI, and
    they are used as in C++. With os.fork, I like to use a context
    manager, putting the calls to fork in __enter__ and the calls to
    sys.exit in __exit__. Then I can just write code like this:

    with parallel():
    # parallel block here

    You can also program in the same style as OpenMP using closures. Just
    wrap whatever loop or block you want to execute in parallel in a
    closure. It requires minimal edition of the serial code. Instead of

    def foobar():
    for i in iterable:
    #whatever

    you can add a closure (internal function) and do this:

    def foobar():
    def section(): # add a closure
    for i in sheduled(iterable): # balance load
    #whatever
    parallel(section) # spawn off threads

    Programs written in C++ are much more difficult to parallelize with
    threads because C++ do not have closures. This is why pragma-based
    parallelization (OpenMP) was invented:

    #pragma omp parallel for private(i)
    for (i=0; i<n; i++) {
    // whatever
    }

    You should know about the GIL. It prevents multiple threads form using
    the Python interpreter simultaneously. For parallel computing, this is
    a blessing and a curse. Only C extensions can release the GIL; this
    includes I/0 routines in Python's standard library. If the GIL is not
    released, the C library call are guaranteed to be thread-safe.
    However, the Python interpreter will be blocked while waiting for the
    library call to return. If the GIL is released, parallelization works
    as expected; you can also utilise multi-core CPUs (it is a common
    misbelief that Python cannot do this).

    What the GIL prevents you from doing, is writing parallel compute-
    bound code in "pure python" using threads. Most likely, you don't want
    to do this. There is a 200x speed penalty from using Python over a C
    extension. If you care enough about speed to program for parallel
    execution, you should always use some C. If you still want to do this,
    you can use processes instead (os.fork, multiprocessing, MPI), as the
    GIL only affects threads.

    It should be mentioned that compute-bound code is very rare, and
    typically involves scientific computing. The only every-day example is
    3D graphics. However, this is taken care of by the GPU and libraries
    like OpenGL and Direct3D. Most parallel code you will want to write
    are I/O bound. You can use the Python standard library and threads for
    this, as it releases the GIL whenever a blocking call is made.

    I program Python for scientific computing daily (computational
    neuroscience). I have yet to experience that the GIL has hindered me
    in my work. This is because whenever I run into a computational
    bottleneck I cannot solve with NumPy, putting this tiny piece of code
    in Fortran, C or Cython involves very little work. 95% is still
    written in plain Python. The human brain is bad at detecting
    computational bottlenecks though. So it almost always pays off to
    write everything in Python first, and use the profiler to locate the
    worst offenders.

    Regards,
    Sturla Molden
    sturlamolden, Aug 19, 2009
    #9
  10. Robert Dailey

    sturlamolden Guest

    On 18 Aug, 13:45, Robert Dailey <> wrote:

    > Really, all I'm trying to do is the most trivial type of
    > parallelization. Take two functions, execute them in parallel. This
    > type of parallelization is called "embarrassingly parallel", and is
    > the simplest form. There are no dependencies between the two
    > functions.


    If you are using Linux or Mac, just call os.fork for this.

    You should also know that you function "create_task" is simply

    from threading import Thread
    def create_task(task):
    Thread(target=task).start()

    If your task releases the GIL, this will work fine.


    > They do requires read-only access to shared data, though.
    > And if they are being spawned as sub-processes this could cause
    > problems, unless the multiprocess module creates pipelines or other
    > means to handle this situation.


    With forking or multiprocessing, you have to use IPC. That is, usually
    pipes, unix sockets / named pipes, or shared memory. Multiprocessing
    helps you with this. Multiprocessing also has a convinient Queue
    object for serialised read/write access to a pipe.

    You can also create shared memory with mmap.mmap, using fd 0 with
    Windows or -1 with Linux.
    sturlamolden, Aug 19, 2009
    #10
  11. On Wednesday 19 August 2009 10:13:41 Paul Rubin wrote:
    > Hendrik van Rooyen <> writes:
    > > Just use thread then and thread.start_new_thread.
    > > It just works.

    >
    > The GIL doesn't apply to threads made like that?!


    The GIL does apply - I was talking nonsense again. Misread the OP's
    intention.

    - Hendrik
    Hendrik van Rooyen, Aug 19, 2009
    #11
  12. Robert Dailey

    sturlamolden Guest

    On 19 Aug, 05:27, Dave Angel <> wrote:

    > But if you do it that way, it's slower than sequential.  And if you have
    > a multi-core processor, or two processors, or ...   then it gets much
    > slower yet, and slows down other tasks as well.
    >
    > With the current GIL implementation, for two CPU-bound tasks, you either
    > do them sequentially, or make a separate process.


    For CPU bound tasks, one should put the bottleneck in C/Fortran/Cython
    and release the GIL. There is a speed penalty of 200x from using
    Python instead of C. With a quadcore processor you can gain less than
    4x speed-up by parallelizing. If you really care enough about speed to
    write parallel code, the first thing you should do is migrate the
    bottleneck to C.
    sturlamolden, Aug 19, 2009
    #12
  13. Robert Dailey

    sturlamolden Guest

    On 19 Aug, 05:16, sturlamolden <> wrote:

    > You should know about the GIL. It prevents multiple threads form using
    > the Python interpreter simultaneously. For parallel computing, this is
    > a blessing and a curse. Only C extensions can release the GIL; this
    > includes I/0 routines in Python's standard library. If the GIL is not
    > released, the C library call are guaranteed to be thread-safe.
    > However, the Python interpreter will be blocked while waiting for the
    > library call to return. If the GIL is released, parallelization works
    > as expected; you can also utilise multi-core CPUs (it is a common
    > misbelief that Python cannot do this).



    Since I am at it, this is how the GIL can be released:

    - Many functions in Python's standard library, particularly all
    blocking i/o functions, release the GIL.

    - In C or C++ extensions, use the macros Py_BEGIN_ALLOW_THREADS and
    Py_END_ALLOW_THREADS.

    - With ctypes, functions called from a cdll release the GIL, whereas
    functions called from a pydll do not.

    - In f2py, declaring a Fortran function threadsafe in a .pyf file or
    cf2py comment releases the GIL.

    - In Cython or Pyrex, use a "with nogil:" block to execute code
    without holding the GIL.


    Regards,
    Sturla Molden
    sturlamolden, Aug 19, 2009
    #13
  14. Robert Dailey

    sturlamolden Guest

    On 19 Aug, 05:27, Dave Angel <> wrote:

    > With the current GIL implementation, for two CPU-bound tasks, you either
    > do them sequentially, or make a separate process.


    I'd also like to add that most compute-bound code should be delegated
    to specialized C libraries, many of which are prewritten. For example,
    FFTW, Intel MKL, ATLAS, LAPACK, NAG. When you do this, the GIL has no
    consequence unless it is kept locked. So even for scientific programs,
    writing parallel compute-bound code mostly involves calling into a C
    or Fortran library with the GIL released. I have yet to see compute-
    bound code that could not be easily migrated to C or Fortran, either
    using existing libraries (the common case) or specialised code.
    sturlamolden, Aug 19, 2009
    #14
  15. Robert Dailey

    sturlamolden Guest

    On 19 Aug, 05:34, Hendrik van Rooyen <> wrote:

    > The GIL does apply - I was talking nonsense again.  Misread the OP's
    > intention.


    It depends on what the OP's functions "doStuff1" and "doStuff2"
    actually do. If they release the GIL (e.g. make I/O calls) it does not
    apply. The GIL only serialize access to the interpreter.
    sturlamolden, Aug 19, 2009
    #15
  16. Robert Dailey

    sturlamolden Guest

    On 18 Aug, 11:41, Stefan Behnel <> wrote:

    > I think the canonical answer is to use the threading module or (preferably)
    > the multiprocessing module, which is new in Py2.6.
    >
    > http://docs.python.org/library/threading.htmlhttp://docs.python.org/library/multiprocessing.html
    >
    > Both share a (mostly) common interface and are simple enough to use. They
    > are pretty close to the above interface already.


    There is a big difference between them, which is that multiprocessing
    do not work with closures. This means that the threading module is
    simpler to use than multiprocessing if you want to parallelize serial
    code. You just wrap a closure around whatever block of code you want
    to run in a thread. For the same reason, programming with OpenMP is
    easier than using pthreads directly in C/C++. C does not have
    closures, which is the raison d'etre for OpenMP. Multiprocessing has
    the same limitation as abstraction for parallel programming as
    pthreads in C. Python's threading module do not, but the GIL can be a
    limitation.
    sturlamolden, Aug 19, 2009
    #16
  17. Robert Dailey

    Neal Becker Guest

    sturlamolden wrote:

    > On 18 Aug, 11:41, Stefan Behnel <> wrote:
    >
    >> I think the canonical answer is to use the threading module or
    >> (preferably) the multiprocessing module, which is new in Py2.6.
    >>
    >>

    http://docs.python.org/library/threading.htmlhttp://docs.python.org/library/multiprocessing.html
    >>
    >> Both share a (mostly) common interface and are simple enough to use. They
    >> are pretty close to the above interface already.

    >
    > There is a big difference between them, which is that multiprocessing
    > do not work with closures. This means that the threading module is
    > simpler to use than multiprocessing if you want to parallelize serial
    > code. You just wrap a closure around whatever block of code you want
    > to run in a thread.


    Do you have an example of this technique?
    Neal Becker, Aug 19, 2009
    #17
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. P.M.
    Replies:
    60
    Views:
    1,231
    Alex Martelli
    Oct 26, 2004
  2. P.M.
    Replies:
    0
    Views:
    300
  3. Mathias
    Replies:
    5
    Views:
    543
    Albert Hofkamp
    Jan 4, 2005
  4. Mike M?ller
    Replies:
    0
    Views:
    343
    Mike M?ller
    Dec 21, 2004
  5. Jp Calderone
    Replies:
    1
    Views:
    398
    Irmen de Jong
    Dec 21, 2004
Loading...

Share This Page