threading support in python

Discussion in 'Python' started by km, Sep 4, 2006.

  1. km

    km Guest

    Hi all,

    Is there any PEP to introduce true threading features into python's
    next version as in java? i mean without having GIL.
    when compared to other languages, python is fun to code but i feel its
    is lacking behind in threading

    regards,
    KM
     
    km, Sep 4, 2006
    #1
    1. Advertising

  2. km

    bayerj Guest

    bayerj, Sep 4, 2006
    #2
    1. Advertising

  3. km

    km Guest

    Hi all,
    Are there any alternate ways of attaining true threading in python ?
    if GIL doesnt go then does it mean that python is useless for
    computation intensive scientific applications which are in need of
    parallelization in threading context ?

    regards,
    KM
    ---------------------------------------------------------------------------
    On 4 Sep 2006 07:58:00 -0700, bayerj <> wrote:
    > Hi,
    >
    > GIL won't go. You might want to read
    > http://blog.ianbicking.org/gil-of-doom.html .
    >
    > Regards,
    > -Justin
    >
    > --
    > http://mail.python.org/mailman/listinfo/python-list
    >
     
    km, Sep 4, 2006
    #3
  4. km

    bayerj Guest

    Hi,

    You might want to split your calculation onto different
    worker-processes.

    Then you can use POSH [1] to share data and objects.
    You might even want to go a step further and share the data via
    Sockets/XML-RPC or something like that. That makes it easy to throw
    aditional boxes at a specific calculation, because it can be set up in
    about no time.
    You can even use Twisted Spread [2] and its perspective broker to do
    this on a higher level.

    If that's not what you want, you are left with Java I guess.

    Regards,
    -Justin

    [1] http://poshmodule.sourceforge.net/
    [2] http://twistedmatrix.com/projects/core/documentation/howto/pb.html
     
    bayerj, Sep 4, 2006
    #4
  5. km enlightened us with:
    > Is there any PEP to introduce true threading features into python's
    > next version as in java? i mean without having GIL.


    What is GIL? Except for the Dutch word for SCREAM that is...

    > when compared to other languages, python is fun to code but i feel
    > its is lacking behind in threading


    What's wrong with the current threading? AFAIK it's directly linked to
    the threading of the underlying platform.

    Sybren
    --
    Sybren Stüvel
    Stüvel IT - http://www.stuvel.eu/
     
    Sybren Stuvel, Sep 4, 2006
    #5
  6. "km" <> wrote in message
    news:...

    > if GIL doesnt go then does it mean that python is useless for
    > computation intensive scientific applications which are in need of
    > parallelization in threading context ?


    No.
     
    Richard Brodie, Sep 4, 2006
    #6
  7. Sybren Stuvel wrote:

    > km enlightened us with:
    >> Is there any PEP to introduce true threading features into python's
    >> next version as in java? i mean without having GIL.

    >
    > What is GIL? Except for the Dutch word for SCREAM that is...


    the global interpreter lock, that prevents python from concurrently
    modifying internal structures causing segfaults.

    >> when compared to other languages, python is fun to code but i feel
    >> its is lacking behind in threading

    >
    > What's wrong with the current threading? AFAIK it's directly linked to
    > the threading of the underlying platform.


    There exist rare cases (see the link from bayerj) where the GIL is an
    annoyance, and with the dawn of MP-cores all over the place it might be
    considered a good idea removing it - maybe. But I doubt that is something
    to be considered for py2.x

    Diez
     
    Diez B. Roggisch, Sep 4, 2006
    #7
  8. km

    Sandra-24 Guest

    The trouble is there are some environments where you are forced to use
    threads. Apache and mod_python are an example. You can't make use of
    mutliple CPUs unless you're on *nux and run with multiple processes AND
    you're application doesn't store large amounts of data in memory (which
    mine does) so you'd have to physically double the computer's memory for
    a daul-core, or quadruple it for a quadcore. And forget about running a
    windows server, apache will not even run with multiple processes.

    In years to come this will be more of an issue because single core CPUs
    will be harder to come by, you'll be throwing away half of every CPU
    you buy.

    -Sandra
     
    Sandra-24, Sep 4, 2006
    #8
  9. km wrote:
    > Is there any PEP to introduce true threading features into python's
    > next version as in java? i mean without having GIL.
    > when compared to other languages, python is fun to code but i feel its
    > is lacking behind in threading


    Some of the technical problems:

    - probably breaks compatibility of extensions at the source level in a
    big way, although this might be handled by SWIG, boost and other code
    generators
    - reference counting will have to be synchronized, which means that
    Python will become slower
    - removing reference counting and relying on garbage collection alone
    will break many Python applications (because they rely on files being
    closed at end of scope etc.)

    Daniel
     
    Daniel Dittmar, Sep 4, 2006
    #9
  10. Daniel Dittmar wrote in news:edhl07$b2t$-ag.de in
    comp.lang.python:

    > - removing reference counting and relying on garbage collection alone
    > will break many Python applications (because they rely on files being
    > closed at end of scope etc.)
    >


    They are already broken on at least 2 python implementations, so
    why worry about another one.

    Rob.
    --
    http://www.victim-prime.dsl.pipex.com/
     
    Rob Williscroft, Sep 4, 2006
    #10
  11. km

    Guest

    Sandra-24 wrote:
    > The trouble is there are some environments where you are forced to use
    > threads. Apache and mod_python are an example. You can't make use of
    > mutliple CPUs unless you're on *nux and run with multiple processes AND
    > you're application doesn't store large amounts of data in memory (which
    > mine does) so you'd have to physically double the computer's memory for
    > a daul-core, or quadruple it for a quadcore.


    You seem to be confused about the nature of multiple-process
    programming.

    If you're on a modern Unix/Linux platform and you have static read-only
    data, you can just read it in before forking and it'll be shared
    between the processes..

    If it's read/write data or you're not on a Unix platform, you can use
    shared memory to shared it between many processes.

    Threads are way overused in modern multiexecution programming. The
    decision on whether to use processes or threads should come down to
    whether you want to share everything, or whether you have specific
    pieces of data you want to share. With processes + shm, you can gain
    the security of protected memory for the majority of your code + data,
    only sacrificing it where you need to share the data.

    The entire Windows programming world tends to be so biased toward
    multithreading that they often don't even acknowledge the existence of
    generally superior alternatives. I think that's in large part because
    historically on Windows 3.1/95/98 there was no good way to create
    processes without running a new binary, and so a culture of threading
    grew up. Even today many Windows programmers are unfamiliar with using
    CreateProcessEx with SectionHandle=NULL for efficient copy-on-write
    process creation.

    > And forget about running a
    > windows server, apache will not even run with multiple processes.


    It used to run on windows with multiple processes. If it really won't
    now, use an older version or contribute a fix.

    Now, the GIL is independent of this; if you really need threading in
    your situation (you share almost everything and have hugely complex
    data structures that are difficult to maintain in shm) then you're
    still going to run into GIL serialization. If you're doing a lot of
    work in native code extensions this may not actually be a big
    performance hit, if not it can be pretty bad.
     
    , Sep 4, 2006
    #11
  12. km

    Paul Rubin Guest

    "" <> writes:
    > If it's read/write data or you're not on a Unix platform, you can use
    > shared memory to shared it between many processes.
    >
    > Threads are way overused in modern multiexecution programming. The
    > decision on whether to use processes or threads should come down to
    > whether you want to share everything, or whether you have specific
    > pieces of data you want to share.


    Shared memory means there's a byte vector (the shared memory region)
    accessible to multiple processes. The processes don't use the same
    machine addresses to reference the vector. Any data structures
    (e.g. those containing pointers) shared between the processes have to
    be marshalled in and out of the byte vector instead of being accessed
    normally. Any live objects such as open sockets have to be shared
    some other way. It's not a matter of sharing "everything"; shared
    memory is a pain in the neck even to share a single object. These
    things really can be easier with threads.
     
    Paul Rubin, Sep 4, 2006
    #12
  13. Rob Williscroft wrote:
    > Daniel Dittmar wrote in news:edhl07$b2t$-ag.de in
    > comp.lang.python:
    >
    >
    >>- removing reference counting and relying on garbage collection alone
    >>will break many Python applications (because they rely on files being
    >>closed at end of scope etc.)
    >>

    >
    >
    > They are already broken on at least 2 python implementations, so
    > why worry about another one.


    I guess few applications or libraries are being ported from CPython to
    Jython or IronPython as each is targeting a different standard library,
    so this isn't that much of a problem yet.

    Daniel
     
    Daniel Dittmar, Sep 4, 2006
    #13
  14. km

    Sandra-24 Guest

    > You seem to be confused about the nature of multiple-process
    > programming.
    >
    > If you're on a modern Unix/Linux platform and you have static read-only
    > data, you can just read it in before forking and it'll be shared
    > between the processes..


    Not familiar with *nix programming, but I'll take your word on it.

    > If it's read/write data or you're not on a Unix platform, you can use
    > shared memory to shared it between many processes.


    I know how shared memory works, it's the last resort in my opinion.

    > Threads are way overused in modern multiexecution programming. The


    <snip>

    > It used to run on windows with multiple processes. If it really won't
    > now, use an older version or contribute a fix.


    First of all I'm not in control of spawning processes or threads.
    Apache does that, and apache has no MPM for windows that uses more than
    1 process. Secondly "Superior" is definately a matter of opinion. Let's
    see how you would define superior.

    1) Port (a nicer word for rewrite) the worker MPM from *nix to Windows.
    2) Alternately switch to running Linux servers (which have their
    plusses) but about which I know nothing. I've been using Windows since
    I was 10 years old, I'm confident in my ability to build, secure, and
    maintain a Windows server. I don't think anyone would recommend me to
    run Linux servers with very little in the way of Linux experience.
    3) Rewrite my codebase to use some form of shared memory. This would be
    a terrible nightmare that would take at least a month of development
    time and a lot of heavy rewriting. It would be very difficult, but I'll
    grant that it may work if done properly with only small performance
    losses. Sounds like a deal.

    I would find an easier time, I think, porting mod_python to .net and
    leaving that GIL behind forever. Thankfully, I'm not considering such
    drastic measures - yet.

    Why on earth would I want to do all of that work? Just because you want
    to keep this evil thing called a GIL? My suggestion is in python 3
    ditch the ref counting, use a real garbage collector, and make that GIL
    walk the plank. I have my doubts that it would happen, but that's fine,
    the future of python is in things like IronPython and PyPy. CPython's
    days are numbered. If there was a mod_dotnet I wouldn't be using
    CPython anymore.

    > Now, the GIL is independent of this; if you really need threading in
    > your situation (you share almost everything and have hugely complex
    > data structures that are difficult to maintain in shm) then you're
    > still going to run into GIL serialization. If you're doing a lot of
    > work in native code extensions this may not actually be a big
    > performance hit, if not it can be pretty bad.


    Actually, I'm not sure I understand you correctly. You're saying that
    in an environment like apache (with 250 threads or so) and my hugely
    complex shared data structures, that the GIL is going to cause a huge
    performance hit? So even if I do manage to find my way around in the
    Linux world, and I upgrade my memory, I'm still going to be paying for
    that darned GIL?

    Will the madness never end?
    -Sandra
     
    Sandra-24, Sep 5, 2006
    #14
  15. km

    Steve Holden Guest

    Sandra-24 wrote:
    [Sandra understands shared memory]
    >
    > I would find an easier time, I think, porting mod_python to .net and
    > leaving that GIL behind forever. Thankfully, I'm not considering such
    > drastic measures - yet.
    >

    Quite right too. You haven't even sacrificed a chicken yet ...

    > Why on earth would I want to do all of that work? Just because you want
    > to keep this evil thing called a GIL? My suggestion is in python 3
    > ditch the ref counting, use a real garbage collector, and make that GIL
    > walk the plank. I have my doubts that it would happen, but that's fine,
    > the future of python is in things like IronPython and PyPy. CPython's
    > days are numbered. If there was a mod_dotnet I wouldn't be using
    > CPython anymore.
    >

    You write as though the GIL was invented to get in the programmer's way,
    which is quite wrong. It's there to avoid deep problems with thread
    interaction. Languages that haven't bitten that bullet can bite you in
    quite nasty ways when you write threaded applications.

    Contrary to your apparent opinion, the GIL has nothing to do with
    reference-counting.
    >
    >>Now, the GIL is independent of this; if you really need threading in
    >>your situation (you share almost everything and have hugely complex
    >>data structures that are difficult to maintain in shm) then you're
    >>still going to run into GIL serialization. If you're doing a lot of
    >>work in native code extensions this may not actually be a big
    >>performance hit, if not it can be pretty bad.

    >
    >
    > Actually, I'm not sure I understand you correctly. You're saying that
    > in an environment like apache (with 250 threads or so) and my hugely
    > complex shared data structures, that the GIL is going to cause a huge
    > performance hit? So even if I do manage to find my way around in the
    > Linux world, and I upgrade my memory, I'm still going to be paying for
    > that darned GIL?
    >

    I think the suggestion was rather that abandoning Python because of the
    GIL might be premature optimisation. But since you appear to be sticking
    with it, that might have been unnecessary advice.

    > Will the madness never end?


    This reveals an opinion of the development team that's altogether too
    low. I believe the GIL was introduced for good reasons.

    regards
    Steve
    --
    Steve Holden +44 150 684 7255 +1 800 494 3119
    Holden Web LLC/Ltd http://www.holdenweb.com
    Skype: holdenweb http://holdenweb.blogspot.com
    Recent Ramblings http://del.icio.us/steve.holden
     
    Steve Holden, Sep 5, 2006
    #15
  16. km

    Paul Rubin Guest

    Steve Holden <> writes:
    > You write as though the GIL was invented to get in the programmer's
    > way, which is quite wrong. It's there to avoid deep problems with
    > thread interaction. Languages that haven't bitten that bullet can bite
    > you in quite nasty ways when you write threaded applications.


    And yet, Java programmers manage to write threaded applications all
    day long without getting bitten (once they're used to the issues),
    despite usually being less skilled than Python programmers ;-).

    > Contrary to your apparent opinion, the GIL has nothing to do with
    > reference-counting.


    I think it does, i.e. one of the GIL's motivations was to protect the
    management of reference counts in CPython, which otherwise wasn't
    thread-safe. The obvious implementation of Py_INCREF has a race
    condition, for example. The GIL documentation at

    http://docs.python.org/api/threads.html

    describes this in its very first paragraph.

    > > Will the madness never end?

    >
    > This reveals an opinion of the development team that's altogether too
    > low. I believe the GIL was introduced for good reasons.


    The GIL was an acceptable tradeoff when it was first created in the
    previous century. First of all, it gave a way to add threads to the
    existing, non-threadsafe CPython implementation without having to
    rework the old code too much. Second, Python was at that time
    considered a "scripting language" and there was less concern about
    writing complex apps in it, especially multiprocessing apps. Third,
    multiprocessor computers were themselves exotic, so people who wanted
    to program them probably had exotic problems that they were willing to
    jump through hoops to solve.

    These days, even semi-entry-level consumer laptop computers have dual
    core CPU's, and quad Opteron boxes (8-way multiprocessing using X2
    processors) are quite affordable for midrange servers or engineering
    workstations, and there's endless desire to write fancy server apps
    completely in Python. There is no point paying for all that
    multiprocessor hardware if your programming language won't let you use
    it. So, Python must punt the GIL if it doesn't want to keep
    presenting undue obstacles to writing serious apps on modern hardware.
     
    Paul Rubin, Sep 5, 2006
    #16
  17. 4 Sep 2006 19:19:24 -0700, Sandra-24 <>:
    > If there was a mod_dotnet I wouldn't be using
    > CPython anymore.


    I guess you won't be using then: http://www.mono-project.com/Mod_mono

    --
    Felipe.
     
    Felipe Almeida Lessa, Sep 5, 2006
    #17
  18. km

    Sandra-24 Guest

    Steve Holden wrote:
    > Quite right too. You haven't even sacrificed a chicken yet ...


    Hopefully we don't get to that point.

    > You write as though the GIL was invented to get in the programmer's way,
    > which is quite wrong. It's there to avoid deep problems with thread
    > interaction. Languages that haven't bitten that bullet can bite you in
    > quite nasty ways when you write threaded applications.


    I know it was put there because it is meant to be a good thing.
    However, it gets in my way. I would be perfectly happy if it were gone.
    I've never written code that assumes there's a GIL. I always write my
    code with all shared writable objects protected by locks. It's far more
    portable, and a good habit to get into. You realize that because of the
    GIL, they were discussing (and may have already implemented) Java style
    synchronized dictionaries and lists for IronPython simply because
    python programmers just assume they are thread safe thanks to the GIL.
    I always hated that about Java. If you want to give me thread safe
    collections, fine, they'll be nice for sharing between threads, but
    don't make me use synchronized collections for single-threaded code.
    You'll notice the newer Java collections are not synchronized, it would
    seem I'm not alone in that opinion.

    > Contrary to your apparent opinion, the GIL has nothing to do with
    > reference-counting.


    Actually it does. Without the GIL reference counting is not thread
    safe. You have to synchronize all reference count accesses, increments,
    and decrements because you have no way of knowing which objects get
    shared across threads. I think with Python's current memory management,
    the GIL is the lesser evil.

    I'm mostly writing this to provide a different point of view, many
    people seem to think (previously linked blog) that there is no downside
    to the GIL, and that's just not true. However, I don't expect that the
    GIL can be safely removed from CPython. I also think that it doesn't
    matter because projects like IronPython and PyPy are very likely the
    way of the future for Python anyway. Once you move away from C there
    are so many more things you can do.

    > I think the suggestion was rather that abandoning Python because of the
    > GIL might be premature optimisation. But since you appear to be sticking
    > with it, that might have been unnecessary advice.


    I would never abandon Python, and I hold the development team in very
    high esteem. That doesn't mean there's a few things (like the GIL, or
    super) that I don't like. But overall they've done an excellent job on
    the 99% of things the've got right. I guess we don't say that enough.

    I might switch from CPython sometime to another implementation, but it
    won't be because of the GIL. I'm very fond of the .net framework as a
    library, and I'd also rather write performance critical code in C# than
    C (who wouldn't?) I'm also watching PyPy with interest.

    -Sandra
     
    Sandra-24, Sep 5, 2006
    #18
  19. km

    Bryan Olson Guest

    bayerj wrote:

    > Then you can use POSH [1] to share data and objects.


    Do you use POSH? How well does it work with current Python?
    Any major gotchas?

    I think POSH looks like a great thing to have, but the latest
    version is an alpha from over three years ago. Also, it only
    runs on *nix systems.


    --
    --Bryan
     
    Bryan Olson, Sep 5, 2006
    #19
  20. km

    Sandra-24 Guest

    Felipe Almeida Lessa wrote:
    > 4 Sep 2006 19:19:24 -0700, Sandra-24 <>:
    > > If there was a mod_dotnet I wouldn't be using
    > > CPython anymore.

    >
    > I guess you won't be using then: http://www.mono-project.com/Mod_mono
    >

    Oh I'm aware of that, but it's not what I'm looking for. Mod_mono just
    lets you run ASP.NET on Apache. I'd much rather use Python :) Now if
    there was a way to run IronPython on Apache I'd be interested.

    -Sandra
     
    Sandra-24, Sep 5, 2006
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Replies:
    4
    Views:
    331
    Steve Holden
    Sep 6, 2006
  2. Jean-Paul Calderone

    Re: threading support in python

    Jean-Paul Calderone, Sep 6, 2006, in forum: Python
    Replies:
    10
    Views:
    524
    Antoon Pardon
    Sep 7, 2006
  3. Replies:
    9
    Views:
    1,125
    Mark Space
    Dec 29, 2007
  4. Steven Woody
    Replies:
    0
    Views:
    474
    Steven Woody
    Jan 9, 2009
  5. Jure Erznožnik
    Replies:
    51
    Views:
    1,660
    Hendrik van Rooyen
    Jun 22, 2009
Loading...

Share This Page