threading and multicores, pros and cons

Discussion in 'Python' started by Maric Michaud, Feb 14, 2007.

  1. This is a recurrent problem I encounter when I try to sell python solutions to
    my customers. I'm aware that this problem is sometimes overlooked, but here
    is the market's law.

    I've heard of a bunch of arguments to defend python's choice of GIL, but I'm
    not quite sure of their technical background, nor what is really important
    and what is not. These discussions often end in a prudent "python has made a
    choice among others"... which is not really convincing.

    If some guru has made a good recipe, or want to resume the main points it
    would be really appreciated.

    regards,

    --
    _____________

    Maric Michaud
    _____________

    Aristote - www.aristote.info
    3 place des tapis
    69004 Lyon
    Tel: +33 426 880 097
    Mobile: +33 632 77 00 21
    Maric Michaud, Feb 14, 2007
    #1
    1. Advertising

  2. Maric Michaud

    Paul Rubin Guest

    Maric Michaud <> writes:
    > If some guru has made a good recipe, or want to resume the main points it
    > would be really appreciated.


    Basically Python applications are usually not too CPU-intensive; there
    are some ways you can get parallelism with reasonable extra effort;
    and for most of Python's history, multi-CPU systems have been rather
    exotic so the GIL didn't create too big a problem. Right now it is
    starting to become more of a problem than before, but it's not yet
    intolerable. Obviously something will have to be done about it in the
    long run, maybe with PyPy.
    Paul Rubin, Feb 14, 2007
    #2
    1. Advertising

  3. Le mercredi 14 février 2007 05:49, Paul Rubin a écrit :
    > Basically Python applications are usually not too CPU-intensive; there
    > are some ways you can get parallelism with reasonable extra effort;

    Basically, while not CPU intensive, application server needs to get benefit of
    all resources of the hardware.
    When a customer comes with his new beautiful dual-core server and get a basic
    plone install up and running, he will immediately compare it to J2EE and
    wonder why he should pay a consultant to make it work properly.
    At this time, it 's not easy to explain him that python is not flawed compared
    to Java, and that he will not regret his choice in the future.
    First impression may be decisive.

    The historical explanation should be inefficient here, I'm afraid. What about
    the argument that said that multi threading is not so good for parallelism ?
    Is it strong enough ?

    --
    _____________

    Maric Michaud
    _____________

    Aristote - www.aristote.info
    3 place des tapis
    69004 Lyon
    Tel: +33 426 880 097
    Mobile: +33 632 77 00 21
    Maric Michaud, Feb 14, 2007
    #3
  4. Maric Michaud

    Paul Rubin Guest

    Maric Michaud <> writes:

    > Le mercredi 14 février 2007 05:49, Paul Rubin a écrit :
    > > Basically Python applications are usually not too CPU-intensive; there
    > > are some ways you can get parallelism with reasonable extra effort;

    > Basically, while not CPU intensive, application server needs to get
    > benefit of all resources of the hardware.


    But this is impossible--if the application is not CPU intensive, by
    definition it leaves a lot of the available CPU cycles unused.

    > When a customer comes with his new beautiful dual-core server and
    > get a basic plone install up and running, he will immediately
    > compare it to J2EE and wonder why he should pay a consultant to make
    > it work properly. At this time, it 's not easy to explain him that
    > python is not flawed compared to Java, and that he will not regret
    > his choice in the future. First impression may be decisive.


    That is true, parallelism is an area where Java is ahead of us.

    > The historical explanation should be inefficient here, I'm
    > afraid. What about the argument that said that multi threading is
    > not so good for parallelism ? Is it strong enough ?


    It's not much good for parallelism in the typical application that
    spends most of its time blocked waiting for I/O. That is many
    applications. It might even even be most applications. But there are
    still such things as CPU-intensive applications which can benefit from
    parallelism, and Python has a weak spot there.
    Paul Rubin, Feb 14, 2007
    #4
  5. Maric Michaud

    Guest

    On Feb 13, 9:07 pm, Maric Michaud <> wrote:
    > I've heard of a bunch of arguments to defend python's choice of GIL, but I'm
    > not quite sure of their technical background, nor what is really important
    > and what is not. These discussions often end in a prudent "python has made a
    > choice among others"... which is not really convincing.


    Well, INAG (I'm not a Guru), but we recently had training from a Guru.
    When we brought up this question, his response was fairly simple.
    Paraphrased for inaccuracy:

    "Some time back, a group did remove the GIL from the python core, and
    implemented locks on the core code to make it threadsafe. Well, the
    problem was that while it worked, the necessary locks it made single
    threaded code take significantly longer to execute."

    He then proceeded to show us how to achieve the same effect
    (multithreading python for use on multi-core computers) using popen2
    and stdio pipes.

    FWIW, ~G
    , Feb 14, 2007
    #5
  6. On Feb 14, 1:33 am, Maric Michaud <> wrote:

    > At this time, it 's not easy to explain him that python
    > is notflawed compared to Java, and that he will not
    > regret his choice in the future.


    Database adaptors such as psycopg do release the GIL while connecting
    and exchanging data. Apache's MPM (multi processing module) can run
    mod_python and with that multiple python instances as separate
    processes thus avoiding the global lock as well.

    > plone install up and running, he will immediately compare it to
    > J2EE wonder why he should pay a consultant to make it work properly.


    I really doubt that any performance difference will be due to the
    global interpreter lock. This not how things work. You most certainly
    have far more substantial bottlenecks in each application.

    i.
    Istvan Albert, Feb 14, 2007
    #6
  7. In article <>,
    Maric Michaud <> wrote:

    > This is a recurrent problem I encounter when I try to sell python solutions
    > to
    > my customers. I'm aware that this problem is sometimes overlooked, but here
    > is the market's law.
    >
    > I've heard of a bunch of arguments to defend python's choice of GIL, but I'm
    > not quite sure of their technical background, nor what is really important
    > and what is not. These discussions often end in a prudent "python has made a
    > choice among others"... which is not really convincing.
    >
    > If some guru has made a good recipe, or want to resume the main points it
    > would be really appreciated.


    When designing a new Python application I read a fair amount about the
    implications of multiple cores for using threads versus processes, and
    decided that using multiple processes was the way to go for me. On that
    note, there a (sort of) new module available that allows interprocess
    communication via shared memory and semaphores with Python. You can find
    it here:
    http://NikitaTheSpider.com/python/shm/

    Hope this helps

    --
    Philip
    http://NikitaTheSpider.com/
    Whole-site HTML validation, link checking and more
    Nikita the Spider, Feb 14, 2007
    #7
  8. Maric Michaud

    Guest

    On Feb 14, 1:44 am, Paul Rubin <http://> wrote:
    > > When a customer comes with his new beautiful dual-core server and
    > > get a basic plone install up and running, he will immediately
    > > compare it to J2EE and wonder why he should pay a consultant to make
    > > it work properly. At this time, it 's not easy to explain him that
    > > python is not flawed compared to Java, and that he will not regret
    > > his choice in the future. First impression may be decisive.

    >
    > That is true, parallelism is an area where Java is ahead of us.


    Java's traditionally been ahead in one case, but well behind in
    general.

    Java has historically had no support at all for real multiple process
    solutions (akin to fork() or ZwCreateProcess() with NULL
    SectionHandle), which should make up the vast majority of parallel
    programs (basically all of those except where you don't want memory
    protection).

    Has this changed in recent Java releases? Is there a way to use
    efficient copy-on-write multiprocess architectures?
    , Feb 14, 2007
    #8
  9. Maric Michaud

    Paul Rubin Guest

    "" <> writes:
    > Java has historically had no support at all for real multiple process
    > solutions (akin to fork() or ZwCreateProcess() with NULL
    > SectionHandle), which should make up the vast majority of parallel
    > programs (basically all of those except where you don't want memory
    > protection).


    I don't know what ZwCreateProcess is (sounds like a Windows-ism) but I
    remember using popen() under Java 1.1 in Solaris. That at least
    allows launching a new process and communicating with it. I don't
    know if there was anything like mmap. I think this is mostly a
    question of library functions--you could certainly write JNI
    extensions for that stuff.

    > Has this changed in recent Java releases? Is there a way to use
    > efficient copy-on-write multiprocess architectures?


    I do think they've been adding more stuff for parallelism in general.
    Paul Rubin, Feb 14, 2007
    #9
  10. Maric Michaud

    Guest

    On Feb 14, 4:37 pm, Paul Rubin <http://> wrote:
    > "" <> writes:
    > > Java has historically had no support at all for real multiple process
    > > solutions (akin to fork() or ZwCreateProcess() with NULL
    > > SectionHandle), which should make up the vast majority of parallel
    > > programs (basically all of those except where you don't want memory
    > > protection).

    >
    > I don't know what ZwCreateProcess is (sounds like a Windows-ism)


    Yeah, it's the Window equivalent to fork. Does true copy-on-write, so
    you can do efficient multiprocess work.
    > but I
    > remember using popen() under Java 1.1 in Solaris. That at least
    > allows launching a new process and communicating with it.


    Yep. That's okay for limited kinds of applications.

    > I don't know if there was anything like mmap.


    That would be important as well.

    > I think this is mostly a
    > question of library functions--you could certainly write JNI
    > extensions for that stuff.


    Sure. If you're writing extensions you can work around the GIL, too.

    > > Has this changed in recent Java releases? Is there a way to use
    > > efficient copy-on-write multiprocess architectures?

    >
    > I do think they've been adding more stuff for parallelism in general.


    Up through 1.3/1.4 or so they were pretty staunchly in the "threads
    for everything!" camp, but they've added a select/poll-style call a
    couple versions back. That was a pretty big sticking point previously.
    , Feb 14, 2007
    #10
  11. Maric Michaud

    Paul Rubin Guest

    "" <> writes:
    > > question of library functions--you could certainly write JNI
    > > extensions for that stuff [access to mmap, etc.]

    > Sure. If you're writing extensions you can work around the GIL, too.


    I don't think that's comparable--if you have extensions turning off
    the GIL, they can't mess with Python data objects, which generally
    assume the GIL's presence. Python's mmap module can't do that either.

    > Up through 1.3/1.4 or so they were pretty staunchly in the "threads
    > for everything!" camp, but they've added a select/poll-style call a
    > couple versions back. That was a pretty big sticking point previously.


    They've gone much further now and they actually have some STM features:

    http://www-128.ibm.com/developerworks/java/library/j-jtp11234/
    Paul Rubin, Feb 14, 2007
    #11
  12. Maric Michaud

    MRAB Guest

    On Feb 14, 3:24 pm, wrote:
    > On Feb 13, 9:07 pm, Maric Michaud <> wrote:
    >
    > > I've heard of a bunch of arguments to defend python's choice of GIL, but I'm
    > > not quite sure of their technical background, nor what is really important
    > > and what is not. These discussions often end in a prudent "python has made a
    > > choice among others"... which is not really convincing.

    >
    > Well, INAG (I'm not a Guru), but we recently had training from a Guru.
    > When we brought up this question, his response was fairly simple.
    > Paraphrased for inaccuracy:
    >
    > "Some time back, a group did remove the GIL from the python core, and
    > implemented locks on the core code to make it threadsafe. Well, the
    > problem was that while it worked, the necessary locks it made single
    > threaded code take significantly longer to execute."
    >
    > He then proceeded to show us how to achieve the same effect
    > (multithreading python for use on multi-core computers) using popen2
    > and stdio pipes.
    >

    Hmm. I wonder whether it would be possible to have a pair of python
    cores, one for single-threaded code (no locks necessary) and the other
    for multi-threaded code. When the Python program went from single-
    threaded to multi-threaded or multi-threaded to single-threaded there
    would be a switch from one core to the other.
    MRAB, Feb 14, 2007
    #12
  13. Le mercredi 14 février 2007 16:24, a écrit :
    > "Some time back, a group did remove the GIL from the python core, and
    > implemented locks on the core code to make it threadsafe. Well, the
    > problem was that while it worked, the necessary locks it made single
    > threaded code take significantly longer to execute."


    Very interesting point, this is exactly the sort of thing I'm looking for. Any
    valuable link on this ?

    --
    _____________

    Maric Michaud
    _____________

    Aristote - www.aristote.info
    3 place des tapis
    69004 Lyon
    Tel: +33 426 880 097
    Mobile: +33 632 77 00 21
    Maric Michaud, Feb 15, 2007
    #13
  14. Maric Michaud

    Paul Rubin Guest

    Maric Michaud <> writes:
    > > "Some time back, a group did remove the GIL from the python core, and
    > > implemented locks on the core code to make it threadsafe. Well, the
    > > problem was that while it worked, the necessary locks it made single
    > > threaded code take significantly longer to execute."

    >
    > Very interesting point, this is exactly the sort of thing I'm
    > looking for. Any valuable link on this ?


    I think it was a long time ago, Python 1.5.2 or something. However it
    really wasn't that useful, since as Garrick said, it slowed Python
    down. The reason was CPython's structures weren't designed for thread
    safety so it needed a huge amount of locking/releasing. For example,
    adjusting any reference count required setting and releasing a lock,
    and CPython does this all the time. Getting rid of the GIL in a
    serious way requires radically changing the interpreter, not just
    sticking some locks here and there.
    Paul Rubin, Feb 15, 2007
    #14
  15. Maric Michaud

    John Nagle Guest

    If locking is expensive on x86, it's implemented wrong.
    It's done right in QNX, with inline code for the non-blocking
    case. Not sure about the current libraries for Linux, but
    by now, somebody should have gotten this right.

    John Nagle

    Paul Rubin wrote:
    > Maric Michaud <> writes:
    >
    >>>"Some time back, a group did remove the GIL from the python core, and
    >>>implemented locks on the core code to make it threadsafe. Well, the
    >>>problem was that while it worked, the necessary locks it made single
    >>>threaded code take significantly longer to execute."
    John Nagle, Feb 15, 2007
    #15
  16. Maric Michaud

    Paul Rubin Guest

    John Nagle <> writes:
    > If locking is expensive on x86, it's implemented wrong.
    > It's done right in QNX, with inline code for the non-blocking case.


    Acquiring the lock still takes an expensive instruction, LOCK XCHG or
    whatever. I think QNX is usually run on embedded cpu's with less
    extensive caching as these multicore x86's, so the lock prefix may be
    less expensive in the QNX systems.
    Paul Rubin, Feb 15, 2007
    #16
  17. Maric Michaud

    John Nagle Guest

    Paul Rubin wrote:
    > John Nagle <> writes:
    >
    >> If locking is expensive on x86, it's implemented wrong.
    >>It's done right in QNX, with inline code for the non-blocking case.

    >
    >
    > Acquiring the lock still takes an expensive instruction, LOCK XCHG or
    > whatever. I think QNX is usually run on embedded cpu's with less
    > extensive caching as these multicore x86's, so the lock prefix may be
    > less expensive in the QNX systems.


    That's not so bad. See

    http://lists.freebsd.org/pipermail/freebsd-current/2004-August/033462.html

    But there are dumb thread implementations that make
    a system call for every lock.

    John Nagle
    John Nagle, Feb 15, 2007
    #17
  18. Maric Michaud

    Paul Rubin Guest

    John Nagle <> writes:
    > But there are dumb thread implementations that make
    > a system call for every lock.


    Yes, a sys call on each lock access would really be horrendous. But I
    think that in a modern cpu, LOCK XCHG costs as much as hundreds of
    regular instructions. Doing that on every adjustment of a Python
    reference count is enough to impact the interpreter significantly.
    It's not just mutating user data; every time you use an integer, or
    call a function and make an arg tuple and bind the function's locals
    dictionary, you're touching refcounts.

    The preferred locking scheme in Linux these days is called futex,
    which avoids system calls in the uncontended case--see the docs.
    Paul Rubin, Feb 15, 2007
    #18
  19. On Feb 14, 4:30 pm, "MRAB" <> wrote:
    > Hmm. I wonder whether it would be possible to have a pair of python
    > cores, one for single-threaded code (no locks necessary) and the other
    > for multi-threaded code. When the Python program went from single-
    > threaded to multi-threaded or multi-threaded to single-threaded there
    > would be a switch from one core to the other.


    I have explored this option (and some simpler variants). Essentially,
    you end up rewriting a massive amount of CPython's codebase to change
    the refcount API. Even all the C extension types assume the refcount
    can be statically initialized (which may not be true if you're trying
    to make it efficient on multiple CPUs.)

    Once you realize the barrier for entry is so high you start
    considering alternative implementations. Personally, I'm watching
    PyPy to see if they get reasonable performance using JIT. Then I can
    start hacking on it.

    --
    Adam Olsen, aka Rhamphoryncus
    Rhamphoryncus, Feb 15, 2007
    #19
  20. Maric Michaud

    Paul Boddie Guest

    On 15 Feb, 00:14, "" <> wrote:
    >
    > Yeah, it's the Window equivalent to fork. Does true copy-on-write, so
    > you can do efficient multiprocess work.


    Aside from some code floating around the net which possibly originates
    from some book on Windows systems programming, is there any reference
    material on ZwCreateProcess, is anyone actually using it as "fork on
    Windows", and would it be in any way suitable for an implementation of
    os.fork in the Python standard library? I only ask because there's a
    lot of folklore about this particular function (everyone seems to
    repeat more or less what you've just said), but aside from various
    Cygwin mailing list threads where they reject its usage, there's
    precious little information of substance.

    Not that I care about Windows, but it would be useful to be able to
    offer fork-based multiprocessing solutions to people using that
    platform. Although the python-dev people currently seem more intent in
    considering (and now hopefully rejecting) yet more syntax sugar [1],
    it'd be nice to consider matters seemingly below the python-dev
    threshold of consideration and offer some kind of roadmap for
    convenient parallel processing.

    Paul

    [1] http://mail.python.org/pipermail/python-dev/2007-February/070939.html
    Paul Boddie, Feb 15, 2007
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Kumar Vijay Mishra

    PSL pros and cons

    Kumar Vijay Mishra, Sep 29, 2004, in forum: VHDL
    Replies:
    2
    Views:
    2,569
    vhdlcohen
    Oct 2, 2004
  2. Benny
    Replies:
    1
    Views:
    426
    Paul Wistrand
    Mar 1, 2004
  3. J.S.
    Replies:
    10
    Views:
    6,060
    shawpnendu
    May 20, 2009
  4. Randall Parker

    Pros and cons for using https on a logon page?

    Randall Parker, Dec 4, 2005, in forum: ASP .Net
    Replies:
    2
    Views:
    729
    nimd4
    May 17, 2014
  5. =?Utf-8?B?Q2hhcmxlc0E=?=

    querystring pros and cons, help?

    =?Utf-8?B?Q2hhcmxlc0E=?=, Jan 12, 2006, in forum: ASP .Net
    Replies:
    6
    Views:
    803
    Karl Seguin [MVP]
    Jan 12, 2006
Loading...

Share This Page