Python, multithreading & GIL

Discussion in 'Python' started by Ivan Voras, Apr 13, 2004.

  1. Ivan Voras

    Ivan Voras Guest

    I've read articles about it but I'm not sure I've got everything right. Here
    are some statements about the subject that I'm not 100% sure about:

    - when interpreter (cpython) is compiled with pthreads, python programs can
    make use of multiple processors (other statements below are for
    cpython+pthreads environment)?
    - the GIL is only placed on global variables (and makes access to global
    variables essentially serialized)? (--> if I don't use global variables, I'm
    free from GIL?)
    - python can make use of multiple IO accessess across threads: if one thread
    does file.read(), others are not blocked by it?
    - only one thread can do IO access: if one thread does file.read(), others
    cannot (they wait until the 1st read() call ends)?
    - all of the above stays the same for network IO (socket.read())?
    - all of the above is true for any call to a C function?

    Can someone say which statements are true, which are false (and an
    explanation of what is more correct :) )?

    Thanks!
    Ivan Voras, Apr 13, 2004
    #1
    1. Advertising

  2. Ivan Voras

    Donn Cave Guest

    In article <c5h75f$277$>, Ivan Voras <>
    wrote:

    > I've read articles about it but I'm not sure I've got everything right. Here
    > are some statements about the subject that I'm not 100% sure about:
    >
    > - when interpreter (cpython) is compiled with pthreads, python programs can
    > make use of multiple processors (other statements below are for
    > cpython+pthreads environment)?


    Depends on what you mean by "make use of".

    > - the GIL is only placed on global variables (and makes access to global
    > variables essentially serialized)? (--> if I don't use global variables, I'm
    > free from GIL?)


    No, it's a global variable that serializes thread execution.

    > - python can make use of multiple IO accessess across threads: if one thread
    > does file.read(), others are not blocked by it?


    Yes, because file.read releases the lock before it calls its
    underlying C function (and acquires it again afterwards before
    proceeding.)

    > - all of the above stays the same for network IO (socket.read())?


    Yes.

    > - all of the above is true for any call to a C function?


    No, C function interfaces are not required to release the lock,
    and in fact might reasonably elect not to. For example, a function
    that does some trivial computation, like peeking at some value in
    library state, would incur a lot of unnecessary overhead by releasing
    the lock. Other interfaces might neglect to release the lock just
    because the author didn't care about it.

    Donn Cave,
    Donn Cave, Apr 13, 2004
    #2
    1. Advertising

  3. Ivan Voras

    Ivan Voras Guest

    Donn Cave wrote:

    >>- when interpreter (cpython) is compiled with pthreads, python programs can
    >>make use of multiple processors (other statements below are for
    >>cpython+pthreads environment)?

    >
    >
    > Depends on what you mean by "make use of".


    "Simultaneusly execute different threads on different processors". I
    mean all kinds of threads: IO-based and computation-based.

    >>- the GIL is only placed on global variables (and makes access to global
    >>variables essentially serialized)? (--> if I don't use global variables, I'm
    >>free from GIL?)

    >
    > No, it's a global variable that serializes thread execution.


    Now I'm puzzled - how is that different from GIL?

    For example: if I have two or more threads that do numerical and string
    computations not involving global variables, will they execute without
    unexpected locking?
    Ivan Voras, Apr 13, 2004
    #3
  4. Ivan Voras

    Jarek Zgoda Guest

    Ivan Voras <ivoras@__geri.cc.fer.hr> pisze:

    >>>- the GIL is only placed on global variables (and makes access to global
    >>>variables essentially serialized)? (--> if I don't use global variables, I'm
    >>>free from GIL?)

    >>
    >> No, it's a global variable that serializes thread execution.

    >
    > Now I'm puzzled - how is that different from GIL?
    >
    > For example: if I have two or more threads that do numerical and string
    > computations not involving global variables, will they execute without
    > unexpected locking?


    I think it's a best time to write some definite documet, how GIL can
    affect our programs and how to avoid headaches when using threading with
    Python. I know what this acronym (GIL) means, I know the definition, but
    I have very limited knowledge on threading issues in languages, that use
    VM environments.

    Anyone?

    --
    Jarek Zgoda
    http://jpa.berlios.de/
    Jarek Zgoda, Apr 13, 2004
    #4
  5. Ivan Voras wrote:
    >>> - when interpreter (cpython) is compiled with pthreads, python
    >>> programs can make use of multiple processors (other statements below
    >>> are for cpython+pthreads environment)?

    >>
    >>
    >>
    >> Depends on what you mean by "make use of".

    >
    >
    > "Simultaneusly execute different threads on different processors". I
    > mean all kinds of threads: IO-based and computation-based.


    In Python, no two threads will ever simultaneously interprete byte code
    instructions.

    It might be that two threads started in Python simultaneously execute
    non-Python code (like a C extension), or that one thread blocks in IO
    and the other executes byte code. However, once one thread executes
    Python byte code, no other thread in the same process will do so.

    >
    >>> - the GIL is only placed on global variables (and makes access to
    >>> global variables essentially serialized)? (--> if I don't use global
    >>> variables, I'm free from GIL?)

    >>
    >>
    >> No, it's a global variable that serializes thread execution.

    >
    >
    > Now I'm puzzled - how is that different from GIL?


    That is the GIL: a global variable that serializes thread execution.

    However, it is *not* *only* placed on global variables. It is placed
    on any kind of byte code, and data access, with the few exceptions
    of long-running C code. So if you have two functions

    def thread1():
    while 1:pass

    def thread2():
    while 1:pass

    and you run them in two seperate threads, you will *not* be free from
    the GIL. Both loops hold the GIL while executing, and give it up every
    100 or so byte code instructions.

    > For example: if I have two or more threads that do numerical and string
    > computations not involving global variables, will they execute without
    > unexpected locking?


    Depends on what you expect. There will be locking, and the threads will
    not use two processors effectively (i.e. you typically won't see any
    speedup from multiple processors if your computation is written in
    Python)

    Regards,
    Martin
    =?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=, Apr 13, 2004
    #5
  6. Ivan Voras

    Ivan Voras Guest

    Martin v. Löwis wrote:


    > In Python, no two threads will ever simultaneously interprete byte code
    > instructions.
    >
    > It might be that two threads started in Python simultaneously execute
    > non-Python code (like a C extension), or that one thread blocks in IO
    > and the other executes byte code. However, once one thread executes
    > Python byte code, no other thread in the same process will do so.


    Thank you, your post gave the sort of answers I was looking for. :)
    Ivan Voras, Apr 13, 2004
    #6
  7. Ivan Voras

    Donn Cave Guest

    In article <c5hd8n$1i3$>,
    Ivan Voras <ivoras@__geri.cc.fer.hr> wrote:
    > Donn Cave wrote:
    > > >>- when interpreter (cpython) is compiled with pthreads, python programs can
    > >>make use of multiple processors (other statements below are for
    > >>cpython+pthreads environment)?

    > >
    > >
    > > Depends on what you mean by "make use of".

    >
    > "Simultaneusly execute different threads on different processors". I
    > mean all kinds of threads: IO-based and computation-based.


    I'm sorry, when I wrote that I expected that I would be saying
    more about that later on, but maybe I went over that a little
    too light and I didn't mean to be cryptic. My point is that
    a Python program is (at least) two layers: the part actually
    written in Python, and the part written in C - modules written
    in C and all the library functions they call. A Python program
    can make use of multiple processors. Only one processor can
    actually be executing the interpreter, but the interpreter may
    be calling an external cryptography function in another thread,
    and listening to a socket in another, etc.

    I/O is a natural case of this - Python can't do any kind of I/O
    on its own, so we can reasonably expect concurrent I/O. Computation
    depends.

    > >>- the GIL is only placed on global variables (and makes access to global
    > >>variables essentially serialized)? (--> if I don't use global variables,
    > >>I'm
    > >>free from GIL?)

    > >
    > > No, it's a global variable that serializes thread execution.

    >
    > Now I'm puzzled - how is that different from GIL?


    I meant, the GIL isn't placed on global variables, but it is one.

    > For example: if I have two or more threads that do numerical and string
    > computations not involving global variables, will they execute without
    > unexpected locking?


    They will execute serially. I believe the interpreter schedules
    threads for some number of instructions, so each thread won't
    have to run to completion before the next one can execute - they'll
    all probably finish about the same time - but there will be only
    one interpreter thread executing at any time.

    This has been known to bother people, and some years back a very
    capable programmer on the Python scene at the time tried to fix
    it with a version of Python that was `free threaded.' I think
    the reason it's not the version of Python we're using today is
    1. It's a hard problem, and
    2. It doesn't make that much practical difference.

    That's my opinion, anyway. There are a few lengthy discussions
    of the matter in the comp.lang.python archives, for anyone who
    wants to see more opinions.

    Donn Cave,
    Donn Cave, Apr 13, 2004
    #7
  8. Ivan Voras

    Project2501 Guest

    surely there is a case for a python VM/interpreter to be able to handle
    threads without GIL. That is, map them to whatever underlying OS
    facilities are available, and only of they are not available, do bytecode
    interleaving. After all, python relies on OS facilities for many other
    tasks.
    Project2501, Apr 14, 2004
    #8
  9. Ivan Voras

    Donn Cave Guest

    In article <>,
    Project2501 <> wrote:

    > surely there is a case for a python VM/interpreter to be able to handle
    > threads without GIL. That is, map them to whatever underlying OS
    > facilities are available, and only of they are not available, do bytecode
    > interleaving. After all, python relies on OS facilities for many other
    > tasks.


    Python definitely provides meaningful support for several
    types of operating systems threads, including POSIX, and
    the way I understand you, it does what you say. It's not
    like the way some other interpreted languages (or Python
    has in some version of Stackless) implement threads inside
    a single OS interpreter thread, these are real OS threads
    and your Python code (i.e., the interpreter) runs "in" them.

    It's just that part of the support for concurrency is a
    lock that protects Python internal data structures from
    unsound concurrent access. That's the reason for the GIL.

    And as I asserted, it isn't a significant problem in practice.

    Donn Cave,
    Donn Cave, Apr 14, 2004
    #9
  10. Ivan Voras

    Ivan Voras Guest

    Donn Cave wrote:

    > It's just that part of the support for concurrency is a
    > lock that protects Python internal data structures from
    > unsound concurrent access. That's the reason for the GIL.
    >
    > And as I asserted, it isn't a significant problem in practice.


    Except if you're planning for multiple processors :(
    Ivan Voras, Apr 14, 2004
    #10
  11. Ivan Voras

    Donn Cave Guest

    Quoth Ivan Voras <ivoras@__geri.cc.fer.hr>:
    | Donn Cave wrote:
    |
    | > It's just that part of the support for concurrency is a
    | > lock that protects Python internal data structures from
    | > unsound concurrent access. That's the reason for the GIL.
    | >
    | > And as I asserted, it isn't a significant problem in practice.
    |
    | Except if you're planning for multiple processors :(

    Usually even then. Most applications with a really serious
    computational load will implement the compute-intensive parts
    in C, as a Python module (or will use an existing module.)
    The ones that will implement that part in pure Python, as
    part of a multithreaded architecture that relies on SMP hardware,
    are very few. It wouldn't be a good idea even if it worked.

    Donn
    Donn Cave, Apr 14, 2004
    #11
  12. Ivan Voras

    Roger Binns Guest

    Ivan Voras wrote:
    > Donn Cave wrote:
    > > And as I asserted, it isn't a significant problem in practice.

    >
    > Except if you're planning for multiple processors :(


    To better illustrate this, when you write C code that interfaces
    with Python, it looks like this example from my libusb wrapper:

    Py_BEGIN_ALLOW_THREADS
    res=usb_bulk_read(dev, ep, bytesoutbuffer, *bytesoutbuffersize, timeout);
    Py_END_ALLOW_THREADS

    Any C code between BEGIN_ALLOW_THREADS and END_ALLOW_THREADS can
    run concurrently with any other code meeting the same criteria.
    This typically includes most forms of I/O, networking, operating
    system access etc. Consequently code that uses that a lot
    scales to multiple processors (assuming your OS scales).

    You can do the BEGIN/END threads thing in any C extensions you need.
    In practise this is good enough for most people. Their Python code
    doesn't spend much time processing. And if they did have something
    that did a time consuming calculation (eg complex crypto), they are
    likely to have it in a C extension, or move it into a seperate process
    (eg that is what a database is :)

    Worst case code would be this as the body of each thread:

    while True: pass

    It would not improve no matter how many processors you have.
    You would need to scale that by splitting your program into
    multiple processes. That then also has the benefit that
    you could put the processes on multiple machines (assuming you
    use TCP to connect them) and scale away.

    Roger
    Roger Binns, Apr 14, 2004
    #12
  13. Ivan Voras

    Carl Banks Guest

    Donn Cave wrote:
    >
    >
    > Quoth Ivan Voras <ivoras@__geri.cc.fer.hr>:
    > | Donn Cave wrote:
    > |
    > | > It's just that part of the support for concurrency is a
    > | > lock that protects Python internal data structures from
    > | > unsound concurrent access. That's the reason for the GIL.
    > | >
    > | > And as I asserted, it isn't a significant problem in practice.
    > |
    > | Except if you're planning for multiple processors :(
    >
    > Usually even then. Most applications with a really serious
    > computational load


    You don't know if his application has a serious computational load.


    > will implement the compute-intensive parts
    > in C, as a Python module (or will use an existing module.)
    > The ones that will implement that part in pure Python, as
    > part of a multithreaded architecture that relies on SMP hardware,
    > are very few.


    I highly disagree. It's reasonable to want a multi-threaded, pure
    Python program to run faster with multiple processors, and without
    having to rewrite the thing in C. The GIL limits the ability of pure
    Python to take advantage of SMP, and that's a definite flaw in Python.


    > It wouldn't be a good idea even if it worked.


    Why?


    --
    CARL BANKS http://www.aerojockey.com/software
    "If you believe in yourself, drink your school, stay on drugs, and
    don't do milk, you can get work."
    -- Parody of Mr. T from a Robert Smigel Cartoon
    Carl Banks, Apr 14, 2004
    #13
  14. Ivan Voras

    Simon Burton Guest

    On Tue, 13 Apr 2004 19:10:48 +0200, Ivan Voras wrote:

    > I've read articles about it but I'm not sure I've got everything right.
    > Here are some statements about the subject that I'm not 100% sure about:
    >
    > - when interpreter (cpython) is compiled with pthreads, python programs
    > can make use of multiple processors (other statements below are for
    > cpython+pthreads environment)?


    Not really.

    > - the GIL is only placed on global variables (and makes access to global
    > variables essentially serialized)? (--> if I don't use global variables,
    > I'm free from GIL?)


    No. By "Global" we mean "everything".

    Simon.
    Simon Burton, Apr 14, 2004
    #14
  15. Ivan Voras

    Ivan Voras Guest

    Carl Banks wrote:

    >>Usually even then. Most applications with a really serious
    >>computational load

    >
    > You don't know if his application has a serious computational load.


    Depends on what you mean by computing - in my case it's not bare number
    crunching but the stuff python is good at and convenient to use, mostly
    string manipulation.


    > I highly disagree. It's reasonable to want a multi-threaded, pure
    > Python program to run faster with multiple processors, and without
    > having to rewrite the thing in C. The GIL limits the ability of pure
    > Python to take advantage of SMP, and that's a definite flaw in Python.


    I agree :)
    But now, looking at some other scripting languages, I don't see any that
    claim to be able to do what we're discussing here. Does anybody know of a
    scripting language good at "string crunching" that can exploit SMP with
    threading?

    ObNote: forking is another way, but very inconvenient...
    Ivan Voras, Apr 14, 2004
    #15
  16. Ivan Voras

    Donn Cave Guest

    In article <lI4fc.12338$B%>,
    Carl Banks <> wrote:
    > Donn Cave wrote:
    > > Quoth Ivan Voras <ivoras@__geri.cc.fer.hr>:
    > > | Donn Cave wrote:
    > > |
    > > | > It's just that part of the support for concurrency is a
    > > | > lock that protects Python internal data structures from
    > > | > unsound concurrent access. That's the reason for the GIL.
    > > | >
    > > | > And as I asserted, it isn't a significant problem in practice.
    > > |
    > > | Except if you're planning for multiple processors :(
    > >
    > > Usually even then. Most applications with a really serious
    > > computational load

    >
    > You don't know if his application has a serious computational load.


    I don't intend to guess at what his application is about,
    but that's the only case I ever hear about where it even
    theoretically matters. An application with a trivial
    computational aspect will run more or less concurrently.

    > > will implement the compute-intensive parts
    > > in C, as a Python module (or will use an existing module.)
    > > The ones that will implement that part in pure Python, as
    > > part of a multithreaded architecture that relies on SMP hardware,
    > > are very few.

    >
    > I highly disagree. It's reasonable to want a multi-threaded, pure
    > Python program to run faster with multiple processors, and without
    > having to rewrite the thing in C. The GIL limits the ability of pure
    > Python to take advantage of SMP, and that's a definite flaw in Python.
    >
    >
    > > It wouldn't be a good idea even if it worked.

    >
    > Why?


    Because it would still be slow.

    I'm not arguing that the GIL is a feature, though there may
    be a weak case for that (I've had pretty good luck with my
    Python programs in a multithreaded system that is supposed
    to be a big headache for C++ application programmers, and
    I've wondered if the modest amount of extra serialization
    Python imposes is actually helping me out there. But I haven't
    worked that idea out, because - it doesn't matter, this issue
    isn't going anywhere regardless.)

    I'm not arguing that no one cares at all, or that it's
    unreasonable to wish for it. I'm saying that the need for
    free threading doesn't add up to enough motivation for anyone
    to take on the very hairy task of a implementing it. (Greg
    Stein being the exception that proves the rule - he did
    implement it, and we still have a GIL.)

    I don't know how the advent of Python compilation options will
    change this. Obviously it makes Python more attractive for
    compute intensive work, but ... can the compilers use "safe"
    data structures so you can run unlocked?

    For the sidebar, ocaml has the same system - works with native
    OS threads if built that way, but protects itself with a global
    lock. Not a global interpreter lock, because this is compiled
    code, not interpreted, but still there are data structures.
    I happened to be reading a Linux man page for pthread mutexes,
    and Xavier Leroy's name appeared at the bottom - one of the
    implementors of ocaml, I believe. I'd be interested to hear
    about other languages' support for free threading.

    Donn Cave,
    Donn Cave, Apr 14, 2004
    #16
  17. Ivan Voras

    Ivan Voras Guest

    Roger Binns wrote:

    > Ivan Voras wrote:


    >>Except if you're planning for multiple processors :(

    >
    >
    > To better illustrate this, when you write C code that interfaces
    > with Python, it looks like this example from my libusb wrapper:
    >
    > Py_BEGIN_ALLOW_THREADS
    > res=usb_bulk_read(dev, ep, bytesoutbuffer, *bytesoutbuffersize, timeout);
    > Py_END_ALLOW_THREADS
    >
    > Any C code between BEGIN_ALLOW_THREADS and END_ALLOW_THREADS can
    > run concurrently with any other code meeting the same criteria.
    > This typically includes most forms of I/O, networking, operating
    > system access etc. Consequently code that uses that a lot
    > scales to multiple processors (assuming your OS scales).


    Thanks, this clarifies a lot :)

    So, during the usb_bulk_read() call above, python can and will execute
    another pure-python (or a similary mixed-C code) thread if such is available?
    Ivan Voras, Apr 14, 2004
    #17
  18. Ivan Voras

    Jeff Epler Guest

    On Wed, Apr 14, 2004 at 09:59:05AM -0700, Donn Cave wrote:
    > I'm not arguing that the GIL is a feature, though there may
    > be a weak case for that (I've had pretty good luck with my
    > Python programs in a multithreaded system that is supposed
    > to be a big headache for C++ application programmers, and
    > I've wondered if the modest amount of extra serialization
    > Python imposes is actually helping me out there. But I haven't
    > worked that idea out, because - it doesn't matter, this issue
    > isn't going anywhere regardless.)


    I have to relate this story:

    The application I work on recently switched from C to "C compiled by a
    C++ compiler, plus a little bit of C++ code". Basically, this sucks.
    Anyway, we've started to use parts of Boost, and I was excited to learn
    that Boost has a counted-pointer implementation.

    The simplest program I decide to try was to create and destroy a
    collection of a reference-counted object (only one C instance is
    created, and each of the 2^22 elements in the container is a reference
    or pointer to that object). In Python, this looked
    like so:
    class C(object): pass
    v = [C()] * (1<<22)
    and in C++ with boost:
    #include <boost/shared_ptr.hpp>
    #include <vector>
    class C { };

    int main(void) {
    boost::shared_ptr<C> p(new C);
    std::vector<boost::shared_ptr<C> > v((1<<22), p);
    }

    The C++ program consumes 35 megs and runs in 3.7 seconds, the Python
    program runs in .5 seconds and uses 22 megs. The Python program runs
    just fine with a list of size 1<<25, but boost can't handle it.
    If I compile the C++ program without support for threads, that at least
    trims the runtime to 1.5 seconds.

    The relevant detail here (oh, are you still reading?) is that making all
    those reference counts threadsafe in boost more than doubled runtime.
    Python does a *lot* of refcount modification!

    Jeff
    Jeff Epler, Apr 14, 2004
    #18
  19. Ivan Voras

    Alan Kennedy Guest

    [Carl Banks]
    >> It's reasonable to want a multi-threaded, pure
    >> Python program to run faster with multiple processors, and without
    >> having to rewrite the thing in C. The GIL limits the ability of
    >> pure Python to take advantage of SMP, and that's a definite flaw
    >> in Python.


    [Ivan Voras]
    > I agree :)
    > But now, looking at some other scripting languages, I don't see any
    > that claim to be able to do what we're discussing here. Does anybody
    > know of a scripting language good at "string crunching" that can
    > exploit SMP with threading?


    http://www.jython.org

    --
    alan kennedy
    ------------------------------------------------------
    check http headers here: http://xhaus.com/headers
    email alan: http://xhaus.com/contact/alan
    Alan Kennedy, Apr 14, 2004
    #19
  20. Ivan Voras

    Roger Binns Guest

    Ivan Voras wrote:
    > Roger Binns wrote:
    > >
    > > Py_BEGIN_ALLOW_THREADS
    > > res=usb_bulk_read(dev, ep, bytesoutbuffer, *bytesoutbuffersize, timeout);
    > > Py_END_ALLOW_THREADS

    >
    > So, during the usb_bulk_read() call above, python can and will execute
    > another pure-python (or a similary mixed-C code) thread if such is available?


    Yes. The call to Py_BEGIN_ALLOW_THREADS releases the GIL and the call to
    Py_END_ALLOW_THREADS claims it again. Only one thread at a time can
    own the GIL.

    The Python interpreter itself will continuously execute bytecode in
    one thread until sys.getcheckinterval() bytecode instructions have
    been executed, at which point it can switch to another eligible
    interpretter thread.

    I did see mention in one of these groups about how someone did try
    replacing the GIL with finer grained locking, and it actually performed
    noticably worse.

    Roger
    Roger Binns, Apr 14, 2004
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. abhinav
    Replies:
    1
    Views:
    432
    Armin Steinhoff
    Feb 19, 2006
  2. llothar
    Replies:
    38
    Views:
    1,060
    Terry Reedy
    Sep 20, 2007
  3. Jure Erznožnik
    Replies:
    51
    Views:
    1,602
    Hendrik van Rooyen
    Jun 22, 2009
  4. Ryan Kelly
    Replies:
    4
    Views:
    386
    sturlamolden
    Jun 29, 2010
  5. Ana Marija Sokovic

    Python and GIL

    Ana Marija Sokovic, May 30, 2013, in forum: Python
    Replies:
    1
    Views:
    110
    Steven D'Aprano
    May 30, 2013
Loading...

Share This Page