another thread on Python threading

Discussion in 'Python' started by cgwalters@gmail.com, Jun 3, 2007.

  1. Guest

    Hi,

    I've recently been working on an application[1] which does quite a bit
    of searching through large data structures and string matching, and I
    was thinking that it would help to put some of this CPU-intensive work
    in another thread, but of course this won't work because of Python's
    GIL.

    There's a lot of past discussion on this, and I want to bring it up
    again because with the work on Python 3000, I think it is worth trying
    to take a look at what can be done to address portions of the problem
    through language changes.

    Also, the recent hardware trend towards multicore processors is
    another reason I think it is worth taking a look at the problem again.

    = dynamic objects, locking and __slots__ =

    I remember reading (though I can't find it now) one person's attempt
    at true multithreaded programming involved adding a mutex to all
    object access. The obvious question though is - why don't other true
    multithreaded languages like Java need to lock an object when making
    changes? The answer is that they don't support adding random
    attributes to objects; in other words, they default to the equivalent
    of __slots__.

    == Why hasn't __slots__ been successful? ==

    I very rarely see Python code use __slots__. I think there are
    several reasons for this. The first is that a lot of programs don't
    need to optimize on this level. The second is that it's annoying to
    use, because it means you have to type your member variables *another*
    time (in addition to __init__ for example), which feels very un-
    Pythonic.

    == Defining object attributes ==

    In my Python code, one restriction I try to follow is to set all the
    attributes I use for an object in __init__. You could do this as
    class member variables, but often I want to set them in __init__
    anyways from constructor arguments, so "defining" them in __init__
    means I only type them once, not twice.

    One random idea is to for Python 3000, make the equivalent of
    __slots__ the default, *but* instead gather
    the set of attributes from all member variables set in __init__. For
    example, if I write:

    class Foo(object):
    def __init__(self, bar=None):
    self.__baz = 20
    if bar:
    self.__bar = bar
    else:
    self.__bar = time.time()

    f = Foo()
    f.otherattr = 40 # this would be an error! Can't add random
    attributes not defined in __init__

    I would argue that the current Python default of supporting adding
    random attributes is almost never what you really want. If you *do*
    want to set random attributes, you almost certainly want to be using a
    dictionary or a subclass of one, not an object. What's nice about the
    current Python is that you don't need to redundantly type things, and
    we should preserve that while still allowing more efficient
    implementation strategies.

    = Limited threading =

    Now, I realize there are a ton of other things the GIL protects other
    than object dictionaries; with true threading you would have to touch
    the importer, the garbage collector, verify all the C extension
    modules, etc. Obviously non-trivial. What if as an initial push
    towards real threading, Python had support for "restricted threads".
    Essentially, restricted threads would be limited to a subset of the
    standard library that had been verified for thread safety, would not
    be able to import new modules, etc.

    Something like this:

    def datasearcher(list, queue):
    for item in list:
    if item.startswith('foo'):
    queue.put(item)
    queue.done()

    vals = ['foo', 'bar']
    queue = queue.Queue()
    threading.start_restricted_thread(datasearcher, vals, queue)
    def print_item(item):
    print item
    queue.set_callback(print_item)

    Making up some API above I know, but the point here is "datasearcher"
    could pretty easily run in a true thread and touch very little of the
    interpreter; only support for atomic reference counting and a
    concurrent garbage collector would be needed.

    Thoughts?

    [1] http://submind.verbum.org/hotwire/wiki
     
    , Jun 3, 2007
    #1
    1. Advertising

  2. Guest

    On Jun 3, 5:52 pm, Steve Howell <> wrote:

    > The pitfall here is that to reduce code duplication,
    > you might initialize certain variables in a method
    > called by __init__, because your object might want to
    > return to its initial state.


    This is a good point. I was thinking that this analysis would
    occur during module loading; i.e. it would be the equivalent of Java's
    classloading.

    What if the startup code analysis just extended to methods called
    during __init__?
    That seems like a relatively straightforward extension. Remember we
    aren't actually *executing*
    the startup code in my proposal; we are just analyzing it for all
    possible execution paths which cause
    a member variable assignment.
     
    , Jun 3, 2007
    #2
    1. Advertising

  3. wrote:
    > I've recently been working on an application[1] which does quite a bit
    > of searching through large data structures and string matching, and I
    > was thinking that it would help to put some of this CPU-intensive work
    > in another thread, but of course this won't work because of Python's
    > GIL.


    If you are doing string searching, implement the algorithm in C, and
    call out to the C (remembering to release the GIL).

    > There's a lot of past discussion on this, and I want to bring it up
    > again because with the work on Python 3000, I think it is worth trying
    > to take a look at what can be done to address portions of the problem
    > through language changes.


    Not going to happen. All Python 3000 PEPs had a due-date at least a
    month ago (possibly even 2), so you are too late to get *any*
    substantial change in.

    > I remember reading (though I can't find it now) one person's attempt
    > at true multithreaded programming involved adding a mutex to all
    > object access. The obvious question though is - why don't other true
    > multithreaded languages like Java need to lock an object when making
    > changes?


    From what I understand, the Java runtime uses fine-grained locking on
    all objects. You just don't notice it because you don't need to write
    the acquire()/release() calls. It is done for you. (in a similar
    fashion to Python's GIL acquisition/release when switching threads)
    They also have a nice little decorator-like thingy (I'm not a Java guy,
    so I don't know the name exactly) called 'synchronize', which locks and
    unlocks the object when accessing it through a method.


    - Josiah



    > == Why hasn't __slots__ been successful? ==
    >
    > I very rarely see Python code use __slots__. I think there are
    > several reasons for this. The first is that a lot of programs don't
    > need to optimize on this level. The second is that it's annoying to
    > use, because it means you have to type your member variables *another*
    > time (in addition to __init__ for example), which feels very un-
    > Pythonic.
    >
    > == Defining object attributes ==
    >
    > In my Python code, one restriction I try to follow is to set all the
    > attributes I use for an object in __init__. You could do this as
    > class member variables, but often I want to set them in __init__
    > anyways from constructor arguments, so "defining" them in __init__
    > means I only type them once, not twice.
    >
    > One random idea is to for Python 3000, make the equivalent of
    > __slots__ the default, *but* instead gather
    > the set of attributes from all member variables set in __init__. For
    > example, if I write:
    >
    > class Foo(object):
    > def __init__(self, bar=None):
    > self.__baz = 20
    > if bar:
    > self.__bar = bar
    > else:
    > self.__bar = time.time()
    >
    > f = Foo()
    > f.otherattr = 40 # this would be an error! Can't add random
    > attributes not defined in __init__
    >
    > I would argue that the current Python default of supporting adding
    > random attributes is almost never what you really want. If you *do*
    > want to set random attributes, you almost certainly want to be using a
    > dictionary or a subclass of one, not an object. What's nice about the
    > current Python is that you don't need to redundantly type things, and
    > we should preserve that while still allowing more efficient
    > implementation strategies.
    >
    > = Limited threading =
    >
    > Now, I realize there are a ton of other things the GIL protects other
    > than object dictionaries; with true threading you would have to touch
    > the importer, the garbage collector, verify all the C extension
    > modules, etc. Obviously non-trivial. What if as an initial push
    > towards real threading, Python had support for "restricted threads".
    > Essentially, restricted threads would be limited to a subset of the
    > standard library that had been verified for thread safety, would not
    > be able to import new modules, etc.
    >
    > Something like this:
    >
    > def datasearcher(list, queue):
    > for item in list:
    > if item.startswith('foo'):
    > queue.put(item)
    > queue.done()
    >
    > vals = ['foo', 'bar']
    > queue = queue.Queue()
    > threading.start_restricted_thread(datasearcher, vals, queue)
    > def print_item(item):
    > print item
    > queue.set_callback(print_item)
    >
    > Making up some API above I know, but the point here is "datasearcher"
    > could pretty easily run in a true thread and touch very little of the
    > interpreter; only support for atomic reference counting and a
    > concurrent garbage collector would be needed.
    >
    > Thoughts?
    >
    > [1] http://submind.verbum.org/hotwire/wiki
    >
     
    Josiah Carlson, Jun 4, 2007
    #3
  4. sturlamolden Guest

    On Jun 4, 3:10 am, Josiah Carlson <>
    wrote:
    > wrote:
    > > I've recently been working on an application[1] which does quite a bit
    > > of searching through large data structures and string matching, and I
    > > was thinking that it would help to put some of this CPU-intensive work
    > > in another thread, but of course this won't work because of Python's
    > > GIL.

    >
    > If you are doing string searching, implement the algorithm in C, and
    > call out to the C (remembering to release the GIL).
    >
    > > There's a lot of past discussion on this, and I want to bring it up
    > > again because with the work on Python 3000, I think it is worth trying
    > > to take a look at what can be done to address portions of the problem
    > > through language changes.

    >
    > Not going to happen. All Python 3000 PEPs had a due-date at least a
    > month ago (possibly even 2), so you are too late to get *any*
    > substantial change in.
    >
    > > I remember reading (though I can't find it now) one person's attempt
    > > at true multithreaded programming involved adding a mutex to all
    > > object access. The obvious question though is - why don't other true
    > > multithreaded languages like Java need to lock an object when making
    > > changes?

    >
    > From what I understand, the Java runtime uses fine-grained locking on
    > all objects. You just don't notice it because you don't need to write
    > the acquire()/release() calls. It is done for you. (in a similar
    > fashion to Python's GIL acquisition/release when switching threads)


    The problem is CPython's reference counting. Access to reference
    counts must be synchronized.

    Java, IronPython and Jython uses another scheme for the garbage
    collector and do not need a GIL.

    Changing CPython's garbage collection from reference counting to a
    generational GC will be a major undertaking. There are also pros and
    cons to using reference counts instead of 'modern' garbage collectors.
    For example, unless there are cyclic references, one can always know
    when an object is garbage collected. One also avoids periodic delays
    when garbage are collected, and memory use can be more modest then a
    lot of small temporary objects are being used.

    Also beware that the GIL is only a problem for CPU bound code. IO
    bound code is not slowed by the GIL. The Python runtime itself is a
    bigger problem for CPU bound code.

    In C or Fortran, writing parallell algorithms for multiprocessor
    systems typically involves using OpenMP or MPI. Parallelizing
    algorithms using manual threading should be discouraged. It is far
    better to insert a compiler directive (#pragma omp) and let an OpenMP
    compiler to the job.

    There are a number of different options for exploiting multiple CPUs
    from CPython, including:

    - MPI (e.g. mpi4py or PyMPI)
    - PyPar
    - os.fork() on Linux or Unix
    - subprocess.Popen
    - C extensions that use OpenMP
    - C extensions that spawn threads (should be discouraged!)

    > They also have a nice little decorator-like thingy (I'm not a Java guy,
    > so I don't know the name exactly) called 'synchronize', which locks and
    > unlocks the object when accessing it through a method.


    A similar Python 'synchronized' function decorator may look like this:

    def synchronized(fun):
    from threading import RLock
    rl = RLock()
    def decorator(*args,**kwargs):
    with rl:
    retv = fun(*args,**kwargs)
    return retv
    return decorator

    It is not possible to define a 'synchronized' block though, as Python
    do not have Lisp macros :(




















    >
    > - Josiah
    >
    > > == Why hasn't __slots__ been successful? ==

    >
    > > I very rarely see Python code use __slots__. I think there are
    > > several reasons for this. The first is that a lot of programs don't
    > > need to optimize on this level. The second is that it's annoying to
    > > use, because it means you have to type your member variables *another*
    > > time (in addition to __init__ for example), which feels very un-
    > > Pythonic.

    >
    > > == Defining object attributes ==

    >
    > > In my Python code, one restriction I try to follow is to set all the
    > > attributes I use for an object in __init__. You could do this as
    > > class member variables, but often I want to set them in __init__
    > > anyways from constructor arguments, so "defining" them in __init__
    > > means I only type them once, not twice.

    >
    > > One random idea is to for Python 3000, make the equivalent of
    > > __slots__ the default, *but* instead gather
    > > the set of attributes from all member variables set in __init__. For
    > > example, if I write:

    >
    > > class Foo(object):
    > > def __init__(self, bar=None):
    > > self.__baz = 20
    > > if bar:
    > > self.__bar = bar
    > > else:
    > > self.__bar = time.time()

    >
    > > f = Foo()
    > > f.otherattr = 40 # this would be an error! Can't add random
    > > attributes not defined in __init__

    >
    > > I would argue that the current Python default of supporting adding
    > > random attributes is almost never what you really want. If you *do*
    > > want to set random attributes, you almost certainly want to be using a
    > > dictionary or a subclass of one, not an object. What's nice about the
    > > current Python is that you don't need to redundantly type things, and
    > > we should preserve that while still allowing more efficient
    > > implementation strategies.

    >
    > > = Limited threading =

    >
    > > Now, I realize there are a ton of other things the GIL protects other
    > > than object dictionaries; with true threading you would have to touch
    > > the importer, the garbage collector, verify all the C extension
    > > modules, etc. Obviously non-trivial. What if as an initial push
    > > towards real threading, Python had support for "restricted threads".
    > > Essentially, restricted threads would be limited to a subset of the
    > > standard library that had been verified for thread safety, would not
    > > be able to import new modules, etc.

    >
    > > Something like this:

    >
    > > def datasearcher(list, queue):
    > > for item in list:
    > > if item.startswith('foo'):
    > > queue.put(item)
    > > queue.done()

    >
    > > vals = ['foo', 'bar']
    > > queue = queue.Queue()
    > > threading.start_restricted_thread(datasearcher, vals, queue)
    > > def print_item(item):
    > > print item
    > > queue.set_callback(print_item)

    >
    > > Making up some API above I know, but the point here is "datasearcher"
    > > could pretty easily run in a true thread and touch very little of the
    > > interpreter; only support for atomic reference counting and a
    > > concurrent garbage collector would be needed.

    >
    > > Thoughts?

    >
    > > [1]http://submind.verbum.org/hotwire/wiki
     
    sturlamolden, Jun 4, 2007
    #4
  5. Guest

    On Jun 3, 9:10 pm, Josiah Carlson <>
    wrote:
    >
    > If you are doing string searching, implement the algorithm in C, and
    > call out to the C (remembering to release the GIL).


    I considered that, but...ick! The whole reason I'm writing this
    program
    in Python in the first place is so I don't have to deal with the mess
    that is involved when you do string matching and data structure
    traversal
    in C.

    On the other hand, there are likely C libraries out there for
    searching the
    kinds of data structures I use; I'll investigate.

    > > There's a lot of past discussion on this, and I want to bring it up
    > > again because with the work on Python 3000, I think it is worth trying
    > > to take a look at what can be done to address portions of the problem
    > > through language changes.

    >
    > Not going to happen. All Python 3000 PEPs had a due-date at least a
    > month ago (possibly even 2), so you are too late to get *any*
    > substantial change in.


    =( Too bad. It might be possible to do these changes in a backwards
    compatible way,
    though less elegantly. For example, the object change could be
    denoted by inheriting from "fixedobject"
    or something.
     
    , Jun 4, 2007
    #5
  6. sturlamolden wrote:
    > On Jun 4, 3:10 am, Josiah Carlson <>
    > wrote:
    >> From what I understand, the Java runtime uses fine-grained locking on
    >> all objects. You just don't notice it because you don't need to write
    >> the acquire()/release() calls. It is done for you. (in a similar
    >> fashion to Python's GIL acquisition/release when switching threads)

    >
    > The problem is CPython's reference counting. Access to reference
    > counts must be synchronized.
    > Java, IronPython and Jython uses another scheme for the garbage
    > collector and do not need a GIL.


    There was a discussion regarding this in the python-ideas list recently.
    You *can* attach a lock to every object, and use fine-grained locking
    to handle refcounts. Alternatively, you can use platform-specific
    atomic increments and decrements, or even a secondary 'owner thread'
    refcount that doesn't need to be locked by 1 thread at a time.

    It turns out that atomic updates are slow, and I wasn't able to get any
    sort of productive results using 'owner threads' (seemed generally
    negative, and was certainly more work to make happen). I don't believe
    anyone bothered to test fine-grained locking on objects.

    However, locking isn't just for refcounts, it's to make sure that thread
    A isn't mangling your object while thread B is traversing it. With
    object locking (course via the GIL, or fine via object-specific locks),
    you get the same guarantees, with the problem being that fine-grained
    locking is about a bazillion times more difficult to verify the lack of
    deadlocks than a GIL-based approach.


    > Changing CPython's garbage collection from reference counting to a
    > generational GC will be a major undertaking. There are also pros and
    > cons to using reference counts instead of 'modern' garbage collectors.
    > For example, unless there are cyclic references, one can always know
    > when an object is garbage collected. One also avoids periodic delays
    > when garbage are collected, and memory use can be more modest then a
    > lot of small temporary objects are being used.


    It was done a while ago. The results? On a single-processor machine,
    Python code ran like 1/4-1/3 the speed of the original runtime. When
    using 4+ processors, there were some gains in threaded code, but not
    substantial at that point.


    > There are a number of different options for exploiting multiple CPUs
    > from CPython, including:


    My current favorite is the processing package (available from the Python
    cheeseshop). You get much of the same API as threading, only you are
    using processes instead. It works on Windows, OSX, and *nix.


    > def synchronized(fun):
    > from threading import RLock
    > rl = RLock()
    > def decorator(*args,**kwargs):
    > with rl:
    > retv = fun(*args,**kwargs)
    > return retv
    > return decorator
    >
    > It is not possible to define a 'synchronized' block though, as Python
    > do not have Lisp macros :(


    Except that you just used the precise mechanism necessary to get a
    synchronized block in Python:

    lock = threading.Lock()

    with lock:
    #synchronized block!
    pass


    - Josiah
     
    Josiah Carlson, Jun 4, 2007
    #6
  7. sturlamolden Guest

    On Jun 4, 10:11 pm, Josiah Carlson <>
    wrote:

    > lock = threading.Lock()
    >
    > with lock:
    > #synchronized block!
    > pass


    True, except that the lock has to be shared among the threads. This
    explicit initiation of an reentrant lock is avoided in a Java
    synchronized block.
     
    sturlamolden, Jun 4, 2007
    #7
  8. sturlamolden Guest

    On Jun 4, 10:11 pm, Josiah Carlson <>
    wrote:

    > However, locking isn't just for refcounts, it's to make sure that thread
    > A isn't mangling your object while thread B is traversing it.



    > With
    > object locking (course via the GIL, or fine via object-specific locks),
    > you get the same guarantees, with the problem being that fine-grained
    > locking is about a bazillion times more difficult to verify the lack of
    > deadlocks than a GIL-based approach.


    I think this is just as much a question of what the runtime should
    guarantee. One don't need a guarantee that two threads are not
    mangling the same object simultaneously. Instead, the runtime could
    leave it to the programmer to use explicit locks on the object or
    synchronized blocks to guarantee this for himself.


    > It was done a while ago. The results? On a single-processor machine,
    > Python code ran like 1/4-1/3 the speed of the original runtime. When
    > using 4+ processors, there were some gains in threaded code, but not
    > substantial at that point.


    I am not surprised. Reference counts are quite efficient, contrary to
    common belief. The problem with reference counts is cyclic references
    involving objects that define a __del__ method. As these objects are
    not eligible for cyclic garbage collection, this can produce resource
    leaks.


    > My current favorite is the processing package (available from the Python
    > cheeseshop).


    Thanks. I'll take a look at that.
     
    sturlamolden, Jun 4, 2007
    #8
  9. sturlamolden wrote:
    > On Jun 4, 10:11 pm, Josiah Carlson <>
    > wrote:
    >
    >> lock = threading.Lock()
    >>
    >> with lock:
    >> #synchronized block!
    >> pass

    >
    > True, except that the lock has to be shared among the threads. This
    > explicit initiation of an reentrant lock is avoided in a Java
    > synchronized block.


    You toss the lock creation in the global namespace of the module for
    which you would like to synchronize access.


    - Josiah
     
    Josiah Carlson, Jun 5, 2007
    #9
  10. sturlamolden wrote:
    > On Jun 4, 10:11 pm, Josiah Carlson <>
    > wrote:
    >
    >> However, locking isn't just for refcounts, it's to make sure that thread
    >> A isn't mangling your object while thread B is traversing it.

    >
    >
    >> With
    >> object locking (course via the GIL, or fine via object-specific locks),
    >> you get the same guarantees, with the problem being that fine-grained
    >> locking is about a bazillion times more difficult to verify the lack of
    >> deadlocks than a GIL-based approach.

    >
    > I think this is just as much a question of what the runtime should
    > guarantee. One don't need a guarantee that two threads are not
    > mangling the same object simultaneously. Instead, the runtime could
    > leave it to the programmer to use explicit locks on the object or
    > synchronized blocks to guarantee this for himself.


    Why? Right now we have a language where the only thing that doing silly
    things with threads can get you now is perhaps a deadlock, perhaps
    incorrect execution, or maybe some data corruption if you are working
    with files. If we forced all thread users to synchronize everything
    themselves, we get an uglier language, and incorrectly written code
    could potentially cause crashes (though all of the earlier drawbacks
    still apply). In the "safety vs. speed" or "easy vs. fast" arenas,
    Python has already chosen safe and easy rather than fast. I doubt it is
    going to change any time soon.


    >> It was done a while ago. The results? On a single-processor machine,
    >> Python code ran like 1/4-1/3 the speed of the original runtime. When
    >> using 4+ processors, there were some gains in threaded code, but not
    >> substantial at that point.

    >
    > I am not surprised. Reference counts are quite efficient, contrary to
    > common belief. The problem with reference counts is cyclic references
    > involving objects that define a __del__ method. As these objects are
    > not eligible for cyclic garbage collection, this can produce resource
    > leaks.


    There was a discussion about removing __del__ within the last couple
    weeks. I didn't pay much attention to it (having learned never to use
    __del__), but I believe it involved some sort of weakref-based cleanup.

    - Josiah
     
    Josiah Carlson, Jun 5, 2007
    #10
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. sayoyo
    Replies:
    3
    Views:
    1,174
    sayoyo
    Feb 16, 2004
  2. Jerry Sievers

    thread, threading; how to kill a thread?

    Jerry Sievers, Nov 17, 2004, in forum: Python
    Replies:
    12
    Views:
    1,157
    Mustafa Demirhan
    Nov 19, 2004
  3. Replies:
    9
    Views:
    1,048
    Mark Space
    Dec 29, 2007
  4. Steven Woody
    Replies:
    0
    Views:
    425
    Steven Woody
    Jan 9, 2009
  5. Steven Woody
    Replies:
    0
    Views:
    458
    Steven Woody
    Jan 9, 2009
Loading...

Share This Page