Are min() and max() thread-safe?

Discussion in 'Python' started by Steven D'Aprano, Sep 17, 2009.

  1. I have two threads, one running min() and the other running max() over
    the same list. I'm getting some mysterious results which I'm having
    trouble debugging. Are min() and max() thread-safe, or am I doing
    something fundamentally silly by having them walk over the same list
    simultaneously?

    My code is as follows. Is there anything obviously wrong with it?


    import threading, time

    class MMThread(threading.Thread):
    def __init__(self, data, func, target, where):
    super(MMThread, self).__init__()
    self._data = data
    self._func = func
    self._target = target
    self._where = where
    def run(self):
    self._target[self._where] = self._func(self._data)



    def minmax(seq):
    result = [None, None]
    t1 = MMThread(seq, min, result, 0)
    t2 = MMThread(seq, max, result, 1)
    t1.start()
    t2.start()
    # Block until all threads are done.
    while any([t1.isAlive(), t2.isAlive()]):
    time.sleep(0)
    assert None not in result
    return tuple(result)
     
    Steven D'Aprano, Sep 17, 2009
    #1
    1. Advertisements

  2. [snip]


    For clarity, I'm not (yet) asking for help debugging the errors I'm
    getting -- if I were, I would post a minimal example and a traceback. I'm
    just looking for reassurance that I'm not being an idiot for even
    attempting what I'm doing.

    Thank you.
     
    Steven D'Aprano, Sep 17, 2009
    #2
    1. Advertisements

  3. See for yourself: http://svn.python.org/view/python/trunk/Python/bltinmodule.c?view=markup

    min() and max() don't release the GIL, so yes, they are safe, and
    shouldn't see a list in an inconsistent state (with regard to the
    Python interpreter, but not necessarily to your application). But a
    threaded approach is somewhat silly, since the GIL ensures that they
    *won't* walk over the same list simultaneously (two separate lists,
    for that matter).

    -Miles
     
    Miles Kaufmann, Sep 17, 2009
    #3
  4. Steven D'Aprano

    Dj Gilcrease Guest

    I dont see anything wrong with it and it works for me on a sequence of 5000
     
    Dj Gilcrease, Sep 17, 2009
    #4
  5. Mostly gobbledygook to me I'm afraid :(

    Perhaps that's true for list contents which are built-ins like ints, but
    with custom objects, I can demonstrate that the two threads operate
    simultaneously at least sometimes. Unless I'm misinterpreting what I'm
    seeing.



    Nevertheless, this is useful information to know, thank you.
     
    Steven D'Aprano, Sep 17, 2009
    #5
  6. For one time sequences like files and generators your code is broken
    for obvious reasons.

    /Niklas Norrthon
     
    Niklas Norrthon, Sep 17, 2009
    #6
  7. If min and max call into Python code, which can happen for custom
    objects, then the interpreter will occasionally release the GIL to give
    other threads a chance to run. That way min and max will operate
    interleaved (if not exactly in parallel). Even so, I see no reason for
    them to break.
     
    Hrvoje Niksic, Sep 17, 2009
    #7
  8. Whoops, sorry. Yes, if you use Python functions (or C functions that
    release the GIL) for the object comparison methods, a custom key
    function, or the sequence iterator's methods, then the the min()/max()
    calls could overlap between threads. If you have additional threads
    that could modify the list, you should synchronize access to it; if
    any of the earlier-mentioned functions modify the list, you're likely
    to get "mysterious" (or at least potentially unexpected) results even
    in a single-threaded context.

    s/sequence/iterable/

    -Miles
     
    Miles Kaufmann, Sep 17, 2009
    #8
  9. Steven D'Aprano

    John Nagle Guest

    You don't seem to be modifying the data while iterating over it,
    so there's no real race condition issue.

    All Python guarantees is that you can't break the underlying memory
    model with threading. You're not guaranteed, for example,
    that "min" is atomic. There's a short list of primitives whose
    atomicity is guaranteed, at

    http://effbot.org/zone/thread-synchronization.htm

    In particular, some operations like appending to a list and
    popping an element from a list are atomic, which is useful.

    Be aware that CPython performance on multithreaded programs
    really sucks on multicore CPUs. Not only does the GIL prevent
    two CPUs from doing useful work at the same time, the locking
    logic is horribly inefficient for multiprocessors. Adding an
    additional CPU actually increases run time as the CPUs fight
    over the lock.

    So there's little point in doing this.

    John Nagle
     
    John Nagle, Sep 17, 2009
    #9
  10. Steven D'Aprano

    Paul McGuire Guest

    If you are calculating both min and max of a sequence, here is an
    algorithm that can cut your comparisons by 25% - for objects with rich/
    time-consuming comparisons, that can really add up.

    import sys
    if sys.version[0] == 2:
    range = xrange

    def minmax(seq):
    if not seq:
    return None, None
    min_ = seq[0]
    max_ = seq[0]
    seqlen = len(seq)
    start = seqlen % 2
    for i in range(start,seqlen,2):
    a,b = seq,seq[i+1]
    if a > b:
    a,b = b,a
    if a < min_:
    min_ = a
    if b > max_:
    max_ = b
    return min_,max_

    With this test code, I verified that the comparison count drops from
    2*len to 1.5*len:

    if __name__ == "__main__":

    import sys
    if sys.version[0] == 2:
    range = xrange

    import random

    def minmax_bf(seq):
    # brute force, just call min and max on sequence
    return min(seq),max(seq)

    testseq = [random.random() for i in range(1000000)]

    print minmax_bf(testseq)
    print minmax(testseq)

    class ComparisonCounter(object):
    tally = 0
    def __init__(self,obj):
    self.obj = obj
    def __cmp__(self,other):
    ComparisonCounter.tally += 1
    return cmp(self.obj,other.obj)
    def __getattr__(self,attr):
    return getattr(self.obj, attr)
    def __str__(self):
    return str(self.obj)
    def __repr__(self):
    return repr(self.obj)

    testseq = [ComparisonCounter(random.random()) for i in range
    (10001)]

    print minmax_bf(testseq)
    print ComparisonCounter.tally
    ComparisonCounter.tally = 0

    print minmax(testseq)
    print ComparisonCounter.tally


    Plus, now that you are finding both min and max in a single pass
    through the sequence, you can wrap this in a lock to make sure of the
    atomicity of your results.

    (Just for grins, I also tried sorting the list and returning elements
    0 and -1 for min and max - I got numbers of comparisons in the range
    of 12X to 15X the length of the sequence.)

    -- Paul
     
    Paul McGuire, Sep 17, 2009
    #10
  11. Apart from the plethora of indirection, nothing I can see.

    But there is something rotten. Going more basic, look at this:

    [email protected]:~/Junk> cat jjj.py
    import thread as th
    import time

    a = range(10000000)

    def worker(func,thing):
    start_time = time.time()
    print "start time is:",start_time
    res = func(thing)
    print res
    end_time = time.time()
    print "End at:",end_time,"That took:",end_time-start_time, 'seconds'

    t1 = th.start_new_thread(worker,(min,a))
    t2 = th.start_new_thread(worker,(max,a))

    while True:
    time.sleep(1)

    [email protected]:~/Junk> python -i jjj.py
    start time is: 1253176132.64
    0
    End at: 1253176133.34 That took: 0.700681209564 seconds
    start time is: 1253176133.34
    9999999
    End at: 1253176134.18 That took: 0.847566127777 seconds
    Traceback (most recent call last):
    File "jjj.py", line 18, in <module>
    time.sleep(1)
    KeyboardInterruptNo simultaneity.
    So when I increase the range to a hundred million to try to force some
    concurrency, I get:

    [email protected]:~/Junk> python -i jjj.py
    Killed
    [email protected]:~/Junk>

    The behaviour is that it just thrashes for some minutes and then it looks like
    the os kills it.

    SuSe Linux, dual core Intel with a gig of memory:

    Python 2.5.1 (r251:54863, Dec 6 2008, 10:49:39)
    [GCC 4.2.1 (SUSE Linux)] on linux2

    I enclose the file.

    - Hendrik
     
    Hendrik van Rooyen, Sep 17, 2009
    #11
  12. Steven D'Aprano

    Carl Banks Guest

    When running min or max on a list of ints, there is probably no
    occasion for the function to release the GIL. If a thread doesn't
    release the GIL no other Python threads can run. This behavior may be
    rotten, but it's totally expected.

    Try adding "key=lambda x:x" to the function call (which adds an
    occasion where the GIL might be released); you will see that it will
    switch threads.


    Carl Banks
     
    Carl Banks, Sep 17, 2009
    #12
  13. Steven D'Aprano

    Carl Banks Guest

    Why not use "t1.join(); t2.join()" here? Is there any benefit to do
    it this way instead?

    Carl Banks
     
    Carl Banks, Sep 17, 2009
    #13
  14. The lack of concurrency was not what I was complaining about, but the second
    case that just quietly freaked out when I made the list of ints large.

    I have subsequently seen that it is red herring in this context though, as it
    is some memory problem - the crash comes during the list creation, and has
    nothing to do with the min/max processing. One can demo it by simply trying
    to create a big list at the interactive prompt, and after a while you get
    the "Killed" message.

    a = range(100000000) does it for me on my machine.

    - Hendrik
     
    Hendrik van Rooyen, Sep 17, 2009
    #14
  15. Steven D'Aprano

    ryles Guest

    The benefit is that SIGINT (Control-C) will be handled. Not sure why
    sleep(0) is used, though, rather than something less busy like sleep
    (0.5).
     
    ryles, Sep 17, 2009
    #15
  16. Steven D'Aprano

    Carl Banks Guest

    Ah, I see. Yeah, that is rotten behavior in the sense that we don't
    have 128-bit machines with 200 terabytes of ram. :) Give it a few
    months, though


    Carl Banks
     
    Carl Banks, Sep 17, 2009
    #16
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.