Are min() and max() thread-safe?

S

Steven D'Aprano

I have two threads, one running min() and the other running max() over
the same list. I'm getting some mysterious results which I'm having
trouble debugging. Are min() and max() thread-safe, or am I doing
something fundamentally silly by having them walk over the same list
simultaneously?

My code is as follows. Is there anything obviously wrong with it?


import threading, time

class MMThread(threading.Thread):
def __init__(self, data, func, target, where):
super(MMThread, self).__init__()
self._data = data
self._func = func
self._target = target
self._where = where
def run(self):
self._target[self._where] = self._func(self._data)



def minmax(seq):
result = [None, None]
t1 = MMThread(seq, min, result, 0)
t2 = MMThread(seq, max, result, 1)
t1.start()
t2.start()
# Block until all threads are done.
while any([t1.isAlive(), t2.isAlive()]):
time.sleep(0)
assert None not in result
return tuple(result)
 
S

Steven D'Aprano

I have two threads, one running min() and the other running max() over
the same list. I'm getting some mysterious results which I'm having
trouble debugging. Are min() and max() thread-safe, or am I doing
something fundamentally silly by having them walk over the same list
simultaneously?

My code is as follows. Is there anything obviously wrong with it?
[snip]


For clarity, I'm not (yet) asking for help debugging the errors I'm
getting -- if I were, I would post a minimal example and a traceback. I'm
just looking for reassurance that I'm not being an idiot for even
attempting what I'm doing.

Thank you.
 
M

Miles Kaufmann

I have two threads, one running min() and the other running max() over
the same list. I'm getting some mysterious results which I'm having
trouble debugging. Are min() and max() thread-safe, or am I doing
something fundamentally silly by having them walk over the same list
simultaneously?

See for yourself: http://svn.python.org/view/python/trunk/Python/bltinmodule.c?view=markup

min() and max() don't release the GIL, so yes, they are safe, and
shouldn't see a list in an inconsistent state (with regard to the
Python interpreter, but not necessarily to your application). But a
threaded approach is somewhat silly, since the GIL ensures that they
*won't* walk over the same list simultaneously (two separate lists,
for that matter).

-Miles
 
S

Steven D'Aprano


Mostly gobbledygook to me I'm afraid :(

min() and max() don't release the GIL, so yes, they are safe, and
shouldn't see a list in an inconsistent state (with regard to the Python
interpreter, but not necessarily to your application). But a threaded
approach is somewhat silly, since the GIL ensures that they *won't* walk
over the same list simultaneously (two separate lists, for that matter).

Perhaps that's true for list contents which are built-ins like ints, but
with custom objects, I can demonstrate that the two threads operate
simultaneously at least sometimes. Unless I'm misinterpreting what I'm
seeing.



Nevertheless, this is useful information to know, thank you.
 
N

Niklas Norrthon

I have two threads, one running min() and the other running max() over
the same list. I'm getting some mysterious results which I'm having
trouble debugging. Are min() and max() thread-safe, or am I doing
something fundamentally silly by having them walk over the same list
simultaneously?
For one time sequences like files and generators your code is broken
for obvious reasons.

/Niklas Norrthon
 
H

Hrvoje Niksic

Steven D'Aprano said:
Perhaps that's true for list contents which are built-ins like ints,
but with custom objects, I can demonstrate that the two threads
operate simultaneously at least sometimes. Unless I'm misinterpreting
what I'm seeing.

If min and max call into Python code, which can happen for custom
objects, then the interpreter will occasionally release the GIL to give
other threads a chance to run. That way min and max will operate
interleaved (if not exactly in parallel). Even so, I see no reason for
them to break.
 
M

Miles Kaufmann

Perhaps that's true for list contents which are built-ins like ints,
but
with custom objects, I can demonstrate that the two threads operate
simultaneously at least sometimes. Unless I'm misinterpreting what I'm
seeing.

Whoops, sorry. Yes, if you use Python functions (or C functions that
release the GIL) for the object comparison methods, a custom key
function, or the sequence iterator's methods, then the the min()/max()
calls could overlap between threads. If you have additional threads
that could modify the list, you should synchronize access to it; if
any of the earlier-mentioned functions modify the list, you're likely
to get "mysterious" (or at least potentially unexpected) results even
in a single-threaded context.

For one time sequences like files and generators your code is broken
for obvious reasons.

s/sequence/iterable/

-Miles
 
J

John Nagle

Steven said:
I have two threads, one running min() and the other running max() over
the same list. I'm getting some mysterious results which I'm having
trouble debugging. Are min() and max() thread-safe, or am I doing
something fundamentally silly by having them walk over the same list
simultaneously?

My code is as follows. Is there anything obviously wrong with it?

You don't seem to be modifying the data while iterating over it,
so there's no real race condition issue.

All Python guarantees is that you can't break the underlying memory
model with threading. You're not guaranteed, for example,
that "min" is atomic. There's a short list of primitives whose
atomicity is guaranteed, at

http://effbot.org/zone/thread-synchronization.htm

In particular, some operations like appending to a list and
popping an element from a list are atomic, which is useful.

Be aware that CPython performance on multithreaded programs
really sucks on multicore CPUs. Not only does the GIL prevent
two CPUs from doing useful work at the same time, the locking
logic is horribly inefficient for multiprocessors. Adding an
additional CPU actually increases run time as the CPUs fight
over the lock.

So there's little point in doing this.

John Nagle
 
P

Paul McGuire

I have two threads, one running min() and the other running max() over
the same list. I'm getting some mysterious results which I'm having
trouble debugging. Are min() and max() thread-safe, or am I doing
something fundamentally silly by having them walk over the same list
simultaneously?

If you are calculating both min and max of a sequence, here is an
algorithm that can cut your comparisons by 25% - for objects with rich/
time-consuming comparisons, that can really add up.

import sys
if sys.version[0] == 2:
range = xrange

def minmax(seq):
if not seq:
return None, None
min_ = seq[0]
max_ = seq[0]
seqlen = len(seq)
start = seqlen % 2
for i in range(start,seqlen,2):
a,b = seq,seq[i+1]
if a > b:
a,b = b,a
if a < min_:
min_ = a
if b > max_:
max_ = b
return min_,max_

With this test code, I verified that the comparison count drops from
2*len to 1.5*len:

if __name__ == "__main__":

import sys
if sys.version[0] == 2:
range = xrange

import random

def minmax_bf(seq):
# brute force, just call min and max on sequence
return min(seq),max(seq)

testseq = [random.random() for i in range(1000000)]

print minmax_bf(testseq)
print minmax(testseq)

class ComparisonCounter(object):
tally = 0
def __init__(self,obj):
self.obj = obj
def __cmp__(self,other):
ComparisonCounter.tally += 1
return cmp(self.obj,other.obj)
def __getattr__(self,attr):
return getattr(self.obj, attr)
def __str__(self):
return str(self.obj)
def __repr__(self):
return repr(self.obj)

testseq = [ComparisonCounter(random.random()) for i in range
(10001)]

print minmax_bf(testseq)
print ComparisonCounter.tally
ComparisonCounter.tally = 0

print minmax(testseq)
print ComparisonCounter.tally


Plus, now that you are finding both min and max in a single pass
through the sequence, you can wrap this in a lock to make sure of the
atomicity of your results.

(Just for grins, I also tried sorting the list and returning elements
0 and -1 for min and max - I got numbers of comparisons in the range
of 12X to 15X the length of the sequence.)

-- Paul
 
H

Hendrik van Rooyen

I have two threads, one running min() and the other running max() over
the same list. I'm getting some mysterious results which I'm having
trouble debugging. Are min() and max() thread-safe, or am I doing
something fundamentally silly by having them walk over the same list
simultaneously?

My code is as follows. Is there anything obviously wrong with it?

Apart from the plethora of indirection, nothing I can see.

But there is something rotten. Going more basic, look at this:

hvr@Linuxbox:~/Junk> cat jjj.py
import thread as th
import time

a = range(10000000)

def worker(func,thing):
start_time = time.time()
print "start time is:",start_time
res = func(thing)
print res
end_time = time.time()
print "End at:",end_time,"That took:",end_time-start_time, 'seconds'

t1 = th.start_new_thread(worker,(min,a))
t2 = th.start_new_thread(worker,(max,a))

while True:
time.sleep(1)

hvr@Linuxbox:~/Junk> python -i jjj.py
start time is: 1253176132.64
0
End at: 1253176133.34 That took: 0.700681209564 seconds
start time is: 1253176133.34
9999999
End at: 1253176134.18 That took: 0.847566127777 seconds
Traceback (most recent call last):
File "jjj.py", line 18, in <module>
time.sleep(1)
KeyboardInterruptNo simultaneity.
So when I increase the range to a hundred million to try to force some
concurrency, I get:

hvr@Linuxbox:~/Junk> python -i jjj.py
Killed
hvr@Linuxbox:~/Junk>

The behaviour is that it just thrashes for some minutes and then it looks like
the os kills it.

SuSe Linux, dual core Intel with a gig of memory:

Python 2.5.1 (r251:54863, Dec 6 2008, 10:49:39)
[GCC 4.2.1 (SUSE Linux)] on linux2

I enclose the file.

- Hendrik
 
C

Carl Banks

Apart from the plethora of indirection, nothing I can see.

But there is something rotten.  Going more basic, look at this:

hvr@Linuxbox:~/Junk> cat jjj.py
import thread as th
import time

a = range(10000000)

def worker(func,thing):
        start_time = time.time()
        print "start time is:",start_time
        res = func(thing)
        print res
        end_time = time.time()
        print "End at:",end_time,"That took:",end_time-start_time, 'seconds'

t1 = th.start_new_thread(worker,(min,a))
t2 = th.start_new_thread(worker,(max,a))

while True:
        time.sleep(1)

hvr@Linuxbox:~/Junk> python -i jjj.py
start time is: 1253176132.64
0
End at: 1253176133.34 That took: 0.700681209564 seconds
start time is: 1253176133.34
9999999
End at: 1253176134.18 That took: 0.847566127777 seconds
Traceback (most recent call last):
  File "jjj.py", line 18, in <module>
    time.sleep(1)
KeyboardInterrupt

No simultaneity.

When running min or max on a list of ints, there is probably no
occasion for the function to release the GIL. If a thread doesn't
release the GIL no other Python threads can run. This behavior may be
rotten, but it's totally expected.

Try adding "key=lambda x:x" to the function call (which adds an
occasion where the GIL might be released); you will see that it will
switch threads.


Carl Banks
 
C

Carl Banks

def minmax(seq):
    result = [None, None]
    t1 = MMThread(seq, min, result, 0)
    t2 = MMThread(seq, max, result, 1)
    t1.start()
    t2.start()
    # Block until all threads are done.
    while any([t1.isAlive(), t2.isAlive()]):
        time.sleep(0)

Why not use "t1.join(); t2.join()" here? Is there any benefit to do
it this way instead?

Carl Banks
 
H

Hendrik van Rooyen

On Sep 17, 2:18 am, Hendrik van Rooyen <[email protected]>

wrote:
When running min or max on a list of ints, there is probably no
occasion for the function to release the GIL. If a thread doesn't
release the GIL no other Python threads can run. This behavior may be
rotten, but it's totally expected.

The lack of concurrency was not what I was complaining about, but the second
case that just quietly freaked out when I made the list of ints large.

I have subsequently seen that it is red herring in this context though, as it
is some memory problem - the crash comes during the list creation, and has
nothing to do with the min/max processing. One can demo it by simply trying
to create a big list at the interactive prompt, and after a while you get
the "Killed" message.

a = range(100000000) does it for me on my machine.

- Hendrik
 
R

ryles

def minmax(seq):
    result = [None, None]
    t1 = MMThread(seq, min, result, 0)
    t2 = MMThread(seq, max, result, 1)
    t1.start()
    t2.start()
    # Block until all threads are done.
    while any([t1.isAlive(), t2.isAlive()]):
        time.sleep(0)

Why not use "t1.join(); t2.join()" here?  Is there any benefit to do
it this way instead?

Carl Banks

The benefit is that SIGINT (Control-C) will be handled. Not sure why
sleep(0) is used, though, rather than something less busy like sleep
(0.5).
 
C

Carl Banks

I have subsequently seen that it is red herring in this context though, as it
is some memory problem - the crash  comes during the list creation, and has
nothing to do with the min/max processing.  One can demo it by simply trying
to create a big list at the interactive prompt, and after a while you get
the "Killed" message.

a = range(100000000)   does it for me on my machine.

Ah, I see. Yeah, that is rotten behavior in the sense that we don't
have 128-bit machines with 200 terabytes of ram. :) Give it a few
months, though


Carl Banks
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,013
Latest member
KatriceSwa

Latest Threads

Top