Are min() and max() thread-safe?

Steven D'Aprano · Sep 17, 2009

I have two threads, one running min() and the other running max() over
the same list. I'm getting some mysterious results which I'm having
trouble debugging. Are min() and max() thread-safe, or am I doing
something fundamentally silly by having them walk over the same list
simultaneously?

My code is as follows. Is there anything obviously wrong with it?

import threading, time

class MMThread(threading.Thread):
def __init__(self, data, func, target, where):
super(MMThread, self).__init__()
self._data = data
self._func = func
self._target = target
self._where = where
def run(self):
self._target[self._where] = self._func(self._data)

def minmax(seq):
result = [None, None]
t1 = MMThread(seq, min, result, 0)
t2 = MMThread(seq, max, result, 1)
t1.start()
t2.start()
# Block until all threads are done.
while any([t1.isAlive(), t2.isAlive()]):
time.sleep(0)
assert None not in result
return tuple(result)

Steven D'Aprano · Sep 17, 2009

I have two threads, one running min() and the other running max() over
the same list. I'm getting some mysterious results which I'm having
trouble debugging. Are min() and max() thread-safe, or am I doing
something fundamentally silly by having them walk over the same list
simultaneously?

My code is as follows. Is there anything obviously wrong with it?

[snip]

For clarity, I'm not (yet) asking for help debugging the errors I'm
getting -- if I were, I would post a minimal example and a traceback. I'm
just looking for reassurance that I'm not being an idiot for even
attempting what I'm doing.

Thank you.

Miles Kaufmann · Sep 17, 2009

I have two threads, one running min() and the other running max() over
the same list. I'm getting some mysterious results which I'm having
trouble debugging. Are min() and max() thread-safe, or am I doing
something fundamentally silly by having them walk over the same list
simultaneously?

See for yourself: http://svn.python.org/view/python/trunk/Python/bltinmodule.c?view=markup

min() and max() don't release the GIL, so yes, they are safe, and
shouldn't see a list in an inconsistent state (with regard to the
Python interpreter, but not necessarily to your application). But a
threaded approach is somewhat silly, since the GIL ensures that they
*won't* walk over the same list simultaneously (two separate lists,
for that matter).

-Miles

Dj Gilcrease · Sep 17, 2009

I dont see anything wrong with it and it works for me on a sequence of 5000

Steven D'Aprano · Sep 17, 2009

See for yourself:
http://svn.python.org/view/python/trunk/Python/bltinmodule.c?view=markup

Mostly gobbledygook to me I'm afraid

min() and max() don't release the GIL, so yes, they are safe, and
shouldn't see a list in an inconsistent state (with regard to the Python
interpreter, but not necessarily to your application). But a threaded
approach is somewhat silly, since the GIL ensures that they *won't* walk
over the same list simultaneously (two separate lists, for that matter).

Perhaps that's true for list contents which are built-ins like ints, but
with custom objects, I can demonstrate that the two threads operate
simultaneously at least sometimes. Unless I'm misinterpreting what I'm
seeing.

Nevertheless, this is useful information to know, thank you.

Niklas Norrthon · Sep 17, 2009

I have two threads, one running min() and the other running max() over
the same list. I'm getting some mysterious results which I'm having
trouble debugging. Are min() and max() thread-safe, or am I doing
something fundamentally silly by having them walk over the same list
simultaneously?

For one time sequences like files and generators your code is broken
for obvious reasons.

/Niklas Norrthon

Hrvoje Niksic · Sep 17, 2009

Steven D'Aprano said:
Perhaps that's true for list contents which are built-ins like ints,
but with custom objects, I can demonstrate that the two threads
operate simultaneously at least sometimes. Unless I'm misinterpreting
what I'm seeing.

If min and max call into Python code, which can happen for custom
objects, then the interpreter will occasionally release the GIL to give
other threads a chance to run. That way min and max will operate
interleaved (if not exactly in parallel). Even so, I see no reason for
them to break.

Miles Kaufmann · Sep 17, 2009

Perhaps that's true for list contents which are built-ins like ints,
but
with custom objects, I can demonstrate that the two threads operate
simultaneously at least sometimes. Unless I'm misinterpreting what I'm
seeing.

Whoops, sorry. Yes, if you use Python functions (or C functions that
release the GIL) for the object comparison methods, a custom key
function, or the sequence iterator's methods, then the the min()/max()
calls could overlap between threads. If you have additional threads
that could modify the list, you should synchronize access to it; if
any of the earlier-mentioned functions modify the list, you're likely
to get "mysterious" (or at least potentially unexpected) results even
in a single-threaded context.

For one time sequences like files and generators your code is broken
for obvious reasons.

s/sequence/iterable/

-Miles

John Nagle · Sep 17, 2009

Steven said:
I have two threads, one running min() and the other running max() over
the same list. I'm getting some mysterious results which I'm having
trouble debugging. Are min() and max() thread-safe, or am I doing
something fundamentally silly by having them walk over the same list
simultaneously?

My code is as follows. Is there anything obviously wrong with it?

You don't seem to be modifying the data while iterating over it,
so there's no real race condition issue.

All Python guarantees is that you can't break the underlying memory
model with threading. You're not guaranteed, for example,
that "min" is atomic. There's a short list of primitives whose
atomicity is guaranteed, at

http://effbot.org/zone/thread-synchronization.htm

In particular, some operations like appending to a list and
popping an element from a list are atomic, which is useful.

Be aware that CPython performance on multithreaded programs
really sucks on multicore CPUs. Not only does the GIL prevent
two CPUs from doing useful work at the same time, the locking
logic is horribly inefficient for multiprocessors. Adding an
additional CPU actually increases run time as the CPUs fight
over the lock.

So there's little point in doing this.

John Nagle

Paul McGuire · Sep 17, 2009

I have two threads, one running min() and the other running max() over
the same list. I'm getting some mysterious results which I'm having
trouble debugging. Are min() and max() thread-safe, or am I doing
something fundamentally silly by having them walk over the same list
simultaneously?

If you are calculating both min and max of a sequence, here is an
algorithm that can cut your comparisons by 25% - for objects with rich/
time-consuming comparisons, that can really add up.

import sys
if sys.version[0] == 2:
range = xrange

def minmax(seq):
if not seq:
return None, None
min_ = seq[0]
max_ = seq[0]
seqlen = len(seq)
start = seqlen % 2
for i in range(start,seqlen,2):
a,b = seq,seq[i+1]
if a > b:
a,b = b,a
if a < min_:
min_ = a
if b > max_:
max_ = b
return min_,max_

With this test code, I verified that the comparison count drops from
2*len to 1.5*len:

if __name__ == "__main__":

import sys
if sys.version[0] == 2:
range = xrange

import random

def minmax_bf(seq):
# brute force, just call min and max on sequence
return min(seq),max(seq)

testseq = [random.random() for i in range(1000000)]

print minmax_bf(testseq)
print minmax(testseq)

class ComparisonCounter(object):
tally = 0
def __init__(self,obj):
self.obj = obj
def __cmp__(self,other):
ComparisonCounter.tally += 1
return cmp(self.obj,other.obj)
def __getattr__(self,attr):
return getattr(self.obj, attr)
def __str__(self):
return str(self.obj)
def __repr__(self):
return repr(self.obj)

testseq = [ComparisonCounter(random.random()) for i in range
(10001)]

print minmax_bf(testseq)
print ComparisonCounter.tally
ComparisonCounter.tally = 0

print minmax(testseq)
print ComparisonCounter.tally

Plus, now that you are finding both min and max in a single pass
through the sequence, you can wrap this in a lock to make sure of the
atomicity of your results.

(Just for grins, I also tried sorting the list and returning elements
0 and -1 for min and max - I got numbers of comparisons in the range
of 12X to 15X the length of the sequence.)

-- Paul

Hendrik van Rooyen · Sep 17, 2009

I have two threads, one running min() and the other running max() over
the same list. I'm getting some mysterious results which I'm having
trouble debugging. Are min() and max() thread-safe, or am I doing
something fundamentally silly by having them walk over the same list
simultaneously?

My code is as follows. Is there anything obviously wrong with it?

Apart from the plethora of indirection, nothing I can see.

But there is something rotten. Going more basic, look at this:

hvr@Linuxbox:~/Junk> cat jjj.py
import thread as th
import time

a = range(10000000)

def worker(func,thing):
start_time = time.time()
print "start time is:",start_time
res = func(thing)
print res
end_time = time.time()
print "End at:",end_time,"That took:",end_time-start_time, 'seconds'

t1 = th.start_new_thread(worker,(min,a))
t2 = th.start_new_thread(worker,(max,a))

while True:
time.sleep(1)

hvr@Linuxbox:~/Junk> python -i jjj.py
start time is: 1253176132.64
0
End at: 1253176133.34 That took: 0.700681209564 seconds
start time is: 1253176133.34
9999999
End at: 1253176134.18 That took: 0.847566127777 seconds
Traceback (most recent call last):
File "jjj.py", line 18, in <module>
time.sleep(1)
KeyboardInterruptNo simultaneity.
So when I increase the range to a hundred million to try to force some
concurrency, I get:

hvr@Linuxbox:~/Junk> python -i jjj.py
Killed
hvr@Linuxbox:~/Junk>

The behaviour is that it just thrashes for some minutes and then it looks like
the os kills it.

SuSe Linux, dual core Intel with a gig of memory:

Python 2.5.1 (r251:54863, Dec 6 2008, 10:49:39)
[GCC 4.2.1 (SUSE Linux)] on linux2

I enclose the file.

- Hendrik

Carl Banks · Sep 17, 2009

Apart from the plethora of indirection, nothing I can see.

But there is something rotten. Going more basic, look at this:

hvr@Linuxbox:~/Junk> cat jjj.py
import thread as th
import time

a = range(10000000)

def worker(func,thing):
start_time = time.time()
print "start time is:",start_time
res = func(thing)
print res
end_time = time.time()
print "End at:",end_time,"That took:",end_time-start_time, 'seconds'

t1 = th.start_new_thread(worker,(min,a))
t2 = th.start_new_thread(worker,(max,a))

while True:
time.sleep(1)

hvr@Linuxbox:~/Junk> python -i jjj.py
start time is: 1253176132.64
0
End at: 1253176133.34 That took: 0.700681209564 seconds
start time is: 1253176133.34
9999999
End at: 1253176134.18 That took: 0.847566127777 seconds
Traceback (most recent call last):
File "jjj.py", line 18, in <module>
time.sleep(1)
KeyboardInterrupt

No simultaneity.

When running min or max on a list of ints, there is probably no
occasion for the function to release the GIL. If a thread doesn't
release the GIL no other Python threads can run. This behavior may be
rotten, but it's totally expected.

Try adding "key=lambda x:x" to the function call (which adds an
occasion where the GIL might be released); you will see that it will
switch threads.

Carl Banks

Carl Banks · Sep 17, 2009

def minmax(seq):
result = [None, None]
t1 = MMThread(seq, min, result, 0)
t2 = MMThread(seq, max, result, 1)
t1.start()
t2.start()
# Block until all threads are done.
while any([t1.isAlive(), t2.isAlive()]):
time.sleep(0)

Why not use "t1.join(); t2.join()" here? Is there any benefit to do
it this way instead?

Carl Banks

Hendrik van Rooyen · Sep 17, 2009

On Sep 17, 2:18 am, Hendrik van Rooyen <[email protected]>

wrote:

When running min or max on a list of ints, there is probably no
occasion for the function to release the GIL. If a thread doesn't
release the GIL no other Python threads can run. This behavior may be
rotten, but it's totally expected.

The lack of concurrency was not what I was complaining about, but the second
case that just quietly freaked out when I made the list of ints large.

I have subsequently seen that it is red herring in this context though, as it
is some memory problem - the crash comes during the list creation, and has
nothing to do with the min/max processing. One can demo it by simply trying
to create a big list at the interactive prompt, and after a while you get
the "Killed" message.

a = range(100000000) does it for me on my machine.

- Hendrik

ryles · Sep 17, 2009

def minmax(seq):
result = [None, None]
t1 = MMThread(seq, min, result, 0)
t2 = MMThread(seq, max, result, 1)
t1.start()
t2.start()
# Block until all threads are done.
while any([t1.isAlive(), t2.isAlive()]):
time.sleep(0)

Click to expand...

Why not use "t1.join(); t2.join()" here? Is there any benefit to do
it this way instead?

Carl Banks

The benefit is that SIGINT (Control-C) will be handled. Not sure why
sleep(0) is used, though, rather than something less busy like sleep
(0.5).

Carl Banks · Sep 17, 2009

I have subsequently seen that it is red herring in this context though, as it
is some memory problem - the crash comes during the list creation, and has
nothing to do with the min/max processing. One can demo it by simply trying
to create a big list at the interactive prompt, and after a while you get
the "Killed" message.

a = range(100000000) does it for me on my machine.

Ah, I see. Yeah, that is rotten behavior in the sense that we don't
have 128-bit machines with 200 terabytes of ram.

Give it a few
months, though

Carl Banks

Are range iterators thread safe?	3	Oct 20, 2011
querry on queue ( thread safe ) multithreading	1	Mar 11, 2014
builtin max() and weak ordering	1	Mar 3, 2011
psphere: how to make thread safe	3	Aug 22, 2012
os walk() and threads problems (os.walk are thread safe?)	7	Nov 13, 2007
stop thread from outside 'run'	0	Jun 5, 2012
[gtk+thread] Why worker thread never wakes from time.sleep()?	0	Jan 3, 2010
deque and thread-safety	0	Oct 12, 2012

Are min() and max() thread-safe?

Steven D'Aprano

Steven D'Aprano

Miles Kaufmann

Dj Gilcrease

Steven D'Aprano

Niklas Norrthon

Hrvoje Niksic

Miles Kaufmann

John Nagle

Paul McGuire

Hendrik van Rooyen

Carl Banks

Carl Banks

Hendrik van Rooyen

ryles

Carl Banks

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads