maximum() efficency

S

Steve R. Hastings

I was looking at a Python function to find the maximum from a list.
The original was more complicated; I simplified it. The built-in max()
function can replace the simplified example, but not the original.


def maximum(lst):
maxval = lst[0]

for i in xrange(1, len(lst)):
v = lst
if maxval > v:
maxval = v

return maxval



Immediately I started thinking about ways to make this more efficient. My
first thought was to use iterators:


def maximum(lst):
itr = iter(lst)
maxval = itr.next()

for v in itr:
if maxval > v:
maxval = v

return maxval



Then I thought, I wonder if there is a faster solution that *doesn't* use
iterators, for use with older versions of Python. I came up with:


def maximum(lst):
a = []
for v in lst:
try:
if a[0] > v:
a[0] = v
except IndexError:
a.append(v)
return a[0]



Of course, it's a little ugly to use a[0] instead of "maxval". And I'm
trying not to index a list, and here I am indexing another list. So:


def maximum(lst):
for v in lst:
try:
if maxval > v:
maxval = v
except NameError:
maxval = v
return maxval



And we come at last to my actual question: Is this guaranteed to work on
all versions of Python?

In my testing on my computer, the above function works perfectly. Even
if you do this:
maxval = 100
print maximum(cmp, [2, 0, 3, 1])
3

In other words, you get a NameError in the function even if there is a
global variable with that name. That's with Python 2.4, however... is
this function guaranteed to work in all Python versions? I am very
comfortable with the a[0] version, since I explicitly set a to an empty
list, I know a[0] will always raise that IndexError.


P.S. I went ahead and benchmarked the above functions, plus one more
that is similar to the a[0] solution but used an empty dictionary instead
of a zero-length list. I created a list of a million zeroes, and then
stored a 1 at the beginning of the list. Thus the maximum functions would
spend all their time iterating through values and comparing them, and very
little time updating maxval or its equivalent. Times to run each function
100 times on the large list:

36.8 seconds xrange
22.3 seconds iterator
39.2 seconds IndexError on a[0]
31.5 seconds NameError with maxval
43.4 seconds KeyError on empty dictionary


The conclusions I draw from these numbers:

The sneaky tricks with exceptions don't bring enough speed to be worth
using; two of them were actually slower than the xrange solution. Even
the NameError one was only a little bit faster, and simple is better than
complex, so I don't think I'd ever actually use it.

The clear winner was the iterator version. It was much faster than the
others, and in my opinion it is simpler and easier to understand than any
of the others.
 
S

Steven Bethard

Steve said:
I was looking at a Python function to find the maximum from a list.
The original was more complicated; I simplified it. The built-in max()
function can replace the simplified example, but not the original.

What's the original? Are you sure max can't solve it with an
appropriate decorate-sort-undecorate (DSU) or the key= argument coming
in Python 2.5?

STeVe
 
M

Mitja Trampus

Steve said:
I was looking at a Python function to find the maximum from a list.
The original was more complicated; I simplified it. The built-in max()
function can replace the simplified example, but not the original.

But you forgot to shuw us the original...

[snip several implementations]
Of course, it's a little ugly to use a[0] instead of "maxval". And I'm
trying not to index a list, and here I am indexing another list. So:

def maximum(lst):
for v in lst:
try:
if maxval > v:
maxval = v
except NameError:
maxval = v
return maxval

In my testing on my computer, the above function works perfectly. Even
if you do this:
maxval = 100
print maximum(cmp, [2, 0, 3, 1])
3

In other words, you get a NameError in the function even if there is a
global variable with that name. That's with Python 2.4, however... is
this function guaranteed to work in all Python versions?

Not 100% sure, but I think yes. People with longer experience in python can answer
definitely. Or you can try it out yourself.
The reason it works is that this:
def f():
print a
a=7; f()
works and prints 7 (global a is used), while this
def f():
print a
a = 12
a=7; f()
does NOT work. Python notices that a is going to get assigned to inside f and treats it as
a local variable. This is also the case with your code. Try out the above examples without
a=7 and notice the different error messages...
The clear winner was the iterator version. It was much faster than the
others, and in my opinion it is simpler and easier to understand than any
of the others.

I would have done it in the same way, but probably without the iterators. I.e., like this:

def maximum(lst):
try: m = lst[0]
except (TypeError, IndexError): raise Exception "Non-sequence or empty sequence given to
maximum")

# (you forgot the above sanity checks in your code.
# not strictly necessary, but nice to have.)

for x in lst:
if x>m: m=x
return m
 
P

Paul McGuire

Steve R. Hastings said:
I was looking at a Python function to find the maximum from a list.
The original was more complicated; I simplified it. The built-in max()
function can replace the simplified example, but not the original.
If you are interested in both the min and max values, here is an algorithm
that performs only 3 comparisons for every 2 list items, instead of the
brute force method's 4 comparisons. This walks the input list in pairs,
compares the two items to each other, then compares the lesser with the
current min and the greater with the current max.

def minMax(L):
lenL = len(L)
if lenL == 0:
return None,None
if lenL == 1:
return L[0],L[0]

min_ = max_ = L[0]

if lenL % 2:
i = 1
else:
i = 0
while i < lenL:
a,b = L,L[i+1]
if a > b:
a,b = b,a
if a < min_: min_ = a
if b > max_: max_ = b
i += 2

return min_,max_

Of course, this much Python bytecode is not near as fast as simply calling
the builtins min() and max(). But, if you add psyco to the mix, things
aren't so cut-and-dried. I tested this method using a list of
randomly-generated strings, and after the list length exceeds 100-500 or so,
minMax starts to outperform the compiled min() and max(). The following
table shows the timing for the brute force min() and max() calls, followed
by minMax():

List length 1: 0.0000229 0.0000612
List length 2: 0.0000056 0.0001081
List length 10: 0.0000059 0.0000707
List length 100: 0.0000154 0.0000087
List length 500: 0.0000589 0.0000534
List length 1000: 0.0001176 0.0000670
List length 10000: 0.0011485 0.0008954
List length 100000: 0.0126720 0.0077379

Using the OP's test of 1 million zeros with the first zero changed to a 1,
here is the performance of minMax vs. min() and max():

List length 1000000: 0.1235953 0.0126896

minMax is 10X faster! Ironically, it's faster calling minMax() than either
min() or max().

If your list contains objects that are expensive to compare, the crossover
list length may be much shorter.

-- Paul
 
S

Steve R. Hastings

What's the original?



def minimum(cmp, lst):
"""minimum(cmp, lst)

Returns the minimal element in non-empty list LST with elements
compared via CMP() which should return values with the same semantics
as Python's cmp(). If there are several minimal elements, the last
one is returned.
"""

if not lst:
raise ValueError, 'empty list'

minval = lst[0]

for i in xrange(1, len(lst)):
v = lst
if cmp(minval, v) > 0:
minval = v

return minval



This is from Google's "goopy" package.

http://goog-goopy.sourceforge.net/
 
S

Steve R. Hastings

I would have done it in the same way, but probably without the iterators. I.e., like this:

def maximum(lst):
try: m = lst[0]
except (TypeError, IndexError): raise Exception "Non-sequence or empty sequence given to
maximum")

# (you forgot the above sanity checks in your code.
# not strictly necessary, but nice to have.)

for x in lst:
if x>m: m=x
return m

I left out the original sanity check, because I just wanted a streamlined
simple example.


I had a nagging feeling that I was missing something simple, and you have
put your finger on it. That's perfect! It's simple, it's clear, and it
will work on any version of Python. Thanks!
 
S

Steve R. Hastings

Actually, now that I think about it, the version using iter() has one
advantage over your version: it will work correctly if passed either a
list or an iterator. So, for versions of Python that have iterators, I'd
use the iter() version, but for older versions of Python, I'd use your
version.

P.S. I benchmarked your version; it ran in 22.0 seconds, just a gnat's
whisker faster than the iter() version.
 
P

Paul McGuire

Steve R. Hastings said:
What's the original?



def minimum(cmp, lst):
"""minimum(cmp, lst)

Returns the minimal element in non-empty list LST with elements
compared via CMP() which should return values with the same semantics
as Python's cmp(). If there are several minimal elements, the last
one is returned.
"""

if not lst:
raise ValueError, 'empty list'

minval = lst[0]

for i in xrange(1, len(lst)):
v = lst
if cmp(minval, v) > 0:
minval = v

return minval



This is from Google's "goopy" package.

http://goog-goopy.sourceforge.net/


The doc string is not correct. If there are several minimal elements, the
*first* one is returned. See this example:

import math

class Vector(object):
def __init__(self,*x):
# x is list of n-dimensional coordinates
self.x = x

def magnitude(self):
return math.sqrt(sum(x**2 for x in self.x))

def __cmp__(self,other):
return cmp(self.magnitude(), other.magnitude())

def __str__(self):
return "Vector(%s)" % ",".join(str(x) for x in self.x)

a = Vector(0,1)
b = Vector(0,0)
c = Vector(1,0)

# these print 1, -1, and 0 to show that cmp() and Vector are doing the right
thing
print cmp(a,b)
print cmp(b,a)
print cmp(a,c)

print min(a,b) # gives Vector(0,0), or b
print max(a,b,c) # gives Vector(0,1) or a
print min(a,c) # gives Vector(0,1) or a

print minimum(Vector.__cmp__, [a,b]) # gives Vector(0,0), or b
print minimum(Vector.__cmp__, [a,c]) # gives Vector(0,1) or a


-- Paul
 
S

Steven Bethard

Steve said:
What's the original?

def minimum(cmp, lst):
"""minimum(cmp, lst)

Returns the minimal element in non-empty list LST with elements
compared via CMP() which should return values with the same semantics
as Python's cmp(). If there are several minimal elements, the last
one is returned.
"""

if not lst:
raise ValueError, 'empty list'

minval = lst[0]

for i in xrange(1, len(lst)):
v = lst
if cmp(minval, v) > 0:
minval = v

return minval

This is from Google's "goopy" package.

http://goog-goopy.sourceforge.net/


Ahh. Yeah, nasty cmp= arguments certainly do screw that up.

If you're not just trying to implement a particular API, and you
actually have a use-case you think you need this for, you might consider
a key= argument instead of a cmp= argument. It'll probably be at least
as fast, and Python 2.5 will have one on both min() and max(). It'll
work something like:
http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/389659


STeVe
 
S

Steve R. Hastings

The doc string is not correct.

The fault is mine, I'm afraid. I had thought I was copying that from an
intact original version, but I must have edited it. I don't remember
doing it, but I must have.

To check, I went to Sourceforge and downloaded goopy again. Here, for
the record, is the original Google maximum():



def maximum(cmp, lst):
"""maximum(cmp, lst)

Returns the maximal element in non-empty list LST with elements
compared via CMP() which should return values with the same semantics
as Python's cmp(). If there are several maximal elements, the last
one is returned.
"""

if not lst:
raise ValueError, 'empty list'

maxval = lst[0]

for i in xrange(1, len(lst)):
v = lst
if cmp(maxval, v) <= 0:
maxval = v

return maxval



I apologize for the error.

P.S. I'm happy to see that Google uses "lst" for a list. I do, too. I
don't much like the convention of using "L" for a list, but I agree with
Guido that using "l" is bad (looks too much like a one: "1"), so "lst" it
is.

I would have put the "cmp" argument second, and made it default to the
built-in Python "cmp" function if not specified. Oh, well.
 
A

Arne Ludwig

Just for completeness: The functions in Steve's original post named
maximum calculate the minimum.

Also, timing-wise, on my machine with a random list of 200000 integers
Steve's iteration version and Mitja's version are about equal, the
system built-in is equal or slightly slower, and Paul's version about
3-4x slower. If the comparison function is very complex, the mileage
may vary of course.
 
P

Paul McGuire

Arne Ludwig said:
Just for completeness: The functions in Steve's original post named
maximum calculate the minimum.

Also, timing-wise, on my machine with a random list of 200000 integers
Steve's iteration version and Mitja's version are about equal, the
system built-in is equal or slightly slower, and Paul's version about
3-4x slower. If the comparison function is very complex, the mileage
may vary of course.

Arne -

The version I posted (which is not mine by the way, but I've lost the
original citation) becomes much more competitive when run with psyco. With
your list of 200,000 integers, I believe it will outpace even the C library
built-ins.

-- Paul
 
S

Steve R. Hastings

Just for completeness: The functions in Steve's original post named
maximum calculate the minimum.

Er, yes. I benchmarked them but clearly I should have function-tested
them as well.



Here is, I hope, my last word on maximum(). It figures out whether it got
an iterator or a list, and does the right thing. I believe it will
compile and run correctly whether your version of Python has iterators or
not. And, it doesn't compute the minimum.



def maximum(cmp, seq):
"""
Return maximum value of seq, as chosen by provided cmp() function.

If seq contains multiple maximal values, return the last one.

Arguments:
cmp -- function reference; function should return values with the
same semantics as Python's cmp() function.
seq -- any sequence (a list or an iterator). If seq is empty, this
function will raise ValueError.
"""
# is it a list?
if hasattr(seq, "__getitem__"):
if len(seq) == 0:
raise ValueError, "seq must contain at least 1 element"

maxval = seq[0]
for v in seq:
if cmp(maxval, v) <= 0:
maxval = v
return maxval


# is it an iterator?
if hasattr(seq, "__iter__") and hasattr(seq, "next"):
try:
maxval = seq.next()
except StopIteration:
raise ValueError, "seq must contain at least 1 element"

for v in seq:
if cmp(maxval, v) <= 0:
maxval = v
return maxval


raise TypeError, "seq must be a list or an iterator"
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,904
Latest member
HealthyVisionsCBDPrice

Latest Threads

Top