Which is faster?

cnb · Aug 30, 2008

For a big nbr of it might matter?
Is av_grade O(n*2) and the first O(n) when it comes to adding or is
"sum x for x in y" just traversing the list ones, accumulating the
values, it doesnt first build the list and then travese it for sum?

def averageGrade(self):
tot = 0
for review in self.reviews:
tot += review.grade
return tot / len(self.reviews)

def av_grade(self):
return sum(review.grade for review in self.reviews) / \
len(self.reviews)

Fredrik Lundh · Aug 30, 2008

cnb said:
For a big nbr of it might matter?
Is av_grade O(n*2) and the first O(n) when it comes to adding or is
"sum x for x in y" just traversing the list ones, accumulating the
values, it doesnt first build the list and then travese it for sum?

sum() doesn't build a list, but even if it would do that, it'd still be
O(n) -- looping over an array twice doesn't change the time complexity.

to find out which one's actually faster, benchmark them.

</F>

Gabriel Genellina · Aug 30, 2008

En Sat said:
For a big nbr of it might matter?
Is av_grade O(n*2) and the first O(n) when it comes to adding or is
"sum x for x in y" just traversing the list ones, accumulating the
values, it doesnt first build the list and then travese it for sum?

AFAIK both versions are O(n).
sum(x for x in y) contains a "generator expression" [1], it does not
create an intermediate list.
sum([x for x in y]) contains a "list comprehension" [2] and it does create
an intermediate list.

I'd say the version using sum() should be faster, because the iteration is
implemented inside the interpreter, not in Python code as the other one.
But you should actually measure their relative performance; use the timeit
module for that.

def averageGrade(self):
tot = 0
for review in self.reviews:
tot += review.grade
return tot / len(self.reviews)

def av_grade(self):
return sum(review.grade for review in self.reviews) / \
len(self.reviews)

[1] http://docs.python.org/tut/node11.html#SECTION00111100000000000000000
[2] http://docs.python.org/tut/node7.html

Steven D'Aprano · Aug 30, 2008

def averageGrade(self):
tot = 0
for review in self.reviews:
tot += review.grade
return tot / len(self.reviews)

def av_grade(self):
return sum(review.grade for review in self.reviews) / \
len(self.reviews)

Re-writing the functions so they can be tested alone:

def averageGrade(alist):
tot = 0.0
for x in alist:
tot += x
return tot/len(alist)

def av_grade(alist):
return sum(alist)/len(alist)

.... 'from __main__ import alist, averageGrade').repeat(number=100000)
[3.9559240341186523, 3.4910569190979004, 3.4856188297271729].... 'from __main__ import alist, av_grade').repeat(number=100000)
[2.0255107879638672, 1.0968310832977295, 1.0733180046081543]

The version with sum() is much faster. How about with lots of data?
.... 'from __main__ import alist, averageGrade').repeat(number=50)
[17.699107885360718, 18.182793140411377, 18.651514053344727].... 'from __main__ import alist, av_grade').repeat(number=50)
[17.125216007232666, 15.72636890411377, 16.309713840484619]

sum() is still a little faster.

Gabriel Genellina · Aug 30, 2008

En Sat, 30 Aug 2008 03:15:30 -0300, Steven D'Aprano

def averageGrade(self):
tot = 0
for review in self.reviews:
tot += review.grade
return tot / len(self.reviews)

def av_grade(self):
return sum(review.grade for review in self.reviews) / \
len(self.reviews)

Click to expand...

Re-writing the functions so they can be tested alone:

def averageGrade(alist):
tot = 0.0
for x in alist:
tot += x
return tot/len(alist)

def av_grade(alist):
return sum(alist)/len(alist)

... 'from __main__ import alist, averageGrade').repeat(number=100000)
[3.9559240341186523, 3.4910569190979004, 3.4856188297271729]... 'from __main__ import alist, av_grade').repeat(number=100000)
[2.0255107879638672, 1.0968310832977295, 1.0733180046081543]

The version with sum() is much faster. How about with lots of data?
... 'from __main__ import alist, averageGrade').repeat(number=50)
[17.699107885360718, 18.182793140411377, 18.651514053344727]... 'from __main__ import alist, av_grade').repeat(number=50)
[17.125216007232666, 15.72636890411377, 16.309713840484619]

sum() is still a little faster.

Mmm, in this last test you're measuring the long integer operations
performance (because the sum exceeds largely what can be represented in a
plain integer). Long integers are so slow that the difference between both
loops becomes negligible.

I've tried again using float values:
alist = [float(x) for x in xrange(1000000)]
and got consistent results for any input size (the version using sum() is
about twice as fast as the for loop)

cnb · Aug 30, 2008

how does doing something twice not change complexity? yes it maybe
belongs to the same complexity-class but is still twice as slow no?

Fredrik Lundh · Aug 30, 2008

cnb said:
how does doing something twice not change complexity? yes it maybe
belongs to the same complexity-class but is still twice as slow no?

doing two things quickly can still be faster than doing one thing slowly.

</F>

Diez B. Roggisch · Aug 30, 2008

cnb said:
how does doing something twice not change complexity? yes it maybe
belongs to the same complexity-class but is still twice as slow no?

Because big O notation is not about constant factors. Or even subterms
with lower powers.

http://en.wikipedia.org/wiki/Big_O_notation

Diez

Lie · Aug 30, 2008

how does doing something twice not change complexity? yes it maybe
belongs to the same complexity-class but is still twice as slow no?

Who is doing something twice? Definitely not sum().

sum() does not create intermediate list, and if you pass generator
expressions in it you wouldn't make any intermediate list at all, thus
simple looping but in interpreter code. sum() that is passed a list
comprehension should be faster for extremely small numbers of values
to sum, but we don't care about small things, do we? But using
intermediate list should be fast enough for even large numbers of
values, the time when it can't cope anymore would be when the
intermediate list takes half of your memory (how often is that for
regular applications?).

Fredrik Lundh · Aug 30, 2008

Lie said:
Who is doing something twice? Definitely not sum().

nobody's claiming that -- but as mentioned above, even if sum() had done
two passes over the source sequence, it'd still be O(n).

</F>

Raymond Hettinger · Aug 30, 2008

def av_grade(self):
return sum(review.grade for review in self.reviews) / \
len(self.reviews)

Minor point. Consider making the divisor: float(len(self.reviews)).
It would be a bummer to throw-off the average because of floor
division.

Also consider an itertools approach:
sum(itertools.imap(operator.itemgetter('review'), self.reviews))

BTW, the sum() in Py2.6 is *much* faster than before. It should run
circles around any other approach.

As Effbot says, if you really care about speed, then just time both
approaches. However, it's a good run of thumb that builtins are hard
to beat by simulating them in pure python (unless you're using Psyco
as an optimizer).

Raymond

Steven D'Aprano · Aug 31, 2008

Mmm, in this last test you're measuring the long integer operations
performance (because the sum exceeds largely what can be represented in
a plain integer). Long integers are so slow that the difference between
both loops becomes negligible.

Nicely caught, and thank you for pointing that out.

return reduce(lambda x, y: x.grade+y.grade, self.reviews)	1	Aug 29, 2008
Using class-attribute as key to sort?	3	Aug 29, 2008
Range / empty list issues??	1	Dec 11, 2023
why memoizing is faster	0	Mar 24, 2011
I Need Fix In Code	1	Apr 12, 2023
Why is this so much faster?	3	Jun 2, 2011
SENTINEL CONTROL LOOP WHEN DEALING WITH TWO ARRAYS	1	Oct 26, 2023
C program: memory leak/ segmentation fault/ memory limit exceeded	0	Nov 12, 2022

Which is faster?

cnb

Fredrik Lundh

Gabriel Genellina

Steven D'Aprano

Gabriel Genellina

cnb

Fredrik Lundh

Diez B. Roggisch

Lie

Fredrik Lundh

Raymond Hettinger

Steven D'Aprano

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads