Which is faster?

Discussion in 'Python' started by cnb, Aug 30, 2008.

  1. cnb

    cnb Guest

    For a big nbr of it might matter?
    Is av_grade O(n*2) and the first O(n) when it comes to adding or is
    "sum x for x in y" just traversing the list ones, accumulating the
    values, it doesnt first build the list and then travese it for sum?

    def averageGrade(self):
    tot = 0
    for review in self.reviews:
    tot += review.grade
    return tot / len(self.reviews)

    def av_grade(self):
    return sum(review.grade for review in self.reviews) / \
    len(self.reviews)
     
    cnb, Aug 30, 2008
    #1
    1. Advertising

  2. cnb wrote:

    > For a big nbr of it might matter?
    > Is av_grade O(n*2) and the first O(n) when it comes to adding or is
    > "sum x for x in y" just traversing the list ones, accumulating the
    > values, it doesnt first build the list and then travese it for sum?


    sum() doesn't build a list, but even if it would do that, it'd still be
    O(n) -- looping over an array twice doesn't change the time complexity.

    to find out which one's actually faster, benchmark them.

    </F>
     
    Fredrik Lundh, Aug 30, 2008
    #2
    1. Advertising

  3. En Sat, 30 Aug 2008 01:26:35 -0300, cnb <> escribi�:

    > For a big nbr of it might matter?
    > Is av_grade O(n*2) and the first O(n) when it comes to adding or is
    > "sum x for x in y" just traversing the list ones, accumulating the
    > values, it doesnt first build the list and then travese it for sum?


    AFAIK both versions are O(n).
    sum(x for x in y) contains a "generator expression" [1], it does not
    create an intermediate list.
    sum([x for x in y]) contains a "list comprehension" [2] and it does create
    an intermediate list.

    I'd say the version using sum() should be faster, because the iteration is
    implemented inside the interpreter, not in Python code as the other one.
    But you should actually measure their relative performance; use the timeit
    module for that.

    > def averageGrade(self):
    > tot = 0
    > for review in self.reviews:
    > tot += review.grade
    > return tot / len(self.reviews)
    >
    > def av_grade(self):
    > return sum(review.grade for review in self.reviews) / \
    > len(self.reviews)


    [1] http://docs.python.org/tut/node11.html#SECTION00111100000000000000000
    [2] http://docs.python.org/tut/node7.html
    --
    Gabriel Genellina
     
    Gabriel Genellina, Aug 30, 2008
    #3
  4. On Fri, 29 Aug 2008 21:26:35 -0700, cnb wrote:

    > def averageGrade(self):
    > tot = 0
    > for review in self.reviews:
    > tot += review.grade
    > return tot / len(self.reviews)
    >
    > def av_grade(self):
    > return sum(review.grade for review in self.reviews) / \
    > len(self.reviews)


    Re-writing the functions so they can be tested alone:

    def averageGrade(alist):
    tot = 0.0
    for x in alist:
    tot += x
    return tot/len(alist)


    def av_grade(alist):
    return sum(alist)/len(alist)


    >>> from timeit import Timer
    >>> # small amount of items

    .... alist = range(100)
    >>> Timer('averageGrade(alist)',

    .... 'from __main__ import alist, averageGrade').repeat(number=100000)
    [3.9559240341186523, 3.4910569190979004, 3.4856188297271729]
    >>>
    >>> Timer('av_grade(alist)',

    .... 'from __main__ import alist, av_grade').repeat(number=100000)
    [2.0255107879638672, 1.0968310832977295, 1.0733180046081543]


    The version with sum() is much faster. How about with lots of data?

    >>> alist = xrange(1000000)
    >>> Timer('averageGrade(alist)',

    .... 'from __main__ import alist, averageGrade').repeat(number=50)
    [17.699107885360718, 18.182793140411377, 18.651514053344727]
    >>>
    >>> Timer('av_grade(alist)',

    .... 'from __main__ import alist, av_grade').repeat(number=50)
    [17.125216007232666, 15.72636890411377, 16.309713840484619]

    sum() is still a little faster.



    --
    Steven
     
    Steven D'Aprano, Aug 30, 2008
    #4
  5. En Sat, 30 Aug 2008 03:15:30 -0300, Steven D'Aprano
    <> escribi�:

    > On Fri, 29 Aug 2008 21:26:35 -0700, cnb wrote:
    >
    >> def averageGrade(self):
    >> tot = 0
    >> for review in self.reviews:
    >> tot += review.grade
    >> return tot / len(self.reviews)
    >>
    >> def av_grade(self):
    >> return sum(review.grade for review in self.reviews) / \
    >> len(self.reviews)

    >
    > Re-writing the functions so they can be tested alone:
    >
    > def averageGrade(alist):
    > tot = 0.0
    > for x in alist:
    > tot += x
    > return tot/len(alist)
    >
    >
    > def av_grade(alist):
    > return sum(alist)/len(alist)
    >
    >
    >>>> from timeit import Timer
    >>>> # small amount of items

    > ... alist = range(100)
    >>>> Timer('averageGrade(alist)',

    > ... 'from __main__ import alist, averageGrade').repeat(number=100000)
    > [3.9559240341186523, 3.4910569190979004, 3.4856188297271729]
    >>>>
    >>>> Timer('av_grade(alist)',

    > ... 'from __main__ import alist, av_grade').repeat(number=100000)
    > [2.0255107879638672, 1.0968310832977295, 1.0733180046081543]
    >
    >
    > The version with sum() is much faster. How about with lots of data?
    >
    >>>> alist = xrange(1000000)
    >>>> Timer('averageGrade(alist)',

    > ... 'from __main__ import alist, averageGrade').repeat(number=50)
    > [17.699107885360718, 18.182793140411377, 18.651514053344727]
    >>>>
    >>>> Timer('av_grade(alist)',

    > ... 'from __main__ import alist, av_grade').repeat(number=50)
    > [17.125216007232666, 15.72636890411377, 16.309713840484619]
    >
    > sum() is still a little faster.


    Mmm, in this last test you're measuring the long integer operations
    performance (because the sum exceeds largely what can be represented in a
    plain integer). Long integers are so slow that the difference between both
    loops becomes negligible.

    I've tried again using float values:
    alist = [float(x) for x in xrange(1000000)]
    and got consistent results for any input size (the version using sum() is
    about twice as fast as the for loop)

    --
    Gabriel Genellina
     
    Gabriel Genellina, Aug 30, 2008
    #5
  6. cnb

    cnb Guest

    how does doing something twice not change complexity? yes it maybe
    belongs to the same complexity-class but is still twice as slow no?
     
    cnb, Aug 30, 2008
    #6
  7. cnb wrote:

    > how does doing something twice not change complexity? yes it maybe
    > belongs to the same complexity-class but is still twice as slow no?


    doing two things quickly can still be faster than doing one thing slowly.

    </F>
     
    Fredrik Lundh, Aug 30, 2008
    #7
  8. cnb schrieb:
    > how does doing something twice not change complexity? yes it maybe
    > belongs to the same complexity-class but is still twice as slow no?


    Because big O notation is not about constant factors. Or even subterms
    with lower powers.

    http://en.wikipedia.org/wiki/Big_O_notation

    Diez
     
    Diez B. Roggisch, Aug 30, 2008
    #8
  9. cnb

    Lie Guest

    On Aug 30, 5:30 pm, cnb <> wrote:
    > how does doing something twice not change complexity? yes it maybe
    > belongs to the same complexity-class but is still twice as slow no?


    Who is doing something twice? Definitely not sum().

    sum() does not create intermediate list, and if you pass generator
    expressions in it you wouldn't make any intermediate list at all, thus
    simple looping but in interpreter code. sum() that is passed a list
    comprehension should be faster for extremely small numbers of values
    to sum, but we don't care about small things, do we? But using
    intermediate list should be fast enough for even large numbers of
    values, the time when it can't cope anymore would be when the
    intermediate list takes half of your memory (how often is that for
    regular applications?).
     
    Lie, Aug 30, 2008
    #9
  10. Lie wrote:

    >> how does doing something twice not change complexity? yes it maybe
    >> belongs to the same complexity-class but is still twice as slow no?

    >
    > Who is doing something twice? Definitely not sum().


    nobody's claiming that -- but as mentioned above, even if sum() had done
    two passes over the source sequence, it'd still be O(n).

    </F>
     
    Fredrik Lundh, Aug 30, 2008
    #10
  11. On Aug 29, 9:26 pm, cnb <> wrote:
    > def av_grade(self):
    >      return sum(review.grade for review in self.reviews) / \
    >               len(self.reviews)


    Minor point. Consider making the divisor: float(len(self.reviews)).
    It would be a bummer to throw-off the average because of floor
    division.

    Also consider an itertools approach:
    sum(itertools.imap(operator.itemgetter('review'), self.reviews))

    BTW, the sum() in Py2.6 is *much* faster than before. It should run
    circles around any other approach.

    As Effbot says, if you really care about speed, then just time both
    approaches. However, it's a good run of thumb that builtins are hard
    to beat by simulating them in pure python (unless you're using Psyco
    as an optimizer).


    Raymond
     
    Raymond Hettinger, Aug 30, 2008
    #11
  12. On Sat, 30 Aug 2008 04:11:33 -0300, Gabriel Genellina wrote:

    > Mmm, in this last test you're measuring the long integer operations
    > performance (because the sum exceeds largely what can be represented in
    > a plain integer). Long integers are so slow that the difference between
    > both loops becomes negligible.


    Nicely caught, and thank you for pointing that out.


    --
    Steven
     
    Steven D'Aprano, Aug 31, 2008
    #12
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Weng Tianxiang
    Replies:
    12
    Views:
    1,655
  2. Andreas Klemt
    Replies:
    1
    Views:
    461
    Steve C. Orr, MCSD
    Jul 23, 2003
  3. S. Justin Gengo
    Replies:
    2
    Views:
    367
    S. Justin Gengo
    Aug 20, 2003
  4. Bob
    Replies:
    1
    Views:
    2,705
  5. luca

    which one is faster?

    luca, Mar 7, 2004, in forum: Java
    Replies:
    1
    Views:
    334
    William Brogden
    Mar 8, 2004
Loading...

Share This Page