sum for sequences?

Discussion in 'Python' started by kj, Mar 24, 2010.

  1. kj

    kj Guest

    Is there a sequence-oriented equivalent to the sum built-in? E.g.:

    seq_sum(((1, 2), (5, 6))) --> (1, 2) + (5, 6) --> (1, 2, 5, 6)

    ?

    (By "sequence" I'm referring primarily to lists and tuples, and
    excluding strings, since for these there is ''.join()).

    TIA!

    ~K
     
    kj, Mar 24, 2010
    #1
    1. Advertising

  2. kj

    Glazner Guest

    On Mar 24, 5:29 pm, kj <> wrote:
    > Is there a sequence-oriented equivalent to the sum built-in?  E.g.:
    >
    >   seq_sum(((1, 2), (5, 6))) --> (1, 2) + (5, 6) --> (1, 2, 5, 6)
    >
    > ?
    >
    > (By "sequence" I'm referring primarily to lists and tuples, and
    > excluding strings, since for these there is ''.join()).
    >
    > TIA!
    >
    > ~K


    try itertools.chain
     
    Glazner, Mar 24, 2010
    #2
    1. Advertising

  3. kj

    Neil Cerutti Guest

    On 2010-03-24, kj <> wrote:
    >
    >
    > Is there a sequence-oriented equivalent to the sum built-in? E.g.:
    >
    > seq_sum(((1, 2), (5, 6))) --> (1, 2) + (5, 6) --> (1, 2, 5, 6)
    >
    > ?
    >
    > (By "sequence" I'm referring primarily to lists and tuples, and
    > excluding strings, since for these there is ''.join()).


    reduce, or functools.reduce in Python 3.1.

    >>> functools.reduce(operator.add, ((1, 2), (5, 6)))

    (1, 2, 5, 6)

    --
    Neil Cerutti
    "It's not fun to build walls. But it's even less fun to live
    without walls in a world full of zombies." --Greedy Goblin
     
    Neil Cerutti, Mar 24, 2010
    #3
  4. kj

    Steve Holden Guest

    kj wrote:
    >
    > Is there a sequence-oriented equivalent to the sum built-in? E.g.:
    >
    > seq_sum(((1, 2), (5, 6))) --> (1, 2) + (5, 6) --> (1, 2, 5, 6)
    >
    > ?
    >
    > (By "sequence" I'm referring primarily to lists and tuples, and
    > excluding strings, since for these there is ''.join()).
    >

    Do you mean you want to flatten a list structure? There have been
    several discussions about this, which Google will find for you quite easily.

    regards
    Steve
    --
    Steve Holden +1 571 484 6266 +1 800 494 3119
    See PyCon Talks from Atlanta 2010 http://pycon.blip.tv/
    Holden Web LLC http://www.holdenweb.com/
    UPCOMING EVENTS: http://holdenweb.eventbrite.com/
     
    Steve Holden, Mar 24, 2010
    #4
  5. On Wed, 24 Mar 2010 15:29:07 +0000, kj wrote:

    > Is there a sequence-oriented equivalent to the sum built-in? E.g.:
    >
    > seq_sum(((1, 2), (5, 6))) --> (1, 2) + (5, 6) --> (1, 2, 5, 6)
    >
    > ?


    Yes, sum.

    help(sum) is your friend.

    >>> a = range(2)
    >>> b = range(3)
    >>> c = range(4)
    >>> sum((a, b, c), [])

    [0, 1, 0, 1, 2, 0, 1, 2, 3]


    Beware though that sum on lists and tuples will be fairly inefficient if
    you have lots of them. You may find that this will be much more efficient:

    result = []
    for seq in sequences:
    result.extend(seq)



    --
    Steven
     
    Steven D'Aprano, Mar 24, 2010
    #5
  6. kj

    Paul Rubin Guest

    kj <> writes:
    > Is there a sequence-oriented equivalent to the sum built-in? E.g.:
    > seq_sum(((1, 2), (5, 6))) --> (1, 2) + (5, 6) --> (1, 2, 5, 6)


    use itertools.chain for this. A few people have mentioned that sum will
    also work, but I think for that purpose it could have O(n**2)
    complexity.
     
    Paul Rubin, Mar 24, 2010
    #6
  7. kj

    TomF Guest

    On 2010-03-24 14:07:24 -0700, Steven D'Aprano
    <> said:
    > On Wed, 24 Mar 2010 15:29:07 +0000, kj wrote:
    >
    >> Is there a sequence-oriented equivalent to the sum built-in? E.g.:
    >>
    >> seq_sum(((1, 2), (5, 6))) --> (1, 2) + (5, 6) --> (1, 2, 5, 6)
    >>
    >> ?

    >
    > Yes, sum.
    >
    > help(sum) is your friend.


    You might not want to be so glib. The sum doc sure doesn't sound like
    it should work on lists.

    Returns the sum of a sequence of numbers (NOT strings) plus the value
    of parameter 'start' (which defaults to 0).

    -Tom
     
    TomF, Mar 25, 2010
    #7
  8. On Wed, 24 Mar 2010 23:50:23 -0700, TomF wrote:

    > On 2010-03-24 14:07:24 -0700, Steven D'Aprano
    > <> said:
    >> On Wed, 24 Mar 2010 15:29:07 +0000, kj wrote:
    >>
    >>> Is there a sequence-oriented equivalent to the sum built-in? E.g.:
    >>>
    >>> seq_sum(((1, 2), (5, 6))) --> (1, 2) + (5, 6) --> (1, 2, 5, 6)
    >>>
    >>> ?

    >>
    >> Yes, sum.
    >>
    >> help(sum) is your friend.

    >
    > You might not want to be so glib. The sum doc sure doesn't sound like
    > it should work on lists.
    >
    > Returns the sum of a sequence of numbers (NOT strings) plus the
    > value of parameter 'start' (which defaults to 0).



    What part of that suggested to you that sum might not be polymorphic?
    Sure, it says numbers (which should be changed, in my opinion), but it
    doesn't specify what sort of numbers -- ints, floats, or custom types
    that have an __add__ method. It also singles out strings as excluded. Why
    would you need to explicitly exclude strings, since they're not numbers,
    if sum *only* works with numbers?

    E.g. help(math.sin) could have said this, but doesn't:

    Return the sine of x (NOT a dictionary)

    It doesn't need to, because dicts aren't exceptional: sin doesn't work on
    anything *but* numbers. There's no __sin__ method to call on arbitrary
    types.

    The fact that sum does single out strings is a clear sign that strings
    are treated as exceptional and suggests strongly that summing arbitrary
    types should work. I'm not saying that help(sum) explicitly states that
    it works with lists (it clearly doesn't), but it does suggest the
    possibility and makes the experiment worth trying.

    I'll also note that the Fine Manual makes it even more clear that sum is
    polymorphic:

    http://docs.python.org/library/functions.html#sum




    --
    Steven
     
    Steven D'Aprano, Mar 25, 2010
    #8
  9. kj

    Neil Cerutti Guest

    On 2010-03-25, Steven D'Aprano <> wrote:
    >> You might not want to be so glib. The sum doc sure doesn't
    >> sound like it should work on lists.
    >>
    >> Returns the sum of a sequence of numbers (NOT strings) plus the
    >> value of parameter 'start' (which defaults to 0).

    >
    > What part of that suggested to you that sum might not be polymorphic?
    > Sure, it says numbers (which should be changed, in my opinion), but it
    > doesn't specify what sort of numbers -- ints, floats, or custom types
    > that have an __add__ method.


    WTF.

    --
    Neil Cerutti
    "It's not fun to build walls. But it's even less fun to live
    without walls in a world full of zombies." --Greedy Goblin
     
    Neil Cerutti, Mar 25, 2010
    #9
  10. Neil Cerutti, 25.03.2010 13:37:
    > On 2010-03-25, Steven D'Aprano wrote:
    >>> You might not want to be so glib. The sum doc sure doesn't
    >>> sound like it should work on lists.
    >>>
    >>> Returns the sum of a sequence of numbers (NOT strings) plus the
    >>> value of parameter 'start' (which defaults to 0).

    >>
    >> What part of that suggested to you that sum might not be polymorphic?
    >> Sure, it says numbers (which should be changed, in my opinion), but it
    >> doesn't specify what sort of numbers -- ints, floats, or custom types
    >> that have an __add__ method.

    >
    > WTF.


    Warning: truth found!

    Stefan
     
    Stefan Behnel, Mar 25, 2010
    #10
  11. * Neil Cerutti:
    > On 2010-03-25, Steven D'Aprano <> wrote:
    >>> You might not want to be so glib. The sum doc sure doesn't
    >>> sound like it should work on lists.
    >>>
    >>> Returns the sum of a sequence of numbers (NOT strings) plus the
    >>> value of parameter 'start' (which defaults to 0).

    >> What part of that suggested to you that sum might not be polymorphic?
    >> Sure, it says numbers (which should be changed, in my opinion), but it
    >> doesn't specify what sort of numbers -- ints, floats, or custom types
    >> that have an __add__ method.

    >
    > WTF.


    I think Steven's argument is that it would be pointless for 'sum' to distinguish
    between user-defined numerical types and other types that happen to support '+'.

    It could make such a distinction since there's a type hierarchy for numbers, but
    then that should IMHO be more clearly documented.

    However, given that it isn't restricted to numbers, the restriction wrt. strings
    is a bit perplexing in the context of modern CPython. But for Python
    implementations that don't offer the '+=' optimization it might help to avoid
    gross inefficiencies, namely quadratic time string concatenation. E.g., here's a
    natural implementation of sum -- with unoptimized '+=' yielding quadratic time
    for the string concatenation (with modern CPython it's linear time, though):

    <example>
    >>> def sum_all( values, start = 0 ):

    ... s = start
    ... for v in values: s += v
    ... return s
    ...
    >>> sum_all( (1, 2, 3, 4) )

    10
    >>> sum_all( ("a", "b", "c", "d"), "" )

    'abcd'
    >>> sum_all( ((1, 2), (3, 4), (5, 6)), () )

    (1, 2, 3, 4, 5, 6)
    >>> _

    </example>

    However, if that hypothesis about the rationale is correct, then 'sum' should
    also be restricted to not handle tuples or lists, so forth, but at least the
    CPython implementation does.

    So perhaps the documentation needs to be more clear?


    Cheers,

    - Alf
     
    Alf P. Steinbach, Mar 25, 2010
    #11
  12. On Thu, 25 Mar 2010 14:02:05 +0100, Alf P. Steinbach wrote:

    > * Neil Cerutti:
    >> On 2010-03-25, Steven D'Aprano <>
    >> wrote:
    >>>> You might not want to be so glib. The sum doc sure doesn't sound
    >>>> like it should work on lists.
    >>>>
    >>>> Returns the sum of a sequence of numbers (NOT strings) plus the
    >>>> value of parameter 'start' (which defaults to 0).
    >>> What part of that suggested to you that sum might not be polymorphic?
    >>> Sure, it says numbers (which should be changed, in my opinion), but it
    >>> doesn't specify what sort of numbers -- ints, floats, or custom types
    >>> that have an __add__ method.

    >>
    >> WTF.

    >
    > I think Steven's argument is that it would be pointless for 'sum' to
    > distinguish between user-defined numerical types and other types that
    > happen to support '+'.


    Before Python2.6, which introduced a numeric tower, Python *couldn't*
    reliably distinguish between numeric types and other types that
    overloaded +. Since Python discourages type-checking in favour of duck-
    typing and try...except, this is seen as a good thing.

    My argument is that sum isn't hard-coded to only work on the built-ins
    ints or floats, but it supports any object that you can use the +
    operator on. The *sole* exceptions are str and unicode (not even
    UserString), and even there it is very simple to overcome the restriction:

    >>> sum(['a', 'b'], '')

    Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    TypeError: sum() can't sum strings [use ''.join(seq) instead]
    >>> class S:

    .... def __add__(self, other):
    .... return other
    ....
    >>> sum(['a', 'b'], S())

    'ab'


    [...]
    > However, given that it isn't restricted to numbers, the restriction wrt.
    > strings is a bit perplexing in the context of modern CPython. But for
    > Python implementations that don't offer the '+=' optimization it might
    > help to avoid gross inefficiencies, namely quadratic time string
    > concatenation.


    I agree -- the Python philosophy is to allow the user to shoot themselves
    in the foot if they wish to. You're responsible for the Big Oh behaviour
    of your code, not the compiler.


    [...]
    > However, if that hypothesis about the rationale is correct, then 'sum'
    > should also be restricted to not handle tuples or lists, so forth, but
    > at least the CPython implementation does.


    The reasoning is that naive users are far, far more likely to try summing
    a large list of strings than to try summing a large list of lists, and
    therefore in practical terms the consequences of allowing sum on lists is
    slight enough and rare enough to not be worth the check.

    I suspect that this is just an after the fact rationalisation, and that
    the real reason is that those responsible for the hand-holding in sum
    merely forgot, or didn't know, that repeated addition of lists and tuples
    is also O(N**2). But I've never cared enough to dig through the archives
    to find out.



    --
    Steven
     
    Steven D'Aprano, Mar 25, 2010
    #12
  13. kj

    Steve Howell Guest

    On Mar 24, 4:19 pm, Paul Rubin <> wrote:
    > kj <> writes:
    > > Is there a sequence-oriented equivalent to the sum built-in?  E.g.:
    > >   seq_sum(((1, 2), (5, 6))) --> (1, 2) + (5, 6) --> (1, 2, 5, 6)

    >
    > use itertools.chain for this.  A few people have mentioned that sum will
    > also work, but I think for that purpose it could have O(n**2)
    > complexity.


    I agree on the practical matter that itertools.chain and other
    solutions are usually the way to go for most tasks that involve
    iterating through several lists.

    From a purely academic standpoint, I'm not convinced that sum() is
    inefficient in terms of big-O complexity, though.

    showell@showell-laptop:~$ python
    Python 2.6.2 (release26-maint, Apr 19 2009, 01:56:41)
    [GCC 4.3.3] on linux2
    >>> class StupidList:

    ... def __init__(self, lst):
    ... print 'creating', lst
    ... self.lst = lst
    ... def __add__(self, other):
    ... self.lst += '|'
    ... self.lst.extend(other.lst)
    ... return self
    ...
    >>> result = sum([StupidList([1, 2]), StupidList([3,4]),

    StupidList([5,6])], StupidList([0]))
    creating [1, 2]
    creating [3, 4]
    creating [5, 6]
    creating [0]
    >>> result.lst

    [0, '|', 1, 2, '|', 3, 4, '|', 5, 6]

    If I'm interpreting the above program correctly, then sum() is doing
    the most efficient thing under the hood--it appears to do the
    equivalent of += without creating unnecessary objects for intermediate
    sums.

    I think the special-case error message might be a case where
    practicality simply beats out purity. It would be nice if sum() were
    completely duck-typed-let-you-shoot-yourself-in-foot-if-you-know-what-
    you-are-doing, but maybe this was such a pitfall at one time, that
    extra safeguards were put into sum(). I wonder how severely sum(),
    without the restriction, would underperform join() on modern versions
    of Python, though.

    >>> sum('1', '2')

    Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    TypeError: sum() can't sum strings [use ''.join(seq) instead]

    Note that you can easily fake out sum() to get duck typing.

    >>> class EmptyStringStarter:

    ... def __add__(self, other): return other
    ...
    >>> empty = EmptyStringStarter()
    >>> sum(['hello ', 'world'], empty)

    'hello world'
     
    Steve Howell, Mar 26, 2010
    #13
  14. kj

    Steve Howell Guest

    On Mar 26, 7:31 am, Steve Howell <> wrote:
    > On Mar 24, 4:19 pm, Paul Rubin <> wrote:
    >
    > > kj <> writes:
    > > > Is there a sequence-oriented equivalent to the sum built-in?  E.g.:
    > > >   seq_sum(((1, 2), (5, 6))) --> (1, 2) + (5, 6) --> (1, 2, 5, 6)

    >
    > > use itertools.chain for this.  A few people have mentioned that sum will
    > > also work, but I think for that purpose it could have O(n**2)
    > > complexity.

    >
    > I agree on the practical matter that itertools.chain and other
    > solutions are usually the way to go for most tasks that involve
    > iterating through several lists.
    >
    > From a purely academic standpoint, I'm not convinced that sum() is
    > inefficient in terms of big-O complexity, though.
    >
    >  showell@showell-laptop:~$ python
    >  Python 2.6.2 (release26-maint, Apr 19 2009, 01:56:41)
    >  [GCC 4.3.3] on linux2
    >  >>> class StupidList:
    >  ...   def __init__(self, lst):
    >  ...     print 'creating', lst
    >  ...     self.lst = lst
    >  ...   def __add__(self, other):
    >  ...     self.lst += '|'
    >  ...     self.lst.extend(other.lst)
    >  ...     return self
    >  ...
    >  >>> result = sum([StupidList([1, 2]), StupidList([3,4]),
    > StupidList([5,6])], StupidList([0]))
    >  creating [1, 2]
    >  creating [3, 4]
    >  creating [5, 6]
    >  creating [0]
    >  >>> result.lst
    >  [0, '|', 1, 2, '|', 3, 4, '|', 5, 6]
    >
    > If I'm interpreting the above program correctly, then sum() is doing
    > the most efficient thing under the hood--it appears to do the
    > equivalent of += without creating unnecessary objects for intermediate
    > sums.
    >
    > I think the special-case error message might be a case where
    > practicality simply beats out purity.  It would be nice if sum() were
    > completely duck-typed-let-you-shoot-yourself-in-foot-if-you-know-what-
    > you-are-doing, but maybe this was such a pitfall at one time, that
    > extra safeguards were put into sum().  I wonder how severely sum(),
    > without the restriction, would underperform join() on modern versions
    > of Python, though.
    >
    >  >>> sum('1', '2')
    >  Traceback (most recent call last):
    >    File "<stdin>", line 1, in <module>
    >  TypeError: sum() can't sum strings [use ''.join(seq) instead]
    >
    > Note that you can easily fake out sum() to get duck typing.
    >
    >  >>> class EmptyStringStarter:
    >  ...   def __add__(self, other): return other
    >  ...
    >  >>> empty = EmptyStringStarter()
    >  >>> sum(['hello ', 'world'], empty)
    >  'hello world'


    Looking at the code answers my own questions:

    http://svn.python.org/view/python/trunk/Python/bltinmodule.c?view=markup

    Look for builtin_sum().

    First you see the guard against strings:

    /* reject string values for 'start' parameter */
    if (PyObject_TypeCheck(result, &PyBaseString_Type)) {
    PyErr_SetString(PyExc_TypeError,
    "sum() can't sum strings [use ''.join(seq) instead]");
    Py_DECREF(iter);
    return NULL;
    }
    Py_INCREF(result);


    Also, Paul's suspicion that sum() works in O(N squared) for lists is
    confirmed by this comment:

    /* It's tempting to use PyNumber_InPlaceAdd instead of
    PyNumber_Add here, to avoid quadratic running time
    when doing 'sum(list_of_lists, [])'. However, this
    would produce a change in behaviour: a snippet like
    empty = []
    sum([[x] for x in range(10)], empty)
    would change the value of empty. */

    It's interesting, though, that you can construct classes pretty
    easily, as I did above, that avoid the quadratic behavior, as long as
    you do not mind mutating the start value, which I think is usually
    fine, since the original start value usually is not useful afterward
    anyway.
     
    Steve Howell, Mar 26, 2010
    #14
  15. On Fri, 26 Mar 2010 07:31:17 -0700, Steve Howell wrote:

    > From a purely academic standpoint, I'm not convinced that sum() is
    > inefficient in terms of big-O complexity, though.
    >
    > showell@showell-laptop:~$ python
    > Python 2.6.2 (release26-maint, Apr 19 2009, 01:56:41) [GCC 4.3.3] on
    > linux2
    > >>> class StupidList:

    [...]

    But it's not *sum* that is inefficient, it is sum *with a particular data
    structure*.

    Sure, if you create your own data structure, you can make it as efficient
    as you like. Obviously the sum algorithm itself has to perform one
    addition per item, or O(N), which scales tolerably well. But each
    addition has a cost. If the cost is constant, then sum() as a whole
    remains O(N). But if the cost of addition varies with N, sum() degrades
    badly.

    We can compare the performance of sum with different data structures,
    starting with plain integers versus long integers:

    >>> from timeit import Timer
    >>> setup = 'data = [%d]*%d'
    >>> for i in range(6):

    .... t1 = Timer('sum(data, 0)', setup % (1, 10**i))
    .... t2 = Timer('sum(data, 0)', setup % (10**50, 10**i))
    .... print min(t1.repeat(number=1000)),
    .... print min(t2.repeat(number=1000))
    ....
    0.00179290771484 0.00263810157776
    0.00340414047241 0.00854396820068
    0.0190401077271 0.0502791404724
    0.155302047729 0.645124912262
    0.794432878494 2.55748295784
    7.97877693176 25.3812758923

    Both scale about as well, but the cost varies significantly: arithmetic
    on very large longints is expensive.

    Strings, with a trick to fool sum into accepting them, versus lists. Note
    that I changed the number of iterations from 6 down to 5. The reason why
    will become obvious:

    >>> class EmptyStringStarter:

    .... def __add__(self, other):
    .... return other
    ....
    >>> empty = EmptyStringStarter()
    >>> setup = """from __main__ import empty; data = [%r]*%d"""
    >>>
    >>> for i in range(5):

    .... t1 = Timer('sum(data, empty)', setup % ('a', 10**i))
    .... t2 = Timer('sum(data, [])', setup % ([1], 10**i))
    .... print min(t1.repeat(number=1000)),
    .... print min(t2.repeat(number=1000))
    ....
    0.00849103927612 0.00226998329163
    0.0121459960938 0.0082700252533
    0.0489149093628 0.186735153198
    0.428920030594 5.28623914719
    14.6552250385 589.102822065


    Strings perform tolerably well, up to a point, but lists perform
    terribly. And in fact, the relatively good performance of strings is an
    artifact of recent versions of CPython. In Jython and IronPython, and
    older versions of CPython, it will behave as poorly as lists.


    > I wonder how severely sum(), without
    > the restriction, would underperform join() on modern versions of Python,
    > though.



    Take note that, in order to get an answer in reasonable time, I've
    reduced the number of timing iterations drastically:

    >>> for i in range(6):

    .... t1 = Timer('sum(data, empty)', setup % ('a', 10**i))
    .... t2 = Timer('"".join(data)', setup % ('a', 10**i))
    .... print min(t1.repeat(number=10)),
    .... print min(t2.repeat(number=10))
    ....
    8.89301300049e-05 1.09672546387e-05
    0.000131845474243 2.19345092773e-05
    0.000591993331909 9.29832458496e-05
    0.0101289749146 0.00082802772522
    0.365957021713 0.00884819030762
    24.2072279453 0.0421011447906

    Note the performance degradation of sum. It gets worse. Much worse:

    >>> for i in range(4, 7):

    .... t1 = Timer('sum(data, empty)', setup % ('a', 10**i))
    .... t2 = Timer('"".join(data)', setup % ('a', 10**i))
    .... print min(t1.repeat(number=1)), # note fewer iterations
    .... print min(t2.repeat(number=1))
    ....
    0.031229019165 0.000817060470581
    2.45445990562 0.00365781784058
    1024.79705095 0.0398509502411

    This is absolutely catastrophic performance degradation.



    --
    Steven
     
    Steven D'Aprano, Mar 27, 2010
    #15
  16. kj

    Steve Howell Guest

    On Mar 27, 3:18 am, Steven D'Aprano <st...@REMOVE-THIS-
    cybersource.com.au> wrote:
    > On Fri, 26 Mar 2010 07:31:17 -0700, Steve Howell wrote:
    > > From a purely academic standpoint, I'm not convinced that sum() is
    > > inefficient in terms of big-O complexity, though.

    >
    > >  showell@showell-laptop:~$ python
    > >  Python 2.6.2 (release26-maint, Apr 19 2009, 01:56:41) [GCC 4.3.3] on
    > >  linux2
    > >  >>> class StupidList:

    >
    > [...]
    >
    > But it's not *sum* that is inefficient, it is sum *with a particular data
    > structure*.
    >


    Yep, the implied context was with particular data structures.

    > Sure, if you create your own data structure, you can make it as efficient
    > as you like. Obviously the sum algorithm itself has to perform one
    > addition per item, or O(N), which scales tolerably well. But each
    > addition has a cost. If the cost is constant, then sum() as a whole
    > remains O(N). But if the cost of addition varies with N, sum() degrades
    > ba
    >


    The surprising part of sum() is not that the outer loop to do the sums
    is O(N). It is hard to imagine any other implementation (without
    parallelizing it).

    The mildly surprising part of sum() is that is does add vs. add-in-
    place, which leads to O(N) vs. O(1) for the inner loop calls, for
    certain data structures, notably lists, even though none of the
    intermediate results get used by the caller. For lists, you could
    make a more efficient variant of sum() that clones the start value and
    does add-in-place.

    I could guess pretty confidently that the reason this optimization was
    never tried is that sum() has always been intended to be used on
    numerics, since other alternatives exist for strings (join), lists
    (chain), and hand-coded data classes that support add-in-place (roll-
    your-own loop).

    The documentation is pretty clear on the intention that sum() is
    intended for numbers:

    http://docs.python.org/library/functions.html#sum

    Except for strings, the docs are not explicit about efficiency
    concerns for other data structures, or the fact that the reference
    implementation does add vs. add-in-place under the hood.




    http://docs.python.org/library/functions.html#sum
     
    Steve Howell, Mar 28, 2010
    #16
  17. On Sun, 28 Mar 2010 07:16:10 -0700, Steve Howell wrote:

    > The mildly surprising part of sum() is that is does add vs. add-in-
    > place, which leads to O(N) vs. O(1) for the inner loop calls, for
    > certain data structures, notably lists, even though none of the
    > intermediate results get used by the caller. For lists, you could make
    > a more efficient variant of sum() that clones the start value and does
    > add-in-place.


    I have no doubt that you could make a version of sum for lists which is
    more efficient than the existing one. After all, join more or less does
    the same sort of thing, and it's very efficient. But don't think that add-
    in-place is necessarily cheap. List appends are amortized O(1) each; if
    you are adding M lists of N items each, that gives you O(M*N).

    It's possible to improve the performance a tad if you can make multiple
    appends in roughly constant time, which is what list.extend (probably?)
    does, but only up to a point. Lists are over-allocated, but if you try to
    add more items than there is room for, you need to make the list bigger,
    and that means reallocating memory, which could easily be O(N**2) or
    worse, depending on how good your OS's memory management is. Under Linux,
    at least by default, malloc will never fail, but there's no guarantee how
    long it will take to return. If the OOM killer has to start shutting down
    other applications, and paging more and more memory to disk, eventually
    malloc will return (or the system will core dump), but it could take a
    while...



    --
    Steven
     
    Steven D'Aprano, Mar 28, 2010
    #17
  18. kj

    Steve Howell Guest

    On Mar 28, 8:17 am, Duncan Booth <> wrote:
    > Steve Howell <> wrote:
    > > The mildly surprising part of sum() is that is does add vs. add-in-
    > > place, which leads to O(N) vs. O(1) for the inner loop calls, for
    > > certain data structures, notably lists, even though none of the
    > > intermediate results get used by the caller.  For lists, you could
    > > make a more efficient variant of sum() that clones the start value and
    > > does add-in-place.

    >
    > Doing add-in-place isn't the only way to make sum more efficient: if you
    > assume that addition is associative (which of course the builtin sum can't)
    > then you can form partial sums. e.g. instead of calculating:
    >
    >   (((((((a + b) + c) + d) + e) + f) + g) + h)
    >
    > you calculate:
    >
    >   (((a + b) + (c + d)) + ((e + f) + (g + h)))
    >
    > Obviously this requires more space than the naive sum, but not as much as
    > you might at first expect: you only need to hold log(n) intermediates
    > values at any time.
    >


    Yep, I did mention in my post that the outer loop does not *have* to
    be O(N), if you can parallelize it. Apart from reducing
    intermediates, the divide-and-conquer method does not reduce overall
    computation time unless you have multiple processors, correct?

    > > I could guess pretty confidently that the reason this optimization was
    > > never tried is that sum() has always been intended to be used on
    > > numerics, since other alternatives exist for strings (join), lists
    > > (chain), and hand-coded data classes that support add-in-place (roll-
    > > your-own loop).

    >
    > Doing it this way helps summing lists or strings (though not as much as
    > str.join), but it also helps if you need to sum a long list of similarly
    > sized floats as you'll get a more accurate answer.
    >


    Interesting! That makes sense. The docs for math.fsum() suggest that
    partial sums are used to maintain precision.

    http://docs.python.org/library/math.html#math.fsum

    > Seehttp://groups.google.com/group/comp.lang.python/browse_thread/thread/....
    > faf92f532e/027cef7d4429aa3a
    > for an earlier discussion of this, or just Google comp.lang.python for
    > 'pairwise sum'.
    >
    > Here's the code I posted in that thread:
    >
    > def sumpairs(seq):
    >     tmp = []
    >     for i,v in enumerate(seq):
    >         if i&1:
    >             tmp[-1] = tmp[-1] + v
    >             i = i + 1
    >             n = i & -i
    >             while n > 2:
    >                 t = tmp.pop(-1)
    >                 tmp[-1] = tmp[-1] + t
    >                 n >>= 1
    >         else:
    >             tmp.append(v)
    >     while len(tmp) > 1:
    >         t = tmp.pop(-1)
    >         tmp[-1] = tmp[-1] + t
    >     return tmp[0]
    >
    > and I claimed that my Python coded sumpairs function was faster than the
    > builtin sum on a list of lists once you had more than about 210 items.
    > I never did get round to rewriting it in C for a more realistic speed
    > comparison: summing integers my Python version is about 60 times slower
    > than the builtin.
     
    Steve Howell, Mar 28, 2010
    #18
  19. kj

    Steve Howell Guest

    On Mar 28, 7:57 am, Steven D'Aprano <st...@REMOVE-THIS-
    cybersource.com.au> wrote:
    > On Sun, 28 Mar 2010 07:16:10 -0700, Steve Howell wrote:
    > > The mildly surprising part of sum() is that is does add vs. add-in-
    > > place, which leads to O(N) vs. O(1) for the inner loop calls, for
    > > certain data structures, notably lists, even though none of the
    > > intermediate results get used by the caller.  For lists, you could make
    > > a more efficient variant of sum() that clones the start value and does
    > > add-in-place.

    >
    > I have no doubt that you could make a version of sum for lists which is
    > more efficient than the existing one. After all, join more or less does
    > the same sort of thing, and it's very efficient. But don't think that add-
    > in-place is necessarily cheap. List appends are amortized O(1) each; if
    > you are adding M lists of N items each, that gives you O(M*N).
    >


    O(M*N) is still cheaper than O(M*N*N).

    > It's possible to improve the performance a tad if you can make multiple
    > appends in roughly constant time, which is what list.extend (probably?)
    > does, but only up to a point. Lists are over-allocated, but if you try to
    > add more items than there is room for, you need to make the list bigger,
    > and that means reallocating memory, which could easily be O(N**2) or
    > worse, depending on how good your OS's memory management is. Under Linux,
    > at least by default, malloc will never fail, but there's no guarantee how
    > long it will take to return. If the OOM killer has to start shutting down
    > other applications, and paging more and more memory to disk, eventually
    > malloc will return (or the system will core dump), but it could take a
    > while...
    >


    Even though extend() obviously has to do memory allocations along the
    way itself, it is still more efficient than the alternative. No need
    to speculate, you can measure these methods on your platform:

    M = 10
    N = 1000

    def in_place(
    start = [],
    sublists = ([[None] * M]) * N
    ):
    accum = start[:]
    for sublist in sublists:
    accum.extend(sublist)
    return accum

    def with_intermediates(
    start = [],
    sublists = ([[None] * M]) * N
    ):
    accum = start
    for sublist in sublists:
    accum = accum + sublist
    return accum
     
    Steve Howell, Mar 28, 2010
    #19
  20. On Mar 28, 10:17 am, Duncan Booth <>
    wrote:
    > Doing add-in-place isn't the only way to make sum more efficient: if you
    > assume that addition is associative (which of course the builtin sum can't)
    > then you can form partial sums. e.g. instead of calculating:

    ....
    >
    > Doing it this way helps summing lists or strings (though not as much as
    > str.join), but it also helps if you need to sum a long list of similarly
    > sized floats as you'll get a more accurate answer.


    Also, partial sums would be a clear winner over add-in-place if
    someone were dumb^H^H^H^Hnaive enough to use sum() on a long list of
    tuples :)
     
    Patrick Maupin, Mar 28, 2010
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Olaf Petzold

    vhdl textio and escape sequences

    Olaf Petzold, Nov 26, 2005, in forum: VHDL
    Replies:
    1
    Views:
    3,536
    Mike Treseler
    Nov 28, 2005
  2. Replies:
    2
    Views:
    320
  3. Harald Kirsch
    Replies:
    0
    Views:
    341
    Harald Kirsch
    Nov 19, 2004
  4. Cree
    Replies:
    5
    Views:
    1,015
  5. Franck Ditter

    sum works in sequences (Python 3)

    Franck Ditter, Sep 19, 2012, in forum: Python
    Replies:
    10
    Views:
    337
    Hans Mulder
    Sep 19, 2012
Loading...

Share This Page