Feature suggestion: sum() ought to use a compensated summation algorithm

Rhamphoryncus · May 5, 2008

Arnaud Delobelle wrote:
sum() works for any sequence of objects with an __add__ method, not
just floats! Your algorithm is specific to floats.
This occurred to me also, but then I tried
sum(['abc', 'efg'], '')

Click to expand...

Click to expand...

Interesting, I always thought that sum is like shortcut of
reduce(operator.add, ...), but I was mistaken.

Click to expand...

reduce() is more forgiving:
reduce(operator.add, ['abc', 'efg'], '' ) # it works
'abcefg'

Click to expand...

Hm, it works for lists:
sum(([1], [2]), [])
[1, 2]

However I find the seccond argument hack ugly.
Does the sum way have any performance advantages over the reduce way?

The relative differences are irrelevant. The important thing is they
both perform quadratically, which makes them the wrong choice (unless
you can guarantee your inputs are trivial.)

$ python2.5 -m timeit 'x = [[0]*1000]*10' 'sum(x, [])'
1000 loops, best of 3: 566 usec per loop
$ python2.5 -m timeit 'x = [[0]*1000]*100' 'sum(x, [])'
10 loops, best of 3: 106 msec per loop
$ python2.5 -m timeit 'x = [[0]*1000]*1000' 'sum(x, [])'
10 loops, best of 3: 18.1 sec per loop

Per element of the outer list, that's 0.0566, 1.06, 18.1 msec.
Nasty. Use this instead:

$ python2.5 -m timeit -s 'x = [[0]*1000]*10' 'y = []' 'for a in x:
y.extend(a)'
10000 loops, best of 3: 80.2 usec per loop
$ python2.5 -m timeit -s 'x = [[0]*1000]*100' 'y = []' 'for a in x:
y.extend(a)'
1000 loops, best of 3: 821 usec per loop
$ python2.5 -m timeit -s 'x = [[0]*1000]*1000' 'y = []' 'for a in x:
y.extend(a)'
10 loops, best of 3: 24.3 msec per loop
$ python2.5 -m timeit -s 'x = [[0]*1000]*10000' 'y = []' 'for a in x:
y.extend(a)'
10 loops, best of 3: 241 msec per loop
$ python2.5 -m timeit -s 'x = [[0]*1000]*100000' 'y = []' 'for a in x:
y.extend(a)'
10 loops, best of 3: 2.35 sec per loop

Per element of the outer list t hat's 0.00802, 0.00821, 0.0243,
0.0241, 0.0235 msec. At 1000 elements it seemed to cross some
threshold (maybe cache related, maybe allocation related), but overall
it scales linearly. Know your algorithm's complexity.

Robert Kern · May 5, 2008

Gabriel said:
Python doesn't require __add__ to be associative, so this should not be used as a general sum replacement. But if you know that you're adding floating point numbers you can use whatever algorithm best fits you. Or use numpy arrays; I think they implement Kahan summation or a similar algorithm.

No, we do a straightforward sum, too. Less magic in the default case.

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco

Gabriel Genellina · May 5, 2008

En Mon said:
Well, 'reasonable' is a subjective notion in this case I have
never seen a non-associative use of the plus sign in maths, so I would
tend to take associativity for granted when seeing a + b + c in code,
therefore I would avoid such uses. (Multiplication signs are a
different story.) Of course others may feel differently about this,
and technically there's nothing in the way of using a non-associative
__add__!

You're approaching Python from a mathematical point of view, I presume?
Python has a perfectly defined meaning for a + b + c: simply, expressions
are evaluated from left to right
<http://docs.python.org/ref/evalorder.html>.
a+b+c is the same as (a+b)+c and *not* the same thing as a+(b+c).
Parenthesis aren't required in the first case, not because addition is
associative, but because evaluation order is clearly defined.
__add__, __mul__ and all other special methods can be used (and in fact,
*are* used) for many kind of objects, not just numbers; the Tree example
by Duncan Booth is a perfectly valid application. (Of course this usage
can be abused, as any other feature, but it's not so common)

Gabriel Genellina · May 6, 2008

Awesome!

See:
http://svn.python.org/view/sandbox/trunk/statistics/statistics.py?rev=40981&view=markup

Wonderful!

Terry Reedy · May 8, 2008

| Gabriel Genellina wrote:
| >
| > Python doesn't require __add__ to be associative, so this should not be
used as a general sum replacement.
|
| It does not _require_ this, but using an __add__ that is not commutative
| and associative, or has side effects, would qualify as a serious misuse,

seq + seq is not commutative ;-)

Rhamphoryncus · May 9, 2008

Gabriel Genellina wrote:

| >
| > Python doesn't require __add__ to be associative, so this should not be
used as a general sum replacement.
|
| It does not _require_ this, but using an __add__ that is not commutative
| and associative, or has side effects, would qualify as a serious misuse,

seq + seq is not commutative ;-)

There's no point switching to a special algorithm unless a float is
detected. What you're really arguing is seq + seq + float is not
commutative. In either case we may have to hurt you.

Need Help: Program to Accept 2 Matrices and Show their Sum	0	Aug 21, 2022
How to use Densenet121 in monai	0	Feb 16, 2024
Can someone pls help me with a little algorithm script	1	Nov 28, 2024
How can I calculate the last payment of the year to be the sum of all previous payments for that year and subtracting it from Research Costs value?	7	Aug 21, 2023
[newbie] Recursive algorithm - review	5	Jan 3, 2014
Trouble with prediction code, for the life of me I can't figure out why it isnt running properly. Help would be appreciated.	0	Jul 8, 2023
I am making a Snake game and it has a: "raise Terminator/turtle.Terminator" message.	2	Dec 20, 2021
Range / empty list issues??	1	Dec 10, 2023

Feature suggestion: sum() ought to use a compensated summation algorithm

Rhamphoryncus

Robert Kern

Gabriel Genellina

Gabriel Genellina

Terry Reedy

Rhamphoryncus

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads