Feature suggestion: sum() ought to use a compensated summation algorithm

  • Thread starter Szabolcs Horvát
  • Start date
R

Rhamphoryncus

Arnaud Delobelle wrote:
sum() works for any sequence of objects with an __add__ method, not
just floats! Your algorithm is specific to floats.
This occurred to me also, but then I tried
sum(['abc', 'efg'], '')
Interesting, I always thought that sum is like shortcut of
reduce(operator.add, ...), but I was mistaken.
reduce() is more forgiving:
reduce(operator.add, ['abc', 'efg'], '' ) # it works
'abcefg'

Hm, it works for lists:
sum(([1], [2]), [])
[1, 2]

However I find the seccond argument hack ugly.
Does the sum way have any performance advantages over the reduce way?

The relative differences are irrelevant. The important thing is they
both perform quadratically, which makes them the wrong choice (unless
you can guarantee your inputs are trivial.)

$ python2.5 -m timeit 'x = [[0]*1000]*10' 'sum(x, [])'
1000 loops, best of 3: 566 usec per loop
$ python2.5 -m timeit 'x = [[0]*1000]*100' 'sum(x, [])'
10 loops, best of 3: 106 msec per loop
$ python2.5 -m timeit 'x = [[0]*1000]*1000' 'sum(x, [])'
10 loops, best of 3: 18.1 sec per loop

Per element of the outer list, that's 0.0566, 1.06, 18.1 msec.
Nasty. Use this instead:

$ python2.5 -m timeit -s 'x = [[0]*1000]*10' 'y = []' 'for a in x:
y.extend(a)'
10000 loops, best of 3: 80.2 usec per loop
$ python2.5 -m timeit -s 'x = [[0]*1000]*100' 'y = []' 'for a in x:
y.extend(a)'
1000 loops, best of 3: 821 usec per loop
$ python2.5 -m timeit -s 'x = [[0]*1000]*1000' 'y = []' 'for a in x:
y.extend(a)'
10 loops, best of 3: 24.3 msec per loop
$ python2.5 -m timeit -s 'x = [[0]*1000]*10000' 'y = []' 'for a in x:
y.extend(a)'
10 loops, best of 3: 241 msec per loop
$ python2.5 -m timeit -s 'x = [[0]*1000]*100000' 'y = []' 'for a in x:
y.extend(a)'
10 loops, best of 3: 2.35 sec per loop

Per element of the outer list t hat's 0.00802, 0.00821, 0.0243,
0.0241, 0.0235 msec. At 1000 elements it seemed to cross some
threshold (maybe cache related, maybe allocation related), but overall
it scales linearly. Know your algorithm's complexity.
 
R

Robert Kern

Gabriel said:
Python doesn't require __add__ to be associative, so this should not be used as a general sum replacement. But if you know that you're adding floating point numbers you can use whatever algorithm best fits you. Or use numpy arrays; I think they implement Kahan summation or a similar algorithm.

No, we do a straightforward sum, too. Less magic in the default case.

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco
 
G

Gabriel Genellina

Well, 'reasonable' is a subjective notion in this case :) I have
never seen a non-associative use of the plus sign in maths, so I would
tend to take associativity for granted when seeing a + b + c in code,
therefore I would avoid such uses. (Multiplication signs are a
different story.) Of course others may feel differently about this,
and technically there's nothing in the way of using a non-associative
__add__! :)

You're approaching Python from a mathematical point of view, I presume?
Python has a perfectly defined meaning for a + b + c: simply, expressions
are evaluated from left to right
<http://docs.python.org/ref/evalorder.html>.
a+b+c is the same as (a+b)+c and *not* the same thing as a+(b+c).
Parenthesis aren't required in the first case, not because addition is
associative, but because evaluation order is clearly defined.
__add__, __mul__ and all other special methods can be used (and in fact,
*are* used) for many kind of objects, not just numbers; the Tree example
by Duncan Booth is a perfectly valid application. (Of course this usage
can be abused, as any other feature, but it's not so common)
 
T

Terry Reedy

| Gabriel Genellina wrote:
| >
| > Python doesn't require __add__ to be associative, so this should not be
used as a general sum replacement.
|
| It does not _require_ this, but using an __add__ that is not commutative
| and associative, or has side effects, would qualify as a serious misuse,

seq + seq is not commutative ;-)
 
R

Rhamphoryncus

Gabriel Genellina wrote:

| >
| > Python doesn't require __add__ to be associative, so this should not be
used as a general sum replacement.
|
| It does not _require_ this, but using an __add__ that is not commutative
| and associative, or has side effects, would qualify as a serious misuse,

seq + seq is not commutative ;-)

There's no point switching to a special algorithm unless a float is
detected. What you're really arguing is seq + seq + float is not
commutative. In either case we may have to hurt you. ;)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,781
Messages
2,569,616
Members
45,306
Latest member
TeddyWeath

Latest Threads

Top