Steven D'Aprano said:
If the values vary greatly in magnitude, you probably want to add them
from smallest to biggest; other than that, how else can you calculate the
mean?
It doesn't have to be smallest to largest, but the important thing is not
to be adding the millionth element to the sum of the preceding 999999
values. So if I remember correctly one option is to add pairs of numbers,
then sum the pairs and so on.
I think my point really is that nobody is going to bother doing this when
they just want to sum a few numbers, and I'm not really surprised that the
builtin sum doesn't do it either, but I was slightly surprised that numpy
doesn't try to minimise loss of precision.
Here's a quick attempt at implementing it which demonstrates how the
builtin sum loses precision somewhat faster than a pairwise sum (and is
rather dependant on the input order). sumpair is of course a lot slower
than the builtin sum, but it might be interesting to see how a C coded
version compared. I can't see that it would necessarily be much slower: it
just adds some counters, bit twiddling and a short temporary array to the
calculation.
C:\temp>sumpairs.py
builtin sum 500000.00000083662
pairwise sum 500000.00000499998
sorted
builtin sum 500000.00000499998
pairwise sum 500000.00000499998
reverse sorted
builtin sum 500000.0
pairwise sum 500000.00000500004
all the same
builtin sum 1000000.0000016732
pairwise sum 1000000.00001
------------ sumpairs.py ------------------------------------------
# Sum implemented so that intermediate results sum a similar number
# of the input values.
def sumpairs(seq):
tmp = []
for i,v in enumerate(seq):
if i&1:
tmp[-1] += v
i = i + 1
n = i & -i
while n > 2:
t = tmp.pop(-1)
tmp[-1] = tmp[-1] + t
n >>= 1
else:
tmp.append(v)
tmp.reverse()
return sum(tmp)
v = [1000000, 0.00001] * 1000000
print "builtin sum ", repr(sum(v)/len(v))
print "pairwise sum", repr(sumpairs(v)/len(v))
print "sorted"
v.sort()
print "builtin sum ", repr(sum(v)/len(v))
print "pairwise sum", repr(sumpairs(v)/len(v))
print "reverse sorted"
v.reverse()
print "builtin sum ", repr(sum(v)/len(v))
print "pairwise sum", repr(sumpairs(v)/len(v))
print "all the same"
v = [1000000.00001] * 1000000
print "builtin sum ", repr(sum(v)/len(v))
print "pairwise sum", repr(sumpairs(v)/len(v))-----------------------------
----------------------------------
In case the code is a bit hard to follow this version makes it clearer what
is happening:
------------- sumpair2.py ---------------
def sumpairs(seq):
tmp = []
for i,v in enumerate(seq):
if i&1:
tmp[-1] += v
print "add ", tmp
i = i + 1
n = i & -i
while n > 2:
t = tmp.pop(-1)
tmp[-1] = tmp[-1] + t
n >>= 1
print "reduce", tmp
else:
tmp.append(v)
print "append", tmp
tmp.reverse()
return sum(tmp)
print sumpairs([1]*40)
-----------------------------------------
which gives the output:
append [1]
add [2]
append [2, 1]
add [2, 2]
reduce [4]
append [4, 1]
add [4, 2]
append [4, 2, 1]
add [4, 2, 2]
reduce [4, 4]
reduce [8]
append [8, 1]
add [8, 2]
append [8, 2, 1]
add [8, 2, 2]
reduce [8, 4]
append [8, 4, 1]
add [8, 4, 2]
append [8, 4, 2, 1]
add [8, 4, 2, 2]
reduce [8, 4, 4]
reduce [8, 8]
reduce [16]
append [16, 1]
add [16, 2]
append [16, 2, 1]
add [16, 2, 2]
reduce [16, 4]
append [16, 4, 1]
add [16, 4, 2]
append [16, 4, 2, 1]
add [16, 4, 2, 2]
reduce [16, 4, 4]
reduce [16, 8]
append [16, 8, 1]
add [16, 8, 2]
append [16, 8, 2, 1]
add [16, 8, 2, 2]
reduce [16, 8, 4]
append [16, 8, 4, 1]
add [16, 8, 4, 2]
append [16, 8, 4, 2, 1]
add [16, 8, 4, 2, 2]
reduce [16, 8, 4, 4]
reduce [16, 8, 8]
reduce [16, 16]
reduce [32]
append [32, 1]
add [32, 2]
append [32, 2, 1]
add [32, 2, 2]
reduce [32, 4]
append [32, 4, 1]
add [32, 4, 2]
append [32, 4, 2, 1]
add [32, 4, 2, 2]
reduce [32, 4, 4]
reduce [32, 8]
40