sum and strings

Georg Brandl · Aug 19, 2006

Paul said:
Just because it's common doesn't mean it's obvious. In my opinion
it's as ugly as sin, and the fact that it's an idiom shows a
shortcoming in Python. The obvious reason for summing strings is that
it's a long-established feature of Python that a+b concatenates two
strings, so summing a,b,c,d,e should result in a+b+c+d+e.

Which is exactly how I would concatenate five strings.

For concatenating longer sequences of strings, however, if it needs to be
done fast, performance-wise, this approach is not sensible.

Georg

Bill Pursell · Aug 19, 2006

Georg said:
Why would you try to sum up strings? Besides, the ''.join idiom is quite
common in Python.

One could extend this argument to dissallow the following:

Sybren Stuvel · Aug 19, 2006

Paul Rubin enlightened us with:

Huh? Just because some obscure weirdness like this is in the
manual, doesn't make it obvious or natural.

I think it's quite natural. The "fact" that it's some obscure
weirdness is all in your head.

Sybren

Georg Brandl · Aug 19, 2006

Bill said:
One could extend this argument to dissallow the following:

Fortunately, hypergeneralization is not one of Python's weaknesses.

Georg

Rhamphoryncus · Aug 20, 2006

Bill said:
One could extend this argument to dissallow the following:

It's worthwhile to note that the use of + as the concatenation operator
is arbitrary. It could just have well been | or &, and has no
relationship with mathematically addition. Were history different
perhaps Guido would have gone with | or & instead, and we wouldn't be
having this conversation.

It's also worth stressing (not in response to your post, but others)
that sum([[1],[2],[3]], []) is just as bad as attempting to sum
strings, both conceptually (it's not mathematical addition), and
performance-wise. Don't do it.

I believe the prefered method to flatten a list of lists is this:

shallow = []
for i in deep:
shallow.extend(i)

Yes, it's three lines. It's also very easy to read. reduce() and
sum() are not.

Carl Banks · Aug 20, 2006

Rhamphoryncus said:
It's also worth stressing (not in response to your post, but others)
that sum([[1],[2],[3]], []) is just as bad as attempting to sum
strings, both conceptually (it's not mathematical addition), and
performance-wise. Don't do it.

Interesting. I would have guessed that, internally, sum was
implemented with in-place operations (except for the first operation,
which would have to be non-in-place so that it wouldn't modify a
mutable initial value). But looking at the source, I see that I would
have guessed wrongly: it does not use in-place operations. So,
although lists are optimized for growing, sum would end up creating a
new list each iteration anyways. Thus it does have the same penalty
that strings would have. Good to know.

Obviously, sum can't check for every possible type that would be
inefficient, but at least it could check for common builtins (str,
list, tuple). That it only checks for strings suggests to me that this
really is just a case of protecting the unwary. Concatenating strings
is common enough, and the drawbacks of using sum bad enough, that a
special case was considered justified. Lists and tuples, though
theoretically they have the same issues as strings, probably don't
justify a special case because they're not as common.

I believe the prefered method to flatten a list of lists is this:

shallow = []
for i in deep:
shallow.extend(i)

Yes, it's three lines. It's also very easy to read. reduce() and
sum() are not.

sum() is really for numerical uses; people ought to just stick to using
it that way.

Carl Banks

Paddy · Aug 20, 2006

Rhamphoryncus said:
It's worthwhile to note that the use of + as the concatenation operator
is arbitrary. It could just have well been | or &, and has no
relationship with mathematically addition.

The effect of the string concatenation operator is only secondary.
Secondary to the use of the word sum; and what could be 'reasonably'
concieved as sum's effect on non-numeric types.

It's also worth stressing (not in response to your post, but others)
that sum([[1],[2],[3]], []) is just as bad as attempting to sum
strings, both conceptually (it's not mathematical addition)

Unfortunately, the words sum and summation are linked to other words
that are commonly used to describe what is happening to the numders,
the lists, and the strings.
Words such as accumulate, concatenate, aggregate, collect, assemble, as
well as add.

and performance-wise. Don't do it.

Amen to that.

I believe the prefered method to flatten a list of lists is this:

shallow = []
for i in deep:
shallow.extend(i)

Yes, it's three lines. It's also very easy to read. reduce() and
sum() are not.

I'd like to squeeze in the listcomp version, not because it is one line
shorter, but because I, and maybe others prefer short listcomps such as
the folowing:

shallow = []
[shallow.extend(i) for i in deep]

-Pad.

Rhamphoryncus · Aug 20, 2006

Paddy said:
Rhamphoryncus said:

It's worthwhile to note that the use of + as the concatenation operator
is arbitrary. It could just have well been | or &, and has no
relationship with mathematically addition.

Click to expand...

The effect of the string concatenation operator is only secondary.
Secondary to the use of the word sum; and what could be 'reasonably'
concieved as sum's effect on non-numeric types.

It's also worth stressing (not in response to your post, but others)
that sum([[1],[2],[3]], []) is just as bad as attempting to sum
strings, both conceptually (it's not mathematical addition)

Click to expand...

Unfortunately, the words sum and summation are linked to other words
that are commonly used to describe what is happening to the numders,
the lists, and the strings.
Words such as accumulate, concatenate, aggregate, collect, assemble, as
well as add.

String concatenation and numeric addition only group together under the
most vague of english terms, "putting things together". String
concatenation violates many mathematical properties of
addition/summation, and simply isn't related.

It is unfortunate that many languages (including python) promote the
confusion by using + for a "put things together" operator, ie both
mathematical addition and string concatenation.

I believe the prefered method to flatten a list of lists is this:

shallow = []
for i in deep:
shallow.extend(i)

Yes, it's three lines. It's also very easy to read. reduce() and
sum() are not.

Click to expand...

I'd like to squeeze in the listcomp version, not because it is one line
shorter, but because I, and maybe others prefer short listcomps such as
the folowing:

shallow = []
[shallow.extend(i) for i in deep]

I'm sure this has been mentioned before, but listcomps are for when you
want to store the list and use it for further things, not for when you
want a side effect. TOOWTDI.

And of course, if saving a line was the reason:

shallow = []
for i in deep: shallow.extend(i)

Gerhard Fiedler · Aug 20, 2006

shallow = []
[shallow.extend(i) for i in deep]

Click to expand...

I'm sure this has been mentioned before, but listcomps are for when you
want to store the list and use it for further things, not for when you
want a side effect. TOOWTDI.

Can you please explain what you mean with this, and maybe why?

Thanks,
Gerhard

Marc 'BlackJack' Rintsch · Aug 20, 2006

Gerhard Fiedler said:
shallow = []
[shallow.extend(i) for i in deep]

Click to expand...

I'm sure this has been mentioned before, but listcomps are for when you
want to store the list and use it for further things, not for when you
want a side effect. TOOWTDI.

Click to expand...

Can you please explain what you mean with this, and maybe why?

You should not abuse list comps just to have a one liner. Only use them
if you really want to build a list and not just for side effects. The
above one-liner builds a list of `None`\s of length ``len(deep)`` for no
reason just to throw them away.

Ciao,
Marc 'BlackJack' Rintsch

Alex Martelli · Aug 20, 2006

Rhamphoryncus said:
I believe the prefered method to flatten a list of lists is this:

shallow = []
for i in deep:
shallow.extend(i)

Yes, it's three lines. It's also very easy to read. reduce() and
sum() are not.

Click to expand...

I'd like to squeeze in the listcomp version, not because it is one line
shorter, but because I, and maybe others prefer short listcomps such as
the folowing:

shallow = []
[shallow.extend(i) for i in deep]

Click to expand...

I'm sure this has been mentioned before, but listcomps are for when you
want to store the list and use it for further things, not for when you
want a side effect. TOOWTDI.

And of course, if saving a line was the reason:

shallow = []
for i in deep: shallow.extend(i)

Another alternative equivalent to

shallow = sum(deep, [])

but based on the "proper" usage of list comprehensions is

shallow = [ item for sublist in deep for item in sublist ]

In terms of performance, however, the simple loop that you (rhamph)
posted is generally best -- e.g., with Python 2.5c1 on a MacbookPro:

brain:~/downloads alex$ python -mtimeit -s'deep=[range(9)]*9'
's=sum(deep,[])'
100000 loops, best of 3: 11.2 usec per loop

brain:~/downloads alex$ python -mtimeit -s'deep=[range(9)]*9' 's=[]

for sublist in deep: s.extend(sublist)'

100000 loops, best of 3: 6.92 usec per loop

brain:~/downloads alex$ python -mtimeit -s'deep=[range(9)]*9' 's=[]
[s.extend(sublist) for sublist in deep]'
100000 loops, best of 3: 8.48 usec per loop

brain:~/downloads alex$ python -mtimeit -s'deep=[range(9)]*9' 's=[item
for sublist in deep for item in sublist]'
100000 loops, best of 3: 17.1 usec per loop

Alex

Steven D'Aprano · Aug 21, 2006

If the args are strings, the above is a quadratic time algorithm and
it's better to throw an error than create such a trap for an unwary user.

That's a nonsense argument. There is no shortage of slow algorithms, and
some of them are built into Python. When did O(n**2) become an error
condition? And overhead matters: if I'm only doing a few concatenations,
it is significantly faster to add the strings using + than to use join()
(at least in Python 2.3 -- YMMV):

s = timeit.Timer("''.join(L)", "L=['a'*50, 'b'*50, 'c'*50]")
s.timeit() 1.3470098972320557
t = timeit.Timer("a+b+c", "a,b,c = 'a'*50, 'b'*50, 'c'*50")
t.timeit()

Click to expand...

Click to expand...

1.0698421001434326

There's a word for optimizations that actually result in code running 25%
slower.

I applaud that Python's language developers care about efficiency. I
applaud that the documentation warns people about traps and potential
areas of inefficiency. But I think it is a shame that sum() does a
special type-check to avoid something which is only sometimes slow. It
doesn't protect against O(n**2) performance; it merely protects against
just one of an infinite number of possible "traps for the unwary".

I would have thought it would be better for sum() to raise a warning, not
an exception. If we take seriously the argument that sum implies
addition, and that string concatenation isn't really addition, then sum()
should also refuse to operate on lists and tuples and any other
non-numeric class. Either would be better than sum() "protecting" the user
from himself, except when it doesn't.

Georg Brandl · Aug 21, 2006

Steven said:
If the args are strings, the above is a quadratic time algorithm and
it's better to throw an error than create such a trap for an unwary user.

Click to expand...

That's a nonsense argument. There is no shortage of slow algorithms, and
some of them are built into Python. When did O(n**2) become an error
condition? And overhead matters: if I'm only doing a few concatenations,
it is significantly faster to add the strings using + than to use join()
(at least in Python 2.3 -- YMMV):

s = timeit.Timer("''.join(L)", "L=['a'*50, 'b'*50, 'c'*50]")
s.timeit() 1.3470098972320557
t = timeit.Timer("a+b+c", "a,b,c = 'a'*50, 'b'*50, 'c'*50")
t.timeit()

Click to expand...

Click to expand...

1.0698421001434326

There's a word for optimizations that actually result in code running 25%
slower.

I applaud that Python's language developers care about efficiency. I
applaud that the documentation warns people about traps and potential
areas of inefficiency. But I think it is a shame that sum() does a
special type-check to avoid something which is only sometimes slow. It
doesn't protect against O(n**2) performance; it merely protects against
just one of an infinite number of possible "traps for the unwary".

I would have thought it would be better for sum() to raise a warning, not
an exception. If we take seriously the argument that sum implies
addition, and that string concatenation isn't really addition, then sum()
should also refuse to operate on lists and tuples and any other
non-numeric class. Either would be better than sum() "protecting" the user
from himself, except when it doesn't.

Well, present that on python-dev.

Georg

Fredrik Lundh · Aug 21, 2006

Alex said:
In terms of performance, however, the simple loop that you (rhamph)
posted is generally best -- e.g., with Python 2.5c1 on a MacbookPro:

brain:~/downloads alex$ python -mtimeit -s'deep=[range(9)]*9'
's=sum(deep,[])'
100000 loops, best of 3: 11.2 usec per loop

brain:~/downloads alex$ python -mtimeit -s'deep=[range(9)]*9' 's=[]

for sublist in deep: s.extend(sublist)'

Click to expand...

100000 loops, best of 3: 6.92 usec per loop

at least on this machine, map(s.extend) is slightly faster than the loop:

timeit -s"deep=[range(9)]*9" "s=[]" "for sublist in deep:
s.extend(sublist)"
100000 loops, best of 3: 5.59 usec per loop

timeit -s"deep=[range(9)]*9" "s=[]" "map(s.extend, deep)"
100000 loops, best of 3: 5.26 usec per loop

</F>

Fredrik Lundh · Aug 21, 2006

Steven said:
That's a nonsense argument. There is no shortage of slow algorithms, and
some of them are built into Python. When did O(n**2) become an error
condition? And overhead matters: if I'm only doing a few concatenations,
it is significantly faster to add the strings using + than to use join()
(at least in Python 2.3 -- YMMV):

s = timeit.Timer("''.join(L)", "L=['a'*50, 'b'*50, 'c'*50]")
s.timeit() 1.3470098972320557
t = timeit.Timer("a+b+c", "a,b,c = 'a'*50, 'b'*50, 'c'*50")
t.timeit()

Click to expand...

Click to expand...

1.0698421001434326

and what exactly does the fact that Python can do operator-based
dispatch much faster than it can do method-based dispatch have to
do with sum's inability to add strings ?

did you have some kind of "zero overhead for some function calls"
optimization in mind ?

</F>

Carl Banks · Aug 21, 2006

Steven said:
But I think it is a shame that sum() does a
special type-check to avoid something which is only sometimes slow. It
doesn't protect against O(n**2) performance; it merely protects against
just one of an infinite number of possible "traps for the unwary".

I'm not so sure special casing was a good idea myself, but...

If you allow for a small number of special cases in the language to
protect against the most harmful and prone traps, this case would be
one of the top ones, IMHO.

Carl Banks

and, you know, if you really want to concatenate strings with sum, you
can

class noopadd(object):
def __add__(self,other):
return other

sum(["abc","def","ghi"],noopadd())

Paddy · Aug 21, 2006

Carl said:
and, you know, if you really want to concatenate strings with sum, you
can

class noopadd(object):
def __add__(self,other):
return other

sum(["abc","def","ghi"],noopadd())

;-)

- Pad.

Steven D'Aprano · Aug 22, 2006

Steven said:
Steven said:

That's a nonsense argument. There is no shortage of slow algorithms, and
some of them are built into Python. When did O(n**2) become an error
condition? And overhead matters: if I'm only doing a few concatenations,
it is significantly faster to add the strings using + than to use join()
(at least in Python 2.3 -- YMMV):

s = timeit.Timer("''.join(L)", "L=['a'*50, 'b'*50, 'c'*50]")
s.timeit() 1.3470098972320557
t = timeit.Timer("a+b+c", "a,b,c = 'a'*50, 'b'*50, 'c'*50")
t.timeit()

Click to expand...

1.0698421001434326

Click to expand...

and what exactly does the fact that Python can do operator-based
dispatch much faster than it can do method-based dispatch have to
do with sum's inability to add strings ?

Sorry, I don't get you here. Are you saying that string addition doesn't
call the operands' __add__ methods?

Obviously I would have preferred to have done a direct test of sum()
versus join(), but I can't since sum() goes out of its way to make that
difficult.

Or maybe I can...
.... def __add__(self, other): return other
....
.... L = ['a', 'b']
.... M = Magic()
.... """12.319401025772095

That's bizarrely slow for concatenating two characters. It's an order of
magnitude slower than doing a direct addition of two characters, and about
one third the speed of a pure Python implementation:
.... t = ""
.... for s in strlist: t = t+s
.... return t
....
.... a, b, c, d, = 'a', 'b', 'c', 'd'
.... """

t6 = timeit.Timer("mysum([a,b,c,d])", setup1)
t6.timeit()

Click to expand...

Click to expand...

4.3943448066711426

What's going on here? What have I missed? I know adding strings is
O(n**2), but these results don't make any sense to me.

Fredrik Lundh · Aug 22, 2006

Steven said:
Sorry, I don't get you here. Are you saying that string addition doesn't
call the operands' __add__ methods?

the "+" instruction (BINARY_ADD) special-cases integers and 8-bit
strings, and uses a single C call via a type slot for other types.

in contrast, "join" and "sum" needs to do a full name lookup and a
full Python-level function call. compared to the effort needed to
copy 100 bytes, that's a rather expensive set of operations.

</F>

neophyte · Aug 23, 2006

Fredrik said:
do you often write programs that "sums" various kinds of data types, and
for which performance issues are irrelevant?

or are you just stuck in a "I have this idea" loop?

Maybe he's just insisting on the principle of least surprise?
Personally, I'd rather expect sum() to work for strings and Python to
issue a warning instead of raising a type error. That warning might
also be appropriate when using sum() on other builtin sequence types.

Getting incorrect output in finding the maximum pair sum in the given array.	7	Apr 6, 2023
C program: memory leak/ segmentation fault/ memory limit exceeded	0	Nov 12, 2022
Need Help: Program to Accept 2 Matrices and Show their Sum	0	Aug 21, 2022
How can I calculate the last payment of the year to be the sum of all previous payments for that year and subtracting it from Research Costs value?	7	Aug 22, 2023
SENTINEL CONTROL LOOP WHEN DEALING WITH TWO ARRAYS	1	Oct 26, 2023
sum for sequences?	54	Mar 24, 2010
Two Dictionaries and a Sum!	5	May 18, 2013
sum returns numpy.float64 when applied to a sequence of numpy.uint64	0	Dec 15, 2012

sum and strings

Georg Brandl

Bill Pursell

Sybren Stuvel

Georg Brandl

Rhamphoryncus

Carl Banks

Paddy

Rhamphoryncus

Gerhard Fiedler

Marc 'BlackJack' Rintsch

Alex Martelli

Steven D'Aprano

Georg Brandl

Fredrik Lundh

Fredrik Lundh

Carl Banks

Paddy

Steven D'Aprano

Fredrik Lundh

neophyte

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads