sum and strings

P

Paddy

I was browsing the Voidspace blog item on "Flattening Lists", and
followed up on the use of sum to do the flattening.
A solution was:
nestedList = [[1, 2], [3, 4], [5, 6]]
sum(nestedList,[])
[1, 2, 3, 4, 5, 6]

I would not have thought of using sum in this way. When I did help(sum)
the docstring was very number-centric: It went further, and precluded
its use on strings:
Help on built-in function sum in module __builtin__:

sum(...)
sum(sequence, start=0) -> value

Returns the sum of a sequence of numbers (NOT strings) plus the
value
of parameter 'start'. When the sequence is empty, returns start.

The string preclusion would not help with duck-typing (in general), so
I decided to consult the ref doc on sum:

sum( sequence[, start])

Sums start and the items of a sequence, from left to right, and returns
the total. start defaults to 0. The sequence's items are normally
numbers, and are not allowed to be strings. The fast, correct way to
concatenate sequence of strings is by calling ''.join(sequence). Note
that sum(range(n), m) is equivalent to reduce(operator.add, range(n),
m) New in version 2.3.


The above was a lot better description of sum for me, and with an
inquisitive mind, I like to think that I might have come up with using
sum to flatten nestedList :)
But there was still that warning about using strings.

I therefore tried sum versus their reduce "equivalent" for strings:
Traceback (most recent call last):

Well, after all the above, there is a question:

Why not make sum work for strings too?

It would remove what seems like an arbitrary restriction and aid
duck-typing. If the answer is that the sum optimisations don't work for
the string datatype, then wouldn't it be better to put a trap in the
sum code diverting strings to the reduce equivalent?

Just a thought,

- Paddy.
 
S

Sybren Stuvel

Paddy enlightened us with:
Well, after all the above, there is a question:

Why not make sum work for strings too?

Because of "there should only be one way to do it, and that way should
be obvious". There are already the str.join and unicode.join methods,
which are more powerful than sum.

Sybren
 
F

Fredrik Lundh

Sybren said:
Because of "there should only be one way to do it, and that way should
be obvious".

I would have thought that "performance" and "proper use of English" was
more relevant, though.

</F>
 
P

Paul Rubin

Sybren Stuvel said:
Because of "there should only be one way to do it, and that way should
be obvious". There are already the str.join and unicode.join methods,

Those are obvious???
 
G

Georg Brandl

Paul said:
Those are obvious???

Why would you try to sum up strings? Besides, the ''.join idiom is quite
common in Python.

In this special case, ''.join is much faster than sum() which is why
sum() denies to concat strings.

Georg
 
B

bearophileHUGS

Paul Rubin:
Sybren Stuvel:

Those are obvious???

They aren't fully obvious (because they are methods of the separator
string), but after reading some documentation about string methods, and
after some tests done on the Python shell, you too can probably use
then without much problems.

Bye,
bearophile
 
S

Steve Holden

Paul Rubin:



They aren't fully obvious (because they are methods of the separator
string), but after reading some documentation about string methods, and
after some tests done on the Python shell, you too can probably use
then without much problems.
Using a bound method can make it a little more obvious.
>>> cat = "".join
>>> cat(['one', 'two', 'three']) 'onetwothree'
>>> cat([u'one', u'two', u'three']) u'onetwothree'
>>>

regards
Steve
 
P

Paddy

Sybren said:
Paddy enlightened us with:

Because of "there should only be one way to do it, and that way should
be obvious". There are already the str.join and unicode.join methods,
which are more powerful than sum.

Sybren
I get where you are coming from, but in this case we have a function,
sum, that is not as geeral as it could be. sum is already here, and
works for some types but not for strings which seems an arbitrary
limitation that impede duck typing.

- Pad.

P.S. I can see why, and am used to the ''.join method. A newbie
introduced to sum for integers might naturally try and concatenate
strings using sum too.
 
P

Paddy

Sybren said:
Paddy enlightened us with:

Because of "there should only be one way to do it, and that way should
be obvious". There are already the str.join and unicode.join methods,
which are more powerful than sum.

Sybren
--
The problem with the world is stupidity. Not saying there should be a
capital punishment for stupidity, but why don't we just take the
safety labels off of everything and let the problem solve itself?
Frank Zappa

Here is where I see the break in the 'flow':
1+2+3 6
sum([1,2,3], 0) 6
[1] + [2] +[3] [1, 2, 3]
sum([[1],[2],[3]], []) [1, 2, 3]
'1' + '2' + '3' '123'
sum(['1','2','3'], '')
Traceback (most recent call last):

- Pad.
 
G

Georg Brandl

Paddy said:
I get where you are coming from, but in this case we have a function,
sum, that is not as geeral as it could be. sum is already here, and
works for some types but not for strings which seems an arbitrary
limitation that impede duck typing.

Only that it isn't arbitrary.
- Pad.

P.S. I can see why, and am used to the ''.join method. A newbie
introduced to sum for integers might naturally try and concatenate
strings using sum too.

Yes, and he's immediately told what to do instead.

Georg
 
F

Fredrik Lundh

Paddy said:
Here is where I see the break in the 'flow':
1+2+3 6
sum([1,2,3], 0) 6
[1] + [2] +[3] [1, 2, 3]
sum([[1],[2],[3]], []) [1, 2, 3]
'1' + '2' + '3' '123'
sum(['1','2','3'], '')
Traceback (most recent call last):
File "<interactive input>", line 1, in ?
TypeError: sum() can't sum strings [use ''.join(seq) instead]

do you often write programs that "sums" various kinds of data types, and
for which performance issues are irrelevant?

or are you just stuck in a "I have this idea" loop?

</F>
 
P

Paddy

Fredrik said:
Paddy said:
Here is where I see the break in the 'flow':
1+2+3 6
sum([1,2,3], 0) 6
[1] + [2] +[3] [1, 2, 3]
sum([[1],[2],[3]], []) [1, 2, 3]
'1' + '2' + '3' '123'
sum(['1','2','3'], '')
Traceback (most recent call last):
File "<interactive input>", line 1, in ?
TypeError: sum() can't sum strings [use ''.join(seq) instead]

Hi Frederik,
do you often write programs that "sums" various kinds of data types,
and for which performance issues are irrelevant?

I was asking if sum was indeed an optimization that did not work for
strings.
I was pointing out its affects on duck-typing.
I was hoping that someone might know the history and implementation of
sum and could point out the design decisions taken at the time and/or
discuss the merits of making sum accept strings. or indeed any type
that works with operator.add and that has an additive identity value.

Pythons designers seem to know and apply the advantages of having fewer
'themes' that can be applied with less constraints I am curious about
such a constraint on sum.

- Paddy.
 
P

Paul Rubin

Georg Brandl said:
Why would you try to sum up strings? Besides, the ''.join idiom is quite
common in Python.

Just because it's common doesn't mean it's obvious. In my opinion
it's as ugly as sin, and the fact that it's an idiom shows a
shortcoming in Python. The obvious reason for summing strings is that
it's a long-established feature of Python that a+b concatenates two
strings, so summing a,b,c,d,e should result in a+b+c+d+e.
 
P

Paddy

Georg said:
Paddy wrote:

Only that it isn't arbitrary.
Hi Georg,
I said it *seemed* arbitrary. I doubt that it is arbitrary, and thought
someone would say why the restriction is necessary.
Yes, and he's immediately told what to do instead.
Yep, thats the what. Now as to the why?

- paddy.
 
P

Paul Rubin

Sybren Stuvel said:
Yup. Just read the string documentation and you're off.

Huh? Just because some obscure weirdness like this is in the manual,
doesn't make it obvious or natural.
 
P

Paul Rubin

Paddy said:
Pythons designers seem to know and apply the advantages of having fewer
'themes' that can be applied with less constraints I am curious about
such a constraint on sum.

The opposing argument is that sum is sort of like reduce, i.e.
sum((a,b,c,d)) could conceivably implemented as

temp = a
temp += b
temp += c
temp += d
return temp

If the args are strings, the above is a quadratic time algorithm and
it's better to throw an error than create such a trap for an unwary user.
The obvious fix is for the manual to specify that the sum function should
do the right thing and use a sensible algorithm.

Semi-relevant quotation:

"Let's take an example. Consider an abstract data type stack. It's not
enough to have Push and Pop connected with the axiom wherein you push
something onto the stack and after you pop the stack you get the same
thing back. It is of paramount importance that pushing the stack is a
constant time operation regardless of the size of the stack. If I
implement the stack so that every time I push it becomes slower and
slower, no one will want to use this stack.

We need to separate the implementation from the interface but not at
the cost of totally ignoring complexity. Complexity has to be and is a
part of the unwritten contract between the module and its user. The
reason for introducing the notion of abstract data types was to allow
interchangeable software modules. You cannot have interchangeable
modules unless these modules share similar complexity behavior. If I
replace one module with another module with the same functional
behavior but with different complexity tradeoffs, the user of this
code will be unpleasantly surprised. I could tell him anything I like
about data abstraction, and he still would not want to use the
code. Complexity assertions have to be part of the interface."

--Alex Stepanov (designer of C++ standard template library)
http://www.sgi.com/tech/stl/drdobbs-interview.html
 
P

Paddy

Paul said:
The opposing argument is that sum is sort of like reduce, i.e.
sum((a,b,c,d)) could conceivably implemented as

temp = a
temp += b
temp += c
temp += d
return temp

If the args are strings, the above is a quadratic time algorithm and
it's better to throw an error than create such a trap for an unwary user.
The obvious fix is for the manual to specify that the sum function should
do the right thing and use a sensible algorithm.

Semi-relevant quotation:

"Let's take an example. Consider an abstract data type stack. It's not
enough to have Push and Pop connected with the axiom wherein you push
something onto the stack and after you pop the stack you get the same
thing back. It is of paramount importance that pushing the stack is a
constant time operation regardless of the size of the stack. If I
implement the stack so that every time I push it becomes slower and
slower, no one will want to use this stack.

We need to separate the implementation from the interface but not at
the cost of totally ignoring complexity. Complexity has to be and is a
part of the unwritten contract between the module and its user. The
reason for introducing the notion of abstract data types was to allow
interchangeable software modules. You cannot have interchangeable
modules unless these modules share similar complexity behavior. If I
replace one module with another module with the same functional
behavior but with different complexity tradeoffs, the user of this
code will be unpleasantly surprised. I could tell him anything I like
about data abstraction, and he still would not want to use the
code. Complexity assertions have to be part of the interface."

--Alex Stepanov (designer of C++ standard template library)
http://www.sgi.com/tech/stl/drdobbs-interview.html


Thanks Paul.
I also found this from Guido:
http://mail.python.org/pipermail/python-dev/2003-April/034853.html
And this, in the same thread:
http://mail.python.org/pipermail/python-dev/2003-April/034854.html

So,
The upshot is that complexity matters. The algorithm used in sum is
'very wrong' for use with strings, and their is no clean way to switch
to the preferred method for strings.( ''.join() ).


Thanks group,
- Paddy.
 
P

Paul Rubin

D

Dennis Lee Bieber

I was asking if sum was indeed an optimization that did not work for
strings.

I'm surprised it actual appended the lists...
I was pointing out its affects on duck-typing.

At least for me, the most common usage for something like sum()
would be

alst = [ bunch, of, numbers ]

avg = sum(alst) / len(alst)
--
Wulfraed Dennis Lee Bieber KD6MOG
(e-mail address removed) (e-mail address removed)
HTTP://wlfraed.home.netcom.com/
(Bestiaria Support Staff: (e-mail address removed))
HTTP://www.bestiaria.com/
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,766
Messages
2,569,569
Members
45,042
Latest member
icassiem

Latest Threads

Top