sum and strings

T

Tim Chase

Maybe he's just insisting on the principle of least surprise?
Personally, I'd rather expect sum() to work for strings and Python to
issue a warning instead of raising a type error. That warning might
also be appropriate when using sum() on other builtin sequence types.

In its own way, it makes good sense to not be surprising:
>>> # using regular number
>>> n1 = 1
>>> n2 = 2
>>> sum([n1,n2])
3
>>> # using complex numbers
>>> c1 = 1+2j
>>> c2 = 5+11j
>>> sum([c1,c2])
(6+13j)
.... def __init__(self, n, i,j,k):
.... self.n = n
.... self.i = i
.... self.j = j
.... self.k = k
.... def __add__(self, other):
.... return Q(self.n+other.n,
.... self.i+other.i,
.... self.j+other.j,
.... self.k+other.k)
....
>>> q1 = Q(1,2,3,5)
>>> q2 = Q(7,11,13,17)
>>> q3 = q1 + q2
>>> q3.n, q3.i, q3.j, q3.k (8, 13, 16, 22)
>>> sum([q1,q2])
Traceback (most recent call last):
File "<stdin>", line 1, in ?
TypeError: unsupported operand type(s) for +: 'int' and 'q'


Just because something is slow or sub-optimal doesn't mean it
should be an error. Otherwise calls to time.sleep() should throw
an error...
Traceback (most recent call last):
File "<stdin>", line 1, in ?
InefficentError: call takes more than 1 second to complete


[scratches head]

+1 regarding principle of least surprise on sum()

-tkc
 
F

Fredrik Lundh

Tim said:
q3 = q1 + q2
q3.n, q3.i, q3.j, q3.k (8, 13, 16, 22)
sum([q1,q2])
Traceback (most recent call last):
File "<stdin>", line 1, in ?
TypeError: unsupported operand type(s) for +: 'int' and 'q'

Just because something is slow or sub-optimal doesn't mean it
should be an error.

that's not an error because it would be "slow or sub-optimal" to add
custom objects, that's an error because you don't understand how "sum"
works.

(hint: sum != reduce)

</F>
 
T

Tim Chase

Just because something is slow or sub-optimal doesn't mean it
that's not an error because it would be "slow or sub-optimal" to add
custom objects, that's an error because you don't understand how "sum"
works.

(hint: sum != reduce)

No, clearly sum!=reduce...no dispute there...

so we go ahead and get the sum([q1,q2]) working by specifying a
starting value sum([q1,q2], Q()):
.... def __init__(self, n=0, i=0,j=0,k=0):
.... self.n = n
.... self.i = i
.... self.j = j
.... self.k = k
.... def __add__(self, other):
.... return Q(self.n+other.n,
.... self.i+other.i,
.... self.j+other.j,
.... self.k+other.k)
.... def __repr__(self):
.... return "<Q(%i,%i,%i,%i)>" % (
.... self.n,
.... self.i,
.... self.j,
.... self.k)
....
>>> q1 = Q(1,2,3,5)
>>> q2 = Q(7,11,13,17)
>>> q1+q2
>>> sum([q1,q2])
Traceback (most recent call last):
File said:
<Q(8,13,16,22)>


Thus, sum seems to work just fine for objects containing an
__add__ method. However, strings contain an __add__ method.
True

yet, using the same pattern...
>>> sum(["hello", "world"], "")
Traceback (most recent call last):
File "<stdin>", line 1, in ?
TypeError: sum() can't sum strings [use ''.join(seq) instead]


Which seems like an arbitrary prejudice against strings...flying
in the face of python's duck-typing. If it has an __add__
method, duck-typing says you should be able to provide a starting
place and a sequence of things to add to it, and get the sum.

However, a new sum2() function can be created...
.... for item in seq:
.... start += item
.... return start
....

which does what one would expect the definition of sum() should
be doing behind the scenes.
>>> # generate the expected error, same as above
>>> sum2([q1,q2])
Traceback (most recent call last):
File "<stdin>", line 1, in ?
File said:
>>> # employ the same solution of a proper starting point
>>> sum2([q1,q2], Q())
>>> # do the same thing for strings
>>> sum2(["hello", "world"], "")
'helloworld'

and sum2() works just like sum(), only it happily takes strings
without prejudice.

From help(sum):
"Returns the sum of a sequence of numbers (NOT strings) plus the
value of parameter 'start'. When the sequence is empty, returns
start."

It would be as strange as if enumerate() didn't take strings, and
instead forced you to use some other method for enumerating strings:
Traceback (most recent call last):
File "<stdin>", line 1, in ?
TypeError: enumerate() can't enumerate strings [use
"hello".enumerator() instead]

Why the arbitrary breaking of duck-typing for strings in sum()?
Why make them second-class citizens?

The interpreter is clearly smart enough to recognize when the
condition occurs such that it can throw the error...thus, why not
add a few more smarts and have it simply translate it into
"start+''.join(sequence)" to maintain predictable behavior
according to duck-typing?

-tkc
 
F

Fredrik Lundh

Tim said:
The interpreter is clearly smart enough to recognize when the
condition occurs such that it can throw the error...thus, why not
add a few more smarts and have it simply translate it into
"start+''.join(sequence)" to maintain predictable behavior
according to duck-typing?

"join" doesn't use __add__ at all, so I'm not sure in what sense that
would be more predictable. I'm probably missing something, but I cannot
think of any core method that uses a radically different algorithm based
on the *type* of one argument.

besides, in all dictionaries I've consulted, the word "sum" means
"adding numbers". are you sure it wouldn't be more predictable if "sum"
converted strings to numbers ?

(after all, questions about type errors like "cannot concatenate 'str'
and 'int' objects" and "unsupported operand type(s) for +: 'int' and
'str'" are a *lot* more common than questions about sum() on string lists.)

</F>
 
G

Gabriel Genellina

The interpreter is clearly smart enough to recognize when the
condition occurs such that it can throw the error...thus, why not
add a few more smarts and have it simply translate it into
"start+''.join(sequence)" to maintain predictable behavior
according to duck-typing?

sequences don't have to be homogeneous, and iterators cant go back.
But let GvR say that in his own words:
http://mail.python.org/pipermail/python-dev/2003-April/034854.html



Gabriel Genellina
Softlab SRL





__________________________________________________
Preguntá. Respondé. Descubrí.
Todo lo que querías saber, y lo que ni imaginabas,
está en Yahoo! Respuestas (Beta).
¡Probalo ya!
http://www.yahoo.com.ar/respuestas
 
F

Fredrik Lundh

Gabriel said:
sequences don't have to be homogeneous, and iterators cant go back.
But let GvR say that in his own words:
http://mail.python.org/pipermail/python-dev/2003-April/034854.html

you could of course dispatch on the type of the second argument (the
start value), but that'd be at least as silly. here's the relevant
pronouncement:

http://mail.python.org/pipermail/python-dev/2003-April/034853.html

and here's an elaboration by the martellibot:

http://mail.python.org/pipermail/python-dev/2003-April/034855.html

(and note that the python-dev consensus is that sum is for numbers and
join is for strings. anyone who cares about writing readable code knows
that names matter; different things should have different names.)

(I still think a "join" built-in would be nice, though. but anyone who
argues that "join" should support numbers too will be whacked with a
great big halibut.)

</F>
 
P

Paul Rubin

Fredrik Lundh said:
"join" doesn't use __add__ at all, so I'm not sure in what sense that
would be more predictable. I'm probably missing something, but I
cannot think of any core method that uses a radically different
algorithm based on the *type* of one argument.

3 ** 2, vs. 3 ** 2.5

Actually it's worse than depending on just the type:

(-3.0) ** 2.0 works, while (-3.0) ** 2.5 throws an error.

But you can do

(-3.0) ** (2.5 + 0.0j)

for yet another radically different algorithm.

I'm not sure whether the Python spec requires (-3.0)**2.0 to work, or
if it's just an implementation-specific hack. In most languages, it
wouldn't work.
 
S

Steven D'Aprano

besides, in all dictionaries I've consulted, the word "sum" means
"adding numbers". are you sure it wouldn't be more predictable if "sum"
converted strings to numbers ?
(and note that the python-dev consensus is that sum is for numbers and
join is for strings. anyone who cares about writing readable code knows
that names matter; different things should have different names.)

And yet sum() sums lists, tuples, custom classes, and many other things
which are not numbers. Did python-dev not notice this? I doubt it.

I've read all the arguments in favour of type-checking the arguments to
sum() (actually only the second argument). I'm not convinced that this
case is special enough to break the rules by raising an exception for
strings. (Insert usual arguments about consistency, duck typing, least
surprise, etc.)

There is an alternative: raise a warning instead of an exception. That
permits the admirable strategy of educating users that join() is
generally a better way to concatenate a large number of strings, while
still allowing programmers who want to shoot themselves in the foot to do
so. Python generally allows the programmer to shoot himself in the foot,
if the programmer insists, e.g. private attributes are by convention, not
enforced by the language. That's one of the nice things about the
language: it *suggests* what to do, but doesn't *insist* it knows what you
want better than you do.
 
P

Paul Rubin

Steven D'Aprano said:
There is an alternative: raise a warning instead of an exception. That
permits the admirable strategy of educating users that join() is
generally a better way to concatenate a large number of strings, while
still allowing programmers who want to shoot themselves in the foot to do so.

This is reasonable, but why not just do the right thing and use the
right algorithm? There is, after all, no guarantee in the docs that
''.join() uses the right algorithm either. I've come to the view that
the docs should explicitly specify this type of thing, per the
Stepanov interview mentioned earlier. You may be right that even when
it's not clear what the best algorithm is, using a correct but slow
algorithm is preferable to throwing an error.

But I think for strings, we should rethink how this kind of operation
is done, and build up the sum of strings in terms of some kind of mutable
string object resembling Java's StringBuf or Python's cStringIO objects.
 
H

Hendrik van Rooyen

8<----------------------------------------

| (I still think a "join" built-in would be nice, though. but anyone who
| argues that "join" should support numbers too will be whacked with a
| great big halibut.)
|
| </F>

Strange this - you don't *LOOK* like a Gaulish Blacksmith...
 
B

Bryan Olson

Fredrik said:
[...] besides, in all dictionaries I've consulted, the word "sum"
means "adding numbers".

That's a result of not looking deeply enough.

Fredrik Lundh is partially right, in that "Sum" usually refers
to addition of numbers. Nevertheless, the idea that "sum" must
refer to numbers is demonstrably wrong: Googling "define: sum"
shows several counter-examples, the first of which is:

The final aggregate; "the sum of all our troubles did not
equal the misery they suffered."

Numbers? What numbers? Lundh may be telling the truth about
all the dictionaries he consulted, but anyone with Internet
access could have looked -- and found -- further.

Dictionary definition is not really the issue. Who could miss
the correspondence between the addition operation and the sum
function? Why would we choose a language or type system that
cannot even make "+" and "sum" behave consistently?

are you sure it wouldn't be more predictable if "sum"
converted strings to numbers ?

Definitely reject that idea. People so misguided as to want
silent string-to-int conversion can use Perl.

The problem here is unifying the "+" operator and the "sum"
function, The former is one of Python's several 'special'
methods; the latter is one of Python's pre-defined
functions. See:

http://docs.python.org/ref/numeric-types.html
http://docs.python.org/lib/built-in-funcs.html

And herein is the problem: A class may implement "__add__" any
way the programmer chooses. Python should require, or at least
document requirements, on further properties of addition. Python
should insist that addition be symmetric an transitive, and
classes implementing addition provide an additive identity.

(after all, questions about type errors like "cannot concatenate 'str'
and 'int' objects" and "unsupported operand type(s) for +: 'int' and
'str'" are a *lot* more common than questions about sum() on string lists.)

Right. Duck-typing can work, while sloppy typing is doomed.
 
P

Paul Rubin

Bryan Olson said:
And herein is the problem: A class may implement "__add__" any
way the programmer chooses. Python should require, or at least
document requirements, on further properties of addition. Python
should insist that addition be symmetric an transitive, and
classes implementing addition provide an additive identity.

Are you saying "abc"+"def" should not be concatenation? I guess
that's reasonable. As long as + is string concatenation though, the
principle of least astonishment suggests that "sum" should
conconcatenate several strings.

I'm not sure what you mean by addition being symmetric or transitive;
it is not an equivalence relation. Do you mean commutative and
associative? I'm not sure if floating-point arithmetic has those
properties, strictly speaking.
 
N

Neil Cerutti

Are you saying "abc"+"def" should not be concatenation? I
guess that's reasonable. As long as + is string concatenation
though, the principle of least astonishment suggests that "sum"
should conconcatenate several strings.

I'm not sure what you mean by addition being symmetric or
transitive; it is not an equivalence relation. Do you mean
commutative and associative? I'm not sure if floating-point
arithmetic has those properties, strictly speaking.

The interesting part of the discussion, to me, was that Alex
Martelli's initial implementation included the string
optimization so that a list with a string as its first element
called ''.join(lst) instead. But that optimization turns out to
be valid only if every element of the list is a string.

So there isn't, it seems, a practical way of implementing the
sum(list of strings) -> ''.join(list of strings optimization.
 
P

Paul Rubin

Neil Cerutti said:
So there isn't, it seems, a practical way of implementing the
sum(list of strings) -> ''.join(list of strings optimization.

''.join may not be the right way to do it, but obviously there are
other ways. This isn't rocket science.
 
B

Bryan Olson

Paul said:
Are you saying "abc"+"def" should not be concatenation? I guess
that's reasonable.

No, I'm definitely not saying that, or at least I didn't mean
that.
As long as + is string concatenation though, the
principle of least astonishment suggests that "sum" should
conconcatenate several strings.
Absolutely.

I'm not sure what you mean by addition being symmetric or transitive;
it is not an equivalence relation. Do you mean commutative and
associative?

Oops, yes, of course. Posting too late at night.
 
P

Paddy

Paddy wrote:
Why not make sum work for strings too?

It would remove what seems like an arbitrary restriction and aid
duck-typing. If the answer is that the sum optimisations don't work for
the string datatype, then wouldn't it be better to put a trap in the
sum code diverting strings to the reduce equivalent?

Just a thought,

- Paddy.

from __no_future__ import sum

assert "Not likely" == sum(["Not ", "likely"], "", least_surprise=True)

:)
 
F

Fredrik Lundh

Hendrik said:
| (I still think a "join" built-in would be nice, though. but anyone who
| argues that "join" should support numbers too will be whacked with a
| great big halibut.)

Strange this - you don't *LOOK* like a Gaulish Blacksmith...

no, but I have a nice safari outfit.

(hint: this is comp.lang.python, not comp.lang.asterix)

</F>
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,774
Messages
2,569,598
Members
45,145
Latest member
web3PRAgeency
Top