sum works in sequences (Python 3)

Discussion in 'Python' started by Franck Ditter, Sep 19, 2012.

  1. Hello,
    I wonder why sum does not work on the string sequence in Python 3 :

    >>> sum((8,5,9,3))

    25
    >>> sum([5,8,3,9,2])

    27
    >>> sum('rtarze')

    TypeError: unsupported operand type(s) for +: 'int' and 'str'

    I naively thought that sum('abc') would expand to 'a'+'b'+'c'
    And the error message is somewhat cryptic...

    franck
     
    Franck Ditter, Sep 19, 2012
    #1
    1. Advertising

  2. Franck Ditter

    Neil Cerutti Guest

    On 2012-09-19, Franck Ditter <> wrote:
    > Hello,
    > I wonder why sum does not work on the string sequence in Python 3 :
    >
    >>>> sum((8,5,9,3))

    > 25
    >>>> sum([5,8,3,9,2])

    > 27
    >>>> sum('rtarze')

    > TypeError: unsupported operand type(s) for +: 'int' and 'str'
    >
    > I naively thought that sum('abc') would expand to 'a'+'b'+'c'
    > And the error message is somewhat cryptic...


    You got that error message because the default value for the
    second 'start' argument is 0. The function tried to add 'r' to 0.
    That said:

    >>> sum('rtarze', '')

    Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    TypeError: sum() can't sum strings [use ''.join(seq) instead]

    --
    Neil Cerutti
     
    Neil Cerutti, Sep 19, 2012
    #2
    1. Advertising

  3. Franck Ditter

    Ian Kelly Guest

    On Wed, Sep 19, 2012 at 8:41 AM, Franck Ditter <> wrote:
    > Hello,
    > I wonder why sum does not work on the string sequence in Python 3 :
    >
    >>>> sum((8,5,9,3))

    > 25
    >>>> sum([5,8,3,9,2])

    > 27
    >>>> sum('rtarze')

    > TypeError: unsupported operand type(s) for +: 'int' and 'str'
    >
    > I naively thought that sum('abc') would expand to 'a'+'b'+'c'
    > And the error message is somewhat cryptic...


    It notes in the doc string that it does not work on strings:

    sum(...)
    sum(sequence[, start]) -> value

    Returns the sum of a sequence of numbers (NOT strings) plus the value
    of parameter 'start' (which defaults to 0). When the sequence is
    empty, returns start.

    I think this restriction is mainly for efficiency. sum(['a', 'b',
    'c', 'd', 'e']) would be the equivalent of 'a' + 'b' + 'c' + 'd' +
    'e', which is an inefficient way to add together strings. You should
    use ''.join instead:

    >>> ''.join('abc')

    'abc'
     
    Ian Kelly, Sep 19, 2012
    #3
  4. Franck Ditter

    Neil Cerutti Guest

    On 2012-09-19, Ian Kelly <> wrote:
    > It notes in the doc string that it does not work on strings:
    >
    > sum(...)
    > sum(sequence[, start]) -> value
    >
    > Returns the sum of a sequence of numbers (NOT strings) plus
    > the value of parameter 'start' (which defaults to 0). When
    > the sequence is empty, returns start.
    >
    > I think this restriction is mainly for efficiency. sum(['a',
    > 'b', 'c', 'd', 'e']) would be the equivalent of 'a' + 'b' + 'c'
    > + 'd' + 'e', which is an inefficient way to add together
    > strings. You should use ''.join instead:


    While the docstring is still useful, it has diverged from the
    documentation a little bit.

    sum(iterable[, start])

    Sums start and the items of an iterable from left to right and
    returns the total. start defaults to 0. The iterable‘s items
    are normally numbers, and the start value is not allowed to be
    a string.

    For some use cases, there are good alternatives to sum(). The
    preferred, fast way to concatenate a sequence of strings is by
    calling ''.join(sequence). To add floating point values with
    extended precision, see math.fsum(). To concatenate a series of
    iterables, consider using itertools.chain().

    Are iterables and sequences different enough to warrant posting a
    bug report?

    --
    Neil Cerutti
     
    Neil Cerutti, Sep 19, 2012
    #4
  5. Franck Ditter

    Steve Howell Guest

    On Sep 19, 8:06 am, Neil Cerutti <> wrote:
    > On 2012-09-19, Ian Kelly <> wrote:
    >
    > > It notes in the doc string that it does not work on strings:

    >
    > > sum(...)
    > >     sum(sequence[, start]) -> value

    >
    > >     Returns the sum of a sequence of numbers (NOT strings) plus
    > >     the value of parameter 'start' (which defaults to 0).  When
    > >     the sequence is empty, returns start.

    >
    > > I think this restriction is mainly for efficiency.  sum(['a',
    > > 'b', 'c', 'd', 'e']) would be the equivalent of 'a' + 'b' + 'c'
    > > + 'd' + 'e', which is an inefficient way to add together
    > > strings.  You should use ''.join instead:

    >
    > While the docstring is still useful, it has diverged from the
    > documentation a little bit.
    >
    >   sum(iterable[, start])
    >
    >   Sums start and the items of an iterable from left to right and
    >   returns the total. start defaults to 0. The iterable‘s items
    >   are normally numbers, and the start value is not allowed to be
    >   a string.
    >
    >   For some use cases, there are good alternatives to sum(). The
    >   preferred, fast way to concatenate a sequence of strings is by
    >   calling ''.join(sequence). To add floating point values with
    >   extended precision, see math.fsum(). To concatenate a series of
    >   iterables, consider using itertools.chain().
    >
    > Are iterables and sequences different enough to warrant posting a
    > bug report?
    >


    Sequences are iterables, so I'd say the docs are technically correct,
    but maybe I'm misunderstanding what you would be trying to clarify.
     
    Steve Howell, Sep 19, 2012
    #5
  6. On Wed, 19 Sep 2012 09:03:03 -0600, Ian Kelly wrote:

    > I think this restriction is mainly for efficiency. sum(['a', 'b', 'c',
    > 'd', 'e']) would be the equivalent of 'a' + 'b' + 'c' + 'd' + 'e', which
    > is an inefficient way to add together strings.


    It might not be obvious to some people why repeated addition is so
    inefficient, and in fact if people try it with modern Python (version 2.3
    or better), they may not notice any inefficiency.

    But the example given, 'a' + 'b' + 'c' + 'd' + 'e', potentially ends up
    creating four strings, only to immediately throw away three of them:

    * first it concats 'a' to 'b', giving the new string 'ab'
    * then 'ab' + 'c', creating a new string 'abc'
    * then 'abc' + 'd', creating a new string 'abcd'
    * then 'abcd' + 'e', creating a new string 'abcde'

    Each new string requires a block of memory to be allocated, potentially
    requiring other blocks of memory to be moved out of the way (at least for
    large blocks).

    With only five characters in total, you won't really notice any slowdown,
    but with large enough numbers of strings, Python could potentially spend
    a lot of time building, and throwing away, intermediate strings. Pure
    wasted effort.

    For another look at this, see:
    http://www.joelonsoftware.com/articles/fog0000000319.html

    I say "could" because starting in about Python 2.3, there is a nifty
    optimization in Python (CPython only, not Jython or IronPython) that can
    *sometimes* recognise repeated string concatenation and make it less
    inefficient. It depends on the details of the specific strings used, and
    the operating system's memory management. When it works, it can make
    string concatenation almost as efficient as ''.join(). When it doesn't
    work, repeated concatenation is PAINFULLY slow, hundreds or thousands of
    times slower than join.


    --
    Steven
     
    Steven D'Aprano, Sep 19, 2012
    #6
  7. On Wed, 19 Sep 2012 15:07:04 +0000, Alister wrote:

    > Summation is a mathematical function that works on numbers Concatenation
    > is the process of appending 1 string to another
    >
    > although they are not related to each other they do share the same
    > operator(+) which is the cause of confusion. attempting to duck type
    > this function would cause ambiguity for example what would you expect
    > from
    >
    > sum ('a','b',3,4)
    >
    > 'ab34' or 'ab7' ?


    Neither. I would expect sum to do exactly what the + operator does if
    given two incompatible arguments: raise an exception.

    And in fact, that's exactly what it does.

    py> sum ([1, 2, 'a'])
    Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    TypeError: unsupported operand type(s) for +: 'int' and 'str'



    --
    Steven
     
    Steven D'Aprano, Sep 19, 2012
    #7
  8. Franck Ditter

    Ian Kelly Guest

    On Wed, Sep 19, 2012 at 9:37 AM, Steve Howell <> wrote:
    > Sequences are iterables, so I'd say the docs are technically correct,
    > but maybe I'm misunderstanding what you would be trying to clarify.


    The doc string suggests that the argument to sum() must be a sequence,
    when in fact any iterable will do. The restriction in the docs should
    be relaxed to match the reality.
     
    Ian Kelly, Sep 19, 2012
    #8
  9. Franck Ditter

    Steve Howell Guest

    On Sep 19, 11:34 am, Ian Kelly <> wrote:
    > On Wed, Sep 19, 2012 at 9:37 AM, Steve Howell <> wrote:
    > > Sequences are iterables, so I'd say the docs are technically correct,
    > > but maybe I'm misunderstanding what you would be trying to clarify.

    >
    > The doc string suggests that the argument to sum() must be a sequence,
    > when in fact any iterable will do.  The restriction in the docs should
    > be relaxed to match the reality.


    Ah. The docstring looks to be fixed in 3.1.3, but not in Python 2.


    Python 3.1.3 (r313:86834, Mar 13 2011, 00:40:38)
    [GCC 4.4.5] on linux2
    Type "help", "copyright", "credits" or "license" for more information.
    >>> sum.__doc__

    "sum(iterable[, start]) -> value\n\nReturns the sum of an iterable of
    numbers (NOT strings) plus the value\nof parameter 'start' (which
    defaults to 0). When the iterable is\nempty, returns start."


    Python 2.6.6 (r266:84292, Mar 13 2011, 00:35:19)
    [GCC 4.4.5] on linux2
    Type "help", "copyright", "credits" or "license" for more information.
    >>> sum.__doc__

    "sum(sequence[, start]) -> value\n\nReturns the sum of a sequence of
    numbers (NOT strings) plus the value\nof parameter 'start' (which
    defaults to 0). When the sequence is\nempty, returns start."
    >>>
     
    Steve Howell, Sep 19, 2012
    #9
  10. Franck Ditter

    Terry Reedy Guest

    On 9/19/2012 11:07 AM, Alister wrote:

    > Summation is a mathematical function that works on numbers
    > Concatenation is the process of appending 1 string to another
    >
    > although they are not related to each other they do share the same
    > operator(+) which is the cause of confusion.


    If one represents counts in unary, as a sequence or tally of 1s (or
    other markers indicating 'successor' or 'increment'), then count
    addition is sequence concatenation. I think Guido got it right.

    It happens that when the members of all sequences are identical, there
    is a much more compact exponential place value notation that enables
    more efficient addition and other operations. When not, other tricks are
    needed to avoid so much copying that an inherently O(N) operation
    balloons into an O(N*N) operation.

    --
    Terry Jan Reedy
     
    Terry Reedy, Sep 19, 2012
    #10
  11. Franck Ditter

    Hans Mulder Guest

    On 19/09/12 17:07:04, Alister wrote:
    > On Wed, 19 Sep 2012 16:41:20 +0200, Franck Ditter wrote:
    >
    >> Hello,
    >> I wonder why sum does not work on the string sequence in Python 3 :
    >>
    >>>>> sum((8,5,9,3))

    >> 25
    >>>>> sum([5,8,3,9,2])

    >> 27
    >>>>> sum('rtarze')

    >> TypeError: unsupported operand type(s) for +: 'int' and 'str'
    >>
    >> I naively thought that sum('abc') would expand to 'a'+'b'+'c'
    >> And the error message is somewhat cryptic...
    >>
    >> franck

    >
    > Summation is a mathematical function that works on numbers
    > Concatenation is the process of appending 1 string to another


    Actually, the 'sum' builtin function is quite capable of
    concatenatig objects, for example lists:

    >>> sum(([2,3], [5,8], [13,21]), [])

    [2, 3, 5, 8, 13, 21]

    But if you pass a string as a starting value, you get an error:

    >>> sum([], '')

    Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    TypeError: sum() can't sum strings [use ''.join(seq) instead]

    In fact, you can bamboozle 'sum' into concatenating string by
    by tricking it with a non-string starting value:

    >>> class not_a_string(object):

    .... def __add__(self, other):
    .... return other
    ....
    >>> sum("rtarze", not_a_string())

    'rtarze'
    >>> sum(["Monty ", "Python", "'s Fly", "ing Ci", "rcus"],

    .... not_a_string())
    "Monty Python's Flying Circus"
    >>>



    Hope this helps,

    -- HansM
     
    Hans Mulder, Sep 19, 2012
    #11
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Delaney, Timothy C (Timothy)
    Replies:
    2
    Views:
    290
    Michael Hudson
    Feb 25, 2004
  2. Gregory Petrosyan

    Arithmetic sequences in Python

    Gregory Petrosyan, Jan 16, 2006, in forum: Python
    Replies:
    73
    Views:
    4,059
    Bengt Richter
    Jan 24, 2006
  3. Steven Bethard
    Replies:
    7
    Views:
    410
    Rocco Moretti
    Jan 20, 2006
  4. kj

    sum for sequences?

    kj, Mar 24, 2010, in forum: Python
    Replies:
    54
    Views:
    1,396
    Neil Cerutti
    Apr 6, 2010
  5. MRAB
    Replies:
    0
    Views:
    145
Loading...

Share This Page