Re: restriction on sum: intentional bug?

Discussion in 'Python' started by Ethan Furman, Oct 19, 2009.

  1. Ethan Furman

    Ethan Furman Guest

    Dave Angel wrote:
    > Dieter Maurer wrote:
    >
    >> Christian Heimes <> writes on Fri, 16 Oct 2009
    >> 17:58:29 +0200:
    >>
    >>
    >>> Alan G Isaac schrieb:
    >>>
    >>>
    >>>> I expected this to be fixed in Python 3:
    >>>>
    >>>>
    >>>>
    >>>>>>> sum(['ab','cd'],'')
    >>>>>>>
    >>>>
    >>>> Traceback (most recent call last):
    >>>> File "<stdin>", line 1, in <module>
    >>>> TypeError: sum() can't sum strings [use ''.join(seq) instead]
    >>>>
    >>>> Of course it is not a good way to join strings,
    >>>> but it should work, should it not? Naturally,
    >>>>
    >>>
    >>> It's not a bug. sum() doesn't work on strings deliberately. ''.join()
    >>> *is* the right and good way to concatenate strings.
    >>>

    >>
    >> Apparently, "sum" special cases 'str' in order to teach people to use
    >> "join".
    >> It would have been as much work and much more friendly, to just use
    >> "join"
    >> internally to implement "sum" when this is possible.
    >>
    >> Dieter
    >>

    >
    > Earlier, I would have agreed with you. I assumed that this could be
    > done invisibly, with the only difference being performance. But you
    > can't know whether join will do the trick without error till you know
    > that all the items are strings or Unicode strings. And you can't check
    > that without going through the entire iterator. At that point it's too
    > late to change your mind, as you can't back up an iterator. So the user
    > who supplies a list with mixed strings and other stuff will get an
    > unexpected error, one that join generates.
    >
    > To put it simply, I'd say that sum() should not dispatch to join()
    > unless it could be sure that no errors might result.
    >
    > DaveA


    How is this different than passing a list to sum with other incompatible
    types?

    Python 2.5.4 (r254:67916, Dec 23 2008, 15:10:54) [MSC v.1310 32 bit
    (Intel)] on win32
    Type "help", "copyright", "credits" or "license" for more information.
    >>> class Dummy(object):

    .... pass
    ....
    >>> test1 = [1, 2, 3.4, Dummy()]
    >>> sum(test1)

    Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    TypeError: unsupported operand type(s) for +: 'float' and 'Dummy'
    >>> test2 = ['a', 'string', 'and', 'a', Dummy()]
    >>> ''.join(test2)

    Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    TypeError: sequence item 4: expected string, Dummy found

    Looks like a TypeError either way, only the verbage changes.

    ~Ethan~
     
    Ethan Furman, Oct 19, 2009
    #1
    1. Advertising

  2. Ethan Furman

    Carl Banks Guest

    On Oct 18, 4:07 pm, Ethan Furman <> wrote:
    > Dave Angel wrote:
    > > Earlier, I would have agreed with you.  I assumed that this could be
    > > done invisibly, with the only difference being performance.  But you
    > > can't know whether join will do the trick without error till you know
    > > that all the items are strings or Unicode strings.  And you can't check
    > > that without going through the entire iterator.  At that point it's too
    > > late to change your mind, as you can't back up an iterator.  So the user
    > > who supplies a list with mixed strings and other stuff will get an
    > > unexpected error, one that join generates.

    >
    > > To put it simply, I'd say that sum() should not dispatch to join()
    > > unless it could be sure that no errors might result.

    >
    > How is this different than passing a list to sum with other incompatible
    > types?
    >
    > Python 2.5.4 (r254:67916, Dec 23 2008, 15:10:54) [MSC v.1310 32 bit
    > (Intel)] on win32
    > Type "help", "copyright", "credits" or "license" for more information.
    >  >>> class Dummy(object):
    > ...     pass
    > ...
    >  >>> test1 = [1, 2, 3.4, Dummy()]
    >  >>> sum(test1)
    > Traceback (most recent call last):
    >    File "<stdin>", line 1, in <module>
    > TypeError: unsupported operand type(s) for +: 'float' and 'Dummy'
    >  >>> test2 = ['a', 'string', 'and', 'a', Dummy()]
    >  >>> ''.join(test2)
    > Traceback (most recent call last):
    >    File "<stdin>", line 1, in <module>
    > TypeError: sequence item 4: expected string, Dummy found
    >
    > Looks like a TypeError either way, only the verbage changes.



    This test doesn't mean very much since you didn't pass the the same
    list to both calls. The claim is that "".join() might do something
    different than a non-special-cased sum() would have when called on the
    same list, and indeed that is true.

    Consider this thought experiment:


    class Something(object):
    def __radd__(self,other):
    return other + "q"

    x = ["a","b","c",Something()]


    If x were passed to "".join(), it would throw an exception; but if
    passed to a sum() without any special casing, it would successfully
    return "abcq".

    Thus there is divergence in the two behaviors, thus transparently
    calling "".join() to perform the summation is a Bad Thing Indeed, a
    much worse special-case behavior than throwing an exception.


    Carl Banks
     
    Carl Banks, Oct 19, 2009
    #2
    1. Advertising

  3. Ethan Furman

    Ethan Furman Guest

    Carl Banks wrote:
    > On Oct 18, 4:07 pm, Ethan Furman <> wrote:
    >
    >>Dave Angel wrote:
    >>
    >>>Earlier, I would have agreed with you. I assumed that this could be
    >>>done invisibly, with the only difference being performance. But you
    >>>can't know whether join will do the trick without error till you know
    >>>that all the items are strings or Unicode strings. And you can't check
    >>>that without going through the entire iterator. At that point it's too
    >>>late to change your mind, as you can't back up an iterator. So the user
    >>>who supplies a list with mixed strings and other stuff will get an
    >>>unexpected error, one that join generates.

    >>
    >>>To put it simply, I'd say that sum() should not dispatch to join()
    >>>unless it could be sure that no errors might result.

    >>
    >>How is this different than passing a list to sum with other incompatible
    >>types?
    >>
    >>Python 2.5.4 (r254:67916, Dec 23 2008, 15:10:54) [MSC v.1310 32 bit
    >>(Intel)] on win32
    >>Type "help", "copyright", "credits" or "license" for more information.
    >> >>> class Dummy(object):

    >>... pass
    >>...
    >> >>> test1 = [1, 2, 3.4, Dummy()]
    >> >>> sum(test1)

    >>Traceback (most recent call last):
    >> File "<stdin>", line 1, in <module>
    >>TypeError: unsupported operand type(s) for +: 'float' and 'Dummy'
    >> >>> test2 = ['a', 'string', 'and', 'a', Dummy()]
    >> >>> ''.join(test2)

    >>Traceback (most recent call last):
    >> File "<stdin>", line 1, in <module>
    >>TypeError: sequence item 4: expected string, Dummy found
    >>
    >>Looks like a TypeError either way, only the verbage changes.

    >
    >
    >
    > This test doesn't mean very much since you didn't pass the the same
    > list to both calls. The claim is that "".join() might do something
    > different than a non-special-cased sum() would have when called on the
    > same list, and indeed that is true.
    >
    > Consider this thought experiment:
    >
    >
    > class Something(object):
    > def __radd__(self,other):
    > return other + "q"
    >
    > x = ["a","b","c",Something()]
    >
    >
    > If x were passed to "".join(), it would throw an exception; but if
    > passed to a sum() without any special casing, it would successfully
    > return "abcq".
    >
    > Thus there is divergence in the two behaviors, thus transparently
    > calling "".join() to perform the summation is a Bad Thing Indeed, a
    > much worse special-case behavior than throwing an exception.
    >
    >
    > Carl Banks


    Unfortunately, I don't know enough about how join works to know that,
    but I'll take your word for it. Perhaps the better solution then is to
    not worry about optimization, and just call __add__ on the objects.
    Then it either works, or throws the appropriate error.

    This is obviously slow on strings, but mention of that is already in the
    docs, and profiling will also turn up such bottlenecks. Get the code
    working first, then optimize, yes? We've all seen questions on this
    list with folk using the accumulator method for joining strings, and
    then wondering why it's so slow -- the answer given is the same as we
    would give for sum()ing a list of strings -- use join instead. Then we
    have Python following the same advice we give out -- don't break
    duck-typing, any ensuing errors are the responsibility of the caller.

    ~Ethan~
     
    Ethan Furman, Oct 19, 2009
    #3
  4. On Sun, 18 Oct 2009 19:52:41 -0700, Ethan Furman wrote:

    > This is obviously slow on strings, but mention of that is already in the
    > docs, and profiling will also turn up such bottlenecks. Get the code
    > working first, then optimize, yes?


    Well, maybe. Premature optimization and all, but sometimes you just
    *know* something is going to be slow, so you avoid it.

    And it's amazing how O(N**2) algorithms can hide for years. Within the
    last month or two, there was a bug reported for httplib involving
    repeated string concatenation:

    http://bugs.python.org/issue6838

    I can only imagine that the hidden O(N**2) behaviour was there in the
    code for years before somebody noticed it, reported it, spent hours
    debugging it, and finally discovered the cause and produced a patch.

    The amazing thing is, if you look in the httplib.py module, you see this
    comment:

    # XXX This accumulates chunks by repeated string concatenation,
    # which is not efficient as the number or size of chunks gets big.




    > We've all seen questions on this
    > list with folk using the accumulator method for joining strings, and
    > then wondering why it's so slow -- the answer given is the same as we
    > would give for sum()ing a list of strings -- use join instead. Then we
    > have Python following the same advice we give out -- don't break
    > duck-typing, any ensuing errors are the responsibility of the caller.


    I'd be happy for sum() to raise a warning rather than an exception, and
    to do so for both strings and lists. Python, after all, is happy to let
    people shoot themselves in the foot, but it's only fair to give them
    warning the gun is loaded :)



    --
    Steven
     
    Steven D'Aprano, Oct 19, 2009
    #4
  5. Ethan Furman

    Tim Chase Guest

    Carl Banks wrote:
    > Consider this thought experiment:
    >
    > class Something(object):
    > def __radd__(self,other):
    > return other + "q"
    >
    > x = ["a","b","c",Something()]
    >
    > If x were passed to "".join(), it would throw an exception; but if
    > passed to a sum() without any special casing, it would successfully
    > return "abcq".


    Okay...this is the best argument I've heard for not using
    "".join() {Awards Carl one (1) internet} It's a peculiar thing
    to do as a programmer, but "".join() certainly produces an
    unexpected behavior which I'd say is worse. And a lot of this
    discussion has revolved around letting programmers do peculiar
    things if they want.

    So as of Carl's example, I'm now pretty solidly in the "Stop
    throwing an exception, just sum the parts even if it's
    inefficient" camp and no longer straddling between that and the
    "".join() camp. But I'm definitely still not in the "throwing
    exceptions is a good thing" camp.

    -tkc
     
    Tim Chase, Oct 19, 2009
    #5
  6. Ethan Furman

    Carl Banks Guest

    On Oct 19, 3:24 am, Tim Chase <> wrote:
    > Carl Banks wrote:
    > > Consider this thought experiment:

    >
    > > class Something(object):
    > >     def __radd__(self,other):
    > >         return other + "q"

    >
    > > x = ["a","b","c",Something()]

    >
    > > If x were passed to "".join(), it would throw an exception; but if
    > > passed to a sum() without any special casing, it would successfully
    > > return "abcq".

    >
    > Okay...this is the best argument I've heard for not using
    > "".join()  {Awards Carl one (1) internet}


    Well that was my argument in the last post you followed up to, I just
    used a bad example. Actually this example was described by Dave
    Angel, so you should give the internet to him.


    Carl Banks
     
    Carl Banks, Oct 19, 2009
    #6
  7. En Sun, 18 Oct 2009 21:50:55 -0300, Carl Banks <>
    escribió:

    > Consider this thought experiment:
    >
    >
    > class Something(object):
    > def __radd__(self,other):
    > return other + "q"
    >
    > x = ["a","b","c",Something()]
    >
    >
    > If x were passed to "".join(), it would throw an exception; but if
    > passed to a sum() without any special casing, it would successfully
    > return "abcq".
    >
    > Thus there is divergence in the two behaviors, thus transparently
    > calling "".join() to perform the summation is a Bad Thing Indeed, a
    > much worse special-case behavior than throwing an exception.


    Just for completeness, and in case anyone would like to try this O(n²)
    process, sum(x) may be rewritten as:

    x = ["a","b","c",Something()]
    print reduce(operator.add, x)

    which does exactly the same thing, with the same quadratic behavior as
    sum(), but prints "abcq" as expected.

    --
    Gabriel Genellina
     
    Gabriel Genellina, Oct 20, 2009
    #7
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Tim Chase
    Replies:
    4
    Views:
    294
    Steven D'Aprano
    Oct 17, 2009
  2. Benjamin Peterson

    Re: restriction on sum: intentional bug?

    Benjamin Peterson, Oct 16, 2009, in forum: Python
    Replies:
    3
    Views:
    335
  3. Carl Banks
    Replies:
    2
    Views:
    301
    Carl Banks
    Oct 17, 2009
  4. Terry Reedy
    Replies:
    10
    Views:
    486
    Steven D'Aprano
    Oct 18, 2009
  5. Steve
    Replies:
    3
    Views:
    267
    Steven D'Aprano
    Oct 27, 2009
Loading...

Share This Page