String concatenation

Discussion in 'Python' started by Jonas Galvez, Jun 19, 2004.

  1. Jonas Galvez

    Jonas Galvez Guest

    Is it true that joining the string elements of a list is faster than
    concatenating them via the '+' operator?

    "".join(['a', 'b', 'c'])

    vs

    'a'+'b'+'c'

    If so, can anyone explain why?



    \\ jonas galvez
    // jonasgalvez.com
     
    Jonas Galvez, Jun 19, 2004
    #1
    1. Advertising

  2. Jonas Galvez

    Peter Hansen Guest

    Jonas Galvez wrote:

    > Is it true that joining the string elements of a list is faster than
    > concatenating them via the '+' operator?
    >
    > "".join(['a', 'b', 'c'])
    >
    > vs
    >
    > 'a'+'b'+'c'
    >
    > If so, can anyone explain why?


    It's because the latter one has to build a temporary
    string consisting of 'ab' first, then the final string
    with 'c' added, while the join can (and probably does) add up
    all the lengths of the strings to be joined and build the final
    string all in one go.

    Note that there's also '%s%s%s' % ('a', 'b', 'c'), which is
    probably on par with the join technique for both performance
    and lack of readability.

    Note much more importantly, however, that you should probably
    not pick the join approach over the concatenation approach
    based on performance. Concatenation is more readable in the
    above case (ignoring the fact that it's a contrived example),
    as you're being more explicit about your intentions.

    The reason joining lists is popular is because of the
    terribly bad performance of += when one is gradually building
    up a string in pieces, rather than appending to a list and
    then doing join at the end.

    So

    l = []
    l.append('a')
    l.append('b')
    l.append('c')
    s = ''.join(l)

    is _much_ faster (therefore better) in real-world cases than

    s = ''
    s += 'a'
    s += 'b'
    s += 'c'

    With the latter, if you picture longer and many more strings,
    and realize that each += causes a new string to be created
    consisting of the contents of the two old strings joined together,
    steadily growing longer and requiring lots of wasted copying,
    you can see why it's very bad on memory and performance.

    The list approach doesn't copy the strings at all, but just
    holds references to them in a list (which does grow in a
    similar but much more efficient manner). The join figures
    out the sizes of all of the strings and allocates enough
    space to do only a single copy from each.

    Again though, other than the += versus .append() case, you should
    probably not pick ''.join() over + since readability will
    suffer more than your performance will improve.

    -Peter
     
    Peter Hansen, Jun 21, 2004
    #2
    1. Advertising

  3. Jonas Galvez

    Duncan Booth Guest

    Peter Hansen <> wrote in news:xvydnWNN7t2X50vdRVn-
    :

    > Jonas Galvez wrote:
    >
    >> Is it true that joining the string elements of a list is faster than
    >> concatenating them via the '+' operator?
    >>

    > Note that there's also '%s%s%s' % ('a', 'b', 'c'), which is
    > probably on par with the join technique for both performance
    > and lack of readability.


    A few more points.

    Yes, the format string in this example isn't the clearest, but if you have
    a case where some of the strings are fixed and others vary, then the format
    string can be the clearest.

    e.g.

    '<a href="%s" alt="%s">%s</a>' % (uri, alt, text)

    rather than:

    '<a href="'+uri+'" alt="'+alt+'">'+text+'</a>'

    In many situations I find I use a combination of all three techniques.
    Build a list of strings to be concatenated to produce the final output, but
    each of these strings might be built from a format string or simple
    addition as above.

    On the readability of ''.join(), I would suggest never writing it more than
    once. That means I tend to do something like:

    concatenate = ''.join
    ...
    concatenate(myList)

    Or

    def concatenate(*args):
    return ''.join(args)
    ...
    concatenate('a', 'b', 'c')

    depending on how it is to be used.

    It's also worth saying that a lot of the time you find you don't want the
    empty separator at all, (e.g. maybe newline is more appropriate), and in
    this case the join really does become easier than simple addition, but
    again it is worth wrapping it so that your intention at the point of call
    is clear.

    Finally, a method call on a bare string (''.join, or '\n'.join) looks
    sufficiently bad that if, for some reason, you don't want to give it a name
    as above, I would suggest using the alternative form for calling it:

    str.join('\n', aList)

    rather than:

    '\n'.join(aList)
     
    Duncan Booth, Jun 21, 2004
    #3
  4. Jonas Galvez

    David Fraser Guest

    Peter Hansen wrote:
    > Jonas Galvez wrote:
    >
    >> Is it true that joining the string elements of a list is faster than
    >> concatenating them via the '+' operator?
    >>
    >> "".join(['a', 'b', 'c'])
    >>
    >> vs
    >>
    >> 'a'+'b'+'c'
    >>
    >> If so, can anyone explain why?

    >
    >
    > It's because the latter one has to build a temporary
    > string consisting of 'ab' first, then the final string
    > with 'c' added, while the join can (and probably does) add up
    > all the lengths of the strings to be joined and build the final
    > string all in one go.


    Idea sprang to mind: Often (particularly in generating web pages) one
    wants to do lots of += without thinking about "".join.
    So what about creating a class that will do this quickly?
    The following class does this and is much faster when adding together
    lots of strings. Only seem to see performance gains above about 6000
    strings...

    David

    class faststr(str):
    def __init__(self, *args, **kwargs):
    self.appended = []
    str.__init__(self, *args, **kwargs)
    def __add__(self, otherstr):
    self.appended.append(otherstr)
    return self
    def getstr(self):
    return str(self) + "".join(self.appended)

    def testadd(start, n):
    for i in range(n):
    start += str(i)
    if hasattr(start, "getstr"):
    return start.getstr()
    else:
    return start

    if __name__ == "__main__":
    import sys
    if len(sys.argv) >= 3 and sys.argv[2] == "fast":
    start = faststr("test")
    else:
    start = "test"
    s = testadd(start, int(sys.argv[1]))
     
    David Fraser, Jun 23, 2004
    #4
  5. Let's try this :

    def test_concat():
    s = ''
    for i in xrange( test_len ):
    s += str( i )
    return s

    def test_join():
    s = []
    for i in xrange( test_len ):
    s.append( str( i ))
    return ''.join(s)

    def test_join2():
    return ''.join( map( str, range( test_len ) ))

    Results, with and without psyco :


    test_len = 1000
    String concatenation (normal) 4.85290050507 ms.
    [] append + join (normal) 4.27646517754 ms.
    map + join (normal) 2.37970948219 ms.

    String concatenation (psyco) 2.0838675499 ms.
    [] append + join (psyco) 2.29129695892 ms.
    map + join (psyco) 2.21130692959 ms.

    test_len = 5000
    String concatenation (normal) 40.3251230717 ms.
    [] append + join (normal) 23.3911275864 ms.
    map + join (normal) 13.844203949 ms.

    String concatenation (psyco) 9.65108215809 ms.
    [] append + join (psyco) 13.0564379692 ms.
    map + join (psyco) 13.342962265 ms.

    test_len = 10000
    String concatenation (normal) 163.02690506 ms.
    [] append + join (normal) 47.6168513298 ms.
    map + join (normal) 28.5276055336 ms.

    String concatenation (psyco) 19.6494650841 ms.
    [] append + join (psyco) 26.637775898 ms.
    map + join (psyco) 26.7823898792 ms.

    test_len = 20000
    String concatenation (normal) 4556.57429695 ms.
    [] append + join (normal) 92.0199871063 ms.
    map + join (normal) 56.7145824432 ms.

    String concatenation (psyco) 42.247030735 ms.
    [] append + join (psyco) 58.3201909065 ms.
    map + join (psyco) 53.8239884377 ms.


    Conclusion :

    - join is faster but worth the annoyance only if you join 1000s of strings
    - map is useful
    - psyco makes join useless if you can use it (depends on which web
    framework you use)
    - python is really pretty fast even without psyco (it runs about one mips
    !)

    Note :

    Did I mention psyco has a special optimization for string concatenation ?
     
    =?iso-8859-15?Q?Pierre-Fr=E9d=E9ric_Caillaud?=, Jun 23, 2004
    #5
  6. Jonas Galvez

    Steve Holden Guest

    Duncan Booth wrote:

    [...]
    > Finally, a method call on a bare string (''.join, or '\n'.join) looks
    > sufficiently bad that if, for some reason, you don't want to give it a name
    > as above, I would suggest using the alternative form for calling it:
    >
    > str.join('\n', aList)
    >
    > rather than:
    >
    > '\n'.join(aList)


    This is, of course, pure prejudice. Not that there's anything wrong with
    that ...

    regards
    Steve
     
    Steve Holden, Jun 25, 2004
    #6
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. walala
    Replies:
    3
    Views:
    4,858
    walala
    Sep 18, 2003
  2. Sukhbir Dhillon
    Replies:
    1
    Views:
    6,289
    Joe Smith
    Apr 5, 2004
  3. Daniel Bergquist

    String Concatenation problems

    Daniel Bergquist, Jul 13, 2004, in forum: Perl
    Replies:
    2
    Views:
    506
    Joe Smith
    Jul 16, 2004
  4. Sparky Arbuckle

    String Concatenation & Removing Space

    Sparky Arbuckle, Sep 1, 2005, in forum: ASP .Net
    Replies:
    5
    Views:
    629
    Sparky Arbuckle
    Sep 1, 2005
  5. Andrew Berg
    Replies:
    13
    Views:
    1,343
    Andrew Berg
    Jul 11, 2011
Loading...

Share This Page