String concatenation

J

Jonas Galvez

Is it true that joining the string elements of a list is faster than
concatenating them via the '+' operator?

"".join(['a', 'b', 'c'])

vs

'a'+'b'+'c'

If so, can anyone explain why?



\\ jonas galvez
// jonasgalvez.com
 
P

Peter Hansen

Jonas said:
Is it true that joining the string elements of a list is faster than
concatenating them via the '+' operator?

"".join(['a', 'b', 'c'])

vs

'a'+'b'+'c'

If so, can anyone explain why?

It's because the latter one has to build a temporary
string consisting of 'ab' first, then the final string
with 'c' added, while the join can (and probably does) add up
all the lengths of the strings to be joined and build the final
string all in one go.

Note that there's also '%s%s%s' % ('a', 'b', 'c'), which is
probably on par with the join technique for both performance
and lack of readability.

Note much more importantly, however, that you should probably
not pick the join approach over the concatenation approach
based on performance. Concatenation is more readable in the
above case (ignoring the fact that it's a contrived example),
as you're being more explicit about your intentions.

The reason joining lists is popular is because of the
terribly bad performance of += when one is gradually building
up a string in pieces, rather than appending to a list and
then doing join at the end.

So

l = []
l.append('a')
l.append('b')
l.append('c')
s = ''.join(l)

is _much_ faster (therefore better) in real-world cases than

s = ''
s += 'a'
s += 'b'
s += 'c'

With the latter, if you picture longer and many more strings,
and realize that each += causes a new string to be created
consisting of the contents of the two old strings joined together,
steadily growing longer and requiring lots of wasted copying,
you can see why it's very bad on memory and performance.

The list approach doesn't copy the strings at all, but just
holds references to them in a list (which does grow in a
similar but much more efficient manner). The join figures
out the sizes of all of the strings and allocates enough
space to do only a single copy from each.

Again though, other than the += versus .append() case, you should
probably not pick ''.join() over + since readability will
suffer more than your performance will improve.

-Peter
 
D

Duncan Booth

Note that there's also '%s%s%s' % ('a', 'b', 'c'), which is
probably on par with the join technique for both performance
and lack of readability.

A few more points.

Yes, the format string in this example isn't the clearest, but if you have
a case where some of the strings are fixed and others vary, then the format
string can be the clearest.

e.g.

'<a href="%s" alt="%s">%s</a>' % (uri, alt, text)

rather than:

'<a href="'+uri+'" alt="'+alt+'">'+text+'</a>'

In many situations I find I use a combination of all three techniques.
Build a list of strings to be concatenated to produce the final output, but
each of these strings might be built from a format string or simple
addition as above.

On the readability of ''.join(), I would suggest never writing it more than
once. That means I tend to do something like:

concatenate = ''.join
...
concatenate(myList)

Or

def concatenate(*args):
return ''.join(args)
...
concatenate('a', 'b', 'c')

depending on how it is to be used.

It's also worth saying that a lot of the time you find you don't want the
empty separator at all, (e.g. maybe newline is more appropriate), and in
this case the join really does become easier than simple addition, but
again it is worth wrapping it so that your intention at the point of call
is clear.

Finally, a method call on a bare string (''.join, or '\n'.join) looks
sufficiently bad that if, for some reason, you don't want to give it a name
as above, I would suggest using the alternative form for calling it:

str.join('\n', aList)

rather than:

'\n'.join(aList)
 
D

David Fraser

Peter said:
Jonas said:
Is it true that joining the string elements of a list is faster than
concatenating them via the '+' operator?

"".join(['a', 'b', 'c'])

vs

'a'+'b'+'c'

If so, can anyone explain why?


It's because the latter one has to build a temporary
string consisting of 'ab' first, then the final string
with 'c' added, while the join can (and probably does) add up
all the lengths of the strings to be joined and build the final
string all in one go.

Idea sprang to mind: Often (particularly in generating web pages) one
wants to do lots of += without thinking about "".join.
So what about creating a class that will do this quickly?
The following class does this and is much faster when adding together
lots of strings. Only seem to see performance gains above about 6000
strings...

David

class faststr(str):
def __init__(self, *args, **kwargs):
self.appended = []
str.__init__(self, *args, **kwargs)
def __add__(self, otherstr):
self.appended.append(otherstr)
return self
def getstr(self):
return str(self) + "".join(self.appended)

def testadd(start, n):
for i in range(n):
start += str(i)
if hasattr(start, "getstr"):
return start.getstr()
else:
return start

if __name__ == "__main__":
import sys
if len(sys.argv) >= 3 and sys.argv[2] == "fast":
start = faststr("test")
else:
start = "test"
s = testadd(start, int(sys.argv[1]))
 
?

=?iso-8859-15?Q?Pierre-Fr=E9d=E9ric_Caillaud?=

Let's try this :

def test_concat():
s = ''
for i in xrange( test_len ):
s += str( i )
return s

def test_join():
s = []
for i in xrange( test_len ):
s.append( str( i ))
return ''.join(s)

def test_join2():
return ''.join( map( str, range( test_len ) ))

Results, with and without psyco :


test_len = 1000
String concatenation (normal) 4.85290050507 ms.
[] append + join (normal) 4.27646517754 ms.
map + join (normal) 2.37970948219 ms.

String concatenation (psyco) 2.0838675499 ms.
[] append + join (psyco) 2.29129695892 ms.
map + join (psyco) 2.21130692959 ms.

test_len = 5000
String concatenation (normal) 40.3251230717 ms.
[] append + join (normal) 23.3911275864 ms.
map + join (normal) 13.844203949 ms.

String concatenation (psyco) 9.65108215809 ms.
[] append + join (psyco) 13.0564379692 ms.
map + join (psyco) 13.342962265 ms.

test_len = 10000
String concatenation (normal) 163.02690506 ms.
[] append + join (normal) 47.6168513298 ms.
map + join (normal) 28.5276055336 ms.

String concatenation (psyco) 19.6494650841 ms.
[] append + join (psyco) 26.637775898 ms.
map + join (psyco) 26.7823898792 ms.

test_len = 20000
String concatenation (normal) 4556.57429695 ms.
[] append + join (normal) 92.0199871063 ms.
map + join (normal) 56.7145824432 ms.

String concatenation (psyco) 42.247030735 ms.
[] append + join (psyco) 58.3201909065 ms.
map + join (psyco) 53.8239884377 ms.


Conclusion :

- join is faster but worth the annoyance only if you join 1000s of strings
- map is useful
- psyco makes join useless if you can use it (depends on which web
framework you use)
- python is really pretty fast even without psyco (it runs about one mips
!)

Note :

Did I mention psyco has a special optimization for string concatenation ?
 
S

Steve Holden

Duncan Booth wrote:

[...]
Finally, a method call on a bare string (''.join, or '\n'.join) looks
sufficiently bad that if, for some reason, you don't want to give it a name
as above, I would suggest using the alternative form for calling it:

str.join('\n', aList)

rather than:

'\n'.join(aList)

This is, of course, pure prejudice. Not that there's anything wrong with
that ...

regards
Steve
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,766
Messages
2,569,569
Members
45,042
Latest member
icassiem

Latest Threads

Top