String concatenation performance with +=

Sammo · Feb 14, 2009

String concatenation has been optimized since 2.3, so using += should
be fairly fast.

In my first test, I tried concatentating a 4096 byte string 1000 times
in the following code, and the result was indeed very fast (12.352 ms
on my machine).

import time
t = time.time()
mydata = ""
moredata = "A"*4096
for i in range(1000):
mydata += moredata # 12.352 ms
print "%0.3f ms"%(1000*(time.time() - t))

However, I got a different result in my second test, which is
implemented in a class with a feed() method. This test took 4653.522
ms on my machine, which is 350x slower than the previous test!

class StringConcatTest:
def __init__(self):
self.mydata = ""

def feed(self, moredata):
self.mydata += moredata # 4653.522 ms

test = StringConcatTest()
t = time.time()
for i in range(1000):
test.feed(moredata)
print "%0.3f ms"%(1000*(time.time() - t))

Note that I need to do something to mydata INSIDE the loop, so please
don't tell me to append moredata to a list and then use "".join after
the loop.

Why is the second test so much slower?

Benjamin Peterson · Feb 14, 2009

Sammo said:
String concatenation has been optimized since 2.3, so using += should
be fairly fast.

This is implementation dependent and shouldn't be relied upon.

Note that I need to do something to mydata INSIDE the loop, so please
don't tell me to append moredata to a list and then use "".join after
the loop.

Then why not just mutate the list and then call "".join?

Why is the second test so much slower?

Probably several reasons:

1. Function call overhead is quite large compared to these simple operations.
2. You are resolving attribute names.

Sammo · Feb 14, 2009

Okay, this is what I have tried for string concatenation:

1. Using += implemented using simple operations (12 ms)
2. Using += implemented inside a class (4000+ ms)
3. Using "".join implemented using simple operations (4000+ ms)
4. Using "".join implemented inside a class (4000+ ms)

This is implementation dependent and shouldn't be relied upon.

Then why not just mutate the list and then call "".join?

AFAIK, using list mutation and "".join only improves performance if
the "".join is executed outside of the loop. In fact, in Python 2.5.2,
using "".join inside the loop is actually much slower compared to my
original test, which concatenates using +=.

My original test with simple operations took 12 ms to run:

import time
t = time.time()
mydata = ""
moredata = "A"*4096
for i in range(1000):
mydata += moredata # 12.352 ms
# do some stuff to mydata
# ...
print "%0.3f ms"%(1000*(time.time() - t))

New code modified to mutate the list, then call "".join now takes 4417
ms to run. This is much slower!

import time
t = time.time()
mydata = []
moredata = "A"*4096
for i in range(1000):
mydata.append(moredata)
mydata = ["".join(mydata)]
# do some stuff to mydata
# ...

Using list mutation and "".join, implemented in a class. This took
4434 ms to run, which is again much slower than the original test.
Note that it is about the same speed as using += implemented in a
class.

import time
moredata = "A"*4096
class StringConcatTest:
def __init__(self):
self.mydata = []

def feed(self, moredata):
self.mydata.append(moredata)
self.mydata = ["".join(self.mydata)]
# do some stuff to self.mydata
# ...

test = StringConcatTest()
t = time.time()
for i in range(1000):
test.feed(moredata)
print "%0.3f ms"%(1000*(time.time() - t))

Probably several reasons:

1. Function call overhead is quite large compared to these simple operations.
2. You are resolving attribute names.

The main problem I need help with is how to improve the performance of
the code implemented in a class. It is currently 350x slower than the
first test using simple operations, so surely there's got to be a way.

Steven D'Aprano · Feb 14, 2009

Benjamin said:
This is implementation dependent and shouldn't be relied upon.

It's also a fairly simple optimization and really only applies to direct
object access, not items or attributes.
[0.067316055297851562, 0.063985109329223633, 0.066659212112426758]

Timer('s[0] += "x"', 's = [""]').repeat(number=100000)

Click to expand...

Click to expand...

[3.0495560169219971, 2.2938292026519775, 2.2914319038391113]
.... 'class K(object): pass \ns=K();s.s = ""').repeat(number=100000)
[3.3624241352081299, 2.3346412181854248, 2.9846079349517822]

Then why not just mutate the list and then call "".join?

Yes, there almost certainly is a way to avoid the repeated concatenation.

Probably several reasons:

1. Function call overhead is quite large compared to these simple
operations. 2. You are resolving attribute names.

3. Because the optimization isn't applied in this case.

Steven D'Aprano · Feb 14, 2009

Steven said:
Benjamin said:

This is implementation dependent and shouldn't be relied upon.

Click to expand...

It's also a fairly simple optimization and really only applies to direct
object access, not items or attributes.
[0.067316055297851562, 0.063985109329223633, 0.066659212112426758]

Oops, sorry, I should have said... Timer is timeit.Timer.

Steven D'Aprano · Feb 14, 2009

Sammo said:
Okay, this is what I have tried for string concatenation:

1. Using += implemented using simple operations (12 ms)
2. Using += implemented inside a class (4000+ ms)
3. Using "".join implemented using simple operations (4000+ ms)
4. Using "".join implemented inside a class (4000+ ms)

I think you're doing more work than you need to. Here are my results:
.... L = []
.... block = "abcdefghijklmnopqrstuvwxyz"*1000
.... for i in xrange(1000):
.... L.append(block)
.... return ''.join(L)
.... .... s = ""
.... block = "abcdefghijklmnopqrstuvwxyz"*1000
.... for i in xrange(1000):
.... s += block
.... return s
.... .... 'from __main__ import concat_test').repeat(number=100)
[5.397799015045166, 4.3106229305267334, 4.3492171764373779].... 'from __main__ import join_test').repeat(number=100)
[4.5378072261810303, 3.9887290000915527, 3.919511079788208]

The join() idiom is slightly faster than repeated concatenation, even with
the optimization. If you're not seeing that result, you need to consider
what extra work you're doing that you don't need to.

AFAIK, using list mutation and "".join only improves performance if
the "".join is executed outside of the loop.

Naturally. If you needlessly join over and over again, instead of delaying
until the end, then you might as well do string concatenation without the
optimization.

join() isn't some magical function that makes your bugs go away. You have to
use it sensibly. What you've done is a variant of Shlemiel the
road-painter's algorithm:

http://www.joelonsoftware.com/articles/fog0000000319.html

Perhaps you have to repeatedly do work on the *temporary* results in between
loops? Something like this toy example?

s = ""
block = "abcdefghijklmnopqrstuvwxyz"*1000
for i in xrange(1000):
s += block
# do something with the partially concatenated string
print "partial length is", len(s)
# all finished
do_something(s)

You've written that using join:

L = []
block = "abcdefghijklmnopqrstuvwxyz"*1000
for i in xrange(1000):
L.append(block)
# do something with the partially concatenated string
L = [ ''.join(L) ]
print "partial length is", len(L[0])
# all finished
s = ''.join(L)
do_something(s)

Okay, but we can re-write that more sensibly:

L = []
block = "abcdefghijklmnopqrstuvwxyz"*1000
for i in xrange(1000):
L.append(block)
# do something with the partially concatenated string
print "partial length is", sum(len(item) for item in L)
# all finished
s = ''.join(L)
do_something(s)

There's still a Shlemiel algorithm in there, but it's executed in fast C
instead of slow Python and it doesn't involve copying large blocks of
memory, only walking them, so you won't notice as much. Can we be smarter?

L = []
block = "abcdefghijklmnopqrstuvwxyz"*1000
partial_length = 0
for i in xrange(1000):
L.append(block)
partial_length += len(block)
# do something with the partially concatenated string
print "partial length is", partial_length
# all finished
s = ''.join(L)
do_something(s)

Naturally this is a toy example, but I think it illustrates one technique
for avoiding turning an O(N) algorithm into an O(N**2) algorithm.

Good luck!

Sammo · Feb 14, 2009

It's also a fairly simple optimization and really only applies to direct
object access, not items or attributes.

3. Because the optimization isn't applied in this case.

Thanks Steven -- that's the answer I was looking for. The += string
concatenation optimization only applies to local variables. In my
case, I incorrectly assumed it applied to attributes.

Can you point me to any references regarding the limitations of the
optimization? I guess it's another Python idiom that's documented
somewhere ...

Sammo · Feb 14, 2009

AFAIK, using list mutation and "".join only improves performance if
the "".join is executed outside of the loop.

Click to expand...

Naturally. If you needlessly join over and over again, instead of delaying
until the end, then you might as well do string concatenation without the
optimization.

join() isn't some magical function that makes your bugs go away. You have to
use it sensibly. What you've done is a variant of Shlemiel the
road-painter's algorithm:

http://www.joelonsoftware.com/articles/fog0000000319.html

Perhaps you have to repeatedly do work on the *temporary* results in between
loops? Something like this toy example?

s = ""
block = "abcdefghijklmnopqrstuvwxyz"*1000
for i in xrange(1000):
s += block
# do something with the partially concatenated string
print "partial length is", len(s)
# all finished
do_something(s)

You've written that using join:

L = []
block = "abcdefghijklmnopqrstuvwxyz"*1000
for i in xrange(1000):
L.append(block)
# do something with the partially concatenated string
L = [ ''.join(L) ]
print "partial length is", len(L[0])
# all finished
s = ''.join(L)
do_something(s)

Okay, but we can re-write that more sensibly:

L = []
block = "abcdefghijklmnopqrstuvwxyz"*1000
for i in xrange(1000):
L.append(block)
# do something with the partially concatenated string
print "partial length is", sum(len(item) for item in L)
# all finished
s = ''.join(L)
do_something(s)

There's still a Shlemiel algorithm in there, but it's executed in fast C
instead of slow Python and it doesn't involve copying large blocks of
memory, only walking them, so you won't notice as much. Can we be smarter?

L = []
block = "abcdefghijklmnopqrstuvwxyz"*1000
partial_length = 0
for i in xrange(1000):
L.append(block)
partial_length += len(block)
# do something with the partially concatenated string
print "partial length is", partial_length
# all finished
s = ''.join(L)
do_something(s)

Naturally this is a toy example, but I think it illustrates one technique
for avoiding turning an O(N) algorithm into an O(N**2) algorithm.

So even though I can't delay the "".join operation until after the
loop, I can still improve performance by reducing the length of
"".join operations inside the loop. In my case, I need to temporarily
work on the latest two blocks only.

L = []
block = "abcdefghijklmnopqrstuvwxyz"*1000
for i in xrange(1000):
L.append(block)
# do something with the latest two blocks
tmp = "".join(L[-2:])
# all finished
s = ''.join(L)
do_something(s)

Unfortunately, the downside is that the code becomes more difficult to
read compared to using the obvious +=. If only the optimization worked
for attributes ...

Steven D'Aprano · Feb 15, 2009

Nick said:
The optimized += depends on their being no other references to the
string. Â Strings are immutable in python. Â So append must return a new
string. Â However the += operation was optimised to do an in-place
append if and only if there are no other references to the string.

You can see this demonstrated here

$ python -m timeit -s 'a="x"' 'a+="x"'
1000000 loops, best of 3: 0.231 usec per loop

$ python -m timeit -s 'a="x"; b=a' 's = a; a+="x"'
100000 loops, best of 3: 30.1 usec per loop

That is a fantastic explanation of why you shouldn't rely on the concat
optimization, although the assignment b=a in the setup isn't necessary and
is a little misleading. Thank you.

String concatenation benchmarking weirdness	4	Jan 11, 2013
String concatenation performance	4	May 11, 2006
PATCH: Speed up direct string concatenation by 20+%!	34	Sep 29, 2006
metaclasses and performance	2	Jun 19, 2007
Help with code plsss	0	Aug 30, 2023
Python point location of intersect between two lines	0	Feb 28, 2018
multiprocessing vs thread performance	8	Dec 29, 2008
range() vs xrange() Python2\|3 issues for performance	11	Aug 2, 2011

String concatenation performance with +=

Sammo

Benjamin Peterson

Sammo

Steven D'Aprano

Steven D'Aprano

Steven D'Aprano

Sammo

Sammo

Steven D'Aprano

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads