How naive is Python?

J

John Nagle

How naive (in the sense that compiler people use the term)
is the current Python system? For example:

def foo() :
s = "This is a test"
return(s)

s2 = foo()

How many times does the string get copied?

Or, for example:

s1 = "Test1"
s2 = "Test2"
s3 = "Test3"
s = s1 + s2 + s3

Any redundant copies performed, or is that case optimized?

How about this?

kcount = 1000
s = ''
for i in range(kcount) :
s += str(i) + ' '

Is this O(N) or O(N^2) because of recopying of "s"?

I just want a sense of what's unusually inefficient in the
current implementation. Thanks.

John Nagle
 
S

skip

John> How naive (in the sense that compiler people use the term) is the
John> current Python system? For example:

John> def foo() :
John> s = "This is a test"
John> return(s)

John> s2 = foo()

John> How many times does the string get copied?

Never. s and s2 just refer to the same object (strings are immutable).
Executing this:

def foo() :
print id("This is a test")
s = "This is a test"
print id(s)
return(s)

s2 = foo()
print id(s2)

prints the same value three times.

John> Or, for example:

John> s1 = "Test1"
John> s2 = "Test2"
John> s3 = "Test3"
John> s = s1 + s2 + s3

John> Any redundant copies performed, or is that case optimized?

Not optimized. You can see that using the dis module:

4 0 LOAD_CONST 1 ('Test1')
3 STORE_FAST 0 (s1)

5 6 LOAD_CONST 2 ('Test2')
9 STORE_FAST 1 (s2)

6 12 LOAD_CONST 3 ('Test3')
15 STORE_FAST 2 (s3)

7 18 LOAD_FAST 0 (s1)
21 LOAD_FAST 1 (s2)
24 BINARY_ADD
25 LOAD_FAST 2 (s3)
28 BINARY_ADD
29 STORE_FAST 3 (s)
32 LOAD_CONST 0 (None)
35 RETURN_VALUE

The BINARY_ADD opcode creates a new string.

John> How about this?

John> kcount = 1000
John> s = ''
John> for i in range(kcount) :
John> s += str(i) + ' '

John> Is this O(N) or O(N^2) because of recopying of "s"?

O(N). Here's a demonstration of that:

#!/usr/bin/env python

from __future__ import division

def foo(kcount):
s = ''
for i in xrange(kcount) :
s += str(i) + ' '

import time

for i in xrange(5,25):
t = time.time()
foo(2**i)
t = time.time() - t
print 2**i, t, t/2**i

On my laptop t roughly doubles for each iteration and prints around 5e-06
for t/2**i in all cases.

Skip
 
R

Roy Smith

John Nagle said:
How naive (in the sense that compiler people use the term)
is the current Python system? For example:

def foo() :
s = "This is a test"
return(s)

s2 = foo()

How many times does the string get copied?

All of those just move around pointers to the same (interned) string.
How about this?

kcount = 1000
s = ''
for i in range(kcount) :
s += str(i) + ' '

Is this O(N) or O(N^2) because of recopying of "s"?

This is a well-known (indeed, the canonical) example of quadratic behavior
in Python. The standard solution is to store all the strings (again,
really just pointers to the strings) in a list, then join all the elements:

temp = []
for i in range (1000):
temp.append (str(i))
s = "".join (temp)

That ends up copying each string once (during the join operation), and is
O(N) overall.
 
S

Steven D'Aprano

How naive (in the sense that compiler people use the term)
is the current Python system? For example:

def foo() :
s = "This is a test"
return(s)

s2 = foo()

How many times does the string get copied?


Let's find out. Results below for Python 2.3 -- other versions may vary.
.... s = "This is a test"
.... return s
........ return "This is a test"
.... 2 0 LOAD_CONST 1 ('This is a test')
3 STORE_FAST 0 (s)

3 6 LOAD_FAST 0 (s)
9 RETURN_VALUE
10 LOAD_CONST 0 (None)
13 RETURN_VALUE 2 0 LOAD_CONST 1 ('This is a test')
3 RETURN_VALUE
4 LOAD_CONST 0 (None)
7 RETURN_VALUE

foo and foo2 functions compile to different byte-code. foo does a little
more work than foo2, but it is likely to be a trivial difference.
(True, True)

So the string "This is a test" within foo is not copied each time the
function is called. However, the string "This is a test" is duplicated
between foo and foo2 (the two functions don't share the same string
instance):
(True, False)


Or, for example:

s1 = "Test1"
s2 = "Test2"
s3 = "Test3"
s = s1 + s2 + s3

Any redundant copies performed, or is that case optimized?

I don't believe it is optimized. I believe that in Python 2.5 simple
numeric optimizations are performed, so that "x = 1 + 3" would compile to
"x = 4", but I don't think that holds for strings. If you are running 2.5,
you could find out with dis.dis.


How about this?

kcount = 1000
s = ''
for i in range(kcount) :
s += str(i) + ' '

Is this O(N) or O(N^2) because of recopying of "s"?

That will be O(N**2), except that CPython version 2.4 (or is it 2.5?) can,
sometimes, optimize it. Note that this is an implementation detail, and
doesn't hold for other Pythons like Jython, IronPython or PyPy.
 
J

John Machin

John> How naive (in the sense that compiler people use the term) is the
John> current Python system? For example:

John> def foo() :
John> s = "This is a test"
John> return(s)

John> s2 = foo()

John> How many times does the string get copied?

Never. s and s2 just refer to the same object (strings are immutable).
Executing this:

def foo() :
print id("This is a test")
s = "This is a test"
print id(s)
return(s)

s2 = foo()
print id(s2)

prints the same value three times.

John> Or, for example:

John> s1 = "Test1"
John> s2 = "Test2"
John> s3 = "Test3"
John> s = s1 + s2 + s3

John> Any redundant copies performed, or is that case optimized?

Not optimized. You can see that using the dis module:

4 0 LOAD_CONST 1 ('Test1')
3 STORE_FAST 0 (s1)

5 6 LOAD_CONST 2 ('Test2')
9 STORE_FAST 1 (s2)

6 12 LOAD_CONST 3 ('Test3')
15 STORE_FAST 2 (s3)

7 18 LOAD_FAST 0 (s1)
21 LOAD_FAST 1 (s2)
24 BINARY_ADD
25 LOAD_FAST 2 (s3)
28 BINARY_ADD
29 STORE_FAST 3 (s)
32 LOAD_CONST 0 (None)
35 RETURN_VALUE

The BINARY_ADD opcode creates a new string.

John> How about this?

John> kcount = 1000
John> s = ''
John> for i in range(kcount) :
John> s += str(i) + ' '

John> Is this O(N) or O(N^2) because of recopying of "s"?

O(N). Here's a demonstration of that:

#!/usr/bin/env python

from __future__ import division

def foo(kcount):
s = ''
for i in xrange(kcount) :
s += str(i) + ' '

import time

for i in xrange(5,25):
t = time.time()
foo(2**i)
t = time.time() - t
print 2**i, t, t/2**i

On my laptop t roughly doubles for each iteration and prints around 5e-06
for t/2**i in all cases.

Sorry, Skip, but I find that very hard to believe. The foo() function
would take quadratic time if it were merely adding on pieces of
constant size -- however len(str(i)) is not a constant, it is
O(log10(i)), so the time should be super-quadratic. Following are the
results I got with Python 2.5, Windows XP Pro, 3.2Ghz Intel dual-core
processor with the last line changed to print i, 2**i, t, t/2**i coz I
can't do log2() in my head :)

[snip]
18 262144 0.889999866486 3.39508005709e-006
19 524288 3.21900010109 6.13975544184e-006
20 1048576 12.2969999313 1.17273330034e-005
21 2097152 54.25 2.58684158325e-005
22 4194304 229.0 5.45978546143e-005
[k/b interrupt]

Cheers,
John
 
D

Duncan Booth

Roy Smith said:
All of those just move around pointers to the same (interned) string.

Correct about the pointers, but the string is not interned:

Steven D'Aprano said:
(True, True)

So the string "This is a test" within foo is not copied each time the
function is called. However, the string "This is a test" is duplicated
between foo and foo2 (the two functions don't share the same string
instance):

(True, False)

In this specific example the two functions don't share the same string, but
that won't always be the case: if the string had been interned this would
have printed (True, True).

e.g. Removing all the spaces from the string produces a string which is
interned.
s = "Thisisatest"
return s
return "Thisisatest"
True

Bottom line, never make assumptions about when two literal strings will
share the same object and when they won't.
 
S

skip

John> Sorry, Skip, but I find that very hard to believe. The foo()
John> function would take quadratic time if it were merely adding on
John> pieces of constant size -- however len(str(i)) is not a constant,
John> it is O(log10(i)), so the time should be
John> super-quadratic.

Sorry, I should have pointed out that I'm using the latest version of
Python. I believe += for strings has been optimized to simply extend s when
there is only a single reference to it.

Skip
 
S

skip

John> Sorry, Skip, but I find that very hard to believe. The foo()
John> function would take quadratic time if it were merely adding on
John> pieces of constant size -- however len(str(i)) is not a constant,
John> it is O(log10(i)), so the time should be super-quadratic.

me> Sorry, I should have pointed out that I'm using the latest version
me> of Python. I believe += for strings has been optimized to simply
me> extend s when there is only a single reference to it.

Actually, it isn't until I work my way back to 2.3 that I start to see
quadratic behavior:

% python2.6 strcopy.py
32 0.00022292137146 6.96629285812e-06
64 0.000907897949219 1.41859054565e-05
128 0.000649929046631 5.0775706768e-06
256 0.00111794471741 4.36697155237e-06
512 0.00260806083679 5.09386882186e-06
1024 0.00437998771667 4.27733175457e-06
2048 0.00921607017517 4.50003426522e-06
4096 0.0191979408264 4.68699727207e-06
8192 0.0694131851196 8.47328919917e-06
16384 0.0976829528809 5.96209429204e-06
32768 0.194766998291 5.94381708652e-06
% python2.5 strcopy.py
32 0.000439167022705 1.37239694595e-05
64 0.000303030014038 4.73484396935e-06
128 0.000631809234619 4.93600964546e-06
256 0.00112318992615 4.38746064901e-06
512 0.00279307365417 5.45522198081e-06
1024 0.00446391105652 4.35928814113e-06
2048 0.00953102111816 4.65381890535e-06
4096 0.0198018550873 4.83443727717e-06
8192 0.175454854965 2.14178289752e-05
16384 0.103327989578 6.30663998891e-06
32768 0.191411972046 5.84142981097e-06
% python2.4 strcopy.py
32 0.000230073928833 7.18981027603e-06
64 0.000307083129883 4.79817390442e-06
128 0.00114107131958 8.91461968422e-06
256 0.00116014480591 4.53181564808e-06
512 0.00231313705444 4.51784580946e-06
1024 0.00459003448486 4.48245555162e-06
2048 0.00974178314209 4.75673004985e-06
4096 0.019122838974 4.66866185889e-06
8192 0.0717267990112 8.75571276993e-06
16384 0.112125873566 6.84362021275e-06
32768 0.188065052032 5.73928991798e-06
% python2.3 strcopy.py
32 0.000223875045776 6.99609518051e-06
64 0.000385999679565 6.03124499321e-06
128 0.000766038894653 5.98467886448e-06
256 0.00154304504395 6.02751970291e-06
512 0.00309181213379 6.03869557381e-06
1024 0.0065929889679 6.43846578896e-06
2048 0.0147500038147 7.20215030015e-06
4096 0.14589214325 3.56181990355e-05
8192 0.778324127197 9.50102694333e-05
16384 3.20213103294 0.000195442567929
32768 14.7389230728 0.000449796236353
% python2.2 strcopy.py
32 0.00031590461731 9.87201929092e-06
64 0.000494003295898 7.71880149841e-06
128 0.00090217590332 7.04824924469e-06
256 0.00173211097717 6.76605850458e-06
512 0.00362610816956 7.08224251866e-06
1024 0.00711607933044 6.94929622114e-06
2048 0.0158200263977 7.7246222645e-06
4096 0.152237892151 3.71674541384e-05
8192 0.89648604393 0.000109434331534
16384 3.14483094215 0.000191945247934
32768 13.3367011547 0.000407003819419

Skip
 
J

John Nagle

John> Sorry, Skip, but I find that very hard to believe. The foo()
John> function would take quadratic time if it were merely adding on
John> pieces of constant size -- however len(str(i)) is not a constant,
John> it is O(log10(i)), so the time should be
John> super-quadratic.

Sorry, I should have pointed out that I'm using the latest version of
Python. I believe += for strings has been optimized to simply extend s when
there is only a single reference to it.

I'd heard that, but wasn't sure. That's the kind of answer I was
looking for.

That kind of optimization is a huge win.

John Nagle
 
J

John Machin

John> Sorry, Skip, but I find that very hard to believe. The foo()
John> function would take quadratic time if it were merely adding on
John> pieces of constant size -- however len(str(i)) is not a constant,
John> it is O(log10(i)), so the time should be super-quadratic.

me> Sorry, I should have pointed out that I'm using the latest version
me> of Python. I believe += for strings has been optimized to simply
me> extend s when there is only a single reference to it.

Sorry, I should have pointed out that you need to read up about
time.time() -- what is its resolution on your box? -- and time.clock()
and the timeit module.
Actually, it isn't until I work my way back to 2.3 that I start to see
quadratic behavior:

% python2.6 strcopy.py
32 0.00022292137146 6.96629285812e-06
64 0.000907897949219 1.41859054565e-05
128 0.000649929046631 5.0775706768e-06
256 0.00111794471741 4.36697155237e-06
512 0.00260806083679 5.09386882186e-06
1024 0.00437998771667 4.27733175457e-06
2048 0.00921607017517 4.50003426522e-06
4096 0.0191979408264 4.68699727207e-06
8192 0.0694131851196 8.47328919917e-06
16384 0.0976829528809 5.96209429204e-06
32768 0.194766998291 5.94381708652e-06
% python2.5 strcopy.py
32 0.000439167022705 1.37239694595e-05
64 0.000303030014038 4.73484396935e-06
128 0.000631809234619 4.93600964546e-06
256 0.00112318992615 4.38746064901e-06
512 0.00279307365417 5.45522198081e-06
1024 0.00446391105652 4.35928814113e-06
2048 0.00953102111816 4.65381890535e-06
4096 0.0198018550873 4.83443727717e-06
8192 0.175454854965 2.14178289752e-05
16384 0.103327989578 6.30663998891e-06
32768 0.191411972046 5.84142981097e-06
[snip]

Your ability to "see" quadratic behavoiur appears to be impaired by (1)
the low resolution and/or erratic nature of time.time() on your system
(2) stopping the experiment just before the results become interesting.

Try this with the latest *production* release (2.5):

C:\junk>cat fookount.py
def foo(kcount):
s = ''
for i in xrange(kcount) :
s += str(i) + ' '

C:\junk>for /L %n in (10,1,19) do \python25\python -mtimeit -s"from
fookount import foo" "foo(2**%n)"

C:\junk>\python25\python -mtimeit -s"from fookount import foo"
"foo(2**10)"
100 loops, best of 3: 1.1 msec per loop

C:\junk>\python25\python -mtimeit -s"from fookount import foo"
"foo(2**11)"
100 loops, best of 3: 2.34 msec per loop

C:\junk>\python25\python -mtimeit -s"from fookount import foo"
"foo(2**12)"
100 loops, best of 3: 4.79 msec per loop

C:\junk>\python25\python -mtimeit -s"from fookount import foo"
"foo(2**13)"
10 loops, best of 3: 9.57 msec per loop

C:\junk>\python25\python -mtimeit -s"from fookount import foo"
"foo(2**14)"
10 loops, best of 3: 19.8 msec per loop

C:\junk>\python25\python -mtimeit -s"from fookount import foo"
"foo(2**15)"
10 loops, best of 3: 40.7 msec per loop

C:\junk>\python25\python -mtimeit -s"from fookount import foo"
"foo(2**16)"
10 loops, best of 3: 82.1 msec per loop

C:\junk>\python25\python -mtimeit -s"from fookount import foo"
"foo(2**17)"
10 loops, best of 3: 242 msec per loop

C:\junk>\python25\python -mtimeit -s"from fookount import foo"
"foo(2**18)"
10 loops, best of 3: 886 msec per loop

C:\junk>\python25\python -mtimeit -s"from fookount import foo"
"foo(2**19)"
10 loops, best of 3: 3.21 sec per loop

Cheers,
John
 
J

John Bauman

Actually, it isn't until I work my way back to 2.3 that I start to see
quadratic behavior:

Yes, that's because the behavior was changed for 2.4, so it wouldn't be
quadratic in this case.
 
S

skip

John> (e-mail address removed) wrote:
John> Sorry, Skip, but I find that very hard to believe. The foo()
John> function would take quadratic time if it were merely adding on
John> pieces of constant size -- however len(str(i)) is not a constant,
John> it is O(log10(i)), so the time should be super-quadratic. me> Sorry, I should have pointed out that I'm using the latest version
me> of Python. I believe += for strings has been optimized to simply
me> extend s when there is only a single reference to it.

John> Sorry, I should have pointed out that you need to read up about
John> time.time() -- what is its resolution on your box? -- and
John> time.clock() and the timeit module.

time.time() should be plenty good enough for this particular task on my
machine (Mac OSX). time.time() has plenty of resolution:
5.00679016113e-06

It's time.clock() on Unix-y systems that is too coarse:
0.0

I ran the size of the loop to 2**24 and still only saw linear growth in the
later versions of Python. The machine (my Mac laptop) was otherwise idle.
I only reduced the loop length in what I posted before because of the
quadratic growth in Python 2.2 and 2.3. It took so long to run my script
using 2.2 and 2.3 I made a slight change so that it bailed out of the larger
loops:

#!/usr/bin/env python

from __future__ import division

def foo(kcount):
s = ''
for i in xrange(kcount) :
s += str(i) + ' '

import time

for i in xrange(5,25):
t = time.time()
foo(2**i)
t = time.time() - t
print "2**%d"%i, 2**i, t, t/2**i
if t > 200:
break

I then ran it using this shell command:

for v in 2.6 2.5 2.4 2.3 2.2 ; do
echo $v
time python$v strcopy.py
done 2>&1 \
| tee strcopy.out

The output is:

2.6
2**5 32 0.000240802764893 7.52508640289e-06
2**6 64 0.000278949737549 4.3585896492e-06
2**7 128 0.000599145889282 4.68082726002e-06
2**8 256 0.00878977775574 3.43350693583e-05
2**9 512 0.00221586227417 4.32785600424e-06
2**10 1024 0.00433588027954 4.23425808549e-06
2**11 2048 0.00897288322449 4.38129063696e-06
2**12 4096 0.0197570323944 4.82349423692e-06
2**13 8192 0.0359501838684 4.38845017925e-06
2**14 16384 0.0721030235291 4.40081930719e-06
2**15 32768 0.146120071411 4.45923069492e-06
2**16 65536 0.292731046677 4.46672129328e-06
2**17 131072 0.692205905914 5.28111195308e-06
2**18 262144 1.20644402504 4.60221872345e-06
2**19 524288 3.34210991859 6.37456878394e-06
2**20 1048576 6.86596488953 6.54789437249e-06
2**21 2097152 10.0534589291 4.79386278585e-06
2**22 4194304 21.4015710354 5.1025321568e-06
2**23 8388608 40.8173680305 4.86580944425e-06
2**24 16777216 84.5512800217 5.039649011e-06

real 2m50.195s
user 2m26.258s
sys 0m2.998s
2.5
2**5 32 0.000205039978027 6.40749931335e-06
2**6 64 0.000274896621704 4.29525971413e-06
2**7 128 0.000594139099121 4.64171171188e-06
2**8 256 0.00110697746277 4.32413071394e-06
2**9 512 0.00236988067627 4.62867319584e-06
2**10 1024 0.0045051574707 4.39956784248e-06
2**11 2048 0.00938105583191 4.58059366792e-06
2**12 4096 0.0197520256042 4.82227187604e-06
2**13 8192 0.0375790596008 4.58728754893e-06
2**14 16384 0.0780160427094 4.76172135677e-06
2**15 32768 0.148911952972 4.54443215858e-06
2**16 65536 0.307368040085 4.69006408821e-06
2**17 131072 0.703125953674 5.36442530574e-06
2**18 262144 1.22114300728 4.6582908908e-06
2**19 524288 2.62232589722 5.00168971485e-06
2**20 1048576 5.06462287903 4.83000076201e-06
2**21 2097152 10.3055510521 4.9140696774e-06
2**22 4194304 24.6841719151 5.88516519429e-06
2**23 8388608 42.5984380245 5.07812953288e-06
2**24 16777216 89.6156759262 5.34151052989e-06

real 2m58.236s
user 2m29.354s
sys 0m2.978s
2.4
2**5 32 0.000231981277466 7.24941492081e-06
2**6 64 0.000316858291626 4.95091080666e-06
2**7 128 0.000571966171265 4.46848571301e-06
2**8 256 0.00112700462341 4.40236181021e-06
2**9 512 0.00228881835938 4.47034835815e-06
2**10 1024 0.00619387626648 6.04870729148e-06
2**11 2048 0.00927710533142 4.52983658761e-06
2**12 4096 0.0188140869141 4.593282938e-06
2**13 8192 0.0386338233948 4.71604289487e-06
2**14 16384 0.0761170387268 4.64581535198e-06
2**15 32768 0.153247117996 4.67673089588e-06
2**16 65536 0.306257009506 4.67311110697e-06
2**17 131072 0.724220991135 5.52536766918e-06
2**18 262144 1.23747801781 4.7206040108e-06
2**19 524288 2.69648981094 5.1431461543e-06
2**20 1048576 5.20070004463 4.9597740599e-06
2**21 2097152 10.6776590347 5.09150459038e-06
2**22 4194304 21.149684906 5.04247782374e-06
2**23 8388608 46.8901240826 5.58973837883e-06
2**24 16777216 110.079385042 6.56124264253e-06

real 3m19.593s
user 2m32.932s
sys 0m3.100s
2.3
2**5 32 0.000223159790039 6.97374343872e-06
2**6 64 0.000349998474121 5.46872615814e-06
2**7 128 0.000737905502319 5.76488673687e-06
2**8 256 0.00150609016418 5.88316470385e-06
2**9 512 0.00307989120483 6.01541250944e-06
2**10 1024 0.00642395019531 6.27338886261e-06
2**11 2048 0.0161211490631 7.87165481597e-06
2**12 4096 0.110109090805 2.68821022473e-05
2**13 8192 0.787949800491 9.61852783803e-05
2**14 16384 3.3133919239 0.000202233393793
2**15 32768 14.3907749653 0.000439171599282
2**16 65536 60.2394678593 0.000919181333302
2**17 131072 295.17253089 0.00225198769294

real 6m15.110s
user 1m54.907s
sys 3m26.747s
2.2
2**5 32 0.000303030014038 9.46968793869e-06
2**6 64 0.000451803207397 7.05942511559e-06
2**7 128 0.00087308883667 6.82100653648e-06
2**8 256 0.00233697891235 9.12882387638e-06
2**9 512 0.00344800949097 6.73439353704e-06
2**10 1024 0.00730109214783 7.12997280061e-06
2**11 2048 0.017196893692 8.39692074805e-06
2**12 4096 0.112847805023 2.75507336482e-05
2**13 8192 0.840929031372 0.00010265246965
2**14 16384 3.05718898773 0.000186596007552
2**15 32768 13.0093569756 0.000397014067858
2**16 65536 57.9497959614 0.00088424371279
2**17 131072 294.361263037 0.00224579821042

real 6m9.514s
user 1m55.257s
sys 3m26.943s

It's clear that the behavior of the 2.4, 2.5 and 2.6 (cvs) versions is
substantially different than the 2.2 and 2.3 versions, and grows linearly as
the string length grows. I believe any slight nonlinearity in the 2.4-2.6
versions is just due to quadratic behavior in the Mac's realloc function.

Skip
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,582
Members
45,070
Latest member
BiogenixGummies

Latest Threads

Top