A performance issue when using default value

K

keakon

I've found strange performance issue when using default value, the
test code is list below:

from timeit import Timer

def f(x):
y = x
y.append(1)
return y

def g(x=[]):
y = []
y.append(1)
return y

def h(x=[]):
y = x
y.append(1)
return y

def f2(x):
y = x
y.append(1)
return y + []

def g2(x=[]):
y = []
y.append(1)
return y + []

def h2(x=[]):
y = x
y.append(1)
return y + []

TIMES = 10000
print Timer('f([])','from __main__ import f, g, h').timeit(TIMES)
print Timer('g()','from __main__ import f, g, h').timeit(TIMES)
print Timer('h([])','from __main__ import f, g, h').timeit(TIMES)
print Timer('h()','from __main__ import f, g, h').timeit(TIMES)
print Timer('f2([])','from __main__ import f2, g2, h2').timeit(TIMES)
print Timer('g2()','from __main__ import f2, g2, h2').timeit(TIMES)
print Timer('h2([])','from __main__ import f2, g2, h2').timeit(TIMES)
print Timer('h2()','from __main__ import f2, g2, h2').timeit(TIMES)


I tested it with Python 2.5.4, 2.6.4 and 3.1.1 on Windows XP, and get
almost the same result:
0.00449247041174
0.00439608944712
0.00455867994396
0.00327471787615
0.00791581052899
0.00684919452053
0.00734311204357
0.30974942346

h2() is about 42 times slower than h2([]), but h() is a litter faster
than h([]).

If change TIMES to 20000, other results are 2 times than before, but h2
() is 4 times(about 1.2 sec) than before.

Is there any tricks in it?
 
A

alex23

keakon said:
def h2(x=[]):
  y = x
  y.append(1)
  return y + []
h2() is about 42 times slower than h2([]), but h() is a litter faster
than h([]).

Are you aware that 'y = x' _doesn't_ make a copy of [], that it
actually points to the same list as x?

My guess is that the slowdown you're seeing is due to the increasing
size of x _per timing iteration_. h2([]) will pass a new list each
time, while h2() will append to the same list _every_ time. The
difference between h & h2 is due to the concatenation of a new list to
the built one: the longer the default list grows, the longer this will
take, as extending a list takes O(k) time, with k being the number of
elements.
 
K

keakon

Even this style is 41 times faster than h2():

def i2(x=[]):
y = x
if not y: # add this line
y = [] # add this line
y.append(1)
return y + []

print Timer('i2()','from __main__ import f2, g2, h2, i2').timeit
(TIMES)

Time: 0.00742356919664
 
C

Chris Rebert

I've found strange performance issue when using default value, the
test code is list below:

from timeit import Timer

def f(x):
 y = x
 y.append(1)
 return y

def g(x=[]):
 y = []
 y.append(1)
 return y

def h(x=[]):
 y = x
 y.append(1)
 return y

def f2(x):
 y = x
 y.append(1)
 return y + []

def g2(x=[]):
 y = []
 y.append(1)
 return y + []

def h2(x=[]):
 y = x
 y.append(1)
 return y + []

TIMES = 10000
print Timer('f([])','from __main__ import f, g, h').timeit(TIMES)
print Timer('g()','from __main__ import f, g, h').timeit(TIMES)
print Timer('h([])','from __main__ import f, g, h').timeit(TIMES)
print Timer('h()','from __main__ import f, g, h').timeit(TIMES)
print Timer('f2([])','from __main__ import f2, g2, h2').timeit(TIMES)
print Timer('g2()','from __main__ import f2, g2, h2').timeit(TIMES)
print Timer('h2([])','from __main__ import f2, g2, h2').timeit(TIMES)
print Timer('h2()','from __main__ import f2, g2, h2').timeit(TIMES)


I tested it with Python 2.5.4, 2.6.4 and 3.1.1 on Windows XP, and get
almost the same result:
0.00449247041174
0.00439608944712
0.00455867994396
0.00327471787615
0.00791581052899
0.00684919452053
0.00734311204357
0.30974942346

h2() is about 42 times slower than h2([]), but h() is a litter faster
than h([]).

If change TIMES to 20000, other results are 2 times than before, but h2
() is 4 times(about 1.2 sec) than before.

Is there any tricks in it?

Are you aware of the following pitfall?:
.... a.append(7)
.... return a
print foo() [7]
print foo() [7, 7]
print foo()
[7, 7, 7]

i.e. the default argument is only evaluated once (when the function is
defined) and is then reused every call when the caller doesn't provide
a value.

Cheers,
Chris
 
S

Steven D'Aprano

I've found strange performance issue when using default value, the test
code is list below:

from timeit import Timer

def f(x):
y = x
y.append(1)
return y

def g(x=[]):
y = []
y.append(1)
return y

def h(x=[]):
y = x
y.append(1)
return y

def f2(x):
y = x
y.append(1)
return y + []

def g2(x=[]):
y = []
y.append(1)
return y + []

def h2(x=[]):
y = x
y.append(1)
return y + []

TIMES = 10000
print Timer('f([])','from __main__ import f, g, h').timeit(TIMES) print
Timer('g()','from __main__ import f, g, h').timeit(TIMES) print
Timer('h([])','from __main__ import f, g, h').timeit(TIMES) print
Timer('h()','from __main__ import f, g, h').timeit(TIMES) print
Timer('f2([])','from __main__ import f2, g2, h2').timeit(TIMES) print
Timer('g2()','from __main__ import f2, g2, h2').timeit(TIMES) print
Timer('h2([])','from __main__ import f2, g2, h2').timeit(TIMES) print
Timer('h2()','from __main__ import f2, g2, h2').timeit(TIMES)


I tested it with Python 2.5.4, 2.6.4 and 3.1.1 on Windows XP, and get
almost the same result:
0.00449247041174
0.00439608944712
0.00455867994396
0.00327471787615
0.00791581052899
0.00684919452053
0.00734311204357
0.30974942346


You've only run those results once, haven't you?

That is not trustworthy. In a multi-tasking operating system, including
Windows, some other process might have run for some milliseconds,
throwing the results off.

More importantly, if you had run it multiple times, you'd see this:

.... Timer('h2()','from __main__ import f2, g2, h2').timeit(TIMES)
....
0.0190200805664
0.315117120743
0.941411972046
1.56618499756
2.18553495407
2.79832291603
3.44865894318


h2 (and probably h, but I haven't tested it) get slower and slower and
slower every time you run it.

Why? I'm pretty sure this is in the FAQs, but default values are created
once, not every time the function is called. So you create a function
with the default value of []. Call the function, and that list gets 1
appended to it, so the default value is now [1]. Call the function again,
and it becomes [1, 1]. And so on, and so on, and so on.

The other functions create a new empty list each time the function is
called, so you're comparing the time it takes to grow an ever-bigger list
with the time it takes to create a tiny list.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,483
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top