why are these exec times so different?

R

r.e.s.

Should one expect the following execution times
to be so different? ...

Module_A takes about 4 seconds to run:
x = (stuff_1)
(stuff_2)

Module_B takes about 80 seconds to run:
x = (stuff_1)
def f(y):
(stuff_2)
f(x)

where (stuff_1) and (stuff_2) are respectively
the same in both modules. I understand that
they should differ in execution time, but by a
factor of 20?!

(I'll supply the actual code if requested, but
I didn't want to clutter things up needlessly.
I'm using PythonWin, win32all build 163.)

Thanks.
 
B

Bengt Richter

Should one expect the following execution times
to be so different? ...

Module_A takes about 4 seconds to run:
x = (stuff_1)
(stuff_2)

Module_B takes about 80 seconds to run:
x = (stuff_1)
def f(y):
(stuff_2)
f(x)

where (stuff_1) and (stuff_2) are respectively
the same in both modules. I understand that
they should differ in execution time, but by a
factor of 20?!

(I'll supply the actual code if requested, but
I didn't want to clutter things up needlessly.
I'm using PythonWin, win32all build 163.)
Timing is tricky, because of all the extra time that can sneak in.
So the thing is to make the circumstances for both timings as near
equal as possible. Best if you can run them separately in separate executions
of python, and do a short loop and get the best time of several. Don't do too many
if you have a CPU that gets hot and slows its clock to avoid burning. By the same
token, don't do a second test before it's cooled off from the first.
Looping should get rid of cache-miss time loss, and hopefully also garbage collection
affected loops will be thown out. Usually looping and taking the best will be ok to do
in the same program for several tests, but sometimes not.

Make sure you are comparing apples and apples too. I.e., if stuff_2 involves copying a
huge list or using range instead of xrange etc., it's not a fair comparison. Generally
stuff done with local variables inside a function ought to be faster than the same
accessing globals.

Not to mention all the background stuff that could be going on. Don't be dragging windows
around or be playing solitaire or playing music or downloading while testing. Maybe some
spy-ware kicked in and called home? ;-)

Regards,
Bengt Richter
 
J

Jeff Epler

Should one expect the following execution times
to be so different? ...

No reason comes immediately to mind. Usually,
def f():
suite
f()
will run a little faster than
suite
because LOAD_FAST variable references, which are used inside functions
for function-scope variables but not at the module top-level scope, are
speedier than LOAD_NAME and LOAD_GLOBAL codes.

So it's safe to say that the real reason for this huge difference in the
favor of the "global scope" version lies in your code somewhere. The
part you didn't post.

Jeff
 
R

r.e.s.

To show that what I'm seeing isn't because of
local/global differences, here are dummy functions
that exhibit the same behavior. (And I've tried to
minimize os background activity, etc, when testing.)

The results are consistently repeatable: For the
given string s, f1(s) takes >10 times longer to run
than f2(s) -- f1(s) and f2(s) return the same value,
and contain essentially the same code, except that
f2 calls function g to do the common inner loop:

def f1(s):
""" do outer & inner loop """
D = {}
for i in xrange(len(s)):
for j in xrange(len(s)-i+1):
D[s[j:j+i]] = True
return len(D)

def f2(s):
""" same as f1, but call g to do inner loop """
count = 0
for i in xrange(len(s)):
count += g(s,i)
return count

def g(s,i):
""" the common inner loop """
D = {}
for j in xrange(len(s)-i+1):
D[s[j:j+i]] = True
return len(D)

def string01(n):
"""Return first n chars of a fixed pseudorandom string."""
random.seed(0)
return ''.join([ random.choice('01') for i in xrange(n) ])

import random, time
s = string01(1000)
t0 = time.clock()
test, count = 'f1:', f1(s)
#test, count = 'f2:', f2(s)
t1 = time.clock()
print test, count, t1-t0
 
R

r.e.s.

gads! as soon as I clicked Send, I saw the obvious.
def f1(s):
""" do outer & inner loop """
D = {}
for i in xrange(len(s)):
for j in xrange(len(s)-i+1):
D[s[j:j+i]] = True
return len(D)

to be comparable to f2, `D={}` needs to be in the
inner loop, otherwise D grows much larger --
it's the growth of D that accounts for the timing
difference I was noticing!

sorry for bothering everyone with my "learning
experience"
 
S

Skip Montanaro

res> To show that what I'm seeing isn't because of local/global
res> differences, here are dummy functions that exhibit the same
res> behavior. (And I've tried to minimize os background activity, etc,
res> when testing.)

res> The results are consistently repeatable: For the given string s,
res> f1(s) takes >10 times longer to run than f2(s) -- f1(s) and f2(s)
res> return the same value, and contain essentially the same code,
res> except that f2 calls function g to do the common inner loop:

...

On my machine, f2 runs about twice as fast as f1. The only obvious
difference I can see is that in f1 D grows to length 491678, while in g the
dictionary D is always much smaller than that. The resizing cost of f1.D
may outweigh the creation costs of all the smaller g.D instances.

Here's a slightly modified version of your script which prints the
distribution of g.D sizes (modulo 10) at the end of f2. Those which are
very small won't need to be resized at all. A dictionary of length 490000
will have to be resized quite a bit. Also, the larger dictionary (depending
on how the key hashing works) may incur a much larger number of collisions
than the many smaller dictionaries, further decreasing performance.

Skip

def f1(s):
""" do outer & inner loop """
D = {}
for i in xrange(len(s)):
for j in xrange(len(s)-i+1):
D[s[j:j+i]] = True
return len(D)

def f2(s):
""" same as f1, but call g to do inner loop """
count = 0
dsizes = {}
for i in xrange(len(s)):
c = g(s,i)
n = dsizes.get(c%10, 0) + 1
dsizes[c%10] = n
count += c
print dsizes
return count

def g(s,i):
""" the common inner loop """
D = {}
for j in xrange(len(s)-i+1):
D[s[j:j+i]] = True
return len(D)

def string01(n):
"""Return first n chars of a fixed pseudorandom string."""
random.seed(0)
return ''.join([ random.choice('01') for i in xrange(n) ])

import random, time
s = string01(1000)
for func in (f1, f2):
t0 = time.clock()
count = func(s)
t1 = time.clock()
print func.func_name, count, t1-t0
 
R

r.e.s.

Skip Montanaro said:
The only obvious
difference I can see is that in f1 D grows to length 491678, while in g the
dictionary D is always much smaller than that. The resizing cost of f1.D
may outweigh the creation costs of all the smaller g.D instances.

Exactly ... I'd somehow failed to notice that.
Here's a slightly modified version of your script which prints the
distribution of g.D sizes (modulo 10) at the end of f2. Those which are
very small won't need to be resized at all. A dictionary of length 490000
will have to be resized quite a bit. Also, the larger dictionary (depending
on how the key hashing works) may incur a much larger number of collisions
than the many smaller dictionaries, further decreasing performance.
[...]
Thanks for the feedback. Being new to python,
I find these little code snippets quite helpful.

--r.e.s.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,764
Messages
2,569,565
Members
45,041
Latest member
RomeoFarnh

Latest Threads

Top