Speed quirk: redundant line gives six-fold speedup

M

MrJean1

Two observations:

1 - The difference in run time with and without the dummy* globals is
due to a difference in the number of invokations of the search()
function: 1,140 resp. 27,530 in my environment.

To verify, just change the line

def search():
....

to

searches = 0
def search():
global searches
searches += 1
....

and add at the very end

print searches, "searches"


2 - The run times with and without the dummy* variables is equal(ly
slow) if the LLentry() class and min() function call are modified to be
independent of the object value.

Change line

class LLentry: pass

to

LLinst = 0
class LLentry(object):
def __init__(self):
global LLinst
LLinst += 1
self.I = LLinst

and change line

mm = min((c.S, c) for c in rowitems(h))[1].D

to

mm = min((c.S, c.I, c) for c in rowitems(h))[2].D


/Jean Brouwers
 
C

Claudio Grondi

I've learnt my lesson :) Thank you for your help, and apologies
for wasting other people's time with this as well as my own!

I've learnt my lesson reading through this thread, too.

I am glad to be given the chance of wasting my time
with it and very happy and thankful, that you posted
your problem here before I have bumped into similar
one myself. I suppose, that there are many others
on this newsgroup who appreciated your posting,
too.

Claudio
 
B

Benjamin Niemann

Raymond said:
Thanks for your post. It is cute, brilliant, and interesting.

I haven't had time to investigate but can point at the likely cause.
All of the global names are stored in a single hash table. Search time
is dictated by two factors, the sparseness of the hash table and the
distribution of hash values. With respect to sparseness, whenever that
table becomes 2/3 full, it grows by a factor of four and becomes only
1/6 full (leading to many fewer collisions). With respect to
distribution, it should be noted that string hash values are decidely
non-random and your variable names likely congested consecutive spaces
in a nearly full table (resulting in seven times as many search probes
to find a global value).

When the extra value was added, it likely resized the table four-fold
and redistributed the hash values into fewer consecutive positions.

If that's the cause, then here's another reason to use long, descriptive
names instead of C64-BASIC style a, b, c, i, j... - with long names the
chances of hash collisions are pretty low.
Or everyone will start optimizing their programs by using long, *random*
names ;)
 
R

Raymond Hettinger

[Raymond Hettinger]
[Benjamin Niemann]
If that's the cause, then here's another reason to use long, descriptive
names instead of C64-BASIC style a, b, c, i, j... - with long names the
chances of hash collisions are pretty low.
Or everyone will start optimizing their programs by using long, *random*
names ;)

The wink applied to second line but could have also applied to the
first. For the most part, the non-randomness of string hashes tends to
work in your favor -- it can result in collision free hash tables.

For optimization, the best approach is to use local variables in your
innermost loops.

Alternatively, if you want to be tricky, try something like this:

globals().update(globals())

What this does and why it can improve search times is left as an
exercise for the reader :)


Raymond
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads


Members online

No members online now.

Forum statistics

Threads
473,774
Messages
2,569,598
Members
45,152
Latest member
LorettaGur
Top