Performance Issues please help

PyPK · Jun 2, 2005

I was testing this piece of code and it takes about 24-30 seconds to do
a look up on a list(m) of size 1000x1000
m -> list of size 1000x1000
import time
print time.ctime()
box = {}
for r,row in enumerate(m):
for c,e in enumerate(row):
if box.has_key(e):
params = box[e]
box[e] = ( min(c, params[0]), min(r, params[1]),
max(c, params[2]), max(r, params[3] ) )
else:
box[e] = [c, r, c, r]
print time.ctime()

Can some one tell me what exactly is taking more time here. Is it
because I am using dictionaries or what is the problem. Can some one
help me improve this .Is there a better way to write this.

John Machin · Jun 2, 2005

PyPK said:
I was testing this piece of code and it takes about 24-30 seconds to do
a look up on a list(m) of size 1000x1000
m -> list of size 1000x1000
import time
print time.ctime()
box = {}
for r,row in enumerate(m):
for c,e in enumerate(row):
if box.has_key(e):
params = box[e]
box[e] = ( min(c, params[0]), min(r, params[1]),
max(c, params[2]), max(r, params[3] ) )
else:
box[e] = [c, r, c, r]
print time.ctime()

Can some one tell me what exactly is taking more time here. Is it
because I am using dictionaries or what is the problem. Can some one
help me improve this .Is there a better way to write this.

Without gross changes to the algorithm, and in no particular order:
0. Stop worrying. Find something else to do during the 30 seconds.
1. Use psyco, if your [unspecified] platform supports it.
2. Why has_key()? Try "if e in box:" instead, if your [unspecified]
version of Python supports it. If it doesn't, (a) consider upgrading (b)
try doing box_has_key = box.has_key outside the loops and using the
result inside the loops.
3. Ensure this code is inside a function, not at global level in a
script [not apparent from your post].
4. Outside the loops, put "local_min = min", ditto for max. Then use
box[e] = (local_min(c, etc etc
5. Upgrade your [unspecified] hardware.
6. Use try/except, if indicated by the [unspecified] percentage of times
"e in box" is true. Hint: printing len(box) at the end might be useful.
7. Consider using pyrex.
8. Consider using numarray/mumeric.

Info req'd for discussing algorithm changes:
1. How much free memory do you have?
2. What can you tell us about the distribution of "e"?
3. Is "m" a rectangle, or are the rows of variable length?
4. Do you have control over "m" i.e. are you stuck with that data
structure, or can we design a different one?

PyPK · Jun 2, 2005

Info req'd for discussing algorithm changes:

1. How much free memory do you have?

- Memory is not a problem but I am working on some timing
constraints.Thats is the reason I am a bit concerned abt these 30
seconds

2. What can you tell us about the distribution of "e"?

- the distribution of e is not fixed.It changes based on what comes
first in the scan of the list.

3. Is "m" a rectangle, or are the rows of variable length?

- m is not a restnagle .its of varied dimensions.

4. Do you have control over "m" i.e. are you stuck with that data

structure, or can we design a different one?

-I can use 'm' as an array or a list. Other than that I am stuck.

I am suspecious abput dictionary approach here. Is this slower than if
i do it with arrays or lists with index of array as my key ?

John Machin · Jun 2, 2005

PyPK said:
I am suspecious abput dictionary approach here. Is this slower than if
i do it with arrays or lists with index of array as my key ?

"with index of array as my key" is either redundant -- indicating that
your answer to my question about "e" should have included something like
"e is an integer that can range from blah0 to blah_max" -- or not
understandable.

In any case, seeing that the code in question is only a few lines, why
don't *you* write an alternative version and see how fast it runs? And
let us know?

Peter Otten · Jun 2, 2005

PyPK said:
I was testing this piece of code and it takes about 24-30 seconds to do
a look up on a list(m) of size 1000x1000
m -> list of size 1000x1000
import time
print time.ctime()
box = {}
for r,row in enumerate(m):
for c,e in enumerate(row):
if box.has_key(e):
params = box[e]
box[e] = ( min(c, params[0]), min(r, params[1]),
max(c, params[2]), max(r, params[3] ) )
else:
box[e] = [c, r, c, r]
print time.ctime()

Can some one tell me what exactly is taking more time here. Is it
because I am using dictionaries or what is the problem. Can some one
help me improve this .Is there a better way to write this.

AFAICT the row index 'r' can only grow. Therefore one min() and one max()
should be superfluous (untested):

def make_box(m):
box = {}
for r, row in enumerate(m):
for c, e in enumerate(row):
if e in box:
left, top, right, bottom = box[e]
box[e] = (min(c, left), top, max(c, right), r)
else:
box[e] = (c, r, c, r)
return box

Peter

PyPK · Jun 2, 2005

Yep that improved the speed by about 50% now it takes about 10 secs
instead of 24 seconds..Thanks much. I guess that is the best we could
do right.It would be really helpful if I could get it less than 5
seconds. Any suggestions on that??

Michael Spencer · Jun 2, 2005

PyPK said:
Yep that improved the speed by about 50% now it takes about 10 secs
instead of 24 seconds..Thanks much. I guess that is the best we could
do right.It would be really helpful if I could get it less than 5
seconds. Any suggestions on that??

Things to try:
* in-lining the min and max expressions
* depending on the distribution of e, it may be faster to catch KeyErrors

def search1(m):
box = {}
for r,row in enumerate(m):
for c,e in enumerate(row):
try:
minc, minr, maxc, maxr = box[e]
box[e] = ( c < minc and c or minc,
r < minr and r or minr,
c > maxc and c or maxc,
r > maxr and r or maxr)
except KeyError:
box[e] = (c, r, c, r)
return box

Michael

Michael Spencer · Jun 2, 2005

Michael said:
> def search1(m):
box = {}
for r,row in enumerate(m):
for c,e in enumerate(row):
try:
minc, minr, maxc, maxr = box[e]
box[e] = ( c < minc and c or minc,
r < minr and r or minr,
c > maxc and c or maxc,
r > maxr and r or maxr)
except KeyError:
box[e] = (c, r, c, r)
return box

Michael

Sorry, my earlier solution was incorrect since:
c < minc and c or minc # evaluates to minc if c == 0
Two of the tests are unnecessary, as pointed out by Peter
The remaining test:
c > maxc and c or maxc
does not suffer from the same problem, since c cannot both be 0 and greater than
maxc

The following version, still untested ;-) has the correction:

def search2(m):
box = {}
for r,row in enumerate(m):
for c,e in enumerate(row):
try: # use this form if e will be found 90%+ of the time
# otherwise, use: if e in box:
minc, minr, maxc, maxr = box[e]
# note correction for c == 0
# also Peter's simplification
box[e] = ( c and (c < minc and c or minc),
minr,
c > maxc and c or maxc,
r)

except KeyError:
box[e] = (c, r, c, r)
return box

Michael

John Machin · Jun 2, 2005

Michael said:
minc, minr, maxc, maxr = box[e]
# note correction for c == 0
# also Peter's simplification
box[e] = ( c and (c < minc and c or minc),
minr,
c > maxc and c or maxc,
r)

This may be slightly faster (when c > 0), and slightly less opaque:

minc, minr, maxc, maxr = box[e]
# note correction for c == 0
# also Peter's simplification
if c < minc:
minc = c
box[e] = ( minc,
minr,
c > maxc and c or maxc,
r)

John Machin · Jun 2, 2005

Michael said:
minc, minr, maxc, maxr = box[e]
# note correction for c == 0
# also Peter's simplification
box[e] = ( c and (c < minc and c or minc),
minr,
c > maxc and c or maxc,
r)

This may be slightly faster (when c > 0), and slightly less opaque:

minc, minr, maxc, maxr = box[e]
# note correction for c == 0
# also Peter's simplification
if c < minc:
minc = c
box[e] = ( minc,
minr,
c > maxc and c or maxc,
r)

I dont get this. Please help me!!	2	Jan 24, 2023
Translater + module + tkinter	1	Feb 16, 2023
Need help with this script	4	Mar 12, 2023
Help with my responsive home page	2	Dec 14, 2022
Trouble with prediction code, for the life of me I can't figure out why it isnt running properly. Help would be appreciated.	0	Jul 8, 2023
Help needed with code	5	Mar 7, 2021
Help with code	0	Jun 12, 2022
Need help with code on website (noob)	2	Jul 18, 2022

Performance Issues please help

PyPK

John Machin

PyPK

John Machin

Peter Otten

PyPK

Michael Spencer

Michael Spencer

John Machin

John Machin

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads