Performance Issues please help

Discussion in 'Python' started by PyPK, Jun 2, 2005.

  1. PyPK

    PyPK Guest

    I was testing this piece of code and it takes about 24-30 seconds to do
    a look up on a list(m) of size 1000x1000
    m -> list of size 1000x1000
    import time
    print time.ctime()
    box = {}
    for r,row in enumerate(m):
    for c,e in enumerate(row):
    if box.has_key(e):
    params = box[e]
    box[e] = ( min(c, params[0]), min(r, params[1]),
    max(c, params[2]), max(r, params[3] ) )
    else:
    box[e] = [c, r, c, r]
    print time.ctime()

    Can some one tell me what exactly is taking more time here. Is it
    because I am using dictionaries or what is the problem. Can some one
    help me improve this .Is there a better way to write this.
     
    PyPK, Jun 2, 2005
    #1
    1. Advertising

  2. PyPK

    John Machin Guest

    PyPK wrote:
    > I was testing this piece of code and it takes about 24-30 seconds to do
    > a look up on a list(m) of size 1000x1000
    > m -> list of size 1000x1000
    > import time
    > print time.ctime()
    > box = {}
    > for r,row in enumerate(m):
    > for c,e in enumerate(row):
    > if box.has_key(e):
    > params = box[e]
    > box[e] = ( min(c, params[0]), min(r, params[1]),
    > max(c, params[2]), max(r, params[3] ) )
    > else:
    > box[e] = [c, r, c, r]
    > print time.ctime()
    >
    > Can some one tell me what exactly is taking more time here. Is it
    > because I am using dictionaries or what is the problem. Can some one
    > help me improve this .Is there a better way to write this.
    >


    Without gross changes to the algorithm, and in no particular order:
    0. Stop worrying. Find something else to do during the 30 seconds.
    1. Use psyco, if your [unspecified] platform supports it.
    2. Why has_key()? Try "if e in box:" instead, if your [unspecified]
    version of Python supports it. If it doesn't, (a) consider upgrading (b)
    try doing box_has_key = box.has_key outside the loops and using the
    result inside the loops.
    3. Ensure this code is inside a function, not at global level in a
    script [not apparent from your post].
    4. Outside the loops, put "local_min = min", ditto for max. Then use
    box[e] = (local_min(c, etc etc
    5. Upgrade your [unspecified] hardware.
    6. Use try/except, if indicated by the [unspecified] percentage of times
    "e in box" is true. Hint: printing len(box) at the end might be useful.
    7. Consider using pyrex.
    8. Consider using numarray/mumeric.

    Info req'd for discussing algorithm changes:
    1. How much free memory do you have?
    2. What can you tell us about the distribution of "e"?
    3. Is "m" a rectangle, or are the rows of variable length?
    4. Do you have control over "m" i.e. are you stuck with that data
    structure, or can we design a different one?
     
    John Machin, Jun 2, 2005
    #2
    1. Advertising

  3. PyPK

    PyPK Guest

    >Info req'd for discussing algorithm changes:
    >1. How much free memory do you have?


    - Memory is not a problem but I am working on some timing
    constraints.Thats is the reason I am a bit concerned abt these 30
    seconds

    > 2. What can you tell us about the distribution of "e"?


    - the distribution of e is not fixed.It changes based on what comes
    first in the scan of the list.

    > 3. Is "m" a rectangle, or are the rows of variable length?


    - m is not a restnagle .its of varied dimensions.

    > 4. Do you have control over "m" i.e. are you stuck with that data

    structure, or can we design a different one?

    -I can use 'm' as an array or a list. Other than that I am stuck.

    I am suspecious abput dictionary approach here. Is this slower than if
    i do it with arrays or lists with index of array as my key ?
     
    PyPK, Jun 2, 2005
    #3
  4. PyPK

    John Machin Guest

    PyPK wrote:

    >
    > I am suspecious abput dictionary approach here. Is this slower than if
    > i do it with arrays or lists with index of array as my key ?
    >


    "with index of array as my key" is either redundant -- indicating that
    your answer to my question about "e" should have included something like
    "e is an integer that can range from blah0 to blah_max" -- or not
    understandable.

    In any case, seeing that the code in question is only a few lines, why
    don't *you* write an alternative version and see how fast it runs? And
    let us know?
     
    John Machin, Jun 2, 2005
    #4
  5. PyPK

    Peter Otten Guest

    PyPK wrote:

    > I was testing this piece of code and it takes about 24-30 seconds to do
    > a look up on a list(m) of size 1000x1000
    > m -> list of size 1000x1000
    > import time
    > print time.ctime()
    > box = {}
    > for r,row in enumerate(m):
    > for c,e in enumerate(row):
    > if box.has_key(e):
    > params = box[e]
    > box[e] = ( min(c, params[0]), min(r, params[1]),
    > max(c, params[2]), max(r, params[3] ) )
    > else:
    > box[e] = [c, r, c, r]
    > print time.ctime()
    >
    > Can some one tell me what exactly is taking more time here. Is it
    > because I am using dictionaries or what is the problem. Can some one
    > help me improve this .Is there a better way to write this.


    AFAICT the row index 'r' can only grow. Therefore one min() and one max()
    should be superfluous (untested):

    def make_box(m):
    box = {}
    for r, row in enumerate(m):
    for c, e in enumerate(row):
    if e in box:
    left, top, right, bottom = box[e]
    box[e] = (min(c, left), top, max(c, right), r)
    else:
    box[e] = (c, r, c, r)
    return box

    Peter
     
    Peter Otten, Jun 2, 2005
    #5
  6. PyPK

    PyPK Guest

    Yep that improved the speed by about 50% now it takes about 10 secs
    instead of 24 seconds..Thanks much. I guess that is the best we could
    do right.It would be really helpful if I could get it less than 5
    seconds. Any suggestions on that??
     
    PyPK, Jun 2, 2005
    #6
  7. PyPK wrote:
    > Yep that improved the speed by about 50% now it takes about 10 secs
    > instead of 24 seconds..Thanks much. I guess that is the best we could
    > do right.It would be really helpful if I could get it less than 5
    > seconds. Any suggestions on that??
    >

    Things to try:
    * in-lining the min and max expressions
    * depending on the distribution of e, it may be faster to catch KeyErrors

    def search1(m):
    box = {}
    for r,row in enumerate(m):
    for c,e in enumerate(row):
    try:
    minc, minr, maxc, maxr = box[e]
    box[e] = ( c < minc and c or minc,
    r < minr and r or minr,
    c > maxc and c or maxc,
    r > maxr and r or maxr)
    except KeyError:
    box[e] = (c, r, c, r)
    return box

    Michael
     
    Michael Spencer, Jun 2, 2005
    #7
  8. Michael Spencer wrote:
    > def search1(m):
    > box = {}
    > for r,row in enumerate(m):
    > for c,e in enumerate(row):
    > try:
    > minc, minr, maxc, maxr = box[e]
    > box[e] = ( c < minc and c or minc,
    > r < minr and r or minr,
    > c > maxc and c or maxc,
    > r > maxr and r or maxr)
    > except KeyError:
    > box[e] = (c, r, c, r)
    > return box
    >
    > Michael
    >


    Sorry, my earlier solution was incorrect since:
    c < minc and c or minc # evaluates to minc if c == 0
    Two of the tests are unnecessary, as pointed out by Peter
    The remaining test:
    c > maxc and c or maxc
    does not suffer from the same problem, since c cannot both be 0 and greater than
    maxc

    The following version, still untested ;-) has the correction:

    def search2(m):
    box = {}
    for r,row in enumerate(m):
    for c,e in enumerate(row):
    try: # use this form if e will be found 90%+ of the time
    # otherwise, use: if e in box:
    minc, minr, maxc, maxr = box[e]
    # note correction for c == 0
    # also Peter's simplification
    box[e] = ( c and (c < minc and c or minc),
    minr,
    c > maxc and c or maxc,
    r)

    except KeyError:
    box[e] = (c, r, c, r)
    return box

    Michael
     
    Michael Spencer, Jun 2, 2005
    #8
  9. PyPK

    John Machin Guest

    Michael Spencer wrote:

    > minc, minr, maxc, maxr = box[e]
    > # note correction for c == 0
    > # also Peter's simplification
    > box[e] = ( c and (c < minc and c or minc),
    > minr,
    > c > maxc and c or maxc,
    > r)
    >


    This may be slightly faster (when c > 0), and slightly less opaque:

    minc, minr, maxc, maxr = box[e]
    # note correction for c == 0
    # also Peter's simplification
    if c < minc:
    minc = c
    box[e] = ( minc,
    minr,
    c > maxc and c or maxc,
    r)
     
    John Machin, Jun 2, 2005
    #9
  10. PyPK

    John Machin Guest

    Michael Spencer wrote:

    > minc, minr, maxc, maxr = box[e]
    > # note correction for c == 0
    > # also Peter's simplification
    > box[e] = ( c and (c < minc and c or minc),
    > minr,
    > c > maxc and c or maxc,
    > r)
    >


    This may be slightly faster (when c > 0), and slightly less opaque:

    minc, minr, maxc, maxr = box[e]
    # note correction for c == 0
    # also Peter's simplification
    if c < minc:
    minc = c
    box[e] = ( minc,
    minr,
    c > maxc and c or maxc,
    r)
     
    John Machin, Jun 2, 2005
    #10
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. JMaelstrom

    IIS 6 vs IIS 5 ASP.NET Performance Issues

    JMaelstrom, Dec 9, 2003, in forum: ASP .Net
    Replies:
    2
    Views:
    4,674
    shan420
    Apr 30, 2010
  2. John Q. Smith

    SQLServer session state performance issues

    John Q. Smith, Jan 12, 2004, in forum: ASP .Net
    Replies:
    4
    Views:
    3,460
    =?Utf-8?B?Sm9obiBRIFNtaXRo?=
    Jan 16, 2004
  3. Replies:
    4
    Views:
    530
    Chris Uppal
    May 5, 2005
  4. KK
    Replies:
    2
    Views:
    622
    Big Brian
    Oct 14, 2003
  5. MuZZy
    Replies:
    7
    Views:
    1,779
    Mike Hewson
    Jan 7, 2005
Loading...

Share This Page