fastest data structure for retrieving objects identified by (x,y)tuple?

Discussion in 'Python' started by Benjamin Jessup, Oct 3, 2012.

  1. I have a group of objects identified by unique (x,y) pairs and I want to
    find out an object's "neighbors" in a matrix of size 2400 x 2400.
    #############
    #obj# # #
    #############
    # # #obj# 3 x 3 Example
    #############
    # # # #
    #############
    There is either a neighbor, or a null value. I always know the (x,y)
    pair to check the neighbors of, so is doing,
    >> obj = grid[x][y] #lists, doesn't scale with num of objects

    or,
    >> obj = grid.get((x,y),None) #dictionary, scales with num of objects

    the fastest? I can't seem to find a conclusion by testing each alone,
    then in the full environment. Is it that, depending on the number of
    objects, each has an advantage?

    I know the fastest way to retrieve them would be to have them store
    pointers to their neighbors, then use those for retrieval. When large
    numbers of objects are changing their (x,y) pairs, rebuilding the
    pointers is too slow.
    Benjamin Jessup, Oct 3, 2012
    #1
    1. Advertising

  2. Re: fastest data structure for retrieving objects identified by(x,y)tuple?

    On Wed, 03 Oct 2012 18:30:24 -0400, Benjamin Jessup wrote:

    > I have a group of objects identified by unique (x,y) pairs and I want to
    > find out an object's "neighbors" in a matrix of size 2400 x 2400.

    [...]
    > There is either a neighbor, or a null value. I always know the (x,y)
    > pair to check the neighbors of, so is doing,
    > >> obj = grid[x][y] #lists, doesn't scale with num of objects

    > or,
    > >> obj = grid.get((x,y),None) #dictionary, scales with num of objects

    > the fastest? I can't seem to find a conclusion by testing each alone,
    > then in the full environment. Is it that, depending on the number of
    > objects, each has an advantage?


    Almost certainly not. Since you don't show how you test each, I have no
    idea why you can't find a conclusion.

    Let's start off with the question you don't ask: which uses less memory?
    The answer is, the dict solution by far. Here is a test:

    # populate a random matrix using both dict and list
    adict = {}
    alist = [[None]*2400 for i in range(2400)]
    from random import randrange
    for i in range(1000):
    x = randrange(2400)
    y = randrange(2400)
    adict[(x, y)] = "something"
    alist[x][y] = "something"

    import sys
    print(sys.getsizeof(adict))
    print(sys.getsizeof(alist) + sum(sys.getsizeof(L) for L in alist))


    The actual sizes printed will depend on how sparse the matrices are, but
    for the same above (approximately half full), using Python 2.7, the
    figures I get are:

    adict: 24712
    alist: 23127324


    Now let's test the speed:

    # randomly select matrix coordinates to look-up
    test_coords = []
    for i in range(1000):
    x = randrange(2400)
    y = randrange(2400)
    test_coords.append((x, y))

    # build some test code
    from timeit import Timer
    setup = "from __main__ import adict, alist, test_coords"
    t1 = Timer("for p in test_coords: obj = adict.get(p)", setup)
    t2 = Timer("for p in test_coords: obj = alist[p[0]][p[1]]", setup)

    # run the test code
    print(min(t1.repeat(number=10000, repeat=7)))
    print(min(t2.repeat(number=10000, repeat=7)))


    Again, on my system using Python 2.7, I get:
    3.13823986053
    2.97539305687

    which shows that the two are very close, but the list solution is about
    6% faster. So in this situation, a list of lists uses about 100 times
    more memory than a dict, but look-ups are about 6% faster.

    I would be very surprised if the timing results depended on the number of
    objects in the matrices.

    In case you are not familiar with timeit, let me explain what I have done:

    * I pre-select 1000 random coordinates.
    * I write some test code inside a Timer object that look up each of
    those coordinates, plus some setup code.
    * timeit runs the setup code once.
    * Then it runs my test code 10000 times as a single trial, timing
    it as accurately as possible.
    * It does seven trials, and I report the best of the seven.

    The time printed is measured in seconds. In this case, I get 3 seconds
    per trial, or 3e-7 seconds = 0.3 microseconds per lookup.


    > I know the fastest way to retrieve them would be to have them store
    > pointers to their neighbors, then use those for retrieval.


    How do you know that?

    No offence, but if you can't even work out whether lookups in a dict or a
    list are faster, I can't imagine why you think you can intuit what the
    fastest way to retrieve the nearest neighbours would be.



    --
    Steven
    Steven D'Aprano, Oct 4, 2012
    #2
    1. Advertising

  3. Re: fastest data structure for retrieving objects identified by(x,y)tuple?

    On Thu, 04 Oct 2012 01:58:16 +0000, Steven D'Aprano wrote:

    > adict: 24712
    > alist: 23127324

    [...]
    > So in this situation, a list of lists uses about 100 times
    > more memory than a dict, but look-ups are about 6% faster.


    Correction: about 1000 times more memory. Sorry for the typo.



    --
    Steven
    Steven D'Aprano, Oct 4, 2012
    #3
  4. Am 04.10.2012 03:58 schrieb Steven D'Aprano:
    > alist = [[None]*2400 for i in range(2400)]
    > from random import randrange
    > for i in range(1000):
    > x = randrange(2400)
    > y = randrange(2400)
    > adict[(x, y)] = "something"
    > alist[x][y] = "something"


    > The actual sizes printed will depend on how sparse the matrices are, but
    > for the same above (approximately half full),


    I wouldn't consider 1000 of 5760000 "half full"...


    Thomas
    Thomas Rachel, Oct 4, 2012
    #4
  5. Re: fastest data structure for retrieving objects identified by(x,y) tuple?

    On Thu, 04 Oct 2012 18:11:28 +0200, Thomas Rachel wrote:

    > Am 04.10.2012 03:58 schrieb Steven D'Aprano:
    >> alist = [[None]*2400 for i in range(2400)] from random import randrange
    >> for i in range(1000):
    >> x = randrange(2400)
    >> y = randrange(2400)
    >> adict[(x, y)] = "something"
    >> alist[x][y] = "something"

    >
    >> The actual sizes printed will depend on how sparse the matrices are,
    >> but for the same above (approximately half full),

    >
    > I wouldn't consider 1000 of 5760000 "half full"...


    Doh! I obviously can't multiply...

    I mean, well done, that was a deliberate test to see who was paying
    attention, and you passed!

    :)


    --
    Steven
    Steven D'Aprano, Oct 5, 2012
    #5
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Sebek
    Replies:
    2
    Views:
    492
    Dimitre Novatchev [MVP XML]
    Apr 5, 2004
  2. mike
    Replies:
    10
    Views:
    724
    Joris Gillis
    Nov 2, 2004
  3. Alex
    Replies:
    2
    Views:
    381
  4. DanielJohnson

    struct point not identified by gcc

    DanielJohnson, Feb 22, 2007, in forum: C Programming
    Replies:
    39
    Views:
    6,215
    CBFalconer
    Mar 7, 2007
  5. Benjamin Jessup
    Replies:
    1
    Views:
    183
    Steven D'Aprano
    Oct 4, 2012
Loading...

Share This Page