fastest data structure for retrieving objects identified by (x,y)tuple?

Benjamin Jessup · Oct 3, 2012

I have a group of objects identified by unique (x,y) pairs and I want to
find out an object's "neighbors" in a matrix of size 2400 x 2400.
#############
#obj# # #
#############
# # #obj# 3 x 3 Example
#############
# # # #
#############
There is either a neighbor, or a null value. I always know the (x,y)
pair to check the neighbors of, so is doing,

obj = grid[x][y] #lists, doesn't scale with num of objects or,
obj = grid.get((x,y),None) #dictionary, scales with num of objects

Click to expand...

the fastest? I can't seem to find a conclusion by testing each alone,
then in the full environment. Is it that, depending on the number of
objects, each has an advantage?

I know the fastest way to retrieve them would be to have them store
pointers to their neighbors, then use those for retrieval. When large
numbers of objects are changing their (x,y) pairs, rebuilding the
pointers is too slow.

Steven D'Aprano · Oct 4, 2012

I have a group of objects identified by unique (x,y) pairs and I want to
find out an object's "neighbors" in a matrix of size 2400 x 2400. [...]
There is either a neighbor, or a null value. I always know the (x,y)
pair to check the neighbors of, so is doing,

obj = grid[x][y] #lists, doesn't scale with num of objects or,
obj = grid.get((x,y),None) #dictionary, scales with num of objects

Click to expand...

Click to expand...

the fastest? I can't seem to find a conclusion by testing each alone,
then in the full environment. Is it that, depending on the number of
objects, each has an advantage?

Almost certainly not. Since you don't show how you test each, I have no
idea why you can't find a conclusion.

Let's start off with the question you don't ask: which uses less memory?
The answer is, the dict solution by far. Here is a test:

# populate a random matrix using both dict and list
adict = {}
alist = [[None]*2400 for i in range(2400)]
from random import randrange
for i in range(1000):
x = randrange(2400)
y = randrange(2400)
adict[(x, y)] = "something"
alist[x][y] = "something"

import sys
print(sys.getsizeof(adict))
print(sys.getsizeof(alist) + sum(sys.getsizeof(L) for L in alist))

The actual sizes printed will depend on how sparse the matrices are, but
for the same above (approximately half full), using Python 2.7, the
figures I get are:

adict: 24712
alist: 23127324

Now let's test the speed:

# randomly select matrix coordinates to look-up
test_coords = []
for i in range(1000):
x = randrange(2400)
y = randrange(2400)
test_coords.append((x, y))

# build some test code
from timeit import Timer
setup = "from __main__ import adict, alist, test_coords"
t1 = Timer("for p in test_coords: obj = adict.get(p)", setup)
t2 = Timer("for p in test_coords: obj = alist[p[0]][p[1]]", setup)

# run the test code
print(min(t1.repeat(number=10000, repeat=7)))
print(min(t2.repeat(number=10000, repeat=7)))

Again, on my system using Python 2.7, I get:
3.13823986053
2.97539305687

which shows that the two are very close, but the list solution is about
6% faster. So in this situation, a list of lists uses about 100 times
more memory than a dict, but look-ups are about 6% faster.

I would be very surprised if the timing results depended on the number of
objects in the matrices.

In case you are not familiar with timeit, let me explain what I have done:

* I pre-select 1000 random coordinates.
* I write some test code inside a Timer object that look up each of
those coordinates, plus some setup code.
* timeit runs the setup code once.
* Then it runs my test code 10000 times as a single trial, timing
it as accurately as possible.
* It does seven trials, and I report the best of the seven.

The time printed is measured in seconds. In this case, I get 3 seconds
per trial, or 3e-7 seconds = 0.3 microseconds per lookup.

I know the fastest way to retrieve them would be to have them store
pointers to their neighbors, then use those for retrieval.

How do you know that?

No offence, but if you can't even work out whether lookups in a dict or a
list are faster, I can't imagine why you think you can intuit what the
fastest way to retrieve the nearest neighbours would be.

Steven D'Aprano · Oct 4, 2012

adict: 24712
alist: 23127324 [...]
So in this situation, a list of lists uses about 100 times
more memory than a dict, but look-ups are about 6% faster.

Correction: about 1000 times more memory. Sorry for the typo.

Thomas Rachel · Oct 4, 2012

Am 04.10.2012 03:58 schrieb Steven D'Aprano:

alist = [[None]*2400 for i in range(2400)]
from random import randrange
for i in range(1000):
x = randrange(2400)
y = randrange(2400)
adict[(x, y)] = "something"
alist[x][y] = "something"

The actual sizes printed will depend on how sparse the matrices are, but
for the same above (approximately half full),

I wouldn't consider 1000 of 5760000 "half full"...

Thomas

Steven D'Aprano · Oct 5, 2012

Am 04.10.2012 03:58 schrieb Steven D'Aprano:

alist = [[None]*2400 for i in range(2400)] from random import randrange
for i in range(1000):
x = randrange(2400)
y = randrange(2400)
adict[(x, y)] = "something"
alist[x][y] = "something"

Click to expand...

The actual sizes printed will depend on how sparse the matrices are,
but for the same above (approximately half full),

Click to expand...

I wouldn't consider 1000 of 5760000 "half full"...

Doh! I obviously can't multiply...

I mean, well done, that was a deliberate test to see who was paying
attention, and you passed!

Trouble with prediction code, for the life of me I can't figure out why it isnt running properly. Help would be appreciated.	0	Jul 8, 2023
Data structure recommendation?	8	Apr 7, 2008
need some advice on x y plot	30	Oct 20, 2005
Request for source code review of simple Ising model	88	Apr 10, 2014
Best practise implementation for equal by value objects	11	Aug 6, 2008
good (x,y) graph plotting code for classic ASP?	2	May 19, 2008
pre-PEP generic objects	44	Nov 29, 2004
Suitable data structure for a "context sensitive" 3-tuple?	12	Jul 8, 2005

fastest data structure for retrieving objects identified by (x,y)tuple?

Benjamin Jessup

Steven D'Aprano

Steven D'Aprano

Thomas Rachel

Steven D'Aprano

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads