fastest data structure for retrieving objects identified by (x,y)tuple?

  • Thread starter Benjamin Jessup
  • Start date
B

Benjamin Jessup

I have a group of objects identified by unique (x,y) pairs and I want to
find out an object's "neighbors" in a matrix of size 2400 x 2400.
#############
#obj# # #
#############
# # #obj# 3 x 3 Example
#############
# # # #
#############
There is either a neighbor, or a null value. I always know the (x,y)
pair to check the neighbors of, so is doing,
obj = grid[x][y] #lists, doesn't scale with num of objects or,
obj = grid.get((x,y),None) #dictionary, scales with num of objects
the fastest? I can't seem to find a conclusion by testing each alone,
then in the full environment. Is it that, depending on the number of
objects, each has an advantage?

I know the fastest way to retrieve them would be to have them store
pointers to their neighbors, then use those for retrieval. When large
numbers of objects are changing their (x,y) pairs, rebuilding the
pointers is too slow.
 
S

Steven D'Aprano

I have a group of objects identified by unique (x,y) pairs and I want to
find out an object's "neighbors" in a matrix of size 2400 x 2400. [...]
There is either a neighbor, or a null value. I always know the (x,y)
pair to check the neighbors of, so is doing,
obj = grid[x][y] #lists, doesn't scale with num of objects or,
obj = grid.get((x,y),None) #dictionary, scales with num of objects
the fastest? I can't seem to find a conclusion by testing each alone,
then in the full environment. Is it that, depending on the number of
objects, each has an advantage?

Almost certainly not. Since you don't show how you test each, I have no
idea why you can't find a conclusion.

Let's start off with the question you don't ask: which uses less memory?
The answer is, the dict solution by far. Here is a test:

# populate a random matrix using both dict and list
adict = {}
alist = [[None]*2400 for i in range(2400)]
from random import randrange
for i in range(1000):
x = randrange(2400)
y = randrange(2400)
adict[(x, y)] = "something"
alist[x][y] = "something"

import sys
print(sys.getsizeof(adict))
print(sys.getsizeof(alist) + sum(sys.getsizeof(L) for L in alist))


The actual sizes printed will depend on how sparse the matrices are, but
for the same above (approximately half full), using Python 2.7, the
figures I get are:

adict: 24712
alist: 23127324


Now let's test the speed:

# randomly select matrix coordinates to look-up
test_coords = []
for i in range(1000):
x = randrange(2400)
y = randrange(2400)
test_coords.append((x, y))

# build some test code
from timeit import Timer
setup = "from __main__ import adict, alist, test_coords"
t1 = Timer("for p in test_coords: obj = adict.get(p)", setup)
t2 = Timer("for p in test_coords: obj = alist[p[0]][p[1]]", setup)

# run the test code
print(min(t1.repeat(number=10000, repeat=7)))
print(min(t2.repeat(number=10000, repeat=7)))


Again, on my system using Python 2.7, I get:
3.13823986053
2.97539305687

which shows that the two are very close, but the list solution is about
6% faster. So in this situation, a list of lists uses about 100 times
more memory than a dict, but look-ups are about 6% faster.

I would be very surprised if the timing results depended on the number of
objects in the matrices.

In case you are not familiar with timeit, let me explain what I have done:

* I pre-select 1000 random coordinates.
* I write some test code inside a Timer object that look up each of
those coordinates, plus some setup code.
* timeit runs the setup code once.
* Then it runs my test code 10000 times as a single trial, timing
it as accurately as possible.
* It does seven trials, and I report the best of the seven.

The time printed is measured in seconds. In this case, I get 3 seconds
per trial, or 3e-7 seconds = 0.3 microseconds per lookup.

I know the fastest way to retrieve them would be to have them store
pointers to their neighbors, then use those for retrieval.

How do you know that?

No offence, but if you can't even work out whether lookups in a dict or a
list are faster, I can't imagine why you think you can intuit what the
fastest way to retrieve the nearest neighbours would be.
 
S

Steven D'Aprano

adict: 24712
alist: 23127324 [...]
So in this situation, a list of lists uses about 100 times
more memory than a dict, but look-ups are about 6% faster.

Correction: about 1000 times more memory. Sorry for the typo.
 
T

Thomas Rachel

Am 04.10.2012 03:58 schrieb Steven D'Aprano:
alist = [[None]*2400 for i in range(2400)]
from random import randrange
for i in range(1000):
x = randrange(2400)
y = randrange(2400)
adict[(x, y)] = "something"
alist[x][y] = "something"
The actual sizes printed will depend on how sparse the matrices are, but
for the same above (approximately half full),

I wouldn't consider 1000 of 5760000 "half full"...


Thomas
 
S

Steven D'Aprano

Am 04.10.2012 03:58 schrieb Steven D'Aprano:
alist = [[None]*2400 for i in range(2400)] from random import randrange
for i in range(1000):
x = randrange(2400)
y = randrange(2400)
adict[(x, y)] = "something"
alist[x][y] = "something"
The actual sizes printed will depend on how sparse the matrices are,
but for the same above (approximately half full),

I wouldn't consider 1000 of 5760000 "half full"...

Doh! I obviously can't multiply...

I mean, well done, that was a deliberate test to see who was paying
attention, and you passed!

:)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,576
Members
45,054
Latest member
LucyCarper

Latest Threads

Top