# fastest data structure for retrieving objects identified by (x,y)tuple?

Discussion in 'Python' started by Benjamin Jessup, Oct 3, 2012.

1. ### Benjamin JessupGuest

I have a group of objects identified by unique (x,y) pairs and I want to
find out an object's "neighbors" in a matrix of size 2400 x 2400.
#############
#obj# # #
#############
# # #obj# 3 x 3 Example
#############
# # # #
#############
There is either a neighbor, or a null value. I always know the (x,y)
pair to check the neighbors of, so is doing,
>> obj = grid[x][y] #lists, doesn't scale with num of objects

or,
>> obj = grid.get((x,y),None) #dictionary, scales with num of objects

the fastest? I can't seem to find a conclusion by testing each alone,
then in the full environment. Is it that, depending on the number of

I know the fastest way to retrieve them would be to have them store
pointers to their neighbors, then use those for retrieval. When large
numbers of objects are changing their (x,y) pairs, rebuilding the
pointers is too slow.

Benjamin Jessup, Oct 3, 2012

2. ### Steven D'ApranoGuest

Re: fastest data structure for retrieving objects identified by(x,y)tuple?

On Wed, 03 Oct 2012 18:30:24 -0400, Benjamin Jessup wrote:

> I have a group of objects identified by unique (x,y) pairs and I want to
> find out an object's "neighbors" in a matrix of size 2400 x 2400.

[...]
> There is either a neighbor, or a null value. I always know the (x,y)
> pair to check the neighbors of, so is doing,
> >> obj = grid[x][y] #lists, doesn't scale with num of objects

> or,
> >> obj = grid.get((x,y),None) #dictionary, scales with num of objects

> the fastest? I can't seem to find a conclusion by testing each alone,
> then in the full environment. Is it that, depending on the number of
> objects, each has an advantage?

Almost certainly not. Since you don't show how you test each, I have no
idea why you can't find a conclusion.

Let's start off with the question you don't ask: which uses less memory?
The answer is, the dict solution by far. Here is a test:

# populate a random matrix using both dict and list
alist = [[None]*2400 for i in range(2400)]
from random import randrange
for i in range(1000):
x = randrange(2400)
y = randrange(2400)
alist[x][y] = "something"

import sys
print(sys.getsizeof(alist) + sum(sys.getsizeof(L) for L in alist))

The actual sizes printed will depend on how sparse the matrices are, but
for the same above (approximately half full), using Python 2.7, the
figures I get are:

alist: 23127324

Now let's test the speed:

# randomly select matrix coordinates to look-up
test_coords = []
for i in range(1000):
x = randrange(2400)
y = randrange(2400)
test_coords.append((x, y))

# build some test code
from timeit import Timer
setup = "from __main__ import adict, alist, test_coords"
t1 = Timer("for p in test_coords: obj = adict.get(p)", setup)
t2 = Timer("for p in test_coords: obj = alist[p[0]][p[1]]", setup)

# run the test code
print(min(t1.repeat(number=10000, repeat=7)))
print(min(t2.repeat(number=10000, repeat=7)))

Again, on my system using Python 2.7, I get:
3.13823986053
2.97539305687

which shows that the two are very close, but the list solution is about
6% faster. So in this situation, a list of lists uses about 100 times
more memory than a dict, but look-ups are about 6% faster.

I would be very surprised if the timing results depended on the number of
objects in the matrices.

In case you are not familiar with timeit, let me explain what I have done:

* I pre-select 1000 random coordinates.
* I write some test code inside a Timer object that look up each of
those coordinates, plus some setup code.
* timeit runs the setup code once.
* Then it runs my test code 10000 times as a single trial, timing
it as accurately as possible.
* It does seven trials, and I report the best of the seven.

The time printed is measured in seconds. In this case, I get 3 seconds
per trial, or 3e-7 seconds = 0.3 microseconds per lookup.

> I know the fastest way to retrieve them would be to have them store
> pointers to their neighbors, then use those for retrieval.

How do you know that?

No offence, but if you can't even work out whether lookups in a dict or a
list are faster, I can't imagine why you think you can intuit what the
fastest way to retrieve the nearest neighbours would be.

--
Steven

Steven D'Aprano, Oct 4, 2012

3. ### Steven D'ApranoGuest

Re: fastest data structure for retrieving objects identified by(x,y)tuple?

On Thu, 04 Oct 2012 01:58:16 +0000, Steven D'Aprano wrote:

> alist: 23127324

[...]
> So in this situation, a list of lists uses about 100 times
> more memory than a dict, but look-ups are about 6% faster.

Correction: about 1000 times more memory. Sorry for the typo.

--
Steven

Steven D'Aprano, Oct 4, 2012
4. ### Thomas RachelGuest

Am 04.10.2012 03:58 schrieb Steven D'Aprano:
> alist = [[None]*2400 for i in range(2400)]
> from random import randrange
> for i in range(1000):
> x = randrange(2400)
> y = randrange(2400)
> alist[x][y] = "something"

> The actual sizes printed will depend on how sparse the matrices are, but
> for the same above (approximately half full),

I wouldn't consider 1000 of 5760000 "half full"...

Thomas

Thomas Rachel, Oct 4, 2012
5. ### Steven D'ApranoGuest

Re: fastest data structure for retrieving objects identified by(x,y) tuple?

On Thu, 04 Oct 2012 18:11:28 +0200, Thomas Rachel wrote:

> Am 04.10.2012 03:58 schrieb Steven D'Aprano:
>> alist = [[None]*2400 for i in range(2400)] from random import randrange
>> for i in range(1000):
>> x = randrange(2400)
>> y = randrange(2400)
>> alist[x][y] = "something"

>
>> The actual sizes printed will depend on how sparse the matrices are,
>> but for the same above (approximately half full),

>
> I wouldn't consider 1000 of 5760000 "half full"...

Doh! I obviously can't multiply...

I mean, well done, that was a deliberate test to see who was paying
attention, and you passed!

--
Steven

Steven D'Aprano, Oct 5, 2012