nearest neighbor in 2D

Discussion in 'Python' started by John Hunter, Nov 3, 2003.

  1. John Hunter

    John Hunter Guest

    I have a list of two tuples containing x and y coord

    (x0, y0)
    (x1, y1)
    ...
    (xn, yn)

    Given a new point x,y, I would like to find the point in the list
    closest to x,y. I have to do this a lot, in an inner loop, and then I
    add each new point x,y to the list. I know the range of x and y in
    advance.

    One solution that comes to mind is to partition to space into
    quadrants and store the elements by quadrant. When a new element
    comes in, identify it's quadrant and only search the appropriate
    quadrant for nearest neighbor. This could be done recursively, a 2D
    binary search of sorts....

    Can anyone point me to some code or module that provides the
    appropriate data structures and algorithms to handle this task
    efficiently? The size of the list will likely be in the range of
    10-1000 elements.

    Thanks,
    John Hunter
    John Hunter, Nov 3, 2003
    #1
    1. Advertising

  2. John Hunter

    Isaac To Guest

    >>>>> "John" == John Hunter <> writes:

    John> Given a new point x,y, I would like to find the point in the list
    John> closest to x,y. I have to do this a lot, in an inner loop, and
    John> then I add each new point x,y to the list. I know the range of x
    John> and y in advance.

    John> One solution that comes to mind is to partition to space into
    John> quadrants and store the elements by quadrant. When a new element
    John> comes in, identify it's quadrant and only search the appropriate
    John> quadrant for nearest neighbor. This could be done recursively, a
    John> 2D binary search of sorts....

    By recursion your solution would work in O(log n) time. The construction
    would take O(n log n) time. Unluckily, it can return the wrong point, as
    the nearest point within the nearest quadrant might not be the nearest
    point.

    The problem is a well-studied basic computational geometry problem, although
    I don't really know any Python code that actually do it. Try to look at the
    web for "Voronoi diagrams" and "radial triangulation" to understand how to
    solve it properly in the above mentioned (perhaps randomized) time
    complexity.

    Regards,
    Isaac.
    Isaac To, Nov 3, 2003
    #2
    1. Advertising

  3. John Hunter

    Noen Guest

    John Hunter wrote:

    > I have a list of two tuples containing x and y coord
    >
    > (x0, y0)
    > (x1, y1)
    > ...
    > (xn, yn)
    >
    > Given a new point x,y, I would like to find the point in the list
    > closest to x,y. I have to do this a lot, in an inner loop, and then I
    > add each new point x,y to the list. I know the range of x and y in
    > advance.
    >
    > One solution that comes to mind is to partition to space into
    > quadrants and store the elements by quadrant. When a new element
    > comes in, identify it's quadrant and only search the appropriate
    > quadrant for nearest neighbor. This could be done recursively, a 2D
    > binary search of sorts....
    >
    > Can anyone point me to some code or module that provides the
    > appropriate data structures and algorithms to handle this task
    > efficiently? The size of the list will likely be in the range of
    > 10-1000 elements.
    >
    > Thanks,
    > John Hunter
    >
    >

    You could to a for loop, and inside that loop you will have a variable
    lessest_distance. I dont know much geometric mathematics, but Im pretty
    sure you can use pytagoras stuff to find the lenght from (Xn,Yn) to
    (X,Y) using sinus cosinus and such.

    And when the function is finished, you should return lessest_distance
    Noen, Nov 3, 2003
    #3
  4. John Hunter

    Graham Lee Guest

    John Hunter wrote:

    >
    > One solution that comes to mind is to partition to space into
    > quadrants and store the elements by quadrant. When a new element
    > comes in, identify it's quadrant and only search the appropriate
    > quadrant for nearest neighbor. This could be done recursively, a 2D
    > binary search of sorts....
    >


    What happens when you put a particle in near/at the boundary of a quadrant
    though? It's possible for the nearest neighbour to be in the nearest
    neighbour quadrant...although you could search over these as well.
    However, the number of independent areas implied by use of the word
    'quadrant' suggests that this would be the same as iterating over all
    space.... :)
    --
    Graham Lee
    Wadham College
    Oxford
    Graham Lee, Nov 3, 2003
    #4
  5. John Hunter

    Ron Adam Guest

    On Sun, 02 Nov 2003 22:12:47 -0600, John Hunter
    <> wrote:

    >
    >I have a list of two tuples containing x and y coord
    >
    > (x0, y0)
    > (x1, y1)
    > ...
    > (xn, yn)
    >
    >Given a new point x,y, I would like to find the point in the list
    >closest to x,y. I have to do this a lot, in an inner loop, and then I
    >add each new point x,y to the list. I know the range of x and y in
    >advance.
    >
    >One solution that comes to mind is to partition to space into
    >quadrants and store the elements by quadrant. When a new element
    >comes in, identify it's quadrant and only search the appropriate
    >quadrant for nearest neighbor. This could be done recursively, a 2D
    >binary search of sorts....
    >
    >Can anyone point me to some code or module that provides the
    >appropriate data structures and algorithms to handle this task
    >efficiently? The size of the list will likely be in the range of
    >10-1000 elements.
    >
    >Thanks,
    >John Hunter
    >



    This is how I would do it. Maybe it's how you are already doing it?


    import math
    import random

    n_points = 1000
    max_x = 1000
    max_y = 1000
    closest_distance = 10000
    closest_point = (max_x,max_y)

    p = []
    for i in xrange(n_points):
    x = round(max_x*random.random())
    y = round(max_y*random.random())
    p.append((x, y))

    new_point = (round(max_x*random.random()), \
    round(max_y*random.random()))

    for point in p:
    distance = math.sqrt((new_point[0]-point[0])**2 \
    +(new_point[1]-point[1])**2)
    if distance < closest_distance:
    closest_distance = distance
    closest_point = point

    print 'new_point:', new_point
    print 'closest_point:', closest_point,' \
    out of',n_points,'points.'


    I really don't know how you can make this faster. There might be a
    library that has a distance between two points function that could
    speed it up.

    Ronald R. Adam
    Ron Adam, Nov 4, 2003
    #5
  6. > >I have a list of two tuples containing x and y coord
    > >
    > > (x0, y0)
    > > (x1, y1)
    > > ...
    > > (xn, yn)
    > >
    > >Given a new point x,y, I would like to find the point in the list
    > >closest to x,y. I have to do this a lot, in an inner loop, and then I
    > >add each new point x,y to the list. I know the range of x and y in
    > >advance.


    Here's some not-very-heavily-tested code for doing this using a kD-tree.

    Worst case efficiency is still linear per point or quadratic total
    (unlike some other more sophisticated data structures) but in practice
    if your points are reasonably well behaved this should be pretty good;
    e.g. I tried it with 10000 random points (each queried then added) and
    it made only 302144 recursive calls to nearestNeighbor.

    Also note that only the test code at the end restricts to two
    dimensions, everything else works in arbitrary numbers of dimensions.

    def dist2(p,q):
    """Squared distance between p and q."""
    d = 0
    for i in range(len(p)):
    d += (p-q)**2
    return d

    class kdtree:
    def __init__(self,dim=2,index=0):
    self.dim = dim
    self.index = index
    self.split = None

    def addPoint(self,p):
    """Include another point in the kD-tree."""
    if self.split is None:
    self.split = p
    self.left = kdtree(self.dim, (self.index + 1) % self.dim)
    self.right = kdtree(self.dim, (self.index + 1) % self.dim)
    elif self.split[self.index] < p[self.index]:
    self.left.addPoint(p)
    else:
    self.right.addPoint(p)

    def nearestNeighbor(self,q,maxdist2):
    """Find pair (d,p) where p is nearest neighbor and d is squared
    distance to p. Returned distance must be within maxdist2; if
    not, no point itself is returned.
    """
    solution = (maxdist2+1,None)
    if self.split is not None:
    solution = min(solution, (dist2(self.split,q),self.split))
    d2split = (self.split[self.index] - q[self.index])**2
    if self.split[self.index] < p[self.index]:
    solution = min(solution,
    self.left.nearestNeighbor(q,solution[0]))
    if d2split < solution[0]:
    solution = min(solution,
    self.right.nearestNeighbor(q,solution[0]))
    else:
    solution = min(solution,
    self.right.nearestNeighbor(q,solution[0]))
    if d2split < solution[0]:
    solution = min(solution,
    self.left.nearestNeighbor(q,solution[0]))
    return solution

    if __name__ == "__main__":
    import math
    import random

    n_points = 50
    max_x = 1000
    max_y = 1000
    max_dist2 = max_x**2 + max_y**2

    k = kdtree()
    for i in range(n_points):
    x = round(max_x*random.random())
    y = round(max_y*random.random())
    p = (x,y)

    if i == 0:
    print 'new point',p
    else:
    d,q = k.nearestNeighbor(p,max_dist2)
    print 'new point', p, 'has neighbor',
    print q, 'at distance', math.sqrt(d)

    k.addPoint(p)

    --
    David Eppstein http://www.ics.uci.edu/~eppstein/
    Univ. of California, Irvine, School of Information & Computer Science
    David Eppstein, Nov 4, 2003
    #6
  7. On Sun, 02 Nov 2003 22:12:47 -0600, John Hunter <> wrote:

    >
    >I have a list of two tuples containing x and y coord
    >
    > (x0, y0)
    > (x1, y1)
    > ...
    > (xn, yn)
    >
    >Given a new point x,y, I would like to find the point in the list
    >closest to x,y. I have to do this a lot, in an inner loop, and then I
    >add each new point x,y to the list. I know the range of x and y in
    >advance.


    Are you trying to find closest location to a mouse cursor as it moves,
    and then adding a point when there's a click? I.e., your particular use case
    might suggest a strategy that's different from, e.g., what you'd do if
    each new point's coordinates where read from file or came from a generator,
    and you had exactly one search leading to exactly one update of the set.
    And also what you wanted to do with the completed set.

    >
    >One solution that comes to mind is to partition to space into
    >quadrants and store the elements by quadrant. When a new element
    >comes in, identify it's quadrant and only search the appropriate
    >quadrant for nearest neighbor. This could be done recursively, a 2D
    >binary search of sorts....


    This might be a way of pruning, but you'd have to take into account that
    nearest square doesn't guarantee nearest diagonal distance. Just blathering
    off the top of my head, ... I think I would try dividing x and y into maybe a
    16*16 grid of squares. A new point will fall into one of those, and then if
    you find some existing points in that square, you could brute force find the
    closest (remembering that comparing squared radial distances works as well as
    comparing their square roots ;-) and then see if that shortest distance can
    reach into any adjacent squares, and search those too if so, since there could be
    a point just the other side of the border, or diagonally across an adjacent
    corner that could be closer than your currently determined distance.

    You could keep info about points in a square in lists or dicts (16*16 might
    be sparsely populated, best in a dict of squares accessed by grid coordinates
    (i.e., 4 bits apiece, maybe as tuple or combined as a single number (but then
    you could use a list pre-populated with None's instead of a dict, so either way).

    I guess in the extreme you could compute a complete table of nearest point
    coordinates for every possible x,y point, so you'd have a raster map of voronoi
    regions, with each region colored by the coordinates of its nearest point. The more
    points you had, the less info would have to be updated for each new point.
    I wonder when the crossover would occur ;-)
    >
    >Can anyone point me to some code or module that provides the
    >appropriate data structures and algorithms to handle this task
    >efficiently? The size of the list will likely be in the range of
    >10-1000 elements.
    >

    What are the ranges of x and y?

    Regards,
    Bengt Richter
    Bengt Richter, Nov 4, 2003
    #7
  8. John Hunter

    John Hunter Guest

    >>>>> "Bengt" == Bengt Richter <> writes:

    Bengt> Are you trying to find closest location to a mouse cursor
    Bengt> as it moves, and then adding a point when there's a click?
    Bengt> I.e., your particular use case might suggest a strategy
    Bengt> that's different from, e.g., what you'd do if each new
    Bengt> point's coordinates where read from file or came from a
    Bengt> generator, and you had exactly one search leading to
    Bengt> exactly one update of the set. And also what you wanted to
    Bengt> do with the completed set.

    I had two use cases just yesterday. The one that prompted the
    question arose in making a contour plot. I'm defining a contour as an
    ordered sequence of values over a 2D MxN matrix where the values
    differ from some target value by at most some tolerance. I maintain a
    list of i,j indices into the matrix for a given contour value, and
    follow the contour from a given i,j location by examining its
    neighbors. In order to close the loop (eg, when the contour finder
    has passed once around a level curve of a mountain, I want to test
    whether a given point i,j is close to a previously discovered point
    k,l. Since I have a list of these 2 tuple coordinates, I want to find
    the nearest neighbor in the list and terminate the contour when the
    nearest neighbor falls within some minimum distance

    3 4 5
    2 6
    13 1 7
    12 8
    11 10 9

    In the example above, I am traversing a contour identifying points in
    the order 1,2,3...; as above each point represents an i,j tuple which
    is an index into the matrix I am contouring. I would like to
    terminate the search at 13 rather than spiral around the existing
    contour 1-12. Each time I add a new point to the contour, I would like
    to query the existing list (excluding the most recently added points
    which are close by construction) of locations and find the minimum
    distance. If I'm not too close to the preexisting contour, I add the
    new point and proceed.

    As I write this I realize there is an important efficiency. Since
    from an existing point I add the closest neighbor, the biggest step I
    can make is 1,1. If on the last nearest neighbor query I find a
    minimum distance of d, it will take me d minimum steps to approach the
    existing contour. So I don't need to check the distance again for at
    least d steps. So the algorithm can proceed 1) obtain the distance d
    from the existing contour to the most recently obtained point 2) make
    d steps adding points that meet the value criteria 3) repeat.

    The second use case arose with gdmodule, which can only allocate 256
    colors, which I cache as a dict from rgb tuples (eg, 0.0, 0.05, 1.0)
    to color. When the total number of color allocations is made, and a
    new rgb request comes into the color manager, I pick the already
    allocated point in rgb space closest to the requested point.

    I'll try David Eppstein's approach tomorrow and see how this fares.

    Thanks to all for suggestions,
    John Hunter
    John Hunter, Nov 4, 2003
    #8
  9. John Hunter

    Andrew Dalke Guest

    Ron Adam
    > for point in p:
    > distance = math.sqrt((new_point[0]-point[0])**2 \
    > +(new_point[1]-point[1])**2)


    > I really don't know how you can make this faster. There might be a
    > library that has a distance between two points function that could
    > speed it up.


    An easy way is to move the math.sqrt call outside the loop, since
    sqrt(d1) < sqrt(d2) iff d1 < d2 (when d1,d2>=0)

    Andrew
    Andrew Dalke, Nov 4, 2003
    #9
  10. John Hunter

    G.J.Giezeman Guest

    Isaac To wrote:

    >>>>>>"John" == John Hunter <> writes:

    >
    >
    > John> Given a new point x,y, I would like to find the point in the list
    > John> closest to x,y. I have to do this a lot, in an inner loop, and
    > John> then I add each new point x,y to the list. I know the range of x
    > John> and y in advance.
    >
    > John> One solution that comes to mind is to partition to space into
    > John> quadrants and store the elements by quadrant. When a new element
    > John> comes in, identify it's quadrant and only search the appropriate
    > John> quadrant for nearest neighbor. This could be done recursively, a
    > John> 2D binary search of sorts....
    >
    > By recursion your solution would work in O(log n) time. The construction
    > would take O(n log n) time. Unluckily, it can return the wrong point, as
    > the nearest point within the nearest quadrant might not be the nearest
    > point.
    >
    > The problem is a well-studied basic computational geometry problem, although
    > I don't really know any Python code that actually do it. Try to look at the
    > web for "Voronoi diagrams" and "radial triangulation" to understand how to
    > solve it properly in the above mentioned (perhaps randomized) time
    > complexity.
    >
    > Regards,
    > Isaac.


    A solution in C++ is using the CGAL-library (www.cgal.org). Look in the
    index of the basic library and search for 'nearest'. It will point you
    to Delaunay triangulations, which, together with a triangulation
    hierarchy, will give O(log n) time complexity, except in pathological
    cases. You can call C++ code from python.
    B.t.w., there will be a new release of the CGAL library very soon
    (probably this week).
    G.J.Giezeman, Nov 4, 2003
    #10
  11. Andrew Dalke wrote:

    > Ron Adam
    >> for point in p:
    >> distance = math.sqrt((new_point[0]-point[0])**2 \
    >> +(new_point[1]-point[1])**2)

    >
    >> I really don't know how you can make this faster. There might be a


    Hmmm, that's what math.hypot is for, isn't it...?

    [alex@lancelot Lib]$ timeit.py -c -s'import math; p=1.6,2.5; np=2.4,1.3'
    'math.sqrt((np[0]-p[0])**2 + (np[1]-p[1])**2)'
    100000 loops, best of 3: 3 usec per loop

    [alex@lancelot Lib]$ timeit.py -c -s'import math; p=1.6,2.5; np=2.4,1.3'
    'math.hypot(np[0]-p[0], np[1]-p[1])'
    100000 loops, best of 3: 1.9 usec per loop


    >> library that has a distance between two points function that could
    >> speed it up.

    >
    > An easy way is to move the math.sqrt call outside the loop, since
    > sqrt(d1) < sqrt(d2) iff d1 < d2 (when d1,d2>=0)


    Yes, omitting the math.sqrt gives the same speed as calling math.hypot,
    and it's the classic solution to speed up minimum-distance problems.

    I vaguely suspect you could shave some further fraction of a microsecond
    by saving those differences as dx and dy and then computing dx*dx+dy*dy --
    since another classic tip is that a**2 is slower than a*2. Let's see...:

    [alex@lancelot Lib]$ timeit.py -c -s'import math; p=1.6,2.5; np=2.4,1.3'
    'dx=np[0]-p[0]; dy=np[1]-p[1]; disq=dx*dx+dy*dy'
    1000000 loops, best of 3: 1.39 usec per loop

    ....yep, another small enhancement. Ain't measuring _FUN_?-)


    Alex
    Alex Martelli, Nov 4, 2003
    #11
  12. John Hunter

    Peter Otten Guest

    Alex Martelli wrote:

    > Andrew Dalke wrote:
    >
    >> Ron Adam
    >>> for point in p:
    >>> distance = math.sqrt((new_point[0]-point[0])**2 \
    >>> +(new_point[1]-point[1])**2)

    >>
    >>> I really don't know how you can make this faster. There might be a

    >
    > Hmmm, that's what math.hypot is for, isn't it...?
    >
    > [alex@lancelot Lib]$ timeit.py -c -s'import math; p=1.6,2.5; np=2.4,1.3'
    > 'math.sqrt((np[0]-p[0])**2 + (np[1]-p[1])**2)'
    > 100000 loops, best of 3: 3 usec per loop
    >
    > [alex@lancelot Lib]$ timeit.py -c -s'import math; p=1.6,2.5; np=2.4,1.3'
    > 'math.hypot(np[0]-p[0], np[1]-p[1])'
    > 100000 loops, best of 3: 1.9 usec per loop
    >
    >
    >>> library that has a distance between two points function that could
    >>> speed it up.

    >>
    >> An easy way is to move the math.sqrt call outside the loop, since
    >> sqrt(d1) < sqrt(d2) iff d1 < d2 (when d1,d2>=0)

    >
    > Yes, omitting the math.sqrt gives the same speed as calling math.hypot,
    > and it's the classic solution to speed up minimum-distance problems.
    >
    > I vaguely suspect you could shave some further fraction of a microsecond
    > by saving those differences as dx and dy and then computing dx*dx+dy*dy --
    > since another classic tip is that a**2 is slower than a*2. Let's see...:
    >
    > [alex@lancelot Lib]$ timeit.py -c -s'import math; p=1.6,2.5; np=2.4,1.3'
    > 'dx=np[0]-p[0]; dy=np[1]-p[1]; disq=dx*dx+dy*dy'
    > 1000000 loops, best of 3: 1.39 usec per loop
    >
    > ...yep, another small enhancement. Ain't measuring _FUN_?-)


    Finally found an application for complex numbers:

    ....> timeit.py -s"p= 1.6+2.5j; np=2.4+1.3j" "d=abs(p-np)"
    1000000 loops, best of 3: 0.436 usec per loop

    ....> timeit.py -s"p= 1.6,2.5; np=2.4,1.3" "dx=np[0]-p[0];
    dy=np[1]-p[1];d=dx*dx+dy*dy"
    1000000 loops, best of 3: 1.15 usec per loop

    This is of course all premature optimization as the most promising approach
    is to try hard to reduce the number of candidate points, as David Eppstein
    seems to have done. But then, he could use complex numbers, too.

    Peter
    Peter Otten, Nov 4, 2003
    #12
  13. In article <bo8j4v$tqt$03$-online.com>,
    Peter Otten <> wrote:

    > This is of course all premature optimization as the most promising approach
    > is to try hard to reduce the number of candidate points, as David Eppstein
    > seems to have done. But then, he could use complex numbers, too.


    Well, yes, but then my code wouldn't work very well in dimensions higher
    than two...

    --
    David Eppstein http://www.ics.uci.edu/~eppstein/
    Univ. of California, Irvine, School of Information & Computer Science
    David Eppstein, Nov 4, 2003
    #13
  14. John Hunter

    Ron Levine Guest

    On Tue, 04 Nov 2003 00:03:31 GMT, Ron Adam <>
    wrote:

    >On Sun, 02 Nov 2003 22:12:47 -0600, John Hunter
    ><> wrote:
    >
    >>
    >>I have a list of two tuples containing x and y coord
    >>
    >> (x0, y0)
    >> (x1, y1)
    >> ...
    >> (xn, yn)
    >>
    >>Given a new point x,y, I would like to find the point in the list
    >>closest to x,y. I have to do this a lot, in an inner loop, and then I
    >>add each new point x,y to the list. I know the range of x and y in
    >>advance.
    >>
    >>One solution that comes to mind is to partition to space into
    >>quadrants and store the elements by quadrant. When a new element
    >>comes in, identify it's quadrant and only search the appropriate
    >>quadrant for nearest neighbor.
    >>This could be done recursively, a 2D
    >>binary search of sorts....
    >>
    >>Can anyone point me to some code or module that provides the
    >>appropriate data structures and algorithms to handle this task
    >>efficiently? The size of the list will likely be in the range of
    >>10-1000 elements.
    >>
    >>Thanks,
    >>John Hunter
    >>

    >
    >
    >This is how I would do it. Maybe it's how you are already doing it?
    >
    >
    >import math
    >import random
    >
    >n_points = 1000
    >max_x = 1000
    >max_y = 1000
    >closest_distance = 10000
    >closest_point = (max_x,max_y)
    >
    >p = []
    >for i in xrange(n_points):
    > x = round(max_x*random.random())
    > y = round(max_y*random.random())
    > p.append((x, y))
    >
    >new_point = (round(max_x*random.random()), \
    > round(max_y*random.random()))
    >
    >for point in p:
    > distance = math.sqrt((new_point[0]-point[0])**2 \
    > +(new_point[1]-point[1])**2)
    > if distance < closest_distance:
    > closest_distance = distance
    > closest_point = point
    >
    >print 'new_point:', new_point
    >print 'closest_point:', closest_point,' \
    > out of',n_points,'points.'
    >
    >
    >I really don't know how you can make this faster. There might be a


    For one thing, you do not have to evaluate the square root. The
    minimum distance occurs for the same point as the minimum distance
    squared.
    Ron Levine, Nov 4, 2003
    #14
  15. John Hunter

    John Hunter Guest

    >>>>> "Ron" == Ron Adam <> writes:

    Ron> I really don't know how you can make this faster. There
    Ron> might be a library that has a distance between two points
    Ron> function that could speed it up.

    If you only had to compare one point to all the other points, then the
    brute force approach -- check every point -- will work great. This is
    O(N) and I don't think you can beat it. The idea is that I will be
    repeatedly checking and adding points to the list, so it is worthwhile
    at the outset to set up a data structure which allows a more efficient
    search.

    The analogy is a binary search in 1D. If you plan to repeatedly
    search a (possibly growing) list of numbers to see whether it contains
    some number or find the nearest neighbor to a number, it is worthwhile
    at the outset to put them in a data structure that allows a binary
    search. Setting up the initial data structure costs you some time,
    but subsequent searches are O(log2(N)). See google for 'binary
    search' and the python module bisect.

    So roughly, for a list with 1,000,000 elements, your brute force
    approach requires a million comparisons per search. If the data is
    setup for binary search, on average only 13-14 comparisons will be
    required. Well worth the effort if you need to do a lot of searches,
    as I do.

    John Hunter
    John Hunter, Nov 4, 2003
    #15
  16. At 2003-11-03T04:12:47Z, John Hunter <> writes:

    > One solution that comes to mind is to partition to space into quadrants
    > and store the elements by quadrant. When a new element comes in, identify
    > it's quadrant and only search the appropriate quadrant for nearest
    > neighbor.


    Erm, no. Imagine that your new point is in one corner of a quadrant. The
    other point in the quadrant is in the opposite corner. There is a point in
    the adjacent quadrant that is infinitessimaly close to your new point.
    That's where your algorithm breaks down.
    --
    Kirk Strauser
    The Day Companies
    Kirk Strauser, Nov 4, 2003
    #16
  17. On Tue, Nov 04, 2003 at 01:02:36PM -0600, John Hunter wrote:
    > >>>>> "Ron" == Ron Adam <> writes:

    >
    > Ron> I really don't know how you can make this faster. There
    > Ron> might be a library that has a distance between two points
    > Ron> function that could speed it up.
    >
    > If you only had to compare one point to all the other points, then the
    > brute force approach -- check every point -- will work great. This is
    > O(N) and I don't think you can beat it. The idea is that I will be
    > repeatedly checking and adding points to the list, so it is worthwhile
    > at the outset to set up a data structure which allows a more efficient
    > search.
    >
    > The analogy is a binary search in 1D. If you plan to repeatedly
    > search a (possibly growing) list of numbers to see whether it contains
    > some number or find the nearest neighbor to a number, it is worthwhile
    > at the outset to put them in a data structure that allows a binary
    > search. Setting up the initial data structure costs you some time,
    > but subsequent searches are O(log2(N)). See google for 'binary
    > search' and the python module bisect.
    >
    > So roughly, for a list with 1,000,000 elements, your brute force
    > approach requires a million comparisons per search. If the data is
    > setup for binary search, on average only 13-14 comparisons will be
    > required. Well worth the effort if you need to do a lot of searches,
    > as I do.
    >
    > John Hunter


    Breaking into the thread late, I've been busy enough to put down c.l.py
    for a couple weeks.

    If you only need to compare 10-1000 points, try this approach below. I
    wrote it for the ICFP programming contest where I was sorting lots and
    lots of points lots and lots of times. It sorts a list of points for
    their manhattan distance from a particular point. I tweaked it until
    it was just fast enough to do what I wanted. I won't pretend it is
    optimal for all N, just that it was good enough for _my_ N. The speed
    trick is that the point we are sorting around is stored in an object that
    is created only once, and then we make the object's __call__ be the
    comparison function.

    Since it uses the list.sort() method we don't have to do anything more
    clever than write our comparison function in C. I trust the Timbot
    has written a better list.sort() than I ever will, so let's use it.

    it is used thusly:

    import icfp

    l = [your big list of point tuples like (1, 10)]
    p = (99, 22) # find nearest point in l to p

    sort_ob = icfp.manhattan_sort(p) # creates a callable Manhat object that remembers p

    l.sort(sort_ob) # sorts around p
    print "closest", l[0]
    print "farthest", l[-1]

    The below code is from my CVS tree, so it should probably work. It was
    written in a rush for a contest (JDH, you may recognize parts that are
    cut-n-pasted from "probstat") so YMMV.

    -jack

    NB, if there is a boost or pyrex guy listening you might want to
    throw in an easilly derived class that makes this [ab]use of __call__ easy.
    It was a big performance win for me for very little effort.


    """ icfp_module.c """

    #include "Python.h"
    #include <stdio.h>
    #include <stdlib.h>

    /*
    * stats module interface
    */

    static PyObject *ErrorObject;

    PyObject *manhat_new(PyObject *self, PyObject *args);

    static PyTypeObject Manhat_Type;

    static PyObject *
    heur1(PyObject *self, PyObject *args)
    {
    PyObject *tup1;
    PyObject *tup2;
    long x,y;

    if (!PyArg_ParseTuple(args, "O!O!", &PyTuple_Type, &tup1, &PyTuple_Type, &tup2)) {
    return NULL;
    }
    x = PyInt_AsLong(PyTuple_GET_ITEM(tup1,0)) - PyInt_AsLong(PyTuple_GET_ITEM(tup2,0));
    y = PyInt_AsLong(PyTuple_GET_ITEM(tup1,1)) - PyInt_AsLong(PyTuple_GET_ITEM(tup2,1));
    return PyInt_FromLong(labs(x * x) + labs(y * y));
    }

    static PyMethodDef stats_methods[] = {
    {"manhattan", heur1, METH_VARARGS},
    {"manhattan_sort", manhat_new, METH_VARARGS},
    {NULL, NULL} /* sentinel */
    };

    DL_EXPORT(void)
    initicfp(void)
    {
    PyObject *m, *d;

    PyPQueue_Type.ob_type = &PyType_Type;
    Manhat_Type.ob_type = &PyType_Type;

    /* Create the module and add the functions */
    m = Py_InitModule("icfp", stats_methods);

    /* Add some symbolic constants to the module */
    d = PyModule_GetDict(m);
    ErrorObject = PyErr_NewException("icfp.error", NULL, NULL);
    }

    """ manhattan.c """
    #include "Python.h"
    #include <stdio.h>

    #define ManhatObject_Check(v) ((v)->ob_type == &Manhat_Type)

    staticforward PyTypeObject Manhat_Type;

    typedef struct {
    PyObject_HEAD
    long x;
    long y;
    } ManhatObject;


    static void
    Manhat_dealloc(ManhatObject *self) {
    PyObject_Del(self);
    }

    static PyObject *
    Manhat_call(ManhatObject *self, PyObject *args)
    {
    PyObject *tup1;
    PyObject *tup2;
    long a;
    long b;
    long x;
    long y;

    if (!PyArg_ParseTuple(args, "O!O!", &PyTuple_Type, &tup1, &PyTuple_Type, &tup2)) {
    return NULL;
    }
    x = PyInt_AsLong(PyTuple_GET_ITEM(tup1,0)) - self->x;
    y = PyInt_AsLong(PyTuple_GET_ITEM(tup1,1)) - self->y;
    a = labs(x * x) + labs(y * y);

    x = PyInt_AsLong(PyTuple_GET_ITEM(tup2,0)) - self->x;
    y = PyInt_AsLong(PyTuple_GET_ITEM(tup2,1)) - self->y;
    b = labs(x * x) + labs(y * y);

    if (a == b)
    return PyInt_FromLong(0);
    else if (a < b)
    return PyInt_FromLong(-1);
    else
    return PyInt_FromLong(1);
    }

    static PyMethodDef Manhat_methods[] = {
    {NULL, NULL} /* sentinel */
    };

    statichere PyTypeObject Manhat_Type = {
    /* The ob_type field must be initialized in the module init function
    * to be portable to Windows without using C++. */
    PyObject_HEAD_INIT(&PyType_Type)
    0, /*ob_size*/
    "Manhat", /*tp_name*/
    sizeof(ManhatObject), /*tp_basicsize*/
    0, /*tp_itemsize*/
    /* methods */
    (destructor)Manhat_dealloc, /*tp_dealloc*/
    0, /*tp_print*/
    0, /*tp_getattr*/
    0, //(setattrfunc)Permute_setattr, /*tp_setattr*/
    0, /*tp_compare*/
    0, /*tp_repr*/
    0, /*tp_as_number*/
    0, /*tp_as_sequence*/
    0, /*tp_as_mapping*/
    0, /*tp_hash*/
    (ternaryfunc)Manhat_call, /*tp_call*/
    0, /*tp_str*/
    PyObject_GenericGetAttr, /*tp_getattro*/
    0, /*tp_setattro*/
    0, /*tp_as_buffer*/
    Py_TPFLAGS_DEFAULT, /*tp_flags*/
    0, /*tp_doc*/
    0, /*tp_traverse*/
    0, /*tp_clear*/
    0, /*tp_richcompare*/
    0, /*tp_weaklistoffset*/
    0, /*tp_iter*/
    0, /*tp_iternext*/
    Manhat_methods, /*tp_methods*/
    0, /*tp_members*/
    0, /*tp_getset*/
    0, /*tp_base*/
    0, /*tp_dict*/
    0, /*tp_descr_get*/
    0, /*tp_descr_set*/
    0, /*tp_dictoffset*/
    0, /*tp_init*/
    0, /*tp_alloc*/
    0, /*tp_new*/
    0, /*tp_free*/
    0, /*tp_is_gc*/
    };

    // not static so ifcp_module.c can see it
    PyObject *
    manhat_new(PyObject *self, PyObject *args)
    {
    ManhatObject *mh;
    PyObject *tup1;

    if (!PyArg_ParseTuple(args, "O!", &PyTuple_Type, &tup1)) {
    return NULL;
    }
    // call object create function and return
    mh = PyObject_New(ManhatObject, &Manhat_Type);
    if (mh == NULL)
    return NULL;

    mh->x = PyInt_AsLong(PyTuple_GET_ITEM(tup1,0));
    mh->y = PyInt_AsLong(PyTuple_GET_ITEM(tup1,1));
    return (PyObject *)mh;
    }

    """ setup.py """

    from distutils.core import setup, Extension

    files = [
    'manhattan.c',
    'icfp_module.c',
    ]

    libraries = []
    #libraries = ["efence"] # uncomment to use ElectricFence
    includes = []
    if (__name__ == '__main__'):
    setup(name = "icfp", version = "0.2",
    ext_modules = [Extension("icfp", files,
    libraries = libraries,
    include_dirs = includes,
    )
    ]
    )
    Jack Diederich, Nov 4, 2003
    #17
  18. John Hunter

    Ron Adam Guest

    On Tue, 04 Nov 2003 08:25:36 -0800, David Eppstein
    <> wrote:

    >In article <bo8j4v$tqt$03$-online.com>,
    > Peter Otten <> wrote:
    >
    >> This is of course all premature optimization as the most promising approach
    >> is to try hard to reduce the number of candidate points, as David Eppstein
    >> seems to have done. But then, he could use complex numbers, too.

    >
    >Well, yes, but then my code wouldn't work very well in dimensions higher
    >than two...



    I rearranged my first example to match the output of yours and used a
    random number seed to get identical results.

    Moving the square root to the return line of the find shortest
    distance function increased the speed of my routine about 20%. Then
    using the p*p form instead of p**2 added anouther 4%.

    With printing turned there is only a very small difference. Of course
    printing is the bottle neck. Turning off printing resulted in the
    following. All are best of 3 runs.

    1000 points:
    Standard loop: 0:00:00.958192
    Kdtree: 0:00:00.248096

    Quite a difference. I'm not quite sure how kdtree's work. (yet) But
    they can be worth while when working with large data sets.

    The standard loop seems to be better for small lists of < 100 points.

    100 points:
    Standard loop: 0:00:00.009966
    kdtree 0:00:00.015247

    But for really large lists.

    10000 points:
    Standard loop: 0:01:39.246454
    kdtree 0:00:03.424873

    Hehe.... no comparison.

    The number of calculations the standard loop does:

    100 new points, 4950 distance calculations.
    1000 new points, 499500 distance calculations.
    10000 new points, 49995000 distance calculations.

    And I don't know how to figure it for kdtree. But we can estimate it
    by using the ratio of the speeds.

    1000 points:
    kdtree (3.42/99.25)*49995000 = 1722749.62 est. dist. calculations.

    There's probably a better way to do that. Python is fun to do this
    stuff with. Playing around like this with other languages is just too
    much trouble.

    _Ron
    Ron Adam, Nov 4, 2003
    #18
  19. On Tue, 04 Nov 2003 17:13:58 +0100,
    Peter Otten <> wrote:
    > Alex Martelli wrote:
    >
    >> Andrew Dalke wrote:
    >>
    >>> Ron Adam
    >>>> for point in p:
    >>>> distance = math.sqrt((new_point[0]-point[0])**2 \
    >>>> +(new_point[1]-point[1])**2)
    >>>
    >>>> I really don't know how you can make this faster. There might be a

    >>
    >> Hmmm, that's what math.hypot is for, isn't it...?
    >>
    >> [alex@lancelot Lib]$ timeit.py -c -s'import math; p=1.6,2.5; np=2.4,1.3'
    >> 'math.sqrt((np[0]-p[0])**2 + (np[1]-p[1])**2)'
    >> 100000 loops, best of 3: 3 usec per loop
    >>
    >> [alex@lancelot Lib]$ timeit.py -c -s'import math; p=1.6,2.5; np=2.4,1.3'
    >> 'math.hypot(np[0]-p[0], np[1]-p[1])'
    >> 100000 loops, best of 3: 1.9 usec per loop
    >>
    >>
    >>>> library that has a distance between two points function that could
    >>>> speed it up.
    >>>
    >>> An easy way is to move the math.sqrt call outside the loop, since
    >>> sqrt(d1) < sqrt(d2) iff d1 < d2 (when d1,d2>=0)

    >>
    >> Yes, omitting the math.sqrt gives the same speed as calling math.hypot,
    >> and it's the classic solution to speed up minimum-distance problems.
    >>
    >> I vaguely suspect you could shave some further fraction of a microsecond
    >> by saving those differences as dx and dy and then computing dx*dx+dy*dy --
    >> since another classic tip is that a**2 is slower than a*2. Let's see...:
    >>
    >> [alex@lancelot Lib]$ timeit.py -c -s'import math; p=1.6,2.5; np=2.4,1.3'
    >> 'dx=np[0]-p[0]; dy=np[1]-p[1]; disq=dx*dx+dy*dy'
    >> 1000000 loops, best of 3: 1.39 usec per loop
    >>
    >> ...yep, another small enhancement. Ain't measuring _FUN_?-)

    >
    > Finally found an application for complex numbers:
    >
    > ...> timeit.py -s"p= 1.6+2.5j; np=2.4+1.3j" "d=abs(p-np)"
    > 1000000 loops, best of 3: 0.436 usec per loop
    >
    > ...> timeit.py -s"p= 1.6,2.5; np=2.4,1.3" "dx=np[0]-p[0];
    > dy=np[1]-p[1];d=dx*dx+dy*dy"
    > 1000000 loops, best of 3: 1.15 usec per loop
    >
    > This is of course all premature optimization as the most promising approach
    > is to try hard to reduce the number of candidate points, as David Eppstein
    > seems to have done. But then, he could use complex numbers, too.
    >
    > Peter
    >



    I am new to timeit.py, but this is odd.

    jim@grendel:~$ /usr/lib/python2.3/timeit.py -c ' p=1.6+2.5j;np=2.4+1.3j; d=abs(p-np)'
    100000 loops, best of 3: 3.1 usec per loop

    vs

    jim@grendel:~$ /usr/lib/python2.3/timeit.py -c -s'import math; p=1.6+2.5j;np=2.4+1.3j; d=abs(p-np)'
    10000000 loops, best of 3: 0.141 usec per loop

    Is it because the builtin math functions are much slower?


    --
    Jim Richardson http://www.eskimo.com/~warlock
    Televangelists: The Pro Wrestlers of Religion
    Jim Richardson, Nov 5, 2003
    #19
  20. John Hunter

    Ron Adam Guest

    On Mon, 03 Nov 2003 22:15:34 -0600, John Hunter
    <> wrote:

    >I had two use cases just yesterday. The one that prompted the
    >question arose in making a contour plot. I'm defining a contour as an
    >ordered sequence of values over a 2D MxN matrix where the values
    >differ from some target value by at most some tolerance. I maintain a
    >list of i,j indices into the matrix for a given contour value, and
    >follow the contour from a given i,j location by examining its
    >neighbors. In order to close the loop (eg, when the contour finder
    >has passed once around a level curve of a mountain, I want to test
    >whether a given point i,j is close to a previously discovered point
    >k,l. Since I have a list of these 2 tuple coordinates, I want to find
    >the nearest neighbor in the list and terminate the contour when the
    >nearest neighbor falls within some minimum distance
    >
    > 3 4 5
    > 2 6
    >13 1 7
    >12 8
    > 11 10 9
    >
    >In the example above, I am traversing a contour identifying points in
    >the order 1,2,3...; as above each point represents an i,j tuple which
    >is an index into the matrix I am contouring. I would like to
    >terminate the search at 13 rather than spiral around the existing
    >contour 1-12. Each time I add a new point to the contour, I would like
    >to query the existing list (excluding the most recently added points
    >which are close by construction) of locations and find the minimum
    >distance. If I'm not too close to the preexisting contour, I add the
    >new point and proceed.
    >
    >As I write this I realize there is an important efficiency. Since
    >from an existing point I add the closest neighbor, the biggest step I
    >can make is 1,1. If on the last nearest neighbor query I find a
    >minimum distance of d, it will take me d minimum steps to approach the
    >existing contour. So I don't need to check the distance again for at
    >least d steps. So the algorithm can proceed 1) obtain the distance d
    >from the existing contour to the most recently obtained point 2) make
    >d steps adding points that meet the value criteria 3) repeat.
    >
    >The second use case arose with gdmodule, which can only allocate 256
    >colors, which I cache as a dict from rgb tuples (eg, 0.0, 0.05, 1.0)
    >to color. When the total number of color allocations is made, and a
    >new rgb request comes into the color manager, I pick the already
    >allocated point in rgb space closest to the requested point.
    >
    >I'll try David Eppstein's approach tomorrow and see how this fares.
    >
    >Thanks to all for suggestions,
    >John Hunter



    Ah, a contour map. Maybe something like this?


    """
    Where pointA and pointB are constitutive points in a list, and pointC
    is a new point from a list of new points.

    For each pointC in a list of new points.
    For each consecutive 2 points in a list of sequential points.
    If lineAC < lineAB and lineBC < lineAB
    Insert pointC between pointA and pointB

    If pointC was not placed in list of sequential points.
    Where pointA and pointB are the beginning and
    end points of the list.

    IF lineAC < lineBC
    Add pointC to beginning of list.
    Else add pointC to end of list.

    When done copy point from beginning of list to end of list
    to complete polygon.
    """


    Just knowing the closest point isn't quite enough because it doesn't
    tell you weather to put it in front or behind the point in the list.
    Storing the distance to the next point along with each point might
    make it work faster. This method has an advantage in that it doesn't
    have to go through the whole list. You could start from the closest
    end of the list and maybe make it quicker.

    One catch is you need to know in advance that the set of points are
    not divided by a hill or valley.

    I'm not sure what this would do with a list of random points. Maybe a
    long squiggly line where the beginning to end segment cuts across
    them. I don't think you will have that problem.

    _Ron
    Ron Adam, Nov 5, 2003
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Steve
    Replies:
    5
    Views:
    42,648
    Steve
    May 17, 2004
  2. Fred
    Replies:
    3
    Views:
    63,470
  3. nkunapa

    XPATH Nearest Node

    nkunapa, Apr 5, 2005, in forum: XML
    Replies:
    2
    Views:
    860
    nkunapa
    Apr 11, 2005
  4. Greg Young [MVP]

    Re: rounding values to the nearest 5 or 10

    Greg Young [MVP], Apr 27, 2006, in forum: ASP .Net
    Replies:
    0
    Views:
    464
    Greg Young [MVP]
    Apr 27, 2006
  5. chaz
    Replies:
    0
    Views:
    374
Loading...

Share This Page