Fast kNN from python

J

Janto Dreijer

Hi!

I am looking for a Python implementation or bindings to a library that
can quickly find k-Nearest Neighbors given an arbitrary distance
metric between objects. Specifically, I have an "edit distance"
between objects that is written in Python.

I haven't looked at the methods in detail but I think I'm looking for
one of the data structures listed on http://en.wikipedia.org/wiki/Metric_trees
(i.e. vp-trees, cover trees, m-trees or bk trees). But I might be
wrong. An approximate kNN would also work.

If there doesn't exist such an implementation yet, any advice on a
library I can wrap myself would also be appreciated.

Thanks!
Janto

PS Before anyone suggests it, I can't use the library at
http://www.cs.umd.edu/~mount/ANN/ as it assumes Minkowski distance
functions.
 
T

Tim Churches

Janto said:
I am looking for a Python implementation or bindings to a library that
can quickly find k-Nearest Neighbors given an arbitrary distance
metric between objects. Specifically, I have an "edit distance"
between objects that is written in Python.

Orange? See http://www.ailab.si/orange/ - not sure about speed, but
quite a few parts of it are written in C, and it does kNN.

Tim C
 
J

Janto Dreijer

Orange? Seehttp://www.ailab.si/orange/- not sure about speed, but
quite a few parts of it are written in C, and it does kNN.

Tim C

Thanks, but I'm actually thinking of "fast" as in computer science
terms. e.g. O(log(n)) lookup time. I'll probably have tens of
thousands of objects to search and the distance calculation is
relatively slow.
From the documentation it looks like Orange does a brute force search
for the nearest k items.
 
M

Miki

S

Sean Davis

Hi!

I am looking for a Python implementation or bindings to a library that
can quickly find k-Nearest Neighbors given an arbitrary distance
metric between objects. Specifically, I have an "edit distance"
between objects that is written in Python.

I haven't looked at the methods in detail but I think I'm looking for
one of the data structures listed onhttp://en.wikipedia.org/wiki/Metric_trees
(i.e. vp-trees, cover trees, m-trees or bk trees). But I might be
wrong. An approximate kNN would also work.

If there doesn't exist such an implementation yet, any advice on a
library I can wrap myself would also be appreciated.

Thanks!
Janto

Have you looked at using Rpy and R? There are probably several knn
implementations that then become accessible to you (although I haven't
checked recently).

Sean
 
J

Janto Dreijer

Hello,


First Google search for "k-Nearest Neighbors python", yieldedhttp://people.revoledu.com/kardi/tutorial/KNN/resources.html which
pointed tohttp://biopython.org/DIST/docs/api/public/Bio.kNN-module.html

HTH,

Thanks. Indeed, I did see that page. Unfortunately biopython's knn
does a brute force search for the nearest k and is therefore way too
slow.

Janto
 
J

Janto Dreijer

Have you looked at using Rpy and R? There are probably severalknn
implementations that then become accessible to you (although I haven't
checked recently).

Sean

Interesting. I have not looked at that. I can't really find an R
package that does what I want, so any suggestions are appreciated.

Janto
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,773
Messages
2,569,594
Members
45,125
Latest member
VinayKumar Nevatia_
Top