Fuzzy Searching a Database

T

TomRossi7

How would you recommend implementing something like a fuzzy search with
Ruby and a SQL database. For example, I want to search a field in a
table for anything that is similar to "Frank Smithe". I would want an
array of matches with some sort of score telling me how good of a match
the item received ("frank smithe" 100%, "frank smith" 90%).

Am I making sense? Is there a third party app I would need?

Thanks,
Tom
 
R

Rich Morin

How would you recommend implementing something like a fuzzy search with
Ruby and a SQL database. For example, I want to search a field in a
table for anything that is similar to "Frank Smithe". I would want an
array of matches with some sort of score telling me how good of a match
the item received ("frank smithe" 100%, "frank smith" 90%).

Soundex is a phonetic algorithm for indexing names by their sound when
pronounced in English. The basic aim is for names with the same
pronunciation to be encoded to the same string so that matching can occur
despite minor differences in spelling. Soundex is the most widely known of
all phonetic algorithms and is often used (incorrectly) as a synonym for
"phonetic algorithm".

-- http://en.wikipedia.org/wiki/Soundex

If you process your database's names into a column of Soundex encodings,
you should be able to search this for an encoded target word. You could
then post-process the results to get the "score" for each one. Note that
you may want to look around for alternative encodings (eg, Metaphone).

-r
 
H

Hampton

Just make Kit screen all of the queries. Get her a blackberry and email
them to her.
 
L

Lou Vanek

this was discussed on both this ML and the Rails ML about a week ago. The
threads have the word 'fuzzy' in the subject.
 
S

Stephen Waits

If you process your database's names into a column of Soundex
encodings,
you should be able to search this for an encoded target word.

Also worth mentioning is that many database systems have Soundex
support built-in.

--Steve
 
T

Tom Reilly

Lou said:
this was discussed on both this ML and the Rails ML about a week ago. The
threads have the word 'fuzzy' in the subject.
I needed to do something similar with my office database. I used the
Levenshtein distance. The routine was also
quite useful in finding name differences between my office data base and
the billing company's.
#------------------------------------------------------
# Determine Levenshtein distance of two strings

def Ld(s,t)

#s string #1
#t string #2

n = s.size
m = t.size
a = Array.new

if n != 0 && m != 0

#2 create array
r = Array.new
rz = Array.new

0.upto(m) {|x| r.push(0)}

0.upto(n) {|x|a.push(r.dup)}
a.each_index {|x| a[x][0] = x}
0.upto(m) {|x| a[0][x] = x}

cost = 0
1.upto(n) do |i|
1.upto(m) do |j|
if s == t[j]
cost =0
else
cost = 1
end
a[j] = [a[ i- 1][j] +1,a[j - 1] + 1,a[i - 1][j -
1] + cost].min
end
end
a[n][m]
else
0
end
end
 
P

Paul

see fulltext searching at mysql.com.

from http://dev.mysql.com/doc/refman/5.1/en/fulltext-search.html
.....
The following example is more complex. The query returns the relevance
values and it also sorts the rows in order of decreasing relevance. To
achieve this result, you should specify MATCH() twice: once in the
SELECT list and once in the WHERE clause. This causes no additional
overhead, because the MySQL optimizer notices that the two MATCH()
calls are identical and invokes the full-text search code only once.

mysql> SELECT id, body, MATCH (title,body) AGAINST
-> ('Security implications of running MySQL as root') AS score
-> FROM articles WHERE MATCH (title,body) AGAINST
-> ('Security implications of running MySQL as root');
+----+-------------------------------------
+-----------------+
| id | body |
score |
+----+-------------------------------------
+-----------------+
| 4 | 1. Never run mysqld as root. 2. ... |
1.5219271183014 |
| 6 | When configured properly, MySQL ... | 1.3114095926285 |
+----+-------------------------------------
+-----------------+
2 rows in set (0.00 sec)
.....

http://dev.mysql.com/doc/refman/5.1/en/fulltext-search.html
 
B

Ben Stroud

/How would you recommend implementing something like a fuzzy search with
Ruby and a SQL database. For example, I want to search a field in a
table for anything that is similar to "Frank Smithe". I would want an
array of matches with some sort of score telling me how good of a match
the item received ("frank smithe" 100%, "frank smith" 90%)./


Less direct, but maybe useful...
The pyLucene project is the apache java lucene library implemented in C
and SWIG'd to python (i think).
You might be able to create ruby bindings for the full text search C
library used there.


http://pylucene.osafoundation.org/

http://lucene.apache.org/java/docs
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,266
Messages
2,571,075
Members
48,772
Latest member
Backspace Studios

Latest Threads

Top