Detecting similar strings

D

Dylan Markow

Is there a way to take two strings, and decide if they are "similar."
I'm creating a contact system in Rails, and am having a large problem
with my users punching in duplicate entries with the last names spelled
slightly different.

Is there a way to check if 2 strings are "identical" up to a certain
percentage, such as only having 1 or 2 characters different?
 
F

Farrel Lifson

Is there a way to take two strings, and decide if they are "similar."
I'm creating a contact system in Rails, and am having a large problem
with my users punching in duplicate entries with the last names spelled
slightly different.

Is there a way to check if 2 strings are "identical" up to a certain
percentage, such as only having 1 or 2 characters different?

Here's a Perl module that does something similar. You might try
porting it over to Ruby.
http://search.cpan.org/~mlehmann/String-Similarity-1.02/Similarity.pm

Farrel
 
A

Alexandru E. Ungur

sender: "Dylan Markow" date: "Tue, Aug 01, 2006 at 03:25:59PM +0900" <<<EOQ
Is there a way to take two strings, and decide if they are "similar."
I'm creating a contact system in Rails, and am having a large problem
with my users punching in duplicate entries with the last names spelled
slightly different.

Is there a way to check if 2 strings are "identical" up to a certain
percentage, such as only having 1 or 2 characters different?
Looks like there is a soundex implementation for Ruby:
http://raa.ruby-lang.org/search.rhtml?search=soundex

If by any chance you are using MySQL you could use the soundex function
builtin into it as well.

Good luck,
Alex
 
B

benjohn

sender: "Dylan Markow" date: "Tue, Aug 01, 2006 at 03:25:59PM +0900"
Looks like there is a soundex implementation for Ruby:
http://raa.ruby-lang.org/search.rhtml?search=soundex

If by any chance you are using MySQL you could use the soundex function
builtin into it as well.

If you check on Google under "ruby soundex module", you'll find a few
likely hits:
http://tinyurl.com/rch5g

I remember a friend saying that there are superior algorithms to soundex
that are no more complex, so you might want to explore about a little.
 
M

Michal Suchanek

If you check on Google under "ruby soundex module", you'll find a few
likely hits:
http://tinyurl.com/rch5g

I remember a friend saying that there are superior algorithms to soundex
that are no more complex, so you might want to explore about a little.

If the names might be from different languages I would rather use
Levenstein than soundex. Levenstein is probably good at describing
typos as "very close" but soundex might be somewhat language specific.

But I haven't tried so I might be wrong.

Thanks

Michal
 
T

Tom Reilly

Here is an implementation of levenstein distance. It seems to me that
there is another in ruby gems somewhere but I don't remember where.

# Determine Levenshtein distance of two strings

def Ld(s,t)

#s string #1
#t string #2

n = s.size
m = t.size
a = Array.new

if n != 0 && m != 0

#2 create array
r = Array.new
rz = Array.new

0.upto(m) {|x| r.push(0)}

0.upto(n) {|x|a.push(r.dup)}
a.each_index {|x| a[x][0] = x}
0.upto(m) {|x| a[0][x] = x}

#a.each {|x| p x}

cost = 0
1.upto(n) do |i|
1.upto(m) do |j|
if s == t[j]
cost =0
else
cost = 1
end
a[j] = [a[ i- 1][j] +1,a[j - 1] + 1,a[i - 1][j -
1] + cost].min
end
end
a[n][m]
#a.each {|x| p x}
else
0
end
end
 
G

Gene Tani

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top