J
Jeremy
I wrote the method below by copying the algorithm from
http://en.wikipedia.org/wiki/Damerau-Levenshtein_distance (and matrix is
a really simple 2d array implementation). But the problem is that it
slows wat down as the string size gets bigger. At string length of about
150 it takes 1s, at 500 10s. Is there any way to recode this to get
better performance without rewriting it in C, and would rewrting it in C
even help or is this just a slow algorithm?
def self.distance(string1, string2)
string1 = string1.unpack('C*')
string2 = string2.unpack('C*')
s1n = string1.length
s2n = string2.length
m = DLDiff::Matrix.new(s1n+1, s2n+1)
cost = 0
(0..s1n).each {|i| m[i,0] = i}
(1..s2n).each {|j| m[0,j] = j}
(1..s1n).each do |i|
(1..s2n).each do |j|
cost = string1 == string2[j] ? 0 : 1
m[i, j] = [ m[i-1, j] + 1,
m[i, j-1] + 1,
m[i-1, j-1] + cost ].min
m[i, j] = [ m[i,j], m[i-2,j-2] + cost].min if(i > 1 && j > 1 &&
string1 == string2[j-1] && string1[i-1] == string2[j])
end
end
m[s1n, s2n]
end
class Matrix
def initialize(columns, rows)
ac = Array.new(columns, 0)
@am = Array.new(rows, 0)
@am = @am.collect{|r| ac.dup}
end
def [](c, r)
@am[r][c]
end
def []=(c,r,value)
@am[r][c] = value
end
def inspect
@am.collect{|a| a.inspect}.join("\n")
end
def to_s
@am.to_s
end
end
http://en.wikipedia.org/wiki/Damerau-Levenshtein_distance (and matrix is
a really simple 2d array implementation). But the problem is that it
slows wat down as the string size gets bigger. At string length of about
150 it takes 1s, at 500 10s. Is there any way to recode this to get
better performance without rewriting it in C, and would rewrting it in C
even help or is this just a slow algorithm?
def self.distance(string1, string2)
string1 = string1.unpack('C*')
string2 = string2.unpack('C*')
s1n = string1.length
s2n = string2.length
m = DLDiff::Matrix.new(s1n+1, s2n+1)
cost = 0
(0..s1n).each {|i| m[i,0] = i}
(1..s2n).each {|j| m[0,j] = j}
(1..s1n).each do |i|
(1..s2n).each do |j|
cost = string1 == string2[j] ? 0 : 1
m[i, j] = [ m[i-1, j] + 1,
m[i, j-1] + 1,
m[i-1, j-1] + cost ].min
m[i, j] = [ m[i,j], m[i-2,j-2] + cost].min if(i > 1 && j > 1 &&
string1 == string2[j-1] && string1[i-1] == string2[j])
end
end
m[s1n, s2n]
end
class Matrix
def initialize(columns, rows)
ac = Array.new(columns, 0)
@am = Array.new(rows, 0)
@am = @am.collect{|r| ac.dup}
end
def [](c, r)
@am[r][c]
end
def []=(c,r,value)
@am[r][c] = value
end
def inspect
@am.collect{|a| a.inspect}.join("\n")
end
def to_s
@am.to_s
end
end