D
DJ Jazzy Linefeed
Sup, fools?
This is the Levenshtein function I'm gankin' for my file comparison
project (see "40 million comparison..." thread):
# Levenshtein calculator
# Author: Paul Battley ([email protected])
# Modified slightly by John Perkins:
# -- removed $KCODE call
def distance(str1, str2)
unpack_rule = 'C*'
s = str1.unpack(unpack_rule)
t = str2.unpack(unpack_rule)
n = s.length
m = t.length
return m if (0 == n) # stop the madness if either string is empty
return n if (0 == m)
d = (0..m).to_a
x = nil
(0...n).each do |i|
e = i + 1
(0...m).each do |j|
cost = (s == t[j]) ? 0 : 1
x = [
d[j + 1] + 1, # insertion
e + 1, # deletion
d[j] + cost # substitution
].min
d[j] = e
e = x
end
d[m] = x
end
return x
end
When I ran this with test data in ruby 1.8 the output was 969, but
when I ran it on a 1.9 install the output was 1011. I'm aware that
some of the rules have changed, especially with arrays. Does anyone
see where the discrepancy lies, because I sure as heck don't. The
files didn't change so the distance shouldn't either. Thanks for all
your help in advance.
This is the Levenshtein function I'm gankin' for my file comparison
project (see "40 million comparison..." thread):
# Levenshtein calculator
# Author: Paul Battley ([email protected])
# Modified slightly by John Perkins:
# -- removed $KCODE call
def distance(str1, str2)
unpack_rule = 'C*'
s = str1.unpack(unpack_rule)
t = str2.unpack(unpack_rule)
n = s.length
m = t.length
return m if (0 == n) # stop the madness if either string is empty
return n if (0 == m)
d = (0..m).to_a
x = nil
(0...n).each do |i|
e = i + 1
(0...m).each do |j|
cost = (s == t[j]) ? 0 : 1
x = [
d[j + 1] + 1, # insertion
e + 1, # deletion
d[j] + cost # substitution
].min
d[j] = e
e = x
end
d[m] = x
end
return x
end
When I ran this with test data in ruby 1.8 the output was 969, but
when I ran it on a 1.9 install the output was 1011. I'm aware that
some of the rules have changed, especially with arrays. Does anyone
see where the discrepancy lies, because I sure as heck don't. The
files didn't change so the distance shouldn't either. Thanks for all
your help in advance.