Damerau-Levenshtein_distance

Discussion in 'Ruby' started by Jeremy, May 18, 2007.

  1. Jeremy

    Jeremy Guest

    I wrote the method below by copying the algorithm from
    http://en.wikipedia.org/wiki/Damerau-Levenshtein_distance (and matrix is
    a really simple 2d array implementation). But the problem is that it
    slows wat down as the string size gets bigger. At string length of about
    150 it takes 1s, at 500 10s. Is there any way to recode this to get
    better performance without rewriting it in C, and would rewrting it in C
    even help or is this just a slow algorithm?

    def self.distance(string1, string2)
    string1 = string1.unpack('C*')
    string2 = string2.unpack('C*')
    s1n = string1.length
    s2n = string2.length

    m = DLDiff::Matrix.new(s1n+1, s2n+1)
    cost = 0

    (0..s1n).each {|i| m[i,0] = i}
    (1..s2n).each {|j| m[0,j] = j}

    (1..s1n).each do |i|
    (1..s2n).each do |j|
    cost = string1 == string2[j] ? 0 : 1
    m[i, j] = [ m[i-1, j] + 1,
    m[i, j-1] + 1,
    m[i-1, j-1] + cost ].min
    m[i, j] = [ m[i,j], m[i-2,j-2] + cost].min if(i > 1 && j > 1 &&
    string1 == string2[j-1] && string1[i-1] == string2[j])
    end
    end

    m[s1n, s2n]
    end

    class Matrix
    def initialize(columns, rows)
    ac = Array.new(columns, 0)
    @am = Array.new(rows, 0)
    @am = @am.collect{|r| ac.dup}
    end

    def [](c, r)
    @am[r][c]
    end

    def []=(c,r,value)
    @am[r][c] = value
    end

    def inspect
    @am.collect{|a| a.inspect}.join("\n")
    end

    def to_s
    @am.to_s
    end
    end
     
    Jeremy, May 18, 2007
    #1
    1. Advertising

  2. Jeremy <> writes:

    > I wrote the method below by copying the algorithm from
    > http://en.wikipedia.org/wiki/Damerau-Levenshtein_distance (and matrix
    > is a really simple 2d array implementation). But the problem is that
    > it slows wat down as the string size gets bigger. At string length of
    > about 150 it takes 1s, at 500 10s. Is there any way to recode this to
    > get better performance without rewriting it in C, and would rewrting
    > it in C even help or is this just a slow algorithm?


    Well, the algorithm itself is O(n*m), where n and m are the size of
    the strings involved, so on large strings it's going to get slow.

    I was able to shave about 40% off the time for your method, and fix a
    bug.

    The bug was caused because the wikipedia article indexes the strings
    starting from 1, but indexes the array starting form 0. In ruby both
    start at 0, of course. To fix this, I added a fake element at the
    start of both strings.

    Anyway, here's my slightly faster version:

    def self.distance(string1, string2)
    string1 = string1.unpack('C*')
    string2 = string2.unpack('C*')
    s1n = string1.length
    s2n = string2.length
    string1.unshift -1
    string2.unshift -1
    m = Array.new(s1n+1){Array.new(s2n+1,0)}
    cost = 0
    a = nil; b = nil
    (0..s1n).each {|i| m[0] = i}
    (1..s2n).each {|j| m[0][j] = j}
    (1..s1n).each do |i|
    (1..s2n).each do |j|
    cost = string1 == string2[j] ? 0 : 1

    a = m[i-1][j] + 1
    a = b if ((b = m[j-1]+1) < a)
    a = b if ((b = m[i-1][j-1]+cost) < a)

    if(i > 1 && j > 1 &&
    string1 == string2[j-1] &&
    string1[i-1] == string2[j]) then
    a = b if ((b = m[i-2][j-2] + 1) < a)
    end
    m[j] = a
    end
    end
    m[s1n][s2n]
    end


    --
    s=%q( Daniel Martin --
    puts "s=%q(#{s})",s.to_a.last )
    puts "s=%q(#{s})",s.to_a.last
     
    Daniel Martin, May 18, 2007
    #2
    1. Advertising

  3. Jeremy

    Jeremy Guest

    Daniel Martin wrote:
    > Well, the algorithm itself is O(n*m), where n and m are the size of
    > the strings involved, so on large strings it's going to get slow.
    >
    > I was able to shave about 40% off the time for your method, and fix a
    > bug.
    >
    >

    Thanks, thats really useful. I guess that now the data i'm comparing
    has grown it would be better to use a diff type method than this.
     
    Jeremy, May 18, 2007
    #3
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Ams Lo
    Replies:
    11
    Views:
    224
    Ken Bloom
    May 1, 2008
Loading...

Share This Page