Levenshtein_distance and recreate the string

Discussion in 'Ruby' started by Ams Lo, Apr 30, 2008.

  1. Ams Lo

    Ams Lo Guest

    Hi -

    A general computer science question-

    Given the levenshtein distance between two strings and one of the
    strings S1, is it possible to re-create the second string.

    For example -

    S1 = "RUBY"

    S2 = "BRUY"

    lev_distance = 3

    Given 3 and S2, is it possible to recreate S1??

    Many thanks,
    --
    Posted via http://www.ruby-forum.com/.
     
    Ams Lo, Apr 30, 2008
    #1
    1. Advertising

  2. -----BEGIN PGP SIGNED MESSAGE-----
    Hash: SHA1

    Ams Lo wrote:
    | Hi -
    |
    | A general computer science question-
    |
    | Given the levenshtein distance between two strings and one of the
    | strings S1, is it possible to re-create the second string.
    |
    | For example -
    |
    | S1 = "RUBY"
    |
    | S2 = "BRUY"
    |
    | lev_distance = 3
    |
    | Given 3 and S2, is it possible to recreate S1??

    I see no reason why it shouldn't.

    Looking at the Wikiality for the algorithm[0], the algorithm works,
    essentially, on a matrix for the strings. Juxtaposing the axes should
    solve that, in a naive, uneducated way, anyway.

    After all, the Levenshtein distance is the same for RUBY -> BRUBY and
    BRUBY -> RUBY.


    [0] http://en.wikipedia.org/wiki/Levenshtein_distance

    - --
    Phillip Gawlowski
    Twitter: twitter.com/cynicalryan
    Blog: http://justarubyist.blogspot.com

    ~ "That's the whole problem with science. You've got a bunch of
    ~ empiricists trying to describe things of unimaginable wonder."
    ~ --- Calvin
    -----BEGIN PGP SIGNATURE-----
    Version: GnuPG v1.4.8 (MingW32)
    Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

    iEYEARECAAYFAkgXzAkACgkQbtAgaoJTgL+1SwCfeSqOFABxce96rWU0YRLl9zHN
    maMAnAz7tdnSBMZGVKWTj6EQ6rutDoL0
    =RsJI
    -----END PGP SIGNATURE-----
     
    Phillip Gawlowski, Apr 30, 2008
    #2
    1. Advertising

  3. On Tue, Apr 29, 2008 at 9:13 PM, Ams Lo <> wrote:
    > Hi -
    >
    > A general computer science question-
    >
    > Given the levenshtein distance between two strings and one of the
    > strings S1, is it possible to re-create the second string.
    >
    > For example -
    >
    > S1 = "RUBY"
    >
    > S2 = "BRUY"
    >
    > lev_distance = 3
    >
    > Given 3 and S2, is it possible to recreate S1??


    Hmmm, smells like a homework assignment.

    Hint, do you suppose that strings other than "RUBY" might have a
    levenshtein distance of 3 from "BRUY"?

    --
    Rick DeNatale

    My blog on Ruby
    http://talklikeaduck.denhaven2.com/
     
    Rick DeNatale, Apr 30, 2008
    #3
  4. Hi --

    On Wed, 30 Apr 2008, Phillip Gawlowski wrote:

    > -----BEGIN PGP SIGNED MESSAGE-----
    > Hash: SHA1
    >
    > Ams Lo wrote:
    > | Hi -
    > |
    > | A general computer science question-
    > |
    > | Given the levenshtein distance between two strings and one of the
    > | strings S1, is it possible to re-create the second string.
    > |
    > | For example -
    > |
    > | S1 = "RUBY"
    > |
    > | S2 = "BRUY"
    > |
    > | lev_distance = 3
    > |
    > | Given 3 and S2, is it possible to recreate S1??
    >
    > I see no reason why it shouldn't.
    >
    > Looking at the Wikiality for the algorithm[0], the algorithm works,
    > essentially, on a matrix for the strings. Juxtaposing the axes should
    > solve that, in a naive, uneducated way, anyway.
    >
    > After all, the Levenshtein distance is the same for RUBY -> BRUBY and
    > BRUBY -> RUBY.


    But isn't it the same for RUBY -> RUBE and RUBA -> RUBE ? In which
    case, if you had RUBE and the L. distance, you could not recreate
    RUBY; you'd have a whole set of strings that were exactly that
    distance from RUBE.


    David

    --
    Rails training from David A. Black and Ruby Power and Light:
    INTRO TO RAILS June 9-12 Berlin
    ADVANCING WITH RAILS June 16-19 Berlin
    INTRO TO RAILS June 24-27 London (Skills Matter)
    See http://www.rubypal.com for details and updates!
     
    David A. Black, Apr 30, 2008
    #4
  5. -----BEGIN PGP SIGNED MESSAGE-----
    Hash: SHA1

    David A. Black wrote:
    |
    | But isn't it the same for RUBY -> RUBE and RUBA -> RUBE ? In which
    | case, if you had RUBE and the L. distance, you could not recreate
    | RUBY; you'd have a whole set of strings that were exactly that
    | distance from RUBE.

    Good point. I don't know if the L. distance is unique for each
    possibility or not.

    Considering how the algorithm looks, the solutions probably aren't unique.

    But then again, you can use weighted probabilities to achieve the
    correct result, I guess.

    N.B.: I'm a bit out of my depth here, having no comp.sci. background
    (which I regret..).

    - --
    Phillip Gawlowski
    Twitter: twitter.com/cynicalryan
    Blog: http://justarubyist.blogspot.com

    ~ - You know you've been hacking too long when...
    ...you wake up and desperately try to start a compiler so you can use
    the 15 minute waiting period to sleep some more.
    -----BEGIN PGP SIGNATURE-----
    Version: GnuPG v1.4.8 (MingW32)
    Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

    iEYEARECAAYFAkgX0csACgkQbtAgaoJTgL9Z2wCfcXY3WwdwtlgRjvFb0luHMTry
    X6EAniZeT0iOJjL/kiWw5vnrFSpKNzmj
    =Er07
    -----END PGP SIGNATURE-----
     
    Phillip Gawlowski, Apr 30, 2008
    #5
  6. On Apr 30, 11:13 am, Ams Lo <> wrote:
    > Hi -
    >
    > A general computer science question-
    >
    > Given the levenshtein distance between two strings and one of the
    > strings S1, is it possible to re-create the second string.
    >
    > For example -
    >
    > S1 = "RUBY"
    >
    > S2 = "BRUY"
    >
    > lev_distance = 3
    >
    > Given 3 and S2, is it possible to recreate S1??
    >
    > Many thanks,
    > --
    > Posted viahttp://www.ruby-forum.com/.


    would S1 be BOAT (BRUT, BRAT, BOAT), GRAN (BRAY, GRAY, GRAN), GOUT
    (BOUY, BOUT, GOUT) or ....?
    ie I think you need to know where you're going in order to know how to
    get there.
    What S1 are we recreating?


    cheers,


    --
    Mark
     
    Mark Woodward, Apr 30, 2008
    #6
  7. Hi --

    On Wed, 30 Apr 2008, Phillip Gawlowski wrote:

    > -----BEGIN PGP SIGNED MESSAGE-----
    > Hash: SHA1
    >
    > David A. Black wrote:
    > |
    > | But isn't it the same for RUBY -> RUBE and RUBA -> RUBE ? In which
    > | case, if you had RUBE and the L. distance, you could not recreate
    > | RUBY; you'd have a whole set of strings that were exactly that
    > | distance from RUBE.
    >
    > Good point. I don't know if the L. distance is unique for each
    > possibility or not.
    >
    > Considering how the algorithm looks, the solutions probably aren't unique.
    >
    > But then again, you can use weighted probabilities to achieve the
    > correct result, I guess.
    >
    > N.B.: I'm a bit out of my depth here, having no comp.sci. background
    > (which I regret..).


    Me too. I leapt on this thread partly to give myself an excuse to get
    a more than anecdotal acquaintance with the algorithm. I can't claim
    that it's *much* more than that, but every little bit helps :)


    David

    --
    Rails training from David A. Black and Ruby Power and Light:
    INTRO TO RAILS June 9-12 Berlin
    ADVANCING WITH RAILS June 16-19 Berlin
    INTRO TO RAILS June 24-27 London (Skills Matter)
    See http://www.rubypal.com for details and updates!
     
    David A. Black, Apr 30, 2008
    #7
  8. Ams Lo

    Axel Etzold Guest

    Dear all,

    maybe what the OP wants is not so much Levenstein distance but
    the McIlroy-Hunt longest common subsequence (LCS) algorithm, an algorithm
    which not only tells you how far apart two strings are (how
    many changes you have to make to get from string A to string B),
    but also where they have to be made.
    There's a Ruby implementation by Austin Ziegler here:

    http://raa.ruby-lang.org/project/diff-lcs/

    If you want to know how many strings have Levenstein distance n from
    a given string A, you'll probably have to create all possible
    combinations of letters from the alphabet of lengths length(A)-n through length(A)+n, and
    throw away those that don't have the correct Levenstein distance.
    That's going to be an enormous amount of words in any case ...
    Even if only words from a dictionary are allowed, in most
    cases, you'll still have many words that are "close" to each other ;-(

    Best regards,

    Axel




    --
    Psst! Geheimtipp: Online Games kostenlos spielen bei den GMX Free Games!
    http://games.entertainment.gmx.net/de/entertainment/games/free
     
    Axel Etzold, Apr 30, 2008
    #8
  9. On 30 Apr 2008, at 02:13, Ams Lo wrote:
    > Hi -
    >
    > A general computer science question-
    >
    > Given the levenshtein distance between two strings and one of the
    > strings S1, is it possible to re-create the second string.
    >
    > For example -
    >
    > S1 = "RUBY"
    >
    > S2 = "BRUY"
    >
    > lev_distance = 3
    >
    > Given 3 and S2, is it possible to recreate S1??
    >
    > Many thanks,


    Simple answer: no.

    Each step that exists between S1 and S2 represents the choosing of one
    change out of a set of changes the same size as your token space (so
    for uppercase letters only that would be 26) and therefore to reverse
    that change you would generate that many valid words. Selecting which
    of those is the S1 word is impossible without additional constraints.

    For the S1 -> S3 -> S1 case the actual search space (still assuming
    capital letters only) is therefore 26 ** 3, or 17576 equally plausible
    S1 candidates. Of course that assumes a naive search strategy, whereas
    it's quite possible the space could be restricted based upon known
    dictionary characteristics: certain tokens may be impossible at
    certain locations; genetic algorithms may grow 'good' solutions faster
    than exhaustive search; etc.

    Ellie

    Eleanor McHugh
    Games With Brains
    http://slides.games-with-brains.net
    ----
    raise ArgumentError unless @reality.responds_to? :reason
     
    Eleanor McHugh, Apr 30, 2008
    #9
  10. On Apr 30, 2:13 am, Ams Lo <> wrote:
    >
    > S1 = "RUBY"
    >
    > S2 = "BRUY"
    >
    > lev_distance = 3
    >
    > Given 3 and S2, is it possible to recreate S1??


    Several people have given great answers as to why this isn't possible,
    but let me give you another reason why, just
    because I feel like it, and it's a useful thing to keep in mind.

    Whenever you get a problem like this, a good way of approaching it if
    you don't understand the algorithm is to think:

    - If it _was_ possible, would that allow you to make impossibly
    efficient compression algorithms?

    In this case, the answer is that yes it would: You could pick a fixed
    S2, for example an empty string, calculate the
    Levensthein distance from S1 to the fixed S2. The distance would be
    proportional to the length of S1. In fact, if
    you choose S2 to be the empty string, then the distance would be equal
    to the length of S1 (it'd take one deletion
    per position in S1 to end up with an empty string)

    In other words, you'd "compress" your test string "BRUY" down to the
    number 4, but more importantly that "compression"
    method would compress any string of any length, which is obviously
    impossible (the reason this is impossible is that
    if it was possible you could apply it to it's own output over and over
    until you had compressed an arbitrary input
    down to a bit).

    In this case it's not a very hard problem to solve, but I find that
    for a large number of questions about data
    transformations, it's helpful to think about it in terms of
    compression because the answers often become blatantly
    obvious once you restate them that way.

    Vidar
     
    Vidar Hokstad, Apr 30, 2008
    #10
  11. On Wed, Apr 30, 2008 at 8:25 AM, Vidar Hokstad <> wrote:

    > Several people have given great answers as to why this isn't possible,
    > but let me give you another reason why, just
    > because I feel like it, and it's a useful thing to keep in mind.
    >
    > Whenever you get a problem like this, a good way of approaching it if
    > you don't understand the algorithm is to think:
    >
    > - If it _was_ possible, would that allow you to make impossibly
    > efficient compression algorithms?
    >
    > In this case, the answer is that yes it would: You could pick a fixed
    > S2, for example an empty string, calculate the
    > Levensthein distance from S1 to the fixed S2. The distance would be
    > proportional to the length of S1. In fact, if
    > you choose S2 to be the empty string, then the distance would be equal
    > to the length of S1 (it'd take one deletion
    > per position in S1 to end up with an empty string)
    >
    > In other words, you'd "compress" your test string "BRUY" down to the
    > number 4, but more importantly that "compression"
    > method would compress any string of any length, which is obviously
    > impossible (the reason this is impossible is that
    > if it was possible you could apply it to it's own output over and over
    > until you had compressed an arbitrary input
    > down to a bit).
    >
    > In this case it's not a very hard problem to solve, but I find that
    > for a large number of questions about data
    > transformations, it's helpful to think about it in terms of
    > compression because the answers often become blatantly
    > obvious once you restate them that way.


    Another thing to consider is why it's called a distance rather than,
    say, a difference.

    Consider this analogy.

    Given that the highway distance between Raleigh, NC, and Washington,
    DC is 250 miles. Given Washington, DC, and 250 miles, can I uniquely
    identify Raleigh, NC.

    No since the highway distance between Washington, DC and New York, NY
    (as well as many other places) is also 250 miles.

    --
    Rick DeNatale

    My blog on Ruby
    http://talklikeaduck.denhaven2.com/
     
    Rick DeNatale, Apr 30, 2008
    #11
  12. Ams Lo

    Ken Bloom Guest

    On Tue, 29 Apr 2008 20:56:30 -0500, Phillip Gawlowski wrote:

    > -----BEGIN PGP SIGNED MESSAGE-----
    > Hash: SHA1
    >
    > David A. Black wrote:
    > |
    > | But isn't it the same for RUBY -> RUBE and RUBA -> RUBE ? In which |
    > case, if you had RUBE and the L. distance, you could not recreate |
    > RUBY; you'd have a whole set of strings that were exactly that |
    > distance from RUBE.
    >
    > Good point. I don't know if the L. distance is unique for each
    > possibility or not.


    It's not unique. Any single deletion has edit distance 1, any single
    insertion has edit distance 1.

    Thus, the strings that have edit distance 1 from "RUBY" include (but are
    not limited to):

    #delete a letter
    RUB
    RUY
    RBY
    UBY
    #add the letter A
    ARUBY
    RAUBY
    RUABY
    RUBAY
    RUBYA
    #add the letter B
    BRUBY
    RBUBY
    RUBBY
    RUBYB
    ....

    That's not to say there aren't reasons for recreating all of these. For
    Mengmeng Du[1], recreating all of these and checking for their existance
    in a Bloom filter was orders of magnitude faster than computing the edit
    distance between a query and every entry in a 200,000 entry list of last
    names. (Just a paper I happen to be reading this week.)

    --Ken

    [1] Du, Mengmeng (2005) "Approximate Name Matching" Master's Degree
    Project, Royal Institute of Technology, Stockholm, Sweden

    --
    Ken (Chanoch) Bloom. PhD candidate. Linguistic Cognition Laboratory.
    Department of Computer Science. Illinois Institute of Technology.
    http://www.iit.edu/~kbloom1/
     
    Ken Bloom, May 1, 2008
    #12
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Carl Prothman [MVP]

    Re: how to recreate the default ASPNET account ?

    Carl Prothman [MVP], Aug 9, 2003, in forum: ASP .Net
    Replies:
    0
    Views:
    3,378
    Carl Prothman [MVP]
    Aug 9, 2003
  2. Mark
    Replies:
    2
    Views:
    374
    Chris Jackson
    Feb 23, 2004
  3. kevinsaucier
    Replies:
    0
    Views:
    1,496
    kevinsaucier
    May 12, 2005
  4. Replies:
    4
    Views:
    957
    M.E.Farmer
    Feb 13, 2005
  5. Jeremy

    Damerau-Levenshtein_distance

    Jeremy, May 18, 2007, in forum: Ruby
    Replies:
    2
    Views:
    128
    Jeremy
    May 18, 2007
Loading...

Share This Page