is this a good way to find anagrams?

Discussion in 'Ruby' started by travis laduke, Dec 8, 2005.

  1. it seems to me, with computers these days, this should finish
    instantly, not take like 20 seconds.
    also, please help me make my ruby more ruby-like. i'm new to ruby,
    not that i know any other language.


    ##these are here from when i was first testing
    secret_word = "spine"
    rack = "spine"

    secret_word = secret_word.split(//)
    rack = rack.split(//)
    test_rack = rack.to_s

    ##are there enough of the right letters in the rack to spell the word
    from the dictionary?
    ##i think i'm going to use funny spanish names for my methods instead
    of descriptive names.
    def rodolfo(secret_word, rack)
    secret_word.each do |x|
    rack = rack.to_s
    if rack.include? x
    rack = rack.sub(x, '')
    ##p rack, x
    else
    ##puts x, ' rack don\'t work'
    break
    end
    end
    end


    puts "reading dictionary"
    dict = IO.readlines('/usr/share/dict/words')

    while rack
    puts "enter rack"
    rack = gets.chomp.split(//)

    dict.each do |secret_word|

    if rodolfo(secret_word.chomp.split(//), rack)
    puts secret_word.to_s
    end
    end
    end
     
    travis laduke, Dec 8, 2005
    #1
    1. Advertisements

  2. travis laduke

    JB Eriksson Guest

    ------=_Part_12356_13524078.1134027118090
    Content-Type: text/plain; charset=ISO-8859-1
    Content-Transfer-Encoding: quoted-printable
    Content-Disposition: inline

    umm... what's the dictionary for?

    also, I think there's an easier way to do this. with a bit of spacing and
    stuff, I wrote up a 10 line solution. ;)
    I also did the splitting string into array thing. Then there's two instance
    methods for Array, named "sort" and "=3D=3D" respectively.

    An anagram is when two words contain the exact same characters, right? so i=
    f
    the two splitted strings are the same when sorted, they should be anagrams.

    ------=_Part_12356_13524078.1134027118090--
     
    JB Eriksson, Dec 8, 2005
    #2
    1. Advertisements

  3. travis laduke

    batkins57 Guest

    One thing you could try is inverting your algorithm. My dict/words
    file has 234,000+ entries in it. Since you're looping through the dict
    file and then calling rodolfo on each iteration, you have to pay the
    cost of calling rodolfo 234,000 times, even though in general only a
    couple of entries in the words file are relevant.

    So what you could do is first find all the possible permuations of the
    letters in your given word. Then loop through the word file once. If
    a word matches one of the permutations in your list, remove it from the
    list of words to be checked and push it onto a separate list. This
    should be significantly faster, since you calculate the combinations up
    front, and then do one scan through the list.

    You could also call out to a program like aspell or ispell to check
    your words for you. Those programs will do something at least as fast
    as a binary search, which will take much, much less time than scanning
    through the whole thing - even in the worst case, a 234,000 entry file
    will be checked in 18 string comparisons, instead of 234,000.

    hth,
    Bill
     
    batkins57, Dec 8, 2005
    #3
  4. --nextPart2139708.tvCfTPD2iy
    Content-Type: text/plain;
    charset="iso-8859-1"
    Content-Transfer-Encoding: quoted-printable
    Content-Disposition: inline

    Hey travis,

    you gave me a good time figuring this one out... after some thought i came =
    up=20
    with something that takes roughly 1 second to search 9,6k words for a=20
    anagram.
    of course this will go up when you add more words (a task you still can thi=
    nk=20
    about)
    all in all, 5 LoC... but i'm sure there are even better ways...

    word =3D "easy".split(//).sort
    anagrams =3D IO.readlines('/usr/share/dict/british-english').partition {|l|
    l.strip!
    (l.size =3D=3D word.size && l.split(//).sort =3D=3D word)
    }[0]
    anagrams.each{|x| puts x}

    ##### gives you:
    manveru@lambda:~/ruby$ time ruby anagramma.rb
    ayes
    easy
    yeas

    real 0m1.097s
    user 0m0.936s
    sys 0m0.105s


    Am Donnerstag, 8. Dezember 2005 06:52 schrieb travis laduke:
    --nextPart2139708.tvCfTPD2iy
    Content-Type: application/pgp-signature

    -----BEGIN PGP SIGNATURE-----
    Version: GnuPG v1.4.1 (GNU/Linux)

    iD8DBQBDl+qRMdQeL6eBxhIRAkCdAKDT1Zzg/c7Ihy87ACnbWCsRy+p52QCfWl5M
    xD+ZNzI3TItJawMxk2l0k4Y=
    =c3rI
    -----END PGP SIGNATURE-----

    --nextPart2139708.tvCfTPD2iy--
     
    Michael Fellinger, Dec 8, 2005
    #4
  5. travis laduke

    batkins57 Guest

    I went ahead and implemented my own suggestions, and just to give you
    an idea of how fast you can get this, my version can find all valid
    anagrams of a five-letter word ("truck") given a 274,000 word
    dictionary in 0.231 seconds....real-time.

    :D
     
    batkins57, Dec 8, 2005
    #5
  6. Even faster - sort the list of permutations, then do an incremental
    search against the dictionary: binary search to find the first word in the
    list, then repeat, setting the lower bound of your search to the
    position immediately after each match when you do the next one.
    O(m!*log n) rather than O(m log m * n), which for small m and large n
    can lead to a speedup. It might be even faster to search for the middle
    word in your permutation list, then do that recursively with the left
    and right sublists, passing in the upper and lower bounds as arguments.
    All this assumes an in-memory dictionary with O(1) position indexing, of
    course.

    martin
     
    Martin DeMello, Dec 8, 2005
    #6
  7. if you want to find lots of anagrams against the same database it
    would make sense to create an index. Make a hash that is indexed by
    the sorted letters and let it point to a list of words that yield this
    letter. After you have done this once, you can lookup anagrams in O(1)

    cheers,

    Brian
     
    Brian Schröder, Dec 9, 2005
    #7
  8. Here comes a short implementation:

    hash =3D
    File.read('/usr/share/dict/words').
    downcase.
    split(/\n/).
    inject(Hash.new { | h, k | h[k] =3D [] }) { | h, w |
    h[w.split(//).sort] << w; h }


    ARGV.each do | word |
    puts "Anagrams for #{word}: #{hash[word.split(//).sort].join(' ')}"
    end
    Anagrams for ruby: ruby bury ruby
    Anagrams for duck: duck
    Anagrams for type: type
    Anagrams for truck: truck
    Anagrams for brian: brain brian rabin brain
    Anagrams for sort: rots sort tors
    Anagrams for index: index nixed

    Here are the timings:

    The calculation of the index took 3.34749s

    Lookup of 7 words took 0.000195s

    And lookup of the words scales linearly, so for certain applications
    (building an anagram-server e.g.) this may be the best possibility.

    Cheers,

    Brian
     
    Brian Schröder, Dec 9, 2005
    #8
  9. I think there's an algorithm for finding anagrams using libgmp, for
    which there are Ruby bindings (on the RAA). I'll bet that's fast. :)

    Dan
     
    Daniel Berger, Dec 10, 2005
    #9
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.