is this a good way to find anagrams?

travis laduke · Dec 8, 2005

it seems to me, with computers these days, this should finish
instantly, not take like 20 seconds.
also, please help me make my ruby more ruby-like. i'm new to ruby,
not that i know any other language.

##these are here from when i was first testing
secret_word = "spine"
rack = "spine"

secret_word = secret_word.split(//)
rack = rack.split(//)
test_rack = rack.to_s

##are there enough of the right letters in the rack to spell the word
from the dictionary?
##i think i'm going to use funny spanish names for my methods instead
of descriptive names.
def rodolfo(secret_word, rack)
secret_word.each do |x|
rack = rack.to_s
if rack.include? x
rack = rack.sub(x, '')
##p rack, x
else
##puts x, ' rack don\'t work'
break
end
end
end

puts "reading dictionary"
dict = IO.readlines('/usr/share/dict/words')

while rack
puts "enter rack"
rack = gets.chomp.split(//)

dict.each do |secret_word|

if rodolfo(secret_word.chomp.split(//), rack)
puts secret_word.to_s
end
end
end

JB Eriksson · Dec 8, 2005

------=_Part_12356_13524078.1134027118090
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
Content-Disposition: inline

umm... what's the dictionary for?

also, I think there's an easier way to do this. with a bit of spacing and
stuff, I wrote up a 10 line solution.

I also did the splitting string into array thing. Then there's two instance
methods for Array, named "sort" and "=3D=3D" respectively.

An anagram is when two words contain the exact same characters, right? so i=
f
the two splitted strings are the same when sorted, they should be anagrams.

it seems to me, with computers these days, this should finish
instantly, not take like 20 seconds.
also, please help me make my ruby more ruby-like. i'm new to ruby,
not that i know any other language.

##these are here from when i was first testing
secret_word =3D "spine"
rack =3D "spine"

secret_word =3D secret_word.split(//)
rack =3D rack.split(//)
test_rack =3D rack.to_s

##are there enough of the right letters in the rack to spell the word
from the dictionary?
##i think i'm going to use funny spanish names for my methods instead
of descriptive names.
def rodolfo(secret_word, rack)
secret_word.each do |x|
rack =3D rack.to_s
if rack.include? x
rack =3D rack.sub(x, '')
##p rack, x
else
##puts x, ' rack don\'t work'
break
end
end
end

puts "reading dictionary"
dict =3D IO.readlines('/usr/share/dict/words')

while rack
puts "enter rack"
rack =3D gets.chomp.split(//)

dict.each do |secret_word|

if rodolfo(secret_word.chomp.split(//), rack)
puts secret_word.to_s
end
end
end

------=_Part_12356_13524078.1134027118090--

batkins57 · Dec 8, 2005

One thing you could try is inverting your algorithm. My dict/words
file has 234,000+ entries in it. Since you're looping through the dict
file and then calling rodolfo on each iteration, you have to pay the
cost of calling rodolfo 234,000 times, even though in general only a
couple of entries in the words file are relevant.

So what you could do is first find all the possible permuations of the
letters in your given word. Then loop through the word file once. If
a word matches one of the permutations in your list, remove it from the
list of words to be checked and push it onto a separate list. This
should be significantly faster, since you calculate the combinations up
front, and then do one scan through the list.

You could also call out to a program like aspell or ispell to check
your words for you. Those programs will do something at least as fast
as a binary search, which will take much, much less time than scanning
through the whole thing - even in the worst case, a 234,000 entry file
will be checked in 18 string comparisons, instead of 234,000.

hth,
Bill

Michael Fellinger · Dec 8, 2005

--nextPart2139708.tvCfTPD2iy
Content-Type: text/plain;
charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
Content-Disposition: inline

Hey travis,

you gave me a good time figuring this one out... after some thought i came =
up=20
with something that takes roughly 1 second to search 9,6k words for a=20
anagram.
of course this will go up when you add more words (a task you still can thi=
nk=20
about)
all in all, 5 LoC... but i'm sure there are even better ways...

word =3D "easy".split(//).sort
anagrams =3D IO.readlines('/usr/share/dict/british-english').partition {|l|
l.strip!
(l.size =3D=3D word.size && l.split(//).sort =3D=3D word)
}[0]
anagrams.each{|x| puts x}

##### gives you:
manveru@lambda:~/ruby$ time ruby anagramma.rb
ayes
easy
yeas

real 0m1.097s
user 0m0.936s
sys 0m0.105s

Am Donnerstag, 8. Dezember 2005 06:52 schrieb travis laduke:

it seems to me, with computers these days, this should finish
instantly, not take like 20 seconds.
also, please help me make my ruby more ruby-like. i'm new to ruby,
not that i know any other language.

##these are here from when i was first testing
secret_word =3D "spine"
rack =3D "spine"

secret_word =3D secret_word.split(//)
rack =3D rack.split(//)
test_rack =3D rack.to_s

##are there enough of the right letters in the rack to spell the word
from the dictionary?
##i think i'm going to use funny spanish names for my methods instead
of descriptive names.
def rodolfo(secret_word, rack)
secret_word.each do |x|
rack =3D rack.to_s
if rack.include? x
rack =3D rack.sub(x, '')
##p rack, x
else
##puts x, ' rack don\'t work'
break
end
end
end

puts "reading dictionary"
dict =3D IO.readlines('/usr/share/dict/words')

while rack
puts "enter rack"
rack =3D gets.chomp.split(//)

dict.each do |secret_word|

if rodolfo(secret_word.chomp.split(//), rack)
puts secret_word.to_s
end
end
end

--nextPart2139708.tvCfTPD2iy
Content-Type: application/pgp-signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)

iD8DBQBDl+qRMdQeL6eBxhIRAkCdAKDT1Zzg/c7Ihy87ACnbWCsRy+p52QCfWl5M
xD+ZNzI3TItJawMxk2l0k4Y=
=c3rI
-----END PGP SIGNATURE-----

--nextPart2139708.tvCfTPD2iy--

batkins57 · Dec 8, 2005

I went ahead and implemented my own suggestions, and just to give you
an idea of how fast you can get this, my version can find all valid
anagrams of a five-letter word ("truck") given a 274,000 word
dictionary in 0.231 seconds....real-time.

Martin DeMello · Dec 8, 2005

So what you could do is first find all the possible permuations of the
letters in your given word. Then loop through the word file once. If
a word matches one of the permutations in your list, remove it from the
list of words to be checked and push it onto a separate list. This
should be significantly faster, since you calculate the combinations up
front, and then do one scan through the list.

Even faster - sort the list of permutations, then do an incremental
search against the dictionary: binary search to find the first word in the
list, then repeat, setting the lower bound of your search to the
position immediately after each match when you do the next one.
O(m!*log n) rather than O(m log m * n), which for small m and large n
can lead to a speedup. It might be even faster to search for the middle
word in your permutation list, then do that recursively with the left
and right sublists, passing in the upper and lower bounds as arguments.
All this assumes an in-memory dictionary with O(1) position indexing, of
course.

martin

Brian Schröder · Dec 9, 2005

it seems to me, with computers these days, this should finish
instantly, not take like 20 seconds.
also, please help me make my ruby more ruby-like. i'm new to ruby,
not that i know any other language.

[snip]

if you want to find lots of anagrams against the same database it
would make sense to create an index. Make a hash that is indexed by
the sorted letters and let it point to a list of words that yield this
letter. After you have done this once, you can lookup anagrams in O(1)

cheers,

Brian

Brian Schröder · Dec 9, 2005

it seems to me, with computers these days, this should finish
instantly, not take like 20 seconds.
also, please help me make my ruby more ruby-like. i'm new to ruby,
not that i know any other language.

[snip]

Click to expand...

if you want to find lots of anagrams against the same database it
would make sense to create an index. Make a hash that is indexed by
the sorted letters and let it point to a list of words that yield this
letter. After you have done this once, you can lookup anagrams in O(1)

cheers,

Brian

Here comes a short implementation:

hash =3D
File.read('/usr/share/dict/words').
downcase.
split(/\n/).
inject(Hash.new { | h, k | h[k] =3D [] }) { | h, w |
h[w.split(//).sort] << w; h }

ARGV.each do | word |
puts "Anagrams for #{word}: #{hash[word.split(//).sort].join(' ')}"
end
Anagrams for ruby: ruby bury ruby
Anagrams for duck: duck
Anagrams for type: type
Anagrams for truck: truck
Anagrams for brian: brain brian rabin brain
Anagrams for sort: rots sort tors
Anagrams for index: index nixed

Here are the timings:

The calculation of the index took 3.34749s

Lookup of 7 words took 0.000195s

And lookup of the words scales linearly, so for certain applications
(building an anagram-server e.g.) this may be the best possibility.

Cheers,

Brian

Daniel Berger · Dec 10, 2005

I went ahead and implemented my own suggestions, and just to give you
an idea of how fast you can get this, my version can find all valid
anagrams of a five-letter word ("truck") given a 274,000 word
dictionary in 0.231 seconds....real-time.

I think there's an algorithm for finding anagrams using libgmp, for
which there are Ruby bindings (on the RAA). I'll bet that's fast.

Dan

Is there a faster way? using Find to update repositories	1	Feb 13, 2008
Is there a simple way to find a method definition?	11	Jul 23, 2008
Is there a way to have instance_eval cannot call private/protected method?	2	Jan 26, 2010
Is there a better way to do this?	18	May 3, 2007
New to ruby... is this bad code? Is there a more canon way to do this?	3	Apr 10, 2006
Caching method calls - Is this of interest to anyone?	2	Oct 29, 2006
Is this a good time to start learning python?	16	Mar 31, 2008
What's the most ruby-ish way to write this python code?	8	May 9, 2007

is this a good way to find anagrams?

travis laduke

JB Eriksson

batkins57

Michael Fellinger

batkins57

Martin DeMello

Brian Schröder

Brian Schröder

Daniel Berger

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads