is this a good way to find anagrams?

T

travis laduke

it seems to me, with computers these days, this should finish
instantly, not take like 20 seconds.
also, please help me make my ruby more ruby-like. i'm new to ruby,
not that i know any other language.


##these are here from when i was first testing
secret_word = "spine"
rack = "spine"

secret_word = secret_word.split(//)
rack = rack.split(//)
test_rack = rack.to_s

##are there enough of the right letters in the rack to spell the word
from the dictionary?
##i think i'm going to use funny spanish names for my methods instead
of descriptive names.
def rodolfo(secret_word, rack)
secret_word.each do |x|
rack = rack.to_s
if rack.include? x
rack = rack.sub(x, '')
##p rack, x
else
##puts x, ' rack don\'t work'
break
end
end
end


puts "reading dictionary"
dict = IO.readlines('/usr/share/dict/words')

while rack
puts "enter rack"
rack = gets.chomp.split(//)

dict.each do |secret_word|

if rodolfo(secret_word.chomp.split(//), rack)
puts secret_word.to_s
end
end
end
 
J

JB Eriksson

------=_Part_12356_13524078.1134027118090
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
Content-Disposition: inline

umm... what's the dictionary for?

also, I think there's an easier way to do this. with a bit of spacing and
stuff, I wrote up a 10 line solution. ;)
I also did the splitting string into array thing. Then there's two instance
methods for Array, named "sort" and "=3D=3D" respectively.

An anagram is when two words contain the exact same characters, right? so i=
f
the two splitted strings are the same when sorted, they should be anagrams.

it seems to me, with computers these days, this should finish
instantly, not take like 20 seconds.
also, please help me make my ruby more ruby-like. i'm new to ruby,
not that i know any other language.


##these are here from when i was first testing
secret_word =3D "spine"
rack =3D "spine"

secret_word =3D secret_word.split(//)
rack =3D rack.split(//)
test_rack =3D rack.to_s

##are there enough of the right letters in the rack to spell the word
from the dictionary?
##i think i'm going to use funny spanish names for my methods instead
of descriptive names.
def rodolfo(secret_word, rack)
secret_word.each do |x|
rack =3D rack.to_s
if rack.include? x
rack =3D rack.sub(x, '')
##p rack, x
else
##puts x, ' rack don\'t work'
break
end
end
end


puts "reading dictionary"
dict =3D IO.readlines('/usr/share/dict/words')

while rack
puts "enter rack"
rack =3D gets.chomp.split(//)

dict.each do |secret_word|

if rodolfo(secret_word.chomp.split(//), rack)
puts secret_word.to_s
end
end
end

------=_Part_12356_13524078.1134027118090--
 
B

batkins57

One thing you could try is inverting your algorithm. My dict/words
file has 234,000+ entries in it. Since you're looping through the dict
file and then calling rodolfo on each iteration, you have to pay the
cost of calling rodolfo 234,000 times, even though in general only a
couple of entries in the words file are relevant.

So what you could do is first find all the possible permuations of the
letters in your given word. Then loop through the word file once. If
a word matches one of the permutations in your list, remove it from the
list of words to be checked and push it onto a separate list. This
should be significantly faster, since you calculate the combinations up
front, and then do one scan through the list.

You could also call out to a program like aspell or ispell to check
your words for you. Those programs will do something at least as fast
as a binary search, which will take much, much less time than scanning
through the whole thing - even in the worst case, a 234,000 entry file
will be checked in 18 string comparisons, instead of 234,000.

hth,
Bill
 
M

Michael Fellinger

--nextPart2139708.tvCfTPD2iy
Content-Type: text/plain;
charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
Content-Disposition: inline

Hey travis,

you gave me a good time figuring this one out... after some thought i came =
up=20
with something that takes roughly 1 second to search 9,6k words for a=20
anagram.
of course this will go up when you add more words (a task you still can thi=
nk=20
about)
all in all, 5 LoC... but i'm sure there are even better ways...

word =3D "easy".split(//).sort
anagrams =3D IO.readlines('/usr/share/dict/british-english').partition {|l|
l.strip!
(l.size =3D=3D word.size && l.split(//).sort =3D=3D word)
}[0]
anagrams.each{|x| puts x}

##### gives you:
manveru@lambda:~/ruby$ time ruby anagramma.rb
ayes
easy
yeas

real 0m1.097s
user 0m0.936s
sys 0m0.105s


Am Donnerstag, 8. Dezember 2005 06:52 schrieb travis laduke:
it seems to me, with computers these days, this should finish
instantly, not take like 20 seconds.
also, please help me make my ruby more ruby-like. i'm new to ruby,
not that i know any other language.


##these are here from when i was first testing
secret_word =3D "spine"
rack =3D "spine"

secret_word =3D secret_word.split(//)
rack =3D rack.split(//)
test_rack =3D rack.to_s

##are there enough of the right letters in the rack to spell the word
from the dictionary?
##i think i'm going to use funny spanish names for my methods instead
of descriptive names.
def rodolfo(secret_word, rack)
secret_word.each do |x|
rack =3D rack.to_s
if rack.include? x
rack =3D rack.sub(x, '')
##p rack, x
else
##puts x, ' rack don\'t work'
break
end
end
end


puts "reading dictionary"
dict =3D IO.readlines('/usr/share/dict/words')

while rack
puts "enter rack"
rack =3D gets.chomp.split(//)

dict.each do |secret_word|

if rodolfo(secret_word.chomp.split(//), rack)
puts secret_word.to_s
end
end
end

--nextPart2139708.tvCfTPD2iy
Content-Type: application/pgp-signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)

iD8DBQBDl+qRMdQeL6eBxhIRAkCdAKDT1Zzg/c7Ihy87ACnbWCsRy+p52QCfWl5M
xD+ZNzI3TItJawMxk2l0k4Y=
=c3rI
-----END PGP SIGNATURE-----

--nextPart2139708.tvCfTPD2iy--
 
B

batkins57

I went ahead and implemented my own suggestions, and just to give you
an idea of how fast you can get this, my version can find all valid
anagrams of a five-letter word ("truck") given a 274,000 word
dictionary in 0.231 seconds....real-time.

:D
 
M

Martin DeMello

So what you could do is first find all the possible permuations of the
letters in your given word. Then loop through the word file once. If
a word matches one of the permutations in your list, remove it from the
list of words to be checked and push it onto a separate list. This
should be significantly faster, since you calculate the combinations up
front, and then do one scan through the list.

Even faster - sort the list of permutations, then do an incremental
search against the dictionary: binary search to find the first word in the
list, then repeat, setting the lower bound of your search to the
position immediately after each match when you do the next one.
O(m!*log n) rather than O(m log m * n), which for small m and large n
can lead to a speedup. It might be even faster to search for the middle
word in your permutation list, then do that recursively with the left
and right sublists, passing in the upper and lower bounds as arguments.
All this assumes an in-memory dictionary with O(1) position indexing, of
course.

martin
 
B

Brian Schröder

it seems to me, with computers these days, this should finish
instantly, not take like 20 seconds.
also, please help me make my ruby more ruby-like. i'm new to ruby,
not that i know any other language.

[snip]

if you want to find lots of anagrams against the same database it
would make sense to create an index. Make a hash that is indexed by
the sorted letters and let it point to a list of words that yield this
letter. After you have done this once, you can lookup anagrams in O(1)

cheers,

Brian
 
B

Brian Schröder

it seems to me, with computers these days, this should finish
instantly, not take like 20 seconds.
also, please help me make my ruby more ruby-like. i'm new to ruby,
not that i know any other language.

[snip]

if you want to find lots of anagrams against the same database it
would make sense to create an index. Make a hash that is indexed by
the sorted letters and let it point to a list of words that yield this
letter. After you have done this once, you can lookup anagrams in O(1)

cheers,

Brian

Here comes a short implementation:

hash =3D
File.read('/usr/share/dict/words').
downcase.
split(/\n/).
inject(Hash.new { | h, k | h[k] =3D [] }) { | h, w |
h[w.split(//).sort] << w; h }


ARGV.each do | word |
puts "Anagrams for #{word}: #{hash[word.split(//).sort].join(' ')}"
end
Anagrams for ruby: ruby bury ruby
Anagrams for duck: duck
Anagrams for type: type
Anagrams for truck: truck
Anagrams for brian: brain brian rabin brain
Anagrams for sort: rots sort tors
Anagrams for index: index nixed

Here are the timings:

The calculation of the index took 3.34749s

Lookup of 7 words took 0.000195s

And lookup of the words scales linearly, so for certain applications
(building an anagram-server e.g.) this may be the best possibility.

Cheers,

Brian
 
D

Daniel Berger

I went ahead and implemented my own suggestions, and just to give you
an idea of how fast you can get this, my version can find all valid
anagrams of a five-letter word ("truck") given a 274,000 word
dictionary in 0.231 seconds....real-time.

I think there's an algorithm for finding anagrams using libgmp, for
which there are Ruby bindings (on the RAA). I'll bet that's fast. :)

Dan
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,009
Latest member
GidgetGamb

Latest Threads

Top