[QUIZ] Word Chains (#44)

Joel VanderWerf · Aug 28, 2005

Dominik said:
There is nothing in the stdlib currently, but there is rbtree and a RCR
to include it in the stdlib:

http://www.rcrchive.net/rcr/show/306

Here's one version using rbtree:

http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-talk/133202

Joel VanderWerf · Aug 28, 2005

Levin said:
Dominik Bathon said:

There is nothing in the stdlib currently, but there is rbtree and a
RCR to include it in the stdlib:

http://www.rcrchive.net/rcr/show/306

Click to expand...

I tried to build it but it fails the unit tests

The check for "rb_block_proc" and the later warnings seems to suggest
that there is something wrong with my system. Any hints?

Thank You,
Levin

$ ruby -v
ruby 1.8.3 (2005-06-23) [i486-linux]
$ ruby extconf.rb
checking for allocation framework... yes
checking for rb_obj_init_copy()... no
checking for rb_block_proc()... no
checking for rb_yield_values()... no
checking for rb_marshal_dump()... no
checking for rb_marshal_load()... no
checking for inline keyword... __inline
creating Makefile

Looks like your ruby headers or libs aren't where extconf.rb expects
them. Did you build ruby from source? It might help to look at

ruby -r rbconfig -e 'p Config::CONFIG'

and see if the paths listed there are where the ruby installed files
are. (Sorry to be so vague...)

Gavin Kistner · Aug 28, 2005

An interesting chain to stress-test your algorithms is "sahara" <->
"cloudy" (46 links)
(in my dictionary at least)

Mine is missing many of the words in that chain. What dictionary is
yours?

Dominik Bathon · Aug 28, 2005

Here is my solution.

It uses breadth first search (unidirectional) and is quite fast.
I use a trick to find all possible steps fast: While reading the words I =
=20
store them in a hash of arrays, e.g. when adding the word "dog", I =20
register "dog" as step for "\0og", "d\0g" and "do\0", later while look fo=
r =20
a step for "dot" I will use all words registered for "\0ot", "d\0t" or =20
"do\0". So reading the dictionary takes some time (and memory), but =20
afterwards finding the shortest chains is really fast. I also use symbols=
=20
instead of strings in some places for faster hash-lookup.

Of course I have also implemented my extra bonus challenge, to find a =20
longest shortest chain and Simon's suggestion to allow words with =20
different length.

Some examples:

$ time ruby word_chain.rb duck ruby
duck
luck
luce
lube
rube
ruby

real 0m1.414s
user 0m1.326s
sys 0m0.028s
$ time ruby word_chain.rb ape human
ape
are
arm
aum
hum
huma
human

real 0m2.604s
user 0m2.484s
sys 0m0.052s
$ time ruby word_chain.rb computer a
computer
compute
compote
compose
compos
compo
campo
camp
cam
aam
aa
a

real 0m12.550s
user 0m11.890s
sys 0m0.232s

The examples with words of different length take longer because much more=
=20
words from the dictionary have to be added to the hash.

$ time ruby longest_shortest.rb 3
edh
edo
ado
aho
pho
phi
poi
poa
pya
mya

real 0m4.840s
user 0m4.615s
sys 0m0.055s
$ time ruby longest_shortest.rb 6
zounds
wounds
woundy
[49 lines]
pantry
paltry
peltry

real 1m12.614s
user 1m10.113s
sys 0m0.655s
$ time ruby longest_shortest.rb 9
unpausing
unpassing
unpasting
unwasting
unwaiting
unwailing
unfailing
unfalling
unfilling
untilling
untelling
unselling
unsealing
unhealing
unhearing
unwearing
unweaving
unreaving
unreeving
unreeling
unfeeling
unfeeding
unheeding

real 0m9.252s
user 0m8.836s
sys 0m0.158s
$ time ruby longest_shortest.rb 12
modification
codification
conification
sonification
sinification
vinification
vilification
bilification

real 0m7.492s
user 0m7.127s
sys 0m0.138s

Dominik

The code:

# word_chain.rb
####################

DEFAULT_DICTIONARY =3D "/usr/share/dict/words"

# Data structure that efficiently stores words from a dictionary in a way=
, =20
that
# it is easy to find all words that differ from a given word only at one
# letter (words that could be the next step in a word chain).
# Example: when adding the word "dog", add_word will register "dog" as =20
step for
# "\0og", "d\0g" and "do\0", later each_possible_step("cat") will yield =20
all words
# registered for "\0at", "c\0t" or "ca\0".
class WordSteps
def initialize
@steps =3D Hash.new { |h, k| h[k] =3D [] }
@words =3D {}
end

# yields all words (as strings) that were added with add_word
def each_word(&block)
@words.each_key(&block)
end

# add all steps for word (a string) to the steps
def add_word(word)
sym =3D word.to_sym
wdup =3D word.dup
for i in 0...word.length
wdup =3D 0
@steps[wdup] << sym
wdup =3D word
end
@words[word] =3D sym # for allow_shorter and each_word
end

# yields each possible next step for word (a string) as symbol, some
# possible steps might be yielded multiple times
# if allow_shorter is true, word[0..-2].to_sym will also be yielded =
if
# available
# if allow_longer is true, all words that match /#{word}./ will be =20
yielded
def each_possible_step(word, allow_shorter =3D false, allow_longer =3D=
=20
false)
wdup =3D word.dup
for i in 0...word.length
wdup =3D 0
if @steps.has_key?(wdup)
@steps[wdup].each { |step| yield step }
end
wdup =3D word
end
if allow_shorter && @words.has_key?(tmp =3D word[0..-2])
yield @words[tmp]
end
if allow_longer && @steps.has_key?(tmp =3D word + "\0")
@steps[tmp].each { |step| yield step }
end
end

# tries to find a word chain between word1 and word2 (strings) using=
=20
all
# available steps
# returns the chain as array of symbols or nil, if no chain is found
# shorter/longer determines if shorter or longer words are allowed i=
n =20
the
# chain
def build_word_chain(word1, word2, shorter =3D false, longer =3D fal=
se)
# build chain with simple breadth first search
current =3D [word1.to_sym]
pre =3D { current[0] =3D> nil } # will contain the predecessors
target =3D word2.to_sym
catchdone) do
until current.empty?
next_step =3D []
current.each do |csym|
each_possible_step(csym.to_s, shorter, longer) do =20
|ssym|
# have we seen this word before?
unless pre.has_key? ssym
pre[ssym] =3D csym
throwdone) if ssym =3D=3D target
next_step << ssym
end
end
end
current =3D next_step
end
return nil # no chain found
end
# build the chain (in reverse order)
chain =3D [target]
chain << target while target =3D pre[target]
chain.reverse
end

# builds and returns a WordSteps instance "containing" all words wit=
h
# length in length_range from the file file_name
def self.load_from_file(file_name, length_range)
word_steps =3D new
IO.foreach(file_name) do |line|
# only load words with correct length
if length_range =3D=3D=3D (word =3D line.strip).length
word_steps.add_word(word.downcase)
end
end
word_steps
end
end

if $0 =3D=3D __FILE__
dictionary =3D DEFAULT_DICTIONARY

# parse arguments
if ARGV[0] =3D=3D "-d"
ARGV.shift
dictionary =3D ARGV.shift
end
unless ARGV.size =3D=3D 2
puts "usage: #$0 [-d path/to/dictionary] word1 word2"
exit 1
end
word1, word2 =3D ARGV[0].strip.downcase, ARGV[1].strip.downcase

shorter =3D word1.length > word2.length
longer =3D word1.length < word2.length
length_range =3D if longer
word1.length..word2.length
else
word2.length..word1.length
end

# read dictionary
warn "Loading dictionary..." if $DEBUG
word_steps =3D WordSteps.load_from_file(dictionary, length_range)
word_steps.add_word(word2) # if it is not in dictionary

# build chain
warn "Building chain..." if $DEBUG
chain =3D word_steps.build_word_chain(word1, word2, shorter, longer)

# print result
puts chain || "No chain found!"
end

# longest_shortest.rb
####################

require "word_chain"

class WordSteps
# builds a longest shortest chain starting with word and returns it =
as
# array of symbols
# if the found chain is shorter than old_max, then it is possible to
# determine other words, whose longest shortest chain will also be n=
ot
# longer than old_max, those words will be added to not_longer, that=
=20
way
# the caller can avoid searching a longest chain for them
def longest_word_chain(word, not_longer =3D {}, old_max =3D 0)
# build chain with simple breadth first search until no more ste=
ps =20
are
# possible, one of the last steps is then a longest shortest cha=
in
current =3D [word.to_sym]
pre =3D { current[0] =3D> nil } # will contain the predecessors
target =3D nil
iterations =3D []
loop do
next_step =3D []
iterations << current
current.each do |csym|
each_possible_step(csym.to_s) do |ssym|
unless pre.has_key? ssym
pre[ssym] =3D csym
next_step << ssym
end
end
end
if next_step.empty?
target =3D current[0]
break
else
current =3D next_step
end
end

# build the chain (in reverse order)
chain =3D [target]
chain << target while target =3D pre[target]

# add words to not_longer if possible (see above)
if chain.size < old_max
(0...([old_max+1-chain.size, iterations.size].min)).each do =
|i|
iterations.each { |w| not_longer[w] =3D true }
end
end
chain.reverse
end
end

if $0 =3D=3D __FILE__
dictionary =3D DEFAULT_DICTIONARY

# parse arguments
if ARGV[0] =3D=3D "-d"
ARGV.shift
dictionary =3D ARGV.shift
end
word_length =3D [ARGV.shift.to_i, 1].max

# read dictionary
warn "Loading dictionary..." if $DEBUG
word_steps =3D WordSteps.load_from_file(dictionary, word_length)

# search chain
warn "Searching longest chain..." if $DEBUG
max =3D 0
chain =3D nil
not_longer =3D {}
word_steps.each_word { |w|
next if not_longer[w.to_sym]
cur =3D word_steps.longest_word_chain(w, not_longer, max)
if cur.size > max
chain =3D cur
max =3D cur.size
warn chain.inspect if $DEBUG
end
}

# print result
puts chain || "No chain found!"
end

Linus Sellberg · Aug 28, 2005

things up. I hence post my new solution. It's still a unidirectional
depth-first search, but it keeps track of visitation depth on nodes,

Did you consider making it into an iterative deepening search instead?
That would get rid of the need to test with several search depths

Levin Alexander · Aug 29, 2005

Gavin Kistner said:
Mine is missing many of the words in that chain. What dictionary is =20
yours?

"/usr/share/dict/british-english" from Ubuntu Hoary.

You can get the Package from =20
<http://packages.debian.org/unstable/text/wbritish>

-Levin

horndude77 · Aug 29, 2005

My solution uses the same method of others here: bi-directional
breadth-first-search. This is usually the first thought when looking
for a shortest path in a non-weighted graph. The only difference is
that after I load the dictionary and set up all of the graph edges (My
algorithm right now is pretty slow for doing this), I use Marshal to
save it off to disk. So I only have to load the dictionary and set up
the graph once. When I do the actual graph search it works pretty
quick. I haven't found an instance where it takes more than half a
second. Since I'm doing this I also had it check for the word lengths
and then decide which dictionary to use instead of the -d option.

dictionary.rb
==========

class Dictionary
# This takes in a straight array of words. (It is assumed that they
# are all the same length, but I should add better error checking
# on this at some point.)
def initialize(words)
@words = Hash.new
words.each do |word|
wordmaskarr = []
word.length.times { |i| wordmaskarr <<
"#{word[0...i]}.#{word[i+1...word.length]}"}
#puts "#{word}:\t/#{wordmaskarr.join('|')}/"
wordmask = Regexp.new(wordmaskarr.join('|'))

@words[word] ||= []
words.each do |otherword|
if(otherword =~ wordmask && otherword != word) then
@words[otherword] ||= []
#puts "\t\tfound match: #{otherword}"
@words[word] |= [otherword]
@words[otherword] |= [word]
end
end
end
end

def chain(from, to)
if(!@words[from]) then
puts "#{from} not in dictionary"
return []
elsif(!@words[to]) then
puts "#{to} not in dictionary"
return []
elsif(from==to)
return [from]
end
linknode = nil
#these hashes are used to keep links back to where they came from
fromedges = {from => ""}
toedges = {to => ""}
#these are queues used for the breadth first search
fromqueue = [from]
toqueue = [to]
while(toqueue.length>0 && fromqueue.length>0)
fromnode = fromqueue.shift
tonode = toqueue.shift
if(toedges[fromnode] || fromnode==to) then
linknode = fromnode
break
elsif(fromedges[tonode] || tonode==from) then
linknode = tonode
break
end

@words[fromnode].each do |i|
if(!fromedges) then
fromedges = fromnode
fromqueue.push(i)
end
end

@words[tonode].each do |i|
if(!toedges) then
toedges = tonode
toqueue.push(i)
end
end
end
if(linknode == nil) then
return nil
end
chain = []
currnode = linknode
while(fromedges[currnode] != "")
chain.unshift(currnode)
currnode = fromedges[currnode]
end
currnode = toedges[linknode]
while(toedges[currnode] != "")
chain.push(currnode)
currnode = toedges[currnode]
end
return [from]+chain+[to]
end
end

makedicts.rb
==========
require 'dictionary'
dicts = Hash.new
File.open("WORD.LST") do |file|
file.each_line do |line|
line.chomp!.downcase!
(dicts[line.length] ||= []) << line
end
end

dicts.each do |len,dict|
dict.sort!
File.open("dict#{0 if len<10}#{len}.txt", "w") do |file|
file.write(dict.join("\n"))
end
end

dicts.each do |len, dict|
if len<50 then
puts "Making dictionary graph for #{len} letter words"
currdict = Dictionary.new(dict)
File.open("dict#{0 if len<10}#{len}.dump", "w") do |file|
Marshal.dump(currdict, file)
end
end
end

wordchain.rb
==========
require 'dictionary'
if(ARGV.length != 2) then puts "Two words required"; exit(1) end
from = ARGV[0]
to = ARGV[1]
if(from.length != to.length) then puts "Word are different lengths";
exit(1) end
dict = nil
File.open("dict#{0 if from.length<10}#{from.length}.dump") do |file|
dict = Marshal.load(file)
chain = dict.chain(from, to)
if(chain) then
puts chain.join("\n")
else
puts "No link found"
end
end

Example runs:
$ time ruby wordchain.rb duck ruby
duck
duce
luce
lube
rube
ruby

real 0m0.196s
user 0m0.186s
sys 0m0.030s

$ time ruby wordchain.rb kitten rabies
kitten
kitted
kilted
killed
gilled
galled
gabled
gables
gabies
rabies

real 0m0.347s
user 0m0.343s
sys 0m0.030s

Daniel Amelang · Aug 29, 2005

Just the other day I needed 'time' for windows, and wrote a 3 line
ruby script I called 'rtime' Really trivial stuff.

start_time =3D Time.now
system ARGV.join(" ")
$stderr.puts "#{Time.now - start_time} seconds"

Of course, it just gives you the real time it took, nothing about user
time, etc. The main point is how very easy it was.

Dan

Adam Shelly · Aug 29, 2005

Here's my solution,
It's another bidirectional breadth first search
I coded it up before I saw all the others, but I think it ended up
being pretty similar.
-Adam
---
# Ruby Quiz Word Chain Solution
# Adam Shelly
# Late August, 2005

# Usage wordchain [-d dictionary] [-l] word word ...
# finds chains between every pair of words on the command line.
# uses dictionary file specified by -d, or /usr/share/dict/words by defau=
lt.
# if -l is specified finds the longest chain from each word.

class String
def shift
self.slice(1,length-1)
end
def parent=3D p
@parent =3D p=20
end
def parent
@parent
end
end

class WCSolver
#Loads only the words with a given length. =20
#didn't add much speed, but saves memory.
def initialize(wordlist, length)
@t =3D Hash.new
@Len =3D length=20
add_words(wordlist)
end
=20
def add_words(wordlist)
w2=3Dnil
wordlist.each {|w| @t[w2]=3Dtrue if (w2=3Dw.strip).length =3D=3D @l=
en }
end

def neighbors word
list =3D []
word.size.times do |i|
w =3D word.dup
w.parent =3D word
?a.upto(?z) {
|c|w=3Dc=20
=09list << w.dup if (@t[w] )
}
end
list.uniq
end

#bidirectional breadth first search
def solve first, last
puts "Connecting #{first} -> #{last}"
return nil if @Len && (first.length !=3D @Len || last.length !=3D @=
len) =20
seenhead, seentail, match =3D [],[],[]
head,tail =3D [first],[last]
while match =3D=3D []
r =3D head.collect{|w| neighbors(w)}.flatten - seenhead
=09 return nil if r =3D=3D []
seenhead +=3D head =3D r
match =3D head & tail
break unless (match =3D=3D [])
=09 r=3D tail.collect{|w| neighbors(w)}.flatten - seentail
=09 return nil if r =3D=3D []
=09 seentail +=3D tail =3D r
=09 match =3D head & tail
end
r =3D [ head.find {|w| w=3D=3Dmatch[0]}]
while r[-1].parent do r << r[-1].parent end
r.reverse!
r[-1]=3Dtail.find {|w| w=3D=3Dmatch[0]}
while r[-1].parent do r << r[-1].parent end
r
end

def longest from
puts "Longest Chain from #{from}"
seen,head =3D [],[from]
while true
r=3D head.collect{|w| neighbors(w)}.flatten - seen
break if r =3D=3D []
seen +=3D head =3D r
end
l =3D [head[0]]
while l[-1].parent do
l << l[-1].parent
end
l.reverse!
end
end
=20
if $0 =3D=3D __FILE__
puts "Wordchain Finder\n"

file =3D ARGV.slice!(ARGV.index('-d'),2)[1] if ARGV.include?'-d'
file ||=3D "/usr/share/dict/words"=20

showlong =3D ARGV.slice!(ARGV.index('-l'),1) if ARGV.include?'-l'
length =3D ARGV[0].length if ARGV[0]

wordlist =3D File.new(file, "r")
if (wordlist)
s =3D WCSolver.new(wordlist,length)
while (w =3D ARGV.shift)
unless (showlong)
puts s.solve(w, ARGV[0]) if ARGV[0]
else
puts s.longest(w)=20
end
end
end
end

Will Thimbleby · Aug 29, 2005

My last email seemed to disappear in the ether, so here it is again.

Hi, one very simple (and short .. and slow) solution. --will

require 'set'

#read in every word into a hash
def read_dictionary(filename, length)
words = IO.readlines(filename).collect{ |l|
l.strip.downcase }.reject{ |word| word.length != length }

Set.new(words)
end

#computes all possible neighbours of a word
def get_neighbours(word)
allneigbours = Set.new

for i in 0...word.length
for c in 97..122
neighbour = word.dup
neighbour = c
allneigbours.add(neighbour)
end
end

allneigbours
end

#performs the word chain search
def compute(a, b, dict)
raise "words are different lengths" if a.length != b.length

#setup
dictionary = read_dictionary(dict, a.length)

raise b+" not in dictionary" if !dictionary.include?(b)

queue = [ [a] ]
seen = Set.new([a])

#very simple breadth first search
while queue.length > 0
solution = queue.delete_at(0)

break if solution.last == b

#get neighbours
neighbours = get_neighbours(solution.last)
neighbours &= dictionary
neighbours -= seen

seen += neighbours

#add onto queue
neighbours.each_with_index { |obj, i|
queue.push( solution.dup.push(obj) )
}
end

raise "cannot create word chain" if solution.last != b

puts solution
end

#parse options very simply
dict = "/usr/share/dict/words"

a = ARGV.shift

if a == "-d"
dict = ARGV.shift
a = ARGV.shift
end

b = ARGV.shift

#do it
compute(a, b, dict)

Gavin Kistner · Aug 29, 2005

--Apple-Mail-1-982118735
Content-Transfer-Encoding: 7bit
Content-Type: text/plain;
charset=US-ASCII;
delsp=yes;
format=flowed

"/usr/share/dict/british-english" from Ubuntu Hoary.

You can get the Package from <http://packages.debian.org/unstable/
text/wbritish>

Thanks...took me a bit to figure out how unpack a .deb file and get
the dictionary out of it. My computer/algorithm is not as fast as
yours, as it takes 210 seconds with a maximum chain limit of 100, or
60 seconds with a chain limit of exactly 46

I also had to modify my program, since it does not normally accept
capitalized words as an acceptable variation, which were necessary
for that chain (including the first link). My Scrabble roots showing
through, I suppose.

I would not have suspected that there could be a minimum-length chain
that long for such short words. Very interesting

--Apple-Mail-1-982118735--

James Edward Gray II · Aug 29, 2005

Now I actually feel like my code has a chance of competing with the
'big boys'. Yay!
James, could you run some speed comparisons of various solutions in
your summary analysis?

I will try, yes.

James Edward Gray II

William James · Aug 30, 2005

Short:

D=IO.read('dict').split($/).grep(/^.{#{$*[0].size}}$/)
def rate(new,old,goal,l)
t=0; v=0; new.size.times{|i|t+=1 if new!=old
v+=1 if new==goal }
[ 1==t && nil==l.index(new), v ]
end
def m(a,b,l)
l << a.dup
if a==b then p l; exit; end
D.inject([]){|w,x| y,v = rate(x,a,b,l)
w << [v,x] if y ; w
}.sort.reverse_each{|v,x| m(x,b,l) }
end
m($*[0], $*[1], [])

Simon Kröger · Aug 30, 2005

William said:
Short:

D=IO.read('dict').split($/).grep(/^.{#{$*[0].size}}$/)
def rate(new,old,goal,l)
t=0; v=0; new.size.times{|i|t+=1 if new!=old
v+=1 if new==goal }
[ 1==t && nil==l.index(new), v ]
end
def m(a,b,l)
l << a.dup
if a==b then p l; exit; end
D.inject([]){|w,x| y,v = rate(x,a,b,l)
w << [v,x] if y ; w
}.sort.reverse_each{|v,x| m(x,b,l) }
end
m($*[0], $*[1], [])

Using Dominik's nice idea i boiled it down to:
(the dictionary is the optional third parameter, no -d)

-----------------------------------------------------------
dict, len = Hash.new{|h,k|h[k] = []}, ARGV[0].size
IO.foreach(ARGV[2] || 'words.txt') do |w| w.chomp!
if w.size != len then next else s = w.dup end
(0...w.size).each{|i|s=?.; dict << w; s=w}
end
t, known = {ARGV[1] => 0}, {}
while !known.merge!(t).include?(ARGV[0])
t = t.keys.inject({}){|h, w|(0...w.size).each{|i|
s=w.dup; s=?.; dict.each{|l|h[l] = w if !known[l]}};h}
warn 'no way!' or exit if t.empty?
end
puts w = ARGV[0]; puts w while (w = known[w]) != 0
-----------------------------------------------------------

It's still quite fast.

BTW:
Is there a nice way to construct a hash, like #map does?
like:
array = (1..10).map{|i| i}
hash = (1..10).map_to_hash{|i| i => i*2}

(i always use inject in a 'not so nice' way)

cheers

Simon

[SUMMARY] Word Chains (#44)	12	Sep 1, 2005
[QUIZ] Word Chains (#65)	6	Feb 3, 2006
[SOLUTION] Word Chains (#44)	0	Aug 30, 2005
[QUIZ] URL Shenanigans (#220)	1	Sep 14, 2009
[QUIZ] Morse Code (#121)	25	Apr 20, 2007
[QUIZ] Word Blender (#108)	30	Jan 5, 2007
[QUIZ] Word Loop (#149)	44	Dec 7, 2007
[QUIZ] Word Search Generator (#159, again)	3	Apr 11, 2008

[QUIZ] Word Chains (#44)

Joel VanderWerf

Joel VanderWerf

Gavin Kistner

Dominik Bathon

Linus Sellberg

Levin Alexander

horndude77

Daniel Amelang

Adam Shelly

Will Thimbleby

Gavin Kistner

James Edward Gray II

William James

Simon Kröger

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads