[QUIZ] Crossword Solver (#132)

R

Ruby Quiz

The three rules of Ruby Quiz:

1. Please do not post any solutions or spoiler discussion for this quiz until
48 hours have passed from the time on this message.

2. Support Ruby Quiz by submitting ideas as often as you can:

http://www.rubyquiz.com/

3. Enjoy!

Suggestion: A [QUIZ] in the subject of emails about the problem helps everyone
on Ruby Talk follow the discussion. Please reply to the original quiz message,
if you can.

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

by Eugene Kalenkovich

Write a Ruby crossword solver. Randomly fill a crossword template with words
from a dictionary. Use any dictionary you want (/usr/share/dict/words, text of
pickaxe, etc.).

Template is provided in a file formatted very similarly to one in quiz #10
(http://www.rubyquiz.com/quiz10.html), with "X" substituted with "#". Simple
template example:

_ _ _ _ _

_ # _ # _

_ _ _ _ _

_ # _ # _

_ _ _ _ _

Format output any way you want, just make it readable. Each run of the program
with big enough dictionary should give a different solution. Example results for
a simple template:

F U G U E

U U R

D E I G N

G D S

E J E C T



R E S T S

I K T

N I E C E

D W E

S U S A N

Bonus 1 (simple). Allow partially pre-filled templates:

# # _ # # # # M

_ _ _ _ _ _ # A

# # _ # # _ # T

R U B Y Q U I Z

U # _ # # _ # #

B # _ _ _ _ _ _

Y # # # # _ # #

One of result variants:

U M

P A S C A L A

A O T

R U B Y Q U I Z

U L N

B Y E A G E R

Y E

Bonus 2: Avoid combinatorial explosion. Fill a big template within reasonable
time:

_ _ _ _ # _ _ _ _ _ _ _ _ _ # _ _ _ _

_ # _ # _ # _ # _ # _ # _ # _ # _ # _

_ _ _ _ _ _ _ _ _ # _ _ _ _ _ _ _ _ _

_ # _ # _ # _ # _ _ _ # _ # _ # _ # _

# _ _ _ _ # _ # _ # _ # _ # _ _ _ _ #

_ # _ # # # _ _ _ _ _ _ _ # # # _ # _

_ _ _ _ _ _ # # _ # _ # # _ _ _ _ _ _

_ # _ # # _ # _ _ _ _ _ # _ # # _ # _

_ _ _ _ _ _ _ _ # _ # _ _ _ _ _ _ _ _

_ # # _ # _ # _ _ _ _ _ # _ # _ # # _

_ _ _ _ _ _ _ _ # _ # _ _ _ _ _ _ _ _

_ # _ # # _ # _ _ _ _ _ # _ # # _ # _

_ _ _ _ _ _ # # _ # _ # # _ _ _ _ _ _

_ # _ # # # _ _ _ _ _ _ _ # # # _ # _

# _ _ _ _ # _ # _ # _ # _ # _ _ _ _ #

_ # _ # _ # _ # _ _ _ # _ # _ # _ # _

_ _ _ _ _ _ _ _ _ # _ _ _ _ _ _ _ _ _

_ # _ # _ # _ # _ # _ # _ # _ # _ # _

_ _ _ _ # _ _ _ _ _ _ _ _ _ # _ _ _ _



C H A O L E N I E N T L Y C O I L

Y V L M D O E O K A

S H A R E A B L E O E D I P A L L Y

T I A R N U N G E A S

F L I P Y T T E C O H N

I A O L I V I E R O I

R E B U F F F M A S H M A N

R L A B Y T E S D A T

I N E R T I A L Y I S S U A N C E

T U R A S P E N O D N

A L T E R E R S E U P R O O T E D

T O S E A S E S B O I

E X T A N T B N S U N T A N

S T U T R E C H T A G

W E E P N A L R D E L L

P R U T M A O U A L M

R E I N S E R T S S O M E W H E R E

O N H U O E A N R A

S A G S B E R N A D I N E U S E D
 
L

Lloyd Linklater

Where can I get a file with the dictionary list? I would rather not
have to make my own.
 
L

Lloyd Linklater

Ball said:
http://wordlist.sourceforge.net/ is a good resource, especially if
you're on Windows and don't have /usr/share/dict/words

Perfect! I got this, then chose the 6 files that were straight
wordlists. I wrote a program to read each file, change it to uppercase,
get rid of non alpha characters (which are no good for the puzzle), get
rid of all duplicates and write the new (sorted) list to a new file. It
has something like 108k words. nice!

Thanks all!
 
A

Andreas Launila

Ruby said:
Write a Ruby crossword solver. Randomly fill a crossword template with words
from a dictionary. Use any dictionary you want (/usr/share/dict/words, text of
pickaxe, etc.).

This solution requires Gecode/R ( http://gecoder.rubyforge.org/ ).

== Overview

The problem is modelled as a matrix of cells where each cell can be
assigned a number representing the letters a..z and #. The letters are
converted to integers in order to use integer variables to represent
them. Similarly each word is converted to a base 26 number with the
converted letters as digits. Hence allowing the words to be represented
as integers as well.

The constraint that sequences of cells longer than one letter must form
words is then added by constraining that the linear combination of the
involved cells forming the corresponding base 26 number, with each
variable as one digit, must be equal to one of the words converted to a
number.

== Weaknesses

It has two major problems
* The words converted to numbers have to fit in the range of an integer
variable, which only allows words of up to 6 characters in my case. This
can probably be helped by using multiple numbers to represent each word.
* There's no randomness in the exploration. Gecode itself does support
custom branching (which would allow such randomness), but Gecode/R does not.

In some cases the exploration will take a long time. I'm unsure whether
it's because of flaws in the model or because of the problem's difficulty.

== Code

require 'enumerator'
require 'rubygems'
require 'gecoder'

# The base we use when converting words to and from numbers.
BASE = ('a'..'z').to_a.size
# The offset of characters compared to digits in word-numbers.
OFFSET = 'a'[0]
# The range of integers that we allow converted words to be in. We are
# only using the unsigned half, we could use both halves, but it would
complicate
# things without giving a larger allowed word length.
ALLOWED_INT_RANGE = 0..Gecode::Raw::Limits::Int::INT_MAX
# The maximum length of a word allowed.
MAX_WORD_LENGTH = (Math.log(ALLOWED_INT_RANGE.last) /
Math.log(BASE)).floor

# Describes an immutable dictionary which represents all contained words
# as numbers of base BASE where each digit is the corresponding letter
# itself converted to a number of base BASE.
class Dictionary
# Creates a dictionary from the contents of the specified dictionary
# file which is assumed to contain one word per line and be sorted.
def initialize(dictionary_location)
@word_arrays = []
File.open(dictionary_location) do |dict|
previous_word = nil
dict.each_line do |line|
word = line.chomp.downcase
# Only allow words that only contain the characters a-z and are
# short enough.
next if previous_word == word or word.size > MAX_WORD_LENGTH or
word =~ /[^a-z]/
(@word_arrays[word.length] ||= []) << self.class.s_to_i(word)
previous_word = word
end
end
end

# Gets an enumeration containing all numbers representing word of the
# specified length.
def words_of_size(n)
@word_arrays[n] || []
end

# Converts a string to a number of base BASE (inverse of #i_to_s ).
def self.s_to_i(string)
string.downcase.unpack('C*').map{ |x| x - OFFSET }.to_number(BASE)
end

# Converts a number of base BASE back to the corresponding string
# (inverse of #s_to_i ).
def self.i_to_s(int)
res = []
loop do
digit = int % BASE
res << digit
int /= BASE
break if int.zero?
end
res.reverse.map{ |x| x + OFFSET }.pack('C*')
end
end

class Array
# Computes a number of the specified base using the array's elements
# as digits.
def to_number(base = 10)
inject{ |result, variable| variable + result * base }
end
end

# Models the solution to a partially completed crossword.
class Crossword < Gecode::Model
# The template should take the format described in RubyQuiz #132 . The
# words used are selected from the specified dictionary.
def initialize(template, dictionary)
@dictionary = dictionary

# Break down the template and create a corresponding square matrix.
# We let each square be represented by integer variable with domain
# -1...BASE where -1 signifies # and the rest signify letters.
squares = template.split(/\n\s*\n/).map!{ |line| line.split(' ') }
@letters = int_var_matrix(squares.size, squares.first.size,
-1...BASE)

# Do an initial pass, filling in the prefilled squares.
squares.each_with_index do |row, i|
row.each_with_index do |letter, j|
unless letter == '_'
# Prefilled letter.
@letters[i,j].must == self.class.s_to_i(letter)
end
end
end

# Add the constraint that sequences longer than one letter must form
# words. @words will accumelate all word variables created.
@words = []
# Left to right pass.
left_to_right_pass(squares, @letters)
# Top to bottom pass.
left_to_right_pass(squares.transpose, @letters.transpose)

branch_on wrap_enum(@words), :variable => :largest_degree,
:value => :min
end

# Displays the solved crossword in the same format as shown in the
# quiz examples.
def to_s
output = []
@letters.values.each_slice(@letters.column_size) do |row|
output << row.map{ |x| self.class.i_to_s(x) }.join(' ')
end
output.join("\n\n").upcase.gsub('#', ' ')
end

private

# Parses the template from left to right, line for line, constraining
# sequences of two or more subsequent squares to form a word in the
# dictionary.
def left_to_right_pass(template, variables)
template.each_with_index do |row, i|
letters = []
row.each_with_index do |letter, j|
if letter == '#'
must_form_word(letters) if letters.size > 1
letters = []
else
letters << variables[i,j]
end
end
must_form_word(letters) if letters.size > 1
end
end

# Converts a word from integer form to string form, including the #.
def self.i_to_s(int)
if int == -1
return '#'
else
Dictionary.i_to_s(int)
end
end

# Converts a word from string form to integer form, including the #.
def self.s_to_i(string)
if string == '#'
return -1
else
Dictionary.s_to_i(string)
end
end

# Constrains the specified variables to form a word contained in the
# dictionary.
def must_form_word(letter_vars)
raise 'The word is too long.' if letter_vars.size > MAX_WORD_LENGTH
# Create a variable for the word with the dictionary's words as
# domain and add the constraint.
word = int_var @dictionary.words_of_size(letter_vars.size)
letter_vars.to_number(BASE).must == word
@words << word
end
end

puts 'Reading the dictionary...'
dictionary = Dictionary.new(ARGV.shift || '/usr/share/dict/words')
puts 'Please enter the template (end with ^D)'
template = ''
loop do
line = $stdin.gets
break if line.nil?
template << line
end
puts 'Building the model...'
model = Crossword.new(template, dictionary)
puts 'Searching for a solution...'
puts((model.solve! || 'Failed').to_s)

__END__
 
E

Eugene Kalenkovich

Ruby Quiz said:
Write a Ruby crossword solver. Randomly fill a crossword template with
words
from a dictionary. Use any dictionary you want (/usr/share/dict/words,
text of
pickaxe, etc.).

My solution, slightly optimized for performance version of the code that was
used originally to generate examples for this quiz:

class CrossTempl
def initialize(templ)
@tmpl=templ.collect { |line| line.split(//) }
@words=[]
@tmpl.each_index do |i|
@tmpl.each_with_index do |char, j|
if char!='#'
if (j==0||@tmpl[j-1]=='#') && !@tmpl[j+1].nil? &&
@tmpl[j+1]!='#'
@words << [i,j,0] if self.template(i,j,0).include?('_')
end
if (i==0||@tmpl[i-1][j]=='#') && !@tmpl[i+1].nil? &&
@tmpl[i+1][j]!='#'
@words << [i,j,1] if self.template(i,j,1).include?('_')
end
end
end
end
@words.sort! {|a,b| a[0]+a[1]<=>b[0]+b[1]}
end

def template(i,j,dir)
str=''
chr=@tmpl[j]
while !chr.nil? && chr!='#'
str<<chr
i+=dir
j+=1-dir
break if @tmpl.nil?
chr=@tmpl[j]
end
str
end

def [](idx)
ar=@words[idx]
return nil if ar.nil?
template(ar[0],ar[1],ar[2])
end

def []=(idx,word)
ar=@words[idx]
i=ar[0]; j=ar[1];
word.split(//).each do |chr|
@tmpl[j]=chr
i+=ar[2]
j+=1-ar[2]
end
end

def to_s
@tmpl.each do |line|
lstr=line.to_s
puts lstr.gsub(/#/,' ').gsub(/(.)/) {"#{$&} "}
end
end

end

$words=[]
File.open("/usr/share/dict/words") {|f| f.readlines}.each do |word|
w=word.upcase.delete("^A-Z")
l=w.length
$words[l]||=[]
$words[l]<<w
end
$words.each {|ar| ar.uniq! if ar }

tfile=ARGV[0]
if tfile.nil?
STDERR.puts "please provide a template file"
exit 1
end

tmpl=File.read(tfile).split(/\n/).collect{|line| line.gsub(/\s/,'').upcase}
$ct=CrossTempl.new(tmpl)

def findWord(i)
pattern=$ct
return true if pattern.nil?
choices=$words[pattern.length].grep(/\A#{pattern.tr("_", ".")}\Z/).sort_by
{ rand }
until choices.empty?
guess=choices.pop
$ct=guess
puts $ct
return true if findWord(i+1)
break if i>0 && rand(2)==1 # fighting incorrect word sorting
# by un-ballancing search
end
$ct=pattern
return false
end

if findWord(0)
puts $ct
else
puts "O-ops"
end
 
A

Andreas Launila

Andreas said:
This solution requires Gecode/R ( http://gecoder.rubyforge.org/ ).

I thought I would update this solution to use the recently introduced
tuple constraints. The updated version no longer has a limitation on
word-lengths and seems to scale well with both the template size and
dictionary size.

The model is almost the same as before. Each square is represented by an
integer variable that can be assigned a number representing one of the
letters a..z and #. The difference is that a different type of
constraint is used to force sequences of cells, longer than one letter,
to form words. The model now uses tuple constraints, which each
constrain an enumeration of integer variables (e.g. letters that should
form words) to take the value of one of many tuples (e.g. all tuples
representing words of the correct length).

The code requires Gecode/R 0.8.1 to run, which can be installed, along
with the Gecode 2.1.1 dependency, using

gem install gecoder-with-gecode


== Code

require 'enumerator'
require 'rubygems'
require 'gecoder'

# The alphabet used.
ALPHABET = ('a'..'z').to_a

# Describes an immutable dictionary which represents all contained words
# as an array of integers where each integer represents a letter. Each
# integer used is between 0 and ALPHABET.size - 1.
class Dictionary
# Creates a dictionary from the contents of the specified dictionary
# file which is assumed to contain one word per line and be sorted.
def initialize(dictionary_location)
@word_arrays = []
File.open(dictionary_location) do |dict|
previous_word = nil
# A regexp that matches strings not in our alphabet.
not_in_alphabet = /[^#{ALPHABET.join}]/
dict.each_line do |line|
word = line.chomp.downcase
# Only allow words that are in our alphabet.
next if previous_word == word or not_in_alphabet.match(word)
(@word_arrays[word.length] ||= []) <<
self.class.s_to_i_array(word)
previous_word = word
end
end
end

# Gets an enumeration containing all arrays of integers representing
# word of the specified length.
def words_of_size(n)
@word_arrays[n] || []
end

# Converts a string to an array of integers (inverse
# of #i_array_to_s ).
def self.s_to_i_array(string)
if @c_to_i_map.nil?
@c_to_i_map =
Hash[*ALPHABET.zip((0...ALPHABET.size).to_a).flatten]
end
@c_to_i_map.values_at *string.scan(/./)
end

# Converts an array of integers back to the corresponding string
# (inverse of #s_to_i_array ).
def self.i_array_to_s(int_array)
if @i_to_c_map.nil?
@i_to_c_map = ALPHABET
end
@i_to_c_map.values_at *int_array
end
end

# Models the solution to a partially completed crossword.
class Crossword < Gecode::Model
# The template should take the format described in RubyQuiz #132 . The
# words used are selected from the specified dictionary.
def initialize(template, dictionary)
@dictionary = dictionary

# Break down the template and create a corresponding square matrix.
# We let each square be represented by an integer variable with
# domain -1...BASE where -1 signify # and the rest signify letters.
squares = template.split(/\n\s*\n/).map!{ |line| line.split(' ') }
@letters =
int_var_matrix(squares.size, squares.first.size,
-1...ALPHABET.size)

# Do an initial pass, filling in the prefilled squares.
squares.each_with_index do |row, i|
row.each_with_index do |letter, j|
unless letter == '_'
# Prefilled letter.
@letters[i,j].must == self.class.s_to_i_array(letter).first
end
end
end

# Add the constraint that sequences longer than one letter must form
# words.
# Left to right pass.
left_to_right_pass(squares, @letters)
# Top to bottom pass.
left_to_right_pass(squares.transpose, @letters.transpose)

# Branch on intersections first.
branch_on @letters, :variable => :largest_degree, :value => :min
end

# Displays the solved crossword in the same format as shown in the
# quiz examples.
def to_s
output = []
@letters.values.each_slice(@letters.column_size) do |row|
output << row.map{ |x| self.class.i_array_to_s([x]) }.join(' ')
end
output.join("\n\n").upcase.gsub('#', ' ')
end

private

# Parses the template from left to right, line for line, constraining
# sequences of two or more subsequent squares to form a word in the
# dictionary.
def left_to_right_pass(template, variables)
template.each_with_index do |row, i|
letters = []
row.each_with_index do |letter, j|
if letter == '#'
must_form_word(letters) if letters.size > 1
letters = []
else
letters << variables[i,j]
end
end
must_form_word(letters) if letters.size > 1
end
end

# Converts a word from integer form to string form, including the #.
def self.i_array_to_s(int_array)
if int_array == [-1]
return '#'
else
Dictionary.i_array_to_s(int_array)
end
end

# Converts a word from string form to integer form, including the #.
def self.s_to_i_array(string)
if string == '#'
return [-1]
else
Dictionary.s_to_i_array(string)
end
end

# Constrains the specified variables to form a word contained in the
# dictionary.
def must_form_word(letters)
words_of_right_size = @dictionary.words_of_size(letters.size)
if words_of_right_size.empty?
raise RuntimeError, "There are no words of size #{letters.size}."
end
# Add the tuple constraint. Use propagation kind :memory if you run
# out of memory.
wrap_enum(letters).must_be.in words_of_right_size, :kind => :speed
end
end

puts 'Reading the dictionary'
dictionary = Dictionary.new(ARGV.shift || '/usr/share/dict/words')
puts 'Enter the template (end with ^D)'
template = ''
loop do
line = $stdin.gets
break if line.nil?
template << line
end
puts 'Building the model'
model = Crossword.new(template, dictionary)
puts 'Searching for a solution'
puts((model.solve! || 'Failed').to_s)

__END__
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads

Blue J Ciphertext Program 2
My Status, Ciphertext 2
ChatGPT will make us Job(Home)less 3
Crossword 2
How to play corresponding sound? 2
Can't solve problems! please Help 0
Crossword 14
Python code problem 2

Members online

No members online now.

Forum statistics

Threads
473,766
Messages
2,569,569
Members
45,042
Latest member
icassiem

Latest Threads

Top