[SUMMARY] Crossword Solver (#132)

Discussion in 'Ruby' started by Ruby Quiz, Aug 2, 2007.

  1. Ruby Quiz

    Ruby Quiz Guest

    Many of these solutions got pretty sluggish when run on large crosswords. The
    reason is simple: the search space for this problem is quite large. You have
    to try a significant number of words in each position before you will find a
    good fit for the board.

    The good news is that this summer Google has paid to bring a very powerful
    search tool to Ruby. I've been lucky enough to have a ring-side seat for this
    process and now I want to show you a little about this new tool.

    You've probably seen Andreas Launila solving a fair number of the recent quizzes
    using his Gecode/R library. Gecode/R is a wrapper over the C Gecode library for
    constraint programming. Constraint programming is a technique for describing
    problems in a way that a tool like Gecode can then use to search the solution
    space and find answers for you. Gecode is a very smart little searcher though
    and will heavily prune the search space based on your constraints. This leads
    to some mighty quick results, as you can see:

    $ time ruby solve_crossword.rb /path/to/scowl-6/final/english-words.20
    < /path/to/test_board.txt
    Reading the dictionary...
    Please enter the template (end with ^D)
    Building the model...
    Searching for a solution...
    A B I D E

    N C A

    G H O S T

    E N E

    L A S E R

    real 0m0.430s
    user 0m0.360s
    sys 0m0.068s

    Let's have a look at Andreas's code to see how it sets things up for Gecode.
    Here's the start of the code:

    require 'enumerator'
    require 'rubygems'
    require 'gecoder'

    # The base we use when converting words to and from numbers.
    BASE = ('a'..'z').to_a.size
    # The offset of characters compared to digits in word-numbers.
    OFFSET = 'a'[0]
    # The range of integers that we allow converted words to be in. We are
    # only using the unsigned half, we could use both halves, but it would
    # complicate things without giving a larger allowed word length.
    ALLOWED_INT_RANGE = 0..Gecode::Raw::Limits::Int::INT_MAX
    # The maximum length of a word allowed.
    MAX_WORD_LENGTH = (Math.log(ALLOWED_INT_RANGE.last) /
    Math.log(BASE)).floor

    # ...

    You can see that Andreas loads Enumerator and Gecode/R here as well as setting
    up some constants. The constants relate to how this code will model words in
    the database. The plan here is to represent words as numbers made up of base 26
    digits which represent letters of the alphabet. This will allow Andreas to use
    Gecode's integer variables to model the problem. The downside is that word size
    will be limited by the maximum size of an integer and thus this solution has a
    weakness in that it can't be used to solve the larger puzzles.

    You can see this conversion from numbers to words in the Dictionary class:

    # ...

    # Describes an immutable dictionary which represents all contained words
    # as numbers of base BASE where each digit is the corresponding letter
    # itself converted to a number of base BASE.
    class Dictionary
    # Creates a dictionary from the contents of the specified dictionary
    # file which is assumed to contain one word per line and be sorted.
    def initialize(dictionary_location)
    @word_arrays = []
    File.open(dictionary_location) do |dict|
    previous_word = nil
    dict.each_line do |line|
    word = line.chomp.downcase
    # Only allow words that only contain the characters a-z and are
    # short enough.
    next if previous_word == word or word.size > MAX_WORD_LENGTH or
    word =~ /[^a-z]/
    (@word_arrays[word.length] ||= []) << self.class.s_to_i(word)
    previous_word = word
    end
    end
    end

    # Gets an enumeration containing all numbers representing word of the
    # specified length.
    def words_of_size(n)
    @word_arrays[n] || []
    end

    # Converts a string to a number of base BASE (inverse of #i_to_s ).
    def self.s_to_i(string)
    string.downcase.unpack('C*').map{ |x| x - OFFSET }.to_number(BASE)
    end

    # Converts a number of base BASE back to the corresponding string
    # (inverse of #s_to_i ).
    def self.i_to_s(int)
    res = []
    loop do
    digit = int % BASE
    res << digit
    int /= BASE
    break if int.zero?
    end
    res.reverse.map{ |x| x + OFFSET }.pack('C*')
    end
    end

    # ...

    We've already talked about the number representation which is the majority of
    the code here. Do have a look at initialize() and words_of_size() though, to
    see how words are being stored. An Array is created where indices represent
    word lengths and the values at those indices are nested Arrays of words with
    that length. This makes getting a list of words that could work in a given slot
    of the puzzle easy and fast.

    The s_to_i() method above relies on a helper method added to Array, which is
    simply this:

    class Array
    # Computes a number of the specified base using the array's elements
    # as digits.
    def to_number(base = 10)
    inject{ |result, variable| variable + result * base }
    end
    end

    Again, this is just another piece of the conversion I explained earlier.

    With a Dictionary created, it's time to model the problem in Gecode constraints:

    # Models the solution to a partially completed crossword.
    class Crossword < Gecode::Model
    # The template should take the format described in RubyQuiz #132 . The
    # words used are selected from the specified dictionary.
    def initialize(template, dictionary)
    @dictionary = dictionary

    # Break down the template and create a corresponding square matrix.
    # We let each square be represented by integer variable with domain
    # -1...BASE where -1 signifies # and the rest signify letters.
    squares = template.split(/\n\s*\n/).map!{ |line| line.split(' ') }
    @letters = int_var_matrix(squares.size, squares.first.size,
    -1...BASE)

    # Do an initial pass, filling in the prefilled squares.
    squares.each_with_index do |row, i|
    row.each_with_index do |letter, j|
    unless letter == '_'
    # Prefilled letter.
    @letters[i,j].must == self.class.s_to_i(letter)
    end
    end
    end

    # Add the constraint that sequences longer than one letter must form
    # words. @words will accumulate all word variables created.
    @words = []
    # Left to right pass.
    left_to_right_pass(squares, @letters)
    # Top to bottom pass.
    left_to_right_pass(squares.transpose, @letters.transpose)

    branch_on wrap_enum(@words), :variable => :largest_degree,
    :value => :min
    end

    # Displays the solved crossword in the same format as shown in the
    # quiz examples.
    def to_s
    output = []
    @letters.values.each_slice(@letters.column_size) do |row|
    output << row.map{ |x| self.class.i_to_s(x) }.join(' ')
    end
    output.join("\n\n").upcase.gsub('#', ' ')
    end

    # ...

    After storing the dictionary, this code breaks the crossword template down into
    an integer matrix created using the Gecode/R helper method int_var_matrix().
    This will be our puzzle of words Gecode is expected to fill in.

    The next two sections of the initialize() method build up the constraints.
    These are the rules that must be satisfied when we have found a correct
    solution.

    The first of these chunks of code sets rules for any letters that were given to
    us in the template. This code tells Gecode that the number in that position of
    the matrix must equal the value of the provided letter. Take a good look at
    this RSpec like syntax because Andreas has spent a considerable effort on making
    it easy to express your constraints in a natural syntax and I hope you will
    agree with me that the end result is quite nice.

    The other chunk of constraints are defined using a helper method we will examine
    in just a moment. The result of this code though is to ensure that the numbers
    selected represent letters that form actual words.

    The final step in describing the problem to Gecode is to choose a branching
    strategy. This tells Gecode which variables it will need to make guesses about
    in order to find a solution as well as selecting a heuristic to use when guesses
    must be made. In this case, words will be selected based on how much of the
    overall puzzle they affect, hopefully ruling out large sections of the search
    space quickly.

    The problem model we just examined is pretty much always how constraint
    programming goes. You just need to remember the three steps: create some
    variables for Gecode to fill in, define the rules for the values you want Gecode
    to find, and select a strategy for Gecode to use in solving the problem.

    The to_s() method above just creates the output used in the quiz examples for
    display to the user.

    Let's have a look at the helper methods used in the model definition now:

    # ...

    private

    # Parses the template from left to right, line for line, constraining
    # sequences of two or more subsequent squares to form a word in the
    # dictionary.
    def left_to_right_pass(template, variables)
    template.each_with_index do |row, i|
    letters = []
    row.each_with_index do |letter, j|
    if letter == '#'
    must_form_word(letters) if letters.size > 1
    letters = []
    else
    letters << variables[i,j]
    end
    end
    must_form_word(letters) if letters.size > 1
    end
    end

    # Converts a word from integer form to string form, including the #.
    def self.i_to_s(int)
    if int == -1
    return '#'
    else
    Dictionary.i_to_s(int)
    end
    end

    # Converts a word from string form to integer form, including the #.
    def self.s_to_i(string)
    if string == '#'
    return -1
    else
    Dictionary.s_to_i(string)
    end
    end

    # Constrains the specified variables to form a word contained in the
    # dictionary.
    def must_form_word(letter_vars)
    raise 'The word is too long.' if letter_vars.size > MAX_WORD_LENGTH
    # Create a variable for the word with the dictionary's words as
    # domain and add the constraint.
    word = int_var @dictionary.words_of_size(letter_vars.size)
    letter_vars.to_number(BASE).must == word
    @words << word
    end
    end

    # ...

    The i_to_s() and s_to_i() methods here are mostly just wrappers over the
    Dictionary counterparts we examined earlier. The real interest is the other two
    methods that together define the word constraints.

    First, left_to_right_pass() is used to walk the puzzle looking for runs of two
    or more letters that will need to become words in the Dictionary. Each time it
    finds one, a hand-off is made to must_form_word(), which builds the actual
    constraint.

    With the problem modeled, it takes just a touch more code to turn this into a
    full solution:

    # ...

    puts 'Reading the dictionary...'
    dictionary = Dictionary.new(ARGV.shift || '/usr/share/dict/words')
    puts 'Please enter the template (end with ^D)'
    template = ''
    loop do
    line = $stdin.gets
    break if line.nil?
    template << line
    end
    puts 'Building the model...'
    model = Crossword.new(template, dictionary)
    puts 'Searching for a solution...'
    puts((model.solve! || 'Failed').to_s)

    You can see that this application code is just reading in the dictionary and
    template, then constructing a model and pulling a solution with the solve!()
    method. That's all it really takes to get answers after describing your problem
    to Gecode.

    If you want to continue exploring Andreas's Gecode/R wrapper, drop by the Web
    site which has links to many useful resources:

    http://gecoder.rubyforge.org/

    My thanks to all who put a good deal of effort into a hard search problem.

    Tomorrow we will take another peek at our dictionary, this time to see what
    numbers are hiding in there...
     
    Ruby Quiz, Aug 2, 2007
    #1
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Replies:
    0
    Views:
    376
  2. Replies:
    3
    Views:
    2,341
    Owen McShane
    Jan 21, 2005
  3. Ruby Quiz

    [SUMMARY] Sodoku Solver (#43)

    Ruby Quiz, Aug 25, 2005, in forum: Ruby
    Replies:
    3
    Views:
    153
    James Edward Gray II
    Aug 25, 2005
  4. Ruby Quiz

    [QUIZ] Crossword Solver (#132)

    Ruby Quiz, Jul 27, 2007, in forum: Ruby
    Replies:
    7
    Views:
    149
    Andreas Launila
    Apr 24, 2008
  5. Martin Rinehart

    80 columns wide? 132 columns wide?

    Martin Rinehart, Oct 31, 2008, in forum: Javascript
    Replies:
    16
    Views:
    184
    John W Kennedy
    Nov 13, 2008
Loading...

Share This Page