[SUMMARY] Posix Pangrams (#97)

R

Ruby Quiz

This problem turns out to be pretty tough. Finding the best pangram for a given
constraint can be NP-Hard, as explained by Ken Bloom:

http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-talk/219003

Given that, we're just interested in code that gets reasonably close to ideal
results in a decent amount of time. All of the submissions did that quite well,
but I want to show off Ezwan Aizat Bin Abdullah Faiz's code below. I've
Rubyified it just a bit, but the code still works the same. For reference, it
finds this pangram in under a second on my box:

String: zcat jobs newgrp mv qhold tty uux mkfifo
Panagram? true
Words? 8
Length? 33
Dupes? 7

Let's start with the easy stuff:

#!/usr/bin/ruby

# removed c99 fort77 m4
words = %w[ admin alias ar asa at awk basename batch bc bg cal cat cd
cflow chgrp chmod chown cksum cmp comm command compress cp
crontab csplit ctags cut cxref date dd delta df diff dirname
du echo ed env ex expand expr false fc fg file find fold
fuser gencat get getconf getopts grep hash head iconv id
ipcrm ipcs jobs join kill lex link ln locale localedef logger
logname lp ls mailx make man mesg mkdir mkfifo more mv newgrp
nice nl nm nohup od paste patch pathchk pax pr printf prs ps
pwd qalter qdel qhold qmove qmsg qrerun qrls qselect qsig
qstat qsub read renice rm rmdel rmdir sact sccs sed sh sleep
sort split strings strip stty tabs tail talk tee test time
touch tput tr true tsort tty type ulimit umask unalias uname
uncompress unexpand unget uniq unlink uucp uudecode uuencode
uustat uux val vi wait wc what who write xargs yacc zcat ]
words_line = words.join(" ")

# ...

This code begins by filling an Array with the Posix utilities, except for the
three that contain digits in their names. Everyone dropped these to keep things
simple. Note that the words are also joined to form one long line of utility
names.

Next we need to enhance String a bit with some capabilities we will need to do
our work:

# ...

class String
def letters(&block)
scan(/[a-z]/, &block)
end

def pangram?
letters.uniq.length == 26
end

def duplicated_letters
seen = Hash.new { |found, char| found[char] = 1; 0 }
letters.inject(0) { |sum, l| sum + seen[l] }
end
end

# ...

We will need easy access to an Array of letters (without the spaces), so a
simple wrapper is created over scan(). From there we can check pangram status
by ensuring that we have 26 unique letters. Finally, when printing statistics
it would be nice to be able to see how many letters are duplicates, so we add a
method for that too.

Now the code builds up some handy constants in preparation for the search:

# ...

OCCURRENCE = Hash.new { |counts, char| counts[char] = 0 }
words_line.letters { |char| OCCURRENCE[char] += 1 }

WORDS_LENGTH = words_line.delete(" ").length

# ...

Here we see the OCCURRENCE Hash being filled. It will be the key to the search
algorithm. The Hash is simply a count of how many times each letter is present
in all of the names. "e" appears the most at a count of 62 times and "z" is
present only once.

Another constant is filled with the overall character count.

Here's a method used to print statistics when we have our answer:

# ...

def stats(line)
puts "String: #{line}"
puts "Pangram? #{line.pangram?}"
puts "Words? #{line.split.length}"
puts "Length? #{line.delete(' ').length}"
puts "Dupes? #{line.duplicated_letters}"
end

# ...

I think that one is pretty self documenting.

We're finally ready for the primary search algorithm and here it comes:

# ...

=begin
Suitability should be determined by
* least number of letters takes out the length of the computer
* no used letters
* no duplicates
=end
def suitability(value, line)
amount, used = 0, ""

value.letters do |char|
amount += if used.include?(char) || line.include?(char)
-OCCURRENCE[char]
else
WORDS_LENGTH / OCCURRENCE[char]
end
used << char
end

amount
end

# ...

This method grades the passed value for inclusion in the line. Each letter is
scored, creating an overall suitability rating for this value.

If the letter is already in the line or the letters we have seen so far from
this value, the OCCURRENCE value for that letter is tacked onto the score as a
penalty. This ensures the duplicates, especially of common letters, are costly
so the code will try to avoid them.

If this is the first time the letter has been encountered, the score is the
OCCURRENCE for that letter divided into the total letter count. This makes it
valuable for the code to place letters like the "z" early on, since we don't
have any choices on that one and thus will also need to accept the letters it
comes with.

Here's the final piece of the puzzle, where we see suitability() put to work:

# ...

line = ""

until line.pangram?
words.sort! do |x, y|
suitability(y, line) <=> suitability(x, line)
end

line += "#{words.shift} "
end

stats line

The process is easy to follow here. First an empty line is created. Then we
loop until the line has become a pangram. At each step, the utility list is
sorted by suitability() and the best word added to the line. Just before exit,
the program dumps the statistics as the result.

My thanks to all you pangram hunters. As always you taught me clever new tricks
in optimization.

Tomorrow we'll play with the king of pathfinding algorithms...
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,764
Messages
2,569,567
Members
45,041
Latest member
RomeoFarnh

Latest Threads

Top