[QUIZ] Cryptograms (#13)

R

Ruby Quiz

The three rules of Ruby Quiz:

1. Please do not post any solutions or spoiler discussion for this quiz until
48 hours have passed from the time on this message.

2. Support Ruby Quiz by submitting ideas as often as you can:

http://www.grayproductions.net/ruby_quiz/

3. Enjoy!

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

by Glenn Parker

GOAL: Given a cryptogram and a dictionary of known words, find the best
possible solution(s) to the crytogram. Extra points for speed. Coding
a brute force solution is relatively trivial, but there are many
opportunities for the clever optimizer.

A cryptogram is piece of text that has been passed through a simple
cipher that maps all instances of one letter to a different letter. The
familiar rot13 encoding is a trivial example.

A solution to a cryptogram is a one-to-one mapping between two sets of
(up to) 26 letters, such that applying the map to the cryptogram yields
the greatest possible number from words in the dictionary.

Both the dictionary and the crytogram are presented as a set of words,
one per line. The script should output one or more solutions and the
full or partial mapping for each solution. An example cryptogram and
solution are provided, below.

Three unsolved cryptograms given. Each cryptogram uses a different
mapping. The cryptograms may contain a few words that are not in the
dictionary, e.g. an author's name is commonly appended to quoted text
in cryptograms. Many published cryptograms also contain punctuation in
plaintext as a clue to the solver. The cryptograms below contain no
punctuation, since it just confuses dictionary-based searches.

The dictionary I used was 2of4brif.txt, available as part of the
12Dicts package at
http://prdownloads.sourceforge.net/wordlist/12dicts-4.0.zip. Given the
size of the file (~ 1Mb), it is not included in this message, but I do
think it would be best if participants all used the same dictionary.

EXAMPLE:

gebo
tev
e
cwaack
cegn
gsatkb
ussyk

SOLUTION:

mary
had
a
little
lamb
mother
goose

in: abcdefghijklmnopqrstuvwxyz
out: trl.a.m...e..by...ohgdi.s.

Note: dots in the "out" side of the mapping indicate unused input
letters.

CRYPTOGRAMS:

-----
crypto1.txt
-----
zfsbhd
bd
lsf
xfe
ofsr
bsdxbejrbls
sbsfra
sbsf
xfe
ofsr
xfedxbejrbls
rqlujd
jvwj
fpbdls

-----
crypto2.txt
-----
mkr
ideerqruhr
nrmsrru
mkr
ozgcym
qdakm
scqi
oui
mkr
qdakm
scqi
dy
mkr
ideerqruhr
nrmsrru
mkr
zdakmudua
nja
oui
mkr
zdakmudua
goqb
msodu

-----
crypto3.txt
-----
ftyw
uwmb
yw
ilwwv
qvb
bjtvi
fupxiu
t
dqvi
tv
yj
huqtvd
mtrw
fuw
dwq
bjmqv
fupyqd
 
G

Glenn Parker

Michael said:
According to the site: "The only goal we try to follow with the quizzes
is to keep them pretty short (30 minutes to an hour to solve for the
average coder). This is obviously vague, so use you're best judgment.
We're interested in fun little diversions, not a massive time sink."

My apologies, I was not aware of the suggested time constraint. I
responded to the e-mailed request for quiz ideas with a problem that a
group of my friends had previously attacked in C, Perl, and Python. We
all solved it in our own way and had a fine time discussing our various
optimizations. Nobody spent more than a week's worth of spare time on
it, so it seemed to be within bounds for a weekly quiz. I also think
the recent tic-tac-toe problem was of comparable difficulty.

The cryptogram problem is definitely more than an hour's work, if only
because test runs can be time consuming. For me, a Ruby neophyte, the
Ruby version consumed over 10 hours of keyboard time, and that was after
I had developed a Python solution previously.

For those who only want a quickie 10 minute "golf" challenge, try
writing a cryptogram generator.
Speak for yourself! :)

I said "relatively", as in relatively trivial compared to a fast and
efficient solution. :)
Yes, there are. Thank the gods, because the brute force solution is
miserably slow on a standard home PC. I got bored watching and never did
make certain that my brute force algorithm actually works on the
cryptograms in the Quiz (I did make sure it could solve shorter grams
with much smaller dict files).

Another optional challenge is to come up with a way to measure your
progress as you chew through the very large solution space. If you get
clever about pruning, the math can get tricky.
Now , thanks to crypto3.txt, I will add to the list of ways to foil
solvers: write messages that are nearly unsolvable in the first
place. :)

Again, my apologies if that puzzle proved too difficult for some. I
assure that there is at least one solution. I created these puzzles
myself to avoid copyright issues, and it's hard to evaluate their
difficulty. I do know that lots of medium-length words in which no
letters are re-used makes it harder, and crypto3.txt is a fine example
of that.

Sometimes, looking at the problem from a more practical standpoint, a
complete computer-generated solution is overkill. If you scan through
your algorithm's partial results, you may get enough insight to finish
the solution on your own. This is especially true when your algorithm
is stumbling on a proper noun, e.g. an author's name, that is not in
your dictionary.
 
F

Florian Gross

Glenn said:
For those who only want a quickie 10 minute "golf" challenge, try
writing a cryptogram generator.

Would this do the job? (42 characters)

l="a".."z";$><<[*l]<<$/<<l.sort_by{rand}<<$/
 
G

Glenn Parker

Florian said:
Glenn said:
For those who only want a quickie 10 minute "golf" challenge, try
writing a cryptogram generator.


Would this do the job? (42 characters)

l="a".."z";$><<[*l]<<$/<<l.sort_by{rand}<<$/

That produces a cryptogram mapping, but a cryptogram consists of
ciphertext that has been created using such a mapping, so I think you
have to also read a plaintext and display the resulting ciphertext.
 
F

Florian Gross

Glenn said:
Would this do the job? (42 characters)

l="a".."z";$><<[*l]<<$/<<l.sort_by{rand}<<$/

That produces a cryptogram mapping, but a cryptogram consists of
ciphertext that has been created using such a mapping, so I think you
have to also read a plaintext and display the resulting ciphertext.

Like this? (75 chars)
 
G

Glenn Parker

Florian said:
Like this? (75 chars)

Getting warmer, but look carefully at the map display. This is the
"forwards" map that produced the cryptogram, but what you really want to
show is the "reverse" map that solves the cryptogram.

m=*"a".."z";n=m.sort{rand};$><<$<.read.tr(m.to_s,n.to_s);warn
m.to_s+$/+m.sort_by{|i|n[i[0]-?a]}.to_s
 
F

Florian Gross

Glenn said:
Getting warmer, but look carefully at the map display. This is the
"forwards" map that produced the cryptogram, but what you really want to
show is the "reverse" map that solves the cryptogram.

Would not be fixed by changing m+$/+n to n+$/+m?
 
G

Glenn Parker

Florian said:
Would not be fixed by changing m+$/+n to n+$/+m?

In a way, yes, but it's much harder to actually use it if the top line
is not sorted.
 
G

Glenn Parker

--------------030403030206060304010407
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit

Here's my "solution". I must admit that it does not perform very well
on crypto3.txt, although based on copious diagnostic output I think it
would eventually come up with the correct solution after a few days.

The comments in my code may help illuminate some of my thinking, which
was very similar to Michael C. Libby's, although the implementation
details are quite different.

I'll provide a summary of all (two, so far) solutions and a discussion
of my approach on Thursday.

--
Glenn Parker | glenn.parker-AT-comcast.net | <http://www.tetrafoil.com/>

--------------030403030206060304010407
Content-Type: text/plain;
name="csolve.rb"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline;
filename="csolve.rb"

#!ruby

STDOUT.sync = true

# A utility class to read files containing words one-per-line
class WordReader
include Enumerable
def initialize(filename)
@filename = filename
end
def each
File.open(@filename) do |file|
file.each do |word|
word.chomp!
next if word.length == 0
yield word
end
end
end
end

# A copy of the dictionary, with words grouped by "signature".
# A signature simplifies a word to its repeating letter patterns.
# The signature for "cat" is 1.2.3 because each successive letter
# in cat is unique. The signature for "banana" is 1.2.3.2.3.2,
# where letter 2, "a", is repeated three times and letter 3, "n"
# is repeated twice.
class Dictionary
def initialize(filename)
@all = {}
@sigs = {}
@max_sig = nil
@max_sig_len = nil

WordReader.new(filename).each { |word|
word.downcase!
word.gsub!(/[^a-z]/, '')
next if word.empty?
@all[word] = true
sig = signature(word)
@sigs[sig] ||= []
@sigs[sig].push(word)
}
self.freeze
end

def lookup(word)
@all[word]
end

def candidates(cipher)
@sigs[signature(cipher)]
end

private

def signature(word)
seen = {}
u = 0
sig = []
word.each_byte do |b|
if not seen
u += 1
seen = u
end
sig.push(seen)
end
sig.join('.')
end
end

# CMap maintains the mapping from ciphertext to plaintext and the
# some state related to the solution. @map is the actual mapping.
# @solved is just a string with all the solved words. @shelved
# is an array of ciphertext words that cannot be solved because
# the current mapping resolves all their letters and the result
# is not found in the dictionary.
class CMap
attr_reader :map, :solved, :shelved

def initialize(arg = nil, newmap = nil, dword = nil)
case
when arg.kind_of?(String)
@map = arg.dup
when arg.kind_of?(CMap)
@map = newmap || arg.map.dup
@solved = arg.solved.dup
@shelved = arg.shelved.dup
append_solved(dword) if dword
else
@map = '.' * 26
@solved = ''
@shelved = []
end
end

def dup
CMap.new(self)
end

# Attempt to update the map to include all letter combinations
# needed to map cword into dword. Return nil if a conflict is found.
def learn(cword, dword)
newmap = @map.dup
(0...cword.length).each do |i|
c = cword - ?a
p = newmap[c]
# check for correct mapping
next if p == dword
# check for incorrect mapping
return nil if (p != ?.) || newmap.include?(dword)
# create new mapping
newmap[c] = dword
end
CMap.new(self, newmap, dword)
end

def append_solved(dword)
@solved += ' ' unless @solved.empty?
@solved += dword
end

def shelve(cword)
@shelved << cword
end

def convert(cword)
pattern = ''
cword.each_byte do |c|
pattern << @map[c - ?a]
end
pattern
end
end

# Pruner keeps track of maps that have already been tried.
# Some maps are subsets of previously tried maps, allowing us to skip them.
# This may consume more RAM than it is worth.
class Pruner
def initialize(list)
@pruned = {}
letter = 'A'
@abbrev = {}
list.each {|w| @abbrev[w] = letter.clone; letter.succ!}
end

def genkey(list)
list.collect{|w|@abbrev[w]}.sort.join('.')
end

def test(list, cmap)
key = genkey(list)
return false unless prunelist = @pruned[key]
prunelist.each do |prunemap|
if /#{prunemap}/ =~ cmap.map
return true
end
end
return false
end

def store(list, cmap)
key = genkey(list)
#puts "Pruner.store #{cmap.map} #{key}"
@pruned[key] ||= []
@pruned[key] = @pruned[key].reject do |prunemap|
/#{cmap.map}/ =~ prunemap
end << cmap.map
end
end

class Cryptogram
def initialize(filename, dict)
@dict = dict
@words = WordReader.new(filename).to_a
# clist is the input cipher with no duplicated words
# and no unrecognized input characters
@clist = []
@words.each do |word|
word.downcase!
word.gsub!(/[^a-z]/, '')
next if word.empty? || @clist.include?(word)
@clist.push(word)
end
# Sort by increasing size of candidate list
@clist = @clist.sort_by {|w| @dict.candidates(w).length}
end

def solve(max_unsolved = 0, stop_on_first = true)
@max_unsolved = max_unsolved
@stop_on_first = stop_on_first
@checks = 0
@solutions = {}
@partials = {}
# @pruner = Pruner.new(@clist)
solve_p(@clist, CMap.new, 0)
end

def solve_p(list, cmap, depth)
# Simplify list if possible
list = prescreen(list, cmap)
return if check_solution(list, cmap)
# return if @pruner.test(list, cmap)
solve_r(list, cmap, depth)
return if done?
# @pruner.store(list, cmap)
end#solve_p

def solve_r(start_list, start_cmap, depth)
for i in (0...start_list.length)
# Pull a cword out of start_list
list = start_list.dup
cword = list.delete_at(i)

pattern = start_cmap.convert(cword)
search(cword, pattern) do |dword|
# Try to make a new cmap by learning dword for cword
next unless cmap = start_cmap.learn(cword, dword)
# Recurse on remaining words
solve_p(list, cmap, depth + 1)
return if done?
end#search
end#for
end#solve_r

def done?
@stop_on_first && @solutions.length > 0
end

# Return the subset of cwords in list that are not fully solved by cmap.
# Update cmap with learned and shelved words.
def prescreen(list, cmap)
start_list = []
list.each do |cword|
pattern = cmap.convert(cword)
if pattern.include?(?.)
# cword was not fully resolved.
start_list << cword
elsif @dict.lookup(pattern)
# cword was resolved and is a known word.
cmap.learn(cword, pattern)
else
# cword cannot be solved.
cmap.shelve(cword)
end
end
start_list
end

# Generate dictionary words matching the pattern
def search(cword, pattern)
# the pattern will normally have at least one unknown character
if pattern.include? ?.
re = Regexp.new("^#{pattern}$")
@dict.candidates(cword).each do |dword|
yield dword if re =~ dword
end
# otherwise, just check that the pattern is actually a known word.
elsif @dict.lookup(pattern)
yield pattern
end
end

def check_solution(list, cmap)
@checks += 1
unsolved = list.length + cmap.shelved.length

# Did we get lucky?
if unsolved == 0
if not @solutions.has_key?(cmap.map)
@solutions[cmap.map] = true
if not @stop_on_first
puts "\nfound complete solution \##{@solutions.length}"
puts "performed #{@checks} checks"
show_cmap(cmap)
end
end
return true
end

# Give up if too many words cannot be solved
if cmap.shelved.length > @max_unsolved
return true
end

# Check for satisfactory partial solution
if unsolved <= @max_unsolved
if not @partials.has_key?(cmap.map)
@partials[cmap.map] = true
puts "\nfound partial \##{@partials.length} with #{unsolved} unsolved"
puts "performed #{@checks} checks"
puts Time.now
show_cmap(cmap)
end
end
return false
end

def show
puts "Performed #{@checks} checks"
puts "Found #{@solutions.length} solutions"
@solutions.each_key do |sol|
show_cmap(CMap.new(sol))
end
puts
puts "Found #{@partials.length} partial solutions"
@partials.each_key do |sol|
show_cmap(CMap.new(sol))
end
end

def show_cmap(cmap)
puts(('a'..'z').to_a.join(''))
puts cmap.map
puts
@words.each do |word|
pattern = cmap.convert(word)
printf "%-20s %s %-20s\n",
word, (@dict.lookup(pattern) ? ' ' : '*'), pattern
end
puts '-' * 42
end
end

DICTFILE = ARGV[0]
PARTIAL = ARGV[1].to_i

puts "Reading dictionary #{DICTFILE}"
dict = Dictionary.new(DICTFILE)

ARGV[2..-1].each do |filename|
puts "Solving cryptogram #{filename} allowing #{PARTIAL} unknowns", Time.now
cryp = Cryptogram.new(filename, dict)
cryp.solve PARTIAL
puts "Cryptogram solution", Time.now
cryp.show
end

--------------030403030206060304010407--
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top