# [SUMMARY] Cryptograms (#13)

Discussion in 'Ruby' started by Ruby Quiz, Jan 6, 2005.

1. ### Ruby QuizGuest

by Glenn Parker

Solving a cryptogram by brute force is prohibitively expensive. The maximum
number of possible solutions is 26!, or roughly 4*10^26, so the first challenge
is to pare down the search to something manageable.

Both my solution and Michael's begin with the insight that any word, either a
regular dictionary word or a cryptographic token, can be viewed as a pattern of
repeated and non-repeated characters. For example, "banana" has the pattern [1
2 3 2 3 2], where the first letter is used exactly once and the second letter
is used three times and the third letter is used twice. These patterns group
all known words into families. The word, banana, belongs to the same family as
the word, rococo.

All words in a dictionary can be grouped into families according to their
patterns, and each cryptographic token has its own pattern that corresponds
(with any luck), to one of the families from the dictionary. If a token has no
matching family, then it cannot be solved with the given dictionary, so we
won't worry about that case too much.

We start by assuming that one of the cryptographic tokens corresponds to one of
the words in its family. This pairing produces a partial map of input to
output characters. So, if we examine the token, xyzyzy, we might assume that
it is really the word, banana. The partial map that results is x->b y->a z->n,
or

abcdefghijklmnopqrstuvwxyz
.......................ban

Note that this mapping will affect all other cryptographic tokens that share
the letters x, y, and z. In fact, it may even solve some of them completely,
e.g. "zyx" becomes "nab". Or, the map may convert another token into a word
that is not in the dictionary, e.g. "zyxxyz" becomes "nabban", which is not in
my dictionary. This is a useful trick that will reduce the size of the search.

Next we assume that another token can be mapped into a dictionary word from its
family, which produces another partial map that must be combined with the first
map. There are two ways this combination can fail. First, the new map may
have a previously mapped input letter going to a different output letter, so if
we mapped "uvwxyz" to "monkey", the result would be a map where "x" mapped to
both "b" and "k". Second, the new map may have a previously unused input
letter going to an output letter that was already used, so if we mapped
"abcdef" to "monkey", the result would map both "c" and "z" to "n". Failed
mappings also serve to reduce the size of the search.

For my solution, I used a depth-first search, working through the tokens trying
every word in its family. The tokens are ordered according increasing family
size, so the tokens with the fewest possible solutions are examined first. At
each level of the recursion, all the words for a token are applied in sequence
to the current map. If the resulting map is valid, I recurse and the new map
is applied to the remaining unsolved tokens to see if they are already solved
or unsolvable. Solved tokens are ignored for the rest of this branch of the
search, and unsolvable tokens are shelved. Then I start working on the next
token with the new map.

The recursion terminates when a complete map is found, or the number of shelved
(unsolvable) tokens exceeds a limit, or every family word has been used for the
last token.

We are interested in maps that do not yield dictionary words for every token.
This is because cryptograms often contain non-dictionary words, and so we may
be satisfied by a partial solution even when a full solution is impossible.
Finding partial solutions is more expensive than finding only full solutions,
since the search space can be significantly larger. Aside from the trick of
shelving unsolvable words, partial solutions require us to selectively ignore
tokens that may be "spoiling" the search even though they produce valid maps.
Neither Michael's solution nor my own fully implements this as far as I can
tell.

Michael represents each map as a set of character pairs. Every possible
partial map for each individual token is calculated and forms a set of maps for
that token. Tokens are ordered by the increasing size of their set of maps, on
the assumption that smaller sets are more constraining and thus simplify the
search.

Some clever set operations are used to progressively combine each token's set
of maps into a master set of maps. When mixing in a token's set of maps
produces an empty set of maps, that token is ignored. After all the sets have
been combined, you are left with a set of maps that solve some or all of the
cryptogram.

The weakness in both mine and Michael's approach is that tokens are always
mixed in to the solution using a single pre-defined order. But the tokens that
are mixed in first can have an overwhelming influence on the final maps that
result. In the worst case, the first token to be mapped can make it impossible
to add any other tokens to the map.

The only solution I know is to add another wrapper around the entire search
process that mutates the order of the token mixing. I've implemented this in
Python, but not in the Ruby version, and I think Michael's solution might be a
better place to start from when persuing this.

Ruby Quiz, Jan 6, 2005

2. ### Cedric FollGuest

Hi,

my solution!

Two minuts later .

I've just finished it (started yesterday evening)

The general idea was to construct a dictionnary structure which
containes all the dictionary.
It's a kind of true with a value '@final' with false or true depends if
the node is the end of a word.

The I have a method (exist) which is able to find if a word exist on it
very quickly and using '.' for unknow characters.

The example is found in few minuts on my laptop.

Regards.

#!/usr/bin/ruby

require 'pp'

class Array
def flop
self.reverse!
char = self.pop
self.reverse!
return char
end
end

class Dico
def initialize(tab)
tab = tab.split('') if tab.class == String
@char = Hash.new
@final = false
if tab.length == 0
@final = true
else
char = tab.flop
@char[char] = Dico.new(tab)
end
@tested = Hash.new
end
tab = tab.split('') if tab.class == String
if tab.length == 0
@final = true
else
char = tab.flop
if @char[char]
else
@char[char] = Dico.new(tab)
end
end

end
def has(tab)
tab = tab.split('') if tab.class == String
return @final if tab == []
char = tab.flop
if !@char[char]
return false
else
return @char[char].has(tab)
end
end
def exist(tmp,without=[])
if @tested[tmp*'']
return @tested[tmp*'']
end
tab = tmp.dup
tab = tab.split('') if tab.class == String
return @final if tab == []
char = tab.flop
if char != "."
if !@char[char]
return false
else
return @char[char].exist(tab)
end
else
for i in @char.keys
next if without.index(i)
if @char.exist(tab)
@tested[tmp*'']=true
return true
end
end
return false
end
end
end

def calc(words,letters)
return words.map {|word| word.split('').map {|l| letters[l]}}
end

def print_res(letters)
for i in 'a' .. 'z'
print i
end
puts ''
for i in 'a' .. 'z'
print letters if letters
print '.' if !letters
end
puts ''
end

\$i = 0

def test_combi(words, letters, dico)
new_words = words.map {|word| word.split('').map {|l| letters[l]}}
\$i += 1
print_res(letters) if \$i % 1000 == 0
p new_words if \$i % 1000 == 0
new_words.each {|word|
return false if not dico.exist(word,letters.values)
}
if new_words.flatten.index('.')
return true
else
print_res(letters)
p new_words
exit
end
end

mydico = Dico.new('')

i = 0
\$stdout.sync=true
while \$stdin.gets
i += 1
print '.' if i%1000 == 0
end
puts ''

words = ARGV
letters = Hash.new
for i in words.join('').split('').sort.uniq
letters='.'
end

for a in 'a' .. 'z'
next if letters.values.include?(a)
letters['a'] = a if letters['a']
next if !test_combi(words,letters,mydico)

for b in 'a' .. 'z'
next if letters.values.include?(b)
letters['b'] = b if letters['b']
next if !test_combi(words,letters,mydico)

for c in 'a' .. 'z'
next if letters.values.include?(c)
letters['c'] = c if letters['c']
next if !test_combi(words,letters,mydico)

for d in 'a' .. 'z'
next if letters.values.include?(d)
letters['d'] = d if letters['d']
next if !test_combi(words,letters,mydico)

for e in 'a' .. 'z'
next if letters.values.include?(e)
letters['e'] = e if letters['e']
next if !test_combi(words,letters,mydico)

for f in 'a' .. 'z'
next if letters.values.include?(f)
letters['f'] = f if letters['f']
next if !test_combi(words,letters,mydico)

for g in 'a' .. 'z'
next if letters.values.include?(g)
letters['g'] = g if letters['g']
next if !test_combi(words,letters,mydico)

for h in 'a' .. 'z'
next if letters.values.include?(h)
letters['h'] = h if letters['h']
next if !test_combi(words,letters,mydico)

for i in 'a' .. 'z'
next if letters.values.include?(i)
letters['i'] = i if letters['i']
next if !test_combi(words,letters,mydico)

for j in 'a' .. 'z'
next if letters.values.include?(j)
letters['j'] = j if letters['j']
next if !test_combi(words,letters,mydico)

for k in 'a' .. 'z'
next if letters.values.include?(k)
letters['k'] = k if letters['k']
next if !test_combi(words,letters,mydico)

for l in 'a' .. 'z'
next if letters.values.include?(l)
letters['l'] = l if letters['l']
next if !test_combi(words,letters,mydico)

for m in 'a' .. 'z'
next if letters.values.include?(m)
letters['m'] = m if letters['m']
next if !test_combi(words,letters,mydico)

for n in 'a' .. 'z'
next if letters.values.include?(n)
letters['n'] = n if letters['n']
next if !test_combi(words,letters,mydico)

for o in 'a' .. 'z'
next if letters.values.include?(o)
letters['o'] = o if letters['o']
next if !test_combi(words,letters,mydico)

for p in 'a' .. 'z'
next if letters.values.include?(p)
letters['p'] = p if letters['p']
next if !test_combi(words,letters,mydico)

for q in 'a' .. 'z'
next if letters.values.include?(q)
letters['q'] = q if letters['q']
next if !test_combi(words,letters,mydico)

for r in 'a' .. 'z'
next if letters.values.include?(r)
letters['r'] = r if letters['r']
next if !test_combi(words,letters,mydico)

for s in 'a' .. 'z'
next if letters.values.include?(s)
letters['s'] = s if letters['s']
next if !test_combi(words,letters,mydico)

for t in 'a' .. 'z'
next if letters.values.include?(t)
letters['t'] = t if letters['t']
next if !test_combi(words,letters,mydico)

for u in 'a' .. 'z'
next if letters.values.include?(u)
letters['u'] = u if letters['u']
next if !test_combi(words,letters,mydico)

for v in 'a' .. 'z'
next if letters.values.include?(v)
letters['v'] = v if letters['v']
next if !test_combi(words,letters,mydico)

for w in 'a' .. 'z'
next if letters.values.include?(w)
letters['w'] = w if letters['w']
next if !test_combi(words,letters,mydico)

for x in 'a' .. 'z'
next if letters.values.include?(x)
letters['x'] = x if letters['x']
next if !test_combi(words,letters,mydico)

for y in 'a' .. 'z'
next if letters.values.include?(y)
letters['y'] = y if letters['y']
next if !test_combi(words,letters,mydico)

for z in 'a' .. 'z'
next if letters.values.include?(z)
letters['z'] = z if letters['z']
next if !test_combi(words,letters,mydico)

break if !letters['z']
end
letters['z']='.' if letters['z']
break if !letters['y']
end
letters['y']='.' if letters['y']
break if !letters['x']
end
letters['x']='.' if letters['x']
break if !letters['w']
end
letters['w']='.' if letters['w']
break if !letters['v']
end
letters['v']='.' if letters['v']
break if !letters['u']
end
letters['u']='.' if letters['u']
break if !letters['t']
end
letters['t']='.' if letters['t']
break if !letters['s']
end
letters['s']='.' if letters['s']
break if !letters['r']
end
letters['r']='.' if letters['r']
break if !letters['q']
end
letters['q']='.' if letters['q']
break if !letters['p']
end
letters['p']='.' if letters['p']
break if !letters['o']
end
letters['o']='.' if letters['o']
break if !letters['n']
end
letters['n']='.' if letters['n']
break if !letters['m']
end
letters['m']='.' if letters['m']
break if !letters['l']
end
letters['l']='.' if letters['l']
break if !letters['k']
end
letters['k']='.' if letters['k']
break if !letters['j']
end
letters['j']='.' if letters['n']
break if !letters['i']
end
letters['i']='.' if letters['i']
break if !letters['h']
end
letters['h']='.' if letters['h']
break if !letters['g']
end
letters['g']='.' if letters['g']
break if !letters['f']
end
letters['f']='.' if letters['f']
break if !letters['e']
end
letters['e']='.' if letters['e']
break if !letters['d']
end
letters['d']='.' if letters['d']
break if !letters['c']
end
letters['c']='.' if letters['c']
break if !letters['b']
end
letters['b']='.' if letters['b']
break if !letters['a']
end

Cedric Foll, Jan 6, 2005

3. ### James Edward Gray IIGuest

On Jan 7, 2005, at 3:02 AM, Vance A Heron wrote:

> (can I buy a vowel?

As I told Glenn, I was insanely busy this week and couldn't quite steal
the time to try this quiz. Which is a real shame because I think it
was a great problem.

However, one of my ideas for pruning was to toss out mappings that
didn't put "reasonable" vowels in every word. I'm not sure how well
that works in practice though, since I didn't code it up. Just a
thought.

James Edward Gray II

James Edward Gray II, Jan 7, 2005
4. ### Dave BurtGuest

> Solving a cryptogram by brute force is prohibitively expensive. The
> maximum
> number of possible solutions is 26!, or roughly 4*10^26, so the first
> challenge
> is to pare down the search to something manageable.
>
> Both my solution and Michael's begin with the insight that any word,
> either a
> regular dictionary word or a cryptographic token, can be viewed as a
> pattern of
> repeated and non-repeated characters. For example, "banana" has the
> pattern [1
> 2 3 2 3 2], where the first letter is used exactly once and the second
> letter
> is used three times and the third letter is used twice. These patterns
> group
> all known words into families. The word, banana, belongs to the same
> family as
> the word, rococo.

I'm sorry I didn't have time to give this a go.

You looked at matching words. I'm wondering if you could get as good a go
with reasonable-sized ciphertexts using a letter frequencies method? (i.e.
the most common letters in an English text are E, then T, etc.) Perhaps you
could apply letter frequencies first before moving on to match common words,
even.

Cheers,
Dave

Dave Burt, Jan 9, 2005