[SUMMARY] Cryptograms (#13)

Discussion in 'Ruby' started by Ruby Quiz, Jan 6, 2005.

  1. Ruby Quiz

    Ruby Quiz Guest

    by Glenn Parker

    Solving a cryptogram by brute force is prohibitively expensive. The maximum
    number of possible solutions is 26!, or roughly 4*10^26, so the first challenge
    is to pare down the search to something manageable.

    Both my solution and Michael's begin with the insight that any word, either a
    regular dictionary word or a cryptographic token, can be viewed as a pattern of
    repeated and non-repeated characters. For example, "banana" has the pattern [1
    2 3 2 3 2], where the first letter is used exactly once and the second letter
    is used three times and the third letter is used twice. These patterns group
    all known words into families. The word, banana, belongs to the same family as
    the word, rococo.

    All words in a dictionary can be grouped into families according to their
    patterns, and each cryptographic token has its own pattern that corresponds
    (with any luck), to one of the families from the dictionary. If a token has no
    matching family, then it cannot be solved with the given dictionary, so we
    won't worry about that case too much.

    We start by assuming that one of the cryptographic tokens corresponds to one of
    the words in its family. This pairing produces a partial map of input to
    output characters. So, if we examine the token, xyzyzy, we might assume that
    it is really the word, banana. The partial map that results is x->b y->a z->n,
    or

    abcdefghijklmnopqrstuvwxyz
    .......................ban

    Note that this mapping will affect all other cryptographic tokens that share
    the letters x, y, and z. In fact, it may even solve some of them completely,
    e.g. "zyx" becomes "nab". Or, the map may convert another token into a word
    that is not in the dictionary, e.g. "zyxxyz" becomes "nabban", which is not in
    my dictionary. This is a useful trick that will reduce the size of the search.

    Next we assume that another token can be mapped into a dictionary word from its
    family, which produces another partial map that must be combined with the first
    map. There are two ways this combination can fail. First, the new map may
    have a previously mapped input letter going to a different output letter, so if
    we mapped "uvwxyz" to "monkey", the result would be a map where "x" mapped to
    both "b" and "k". Second, the new map may have a previously unused input
    letter going to an output letter that was already used, so if we mapped
    "abcdef" to "monkey", the result would map both "c" and "z" to "n". Failed
    mappings also serve to reduce the size of the search.

    For my solution, I used a depth-first search, working through the tokens trying
    every word in its family. The tokens are ordered according increasing family
    size, so the tokens with the fewest possible solutions are examined first. At
    each level of the recursion, all the words for a token are applied in sequence
    to the current map. If the resulting map is valid, I recurse and the new map
    is applied to the remaining unsolved tokens to see if they are already solved
    or unsolvable. Solved tokens are ignored for the rest of this branch of the
    search, and unsolvable tokens are shelved. Then I start working on the next
    token with the new map.

    The recursion terminates when a complete map is found, or the number of shelved
    (unsolvable) tokens exceeds a limit, or every family word has been used for the
    last token.

    We are interested in maps that do not yield dictionary words for every token.
    This is because cryptograms often contain non-dictionary words, and so we may
    be satisfied by a partial solution even when a full solution is impossible.
    Finding partial solutions is more expensive than finding only full solutions,
    since the search space can be significantly larger. Aside from the trick of
    shelving unsolvable words, partial solutions require us to selectively ignore
    tokens that may be "spoiling" the search even though they produce valid maps.
    Neither Michael's solution nor my own fully implements this as far as I can
    tell.

    Michael represents each map as a set of character pairs. Every possible
    partial map for each individual token is calculated and forms a set of maps for
    that token. Tokens are ordered by the increasing size of their set of maps, on
    the assumption that smaller sets are more constraining and thus simplify the
    search.

    Some clever set operations are used to progressively combine each token's set
    of maps into a master set of maps. When mixing in a token's set of maps
    produces an empty set of maps, that token is ignored. After all the sets have
    been combined, you are left with a set of maps that solve some or all of the
    cryptogram.

    The weakness in both mine and Michael's approach is that tokens are always
    mixed in to the solution using a single pre-defined order. But the tokens that
    are mixed in first can have an overwhelming influence on the final maps that
    result. In the worst case, the first token to be mapped can make it impossible
    to add any other tokens to the map.

    The only solution I know is to add another wrapper around the entire search
    process that mutates the order of the token mixing. I've implemented this in
    Python, but not in the Ruby version, and I think Michael's solution might be a
    better place to start from when persuing this.
     
    Ruby Quiz, Jan 6, 2005
    #1
    1. Advertising

  2. Ruby Quiz

    Cedric Foll Guest

    Hi,


    my solution!

    Two minuts later :(.

    I've just finished it (started yesterday evening)

    The general idea was to construct a dictionnary structure which
    containes all the dictionary.
    It's a kind of true with a value '@final' with false or true depends if
    the node is the end of a word.

    The I have a method (exist) which is able to find if a word exist on it
    very quickly and using '.' for unknow characters.

    The example is found in few minuts on my laptop.

    Regards.

    #!/usr/bin/ruby

    require 'pp'

    class Array
    def flop
    self.reverse!
    char = self.pop
    self.reverse!
    return char
    end
    end

    class Dico
    def initialize(tab)
    tab = tab.split('') if tab.class == String
    @char = Hash.new
    @final = false
    if tab.length == 0
    @final = true
    else
    char = tab.flop
    @char[char] = Dico.new(tab)
    end
    @tested = Hash.new
    end
    def add(tab)
    tab = tab.split('') if tab.class == String
    if tab.length == 0
    @final = true
    else
    char = tab.flop
    if @char[char]
    @char[char].add(tab)
    else
    @char[char] = Dico.new(tab)
    end
    end

    end
    def has(tab)
    tab = tab.split('') if tab.class == String
    return @final if tab == []
    char = tab.flop
    if !@char[char]
    return false
    else
    return @char[char].has(tab)
    end
    end
    def exist(tmp,without=[])
    if @tested[tmp*'']
    return @tested[tmp*'']
    end
    tab = tmp.dup
    tab = tab.split('') if tab.class == String
    return @final if tab == []
    char = tab.flop
    if char != "."
    if !@char[char]
    return false
    else
    return @char[char].exist(tab)
    end
    else
    for i in @char.keys
    next if without.index(i)
    if @char.exist(tab)
    @tested[tmp*'']=true
    return true
    end
    end
    return false
    end
    end
    end

    def calc(words,letters)
    return words.map {|word| word.split('').map {|l| letters[l]}}
    end

    def print_res(letters)
    for i in 'a' .. 'z'
    print i
    end
    puts ''
    for i in 'a' .. 'z'
    print letters if letters
    print '.' if !letters
    end
    puts ''
    end

    $i = 0

    def test_combi(words, letters, dico)
    new_words = words.map {|word| word.split('').map {|l| letters[l]}}
    $i += 1
    print_res(letters) if $i % 1000 == 0
    p new_words if $i % 1000 == 0
    new_words.each {|word|
    return false if not dico.exist(word,letters.values)
    }
    if new_words.flatten.index('.')
    return true
    else
    print_res(letters)
    p new_words
    exit
    end
    end

    mydico = Dico.new('')

    i = 0
    $stdout.sync=true
    while $stdin.gets
    mydico.add($_.chomp)
    i += 1
    print '.' if i%1000 == 0
    end
    puts ''

    words = ARGV
    letters = Hash.new
    for i in words.join('').split('').sort.uniq
    letters='.'
    end


    for a in 'a' .. 'z'
    next if letters.values.include?(a)
    letters['a'] = a if letters['a']
    next if !test_combi(words,letters,mydico)

    for b in 'a' .. 'z'
    next if letters.values.include?(b)
    letters['b'] = b if letters['b']
    next if !test_combi(words,letters,mydico)

    for c in 'a' .. 'z'
    next if letters.values.include?(c)
    letters['c'] = c if letters['c']
    next if !test_combi(words,letters,mydico)

    for d in 'a' .. 'z'
    next if letters.values.include?(d)
    letters['d'] = d if letters['d']
    next if !test_combi(words,letters,mydico)

    for e in 'a' .. 'z'
    next if letters.values.include?(e)
    letters['e'] = e if letters['e']
    next if !test_combi(words,letters,mydico)

    for f in 'a' .. 'z'
    next if letters.values.include?(f)
    letters['f'] = f if letters['f']
    next if !test_combi(words,letters,mydico)

    for g in 'a' .. 'z'
    next if letters.values.include?(g)
    letters['g'] = g if letters['g']
    next if !test_combi(words,letters,mydico)

    for h in 'a' .. 'z'
    next if letters.values.include?(h)
    letters['h'] = h if letters['h']
    next if !test_combi(words,letters,mydico)

    for i in 'a' .. 'z'
    next if letters.values.include?(i)
    letters['i'] = i if letters['i']
    next if !test_combi(words,letters,mydico)

    for j in 'a' .. 'z'
    next if letters.values.include?(j)
    letters['j'] = j if letters['j']
    next if !test_combi(words,letters,mydico)

    for k in 'a' .. 'z'
    next if letters.values.include?(k)
    letters['k'] = k if letters['k']
    next if !test_combi(words,letters,mydico)

    for l in 'a' .. 'z'
    next if letters.values.include?(l)
    letters['l'] = l if letters['l']
    next if !test_combi(words,letters,mydico)

    for m in 'a' .. 'z'
    next if letters.values.include?(m)
    letters['m'] = m if letters['m']
    next if !test_combi(words,letters,mydico)

    for n in 'a' .. 'z'
    next if letters.values.include?(n)
    letters['n'] = n if letters['n']
    next if !test_combi(words,letters,mydico)

    for o in 'a' .. 'z'
    next if letters.values.include?(o)
    letters['o'] = o if letters['o']
    next if !test_combi(words,letters,mydico)

    for p in 'a' .. 'z'
    next if letters.values.include?(p)
    letters['p'] = p if letters['p']
    next if !test_combi(words,letters,mydico)

    for q in 'a' .. 'z'
    next if letters.values.include?(q)
    letters['q'] = q if letters['q']
    next if !test_combi(words,letters,mydico)

    for r in 'a' .. 'z'
    next if letters.values.include?(r)
    letters['r'] = r if letters['r']
    next if !test_combi(words,letters,mydico)

    for s in 'a' .. 'z'
    next if letters.values.include?(s)
    letters['s'] = s if letters['s']
    next if !test_combi(words,letters,mydico)

    for t in 'a' .. 'z'
    next if letters.values.include?(t)
    letters['t'] = t if letters['t']
    next if !test_combi(words,letters,mydico)

    for u in 'a' .. 'z'
    next if letters.values.include?(u)
    letters['u'] = u if letters['u']
    next if !test_combi(words,letters,mydico)

    for v in 'a' .. 'z'
    next if letters.values.include?(v)
    letters['v'] = v if letters['v']
    next if !test_combi(words,letters,mydico)

    for w in 'a' .. 'z'
    next if letters.values.include?(w)
    letters['w'] = w if letters['w']
    next if !test_combi(words,letters,mydico)

    for x in 'a' .. 'z'
    next if letters.values.include?(x)
    letters['x'] = x if letters['x']
    next if !test_combi(words,letters,mydico)

    for y in 'a' .. 'z'
    next if letters.values.include?(y)
    letters['y'] = y if letters['y']
    next if !test_combi(words,letters,mydico)

    for z in 'a' .. 'z'
    next if letters.values.include?(z)
    letters['z'] = z if letters['z']
    next if !test_combi(words,letters,mydico)

    break if !letters['z']
    end
    letters['z']='.' if letters['z']
    break if !letters['y']
    end
    letters['y']='.' if letters['y']
    break if !letters['x']
    end
    letters['x']='.' if letters['x']
    break if !letters['w']
    end
    letters['w']='.' if letters['w']
    break if !letters['v']
    end
    letters['v']='.' if letters['v']
    break if !letters['u']
    end
    letters['u']='.' if letters['u']
    break if !letters['t']
    end
    letters['t']='.' if letters['t']
    break if !letters['s']
    end
    letters['s']='.' if letters['s']
    break if !letters['r']
    end
    letters['r']='.' if letters['r']
    break if !letters['q']
    end
    letters['q']='.' if letters['q']
    break if !letters['p']
    end
    letters['p']='.' if letters['p']
    break if !letters['o']
    end
    letters['o']='.' if letters['o']
    break if !letters['n']
    end
    letters['n']='.' if letters['n']
    break if !letters['m']
    end
    letters['m']='.' if letters['m']
    break if !letters['l']
    end
    letters['l']='.' if letters['l']
    break if !letters['k']
    end
    letters['k']='.' if letters['k']
    break if !letters['j']
    end
    letters['j']='.' if letters['n']
    break if !letters['i']
    end
    letters['i']='.' if letters['i']
    break if !letters['h']
    end
    letters['h']='.' if letters['h']
    break if !letters['g']
    end
    letters['g']='.' if letters['g']
    break if !letters['f']
    end
    letters['f']='.' if letters['f']
    break if !letters['e']
    end
    letters['e']='.' if letters['e']
    break if !letters['d']
    end
    letters['d']='.' if letters['d']
    break if !letters['c']
    end
    letters['c']='.' if letters['c']
    break if !letters['b']
    end
    letters['b']='.' if letters['b']
    break if !letters['a']
    end
     
    Cedric Foll, Jan 6, 2005
    #2
    1. Advertising

  3. On Jan 7, 2005, at 3:02 AM, Vance A Heron wrote:

    > (can I buy a vowel? :)


    As I told Glenn, I was insanely busy this week and couldn't quite steal
    the time to try this quiz. Which is a real shame because I think it
    was a great problem.

    However, one of my ideas for pruning was to toss out mappings that
    didn't put "reasonable" vowels in every word. I'm not sure how well
    that works in practice though, since I didn't code it up. Just a
    thought.

    James Edward Gray II
     
    James Edward Gray II, Jan 7, 2005
    #3
  4. Ruby Quiz

    Dave Burt Guest

    > Solving a cryptogram by brute force is prohibitively expensive. The
    > maximum
    > number of possible solutions is 26!, or roughly 4*10^26, so the first
    > challenge
    > is to pare down the search to something manageable.
    >
    > Both my solution and Michael's begin with the insight that any word,
    > either a
    > regular dictionary word or a cryptographic token, can be viewed as a
    > pattern of
    > repeated and non-repeated characters. For example, "banana" has the
    > pattern [1
    > 2 3 2 3 2], where the first letter is used exactly once and the second
    > letter
    > is used three times and the third letter is used twice. These patterns
    > group
    > all known words into families. The word, banana, belongs to the same
    > family as
    > the word, rococo.


    I'm sorry I didn't have time to give this a go.

    You looked at matching words. I'm wondering if you could get as good a go
    with reasonable-sized ciphertexts using a letter frequencies method? (i.e.
    the most common letters in an English text are E, then T, etc.) Perhaps you
    could apply letter frequencies first before moving on to match common words,
    even.

    Cheers,
    Dave
     
    Dave Burt, Jan 9, 2005
    #4
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Adam Barr
    Replies:
    1
    Views:
    519
  2. Libs
    Replies:
    0
    Views:
    1,561
  3. zPaul

    Validation Summary

    zPaul, Jun 26, 2003, in forum: ASP .Net
    Replies:
    0
    Views:
    476
    zPaul
    Jun 26, 2003
  4. Thomas Connolly

    Validation Summary not showing error messages

    Thomas Connolly, Jul 9, 2003, in forum: ASP .Net
    Replies:
    0
    Views:
    694
    Thomas Connolly
    Jul 9, 2003
  5. Ruby Quiz

    [QUIZ] Cryptograms (#13)

    Ruby Quiz, Dec 31, 2004, in forum: Ruby
    Replies:
    8
    Views:
    190
    Glenn Parker
    Jan 5, 2005
Loading...

Share This Page