[SOLUTION][QUIZ] Huffman Encoder (#123)

Discussion in 'Ruby' started by Jesse Merriman, May 13, 2007.

  1. I based my solution on the algorithm described here:

    http://www.siggraph.org/education/materials/HyperGraph/video/mpeg/mpegfaq/huffman_tutorial.html

    I use the rubytree gem for the tree, and subclassed its TreeNode into the class
    FreqNode, which holds a frequency, possibly a letter (just the leaves), and a
    side (either :left or :right). I made FreqNodes compare primarily based on
    their frequencies so I could build a SortedSet of them and repeatedly take the
    two lowest to combine. The actual tree is built in the build_tree method.

    It just takes an input string, builds the tree, and outputs the compressed
    data (which is slightly different from the example given in the original quiz
    message because we treat ties differently. I'm pretty sure this doesn't
    matter though). No extra credit, though it shouldn't be too hard to add on some
    tree serialization.

    irb> tree = build_tree('ABC')
    irb> tree.printTree
    Node ID: 16 Content: 3 Parent: ROOT Children: 2 Total Nodes: 5 Side:
    Node ID: 14 Content: 1C Parent: 16 Children: 0 Total Nodes: 1 Side: left
    Node ID: 15 Content: 2 Parent: 16 Children: 2 Total Nodes: 3 Side: right
    Node ID: 12 Content: 1A Parent: 15 Children: 0 Total Nodes: 1 Side: left
    Node ID: 13 Content: 1B Parent: 15 Children: 0 Total Nodes: 1 Side: right
    irb> encode 'ABC', tree
    => "10110"

    $ ./huffman_encode.rb ABRRKBAARAA
    Encoded:
    01011111 10010100 1100
    Encoded bytes: 3

    Original:
    ABRRKBAARAA
    Original bytes: 11

    Compressed: 73%


    #!/usr/bin/env ruby

    require 'rubygems'
    require 'tree'
    require 'set'

    class Tree::TreeNode
    # Yield once for each leaf node.
    def each_leaf
    self.each { |node| yield(node) if not node.hasChildren? }
    end

    # Both assume exactly 2 children
    def left_child; children.first; end
    def right_child; children.last; end
    end

    # FreqNodes are TreeNodes that:
    # 1) Have generated, numeric names to ensure they're all unique.
    # 2) Compare based primarily on their contents.
    # 3) Have as contents an array containing a frequency, and possibly a letter
    # (for the leaves).
    # 4) Have a @side attribute that should be set to either :left or :right for
    # non-root nodes.
    # 5) Have a few other nicities, like letter().
    class FreqNode < Tree::TreeNode
    attr_accessor :side

    def initialize(content = nil)
    super(FreqNode.next_name, content)
    end

    def <=>(node)
    content_diff = @content <=> node.content
    return content_diff if not content_diff.zero?
    super(node)
    end

    # Assumes this is a leaf.
    def letter; @content[1]; end

    def to_s; super() + " Side: #{@side}"; end

    # Generate unique names because we can't add two nodes with the same name as
    # children of the same parent node.
    @next_name_num = -1
    def FreqNode.next_name
    (@next_name_num += 1).to_s
    end
    end

    class Set
    def shift
    elem = self.min
    self.delete elem
    elem
    end
    end

    # Build a SortedSet containing pairs of [freq, letter] for the given string.
    def build_freqs str
    # Build hash of byte => count
    counts = Hash.new 0
    str.each_byte { |byte| counts[byte.chr] += 1 }

    # Build SortedSet of [freq, byte] pairs (lower freqs first).
    freqs = SortedSet[]
    counts.each { |bc| freqs << bc.reverse }
    freqs
    end

    # Build the tree for the given input string, and return the root node.
    def build_tree str
    nodes = build_freqs(str).map! { |pair| FreqNode.new(pair) }

    while nodes.size > 1
    child1, child2 = nodes.shift, nodes.shift
    parent = FreqNode.new([child1.content.first + child2.content.first])
    child1.side, child2.side = :left, :right
    parent << child1
    parent << child2
    nodes << parent
    end

    nodes.min
    end

    # Encode the given letter using the given tree of FreqNodes.
    def encode_letter letter, tree
    enc = ''

    # Find leaf with the right byte value
    node = nil
    tree.each_leaf do |leaf|
    (node = leaf; break) if leaf.letter == letter
    end

    while not node.isRoot?
    node.side == :left ? enc << '0' : enc << '1'
    node = node.parent
    end

    enc.reverse
    end

    # Build a tree for the given string, and encode it using that tree.
    def encode str, tree = build_tree(str)
    #tree = build_tree(str)
    encoded = ''
    str.each_byte { |byte| encoded << encode_letter(byte.chr, tree) }
    encoded
    end

    # Decode the given string (which should consist of '0's and '1's) using the
    # given tree.
    def decode str, tree
    dec = ''
    i = 0
    node = tree

    str.each_byte do |byte|
    node = (byte.chr == '0' ? node.left_child : node.right_child)
    if not node.hasChildren?
    dec << node.byte_value.chr
    node = tree
    end
    end

    dec
    end

    class String
    # Split up the string by inserting sep between len-length chunks.
    def split_up! len, sep
    (((length / len.to_f).ceil - 1) * len).step(len, -len) do |i|
    insert i, sep
    end
    self
    end
    end

    # Output a binary string, splitting it up for easier reading.
    def puts_binary str
    str = str.dup
    str.split_up! 40, "\n"
    str.each { |line| puts line.split_up!(8, ' ') }
    end

    if __FILE__ == $0
    input = ARGV.join ' '
    input_bytes = input.length
    enc = encode(input)
    enc_bytes = (enc.length / 8.0).ceil
    compressed = ((1 - enc_bytes.to_f/input_bytes) * 100).round

    puts 'Encoded:'
    puts_binary(enc)
    puts "Encoded bytes: #{enc_bytes}\n\n"

    puts "Original:\n#{input}"
    puts "Original bytes: #{input_bytes}\n\n"

    puts "Compressed: #{compressed}%"
    end


    --
    Jesse Merriman

    http://www.jessemerriman.com/
     
    Jesse Merriman, May 13, 2007
    #1
    1. Advertising

  2. Jesse Merriman

    Charles Lowe Guest

    Re: [QUIZ] Huffman Encoder (#123)

    Very simplistic solution. recursively partitions sorted frequencies to
    build the huffman tree:

    % ./huffman.rb I want this message to get encoded!
    Encoded:
    11001000 11111000 01010010 00001011 01111100 01100011 10100101 10111000
    01001001 00001010 11000100 10010100 00001101 01101010 11100010 01100011
    1000

    35 => 17 (51%)


    Code is rather straight forward: http://pastie.caboo.se/61225

    --
    Posted via http://www.ruby-forum.com/.
     
    Charles Lowe, May 13, 2007
    #2
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. huffman encoder

    , Dec 21, 2005, in forum: C Programming
    Replies:
    16
    Views:
    818
    Chuck F.
    Dec 23, 2005
  2. David Tran
    Replies:
    9
    Views:
    208
    David Tran
    Jan 21, 2005
  3. Ruby Quiz

    [QUIZ] Huffman Encoder (#123)

    Ruby Quiz, May 11, 2007, in forum: Ruby
    Replies:
    11
    Views:
    319
    Drew Olson
    May 16, 2007
  4. Kyle Schmitt

    Late huffman (quiz 123)

    Kyle Schmitt, May 16, 2007, in forum: Ruby
    Replies:
    0
    Views:
    92
    Kyle Schmitt
    May 16, 2007
  5. Ruby Quiz

    [SUMMARY] Huffman Encoder (#123)

    Ruby Quiz, May 17, 2007, in forum: Ruby
    Replies:
    0
    Views:
    144
    Ruby Quiz
    May 17, 2007
Loading...

Share This Page