[SUMMARY] Huffman Encoder (#123)

Discussion in 'Ruby' started by Ruby Quiz, May 17, 2007.

  1. Ruby Quiz

    Ruby Quiz Guest

    With the extra credits, this problem is a little involved and some people did
    write a lot of code for it. Building the tree was our main interest in this
    problem though.

    The quiz didn't detail that process too much, but several submitters found
    write-ups like the one at Wikipedia. The trick is generally to use two queues.
    The first starts with all of the letters queued lowest frequency to the highest
    and the second starts empty. While there is more than one node in the combined
    queues, you dequeue the two with the lowest weights, build a new node with them
    as children and a combined weight, then enqueue that in the second queue. When
    you get down to just one node, you are finished. That single node is the root
    of the tree.

    A variation on this strategy is to use a single priority queue. When working
    this way you can always just pull the two lowest entries, since the queue will
    keep them coming in the proper order.

    Drew Olson has some pretty easy to follow code using the priority queue
    approach, so let's look into that now. First, Drew had to build a priority
    queue since one doesn't ship with Ruby:

    # priority queue for nodes
    class NodeQueue
    def initialize
    @queue = []
    end

    def enqueue node
    @queue << node
    @queue = @queue.sort_by{|x|[-x.weight,x.val.size]}
    end

    def dequeue
    @queue.pop
    end

    def size
    @queue.size
    end
    end

    This is a trivial implementation that just resorts the queue after each new
    entry. Note that the sort is on the opposite of the weights to put the lowest
    entries at the front.

    This is not ideal, of course, but likely to be reasonably quick if you are just
    encoding simple text. That's because the sort is largely in C. For a better
    priority queue, have a peek at Daniel Martin's code.

    Drew also used a trivial class to represent nodes in the tree:

    # class to hold nodes in the huffman tree
    class Node
    attr_accessor :val,:weight,:left,:right

    def initialize(val="",weight=0)
    @val,@weight = val,weight
    end

    def children?
    return @left || @right
    end
    end

    As you can see, Nodes are pretty much just a Struct for tracking value, weight,
    and children. The additional method just checks to see if this node is a
    branch, meaning that it has at least one child node.

    With those tools to build on, Drew is now ready to create a HuffmanTree:

    # HuffmanTree represents the tree with which we perform
    # the encoding
    class HuffmanTree

    # initialize the tree based on data
    def initialize data
    @freqs = build_frequencies(data)
    @root = build_tree
    end

    #encode the given data
    def encode data
    data.downcase.split(//).inject("") do |code,char|
    code + encode_char(char)
    end
    end

    def decode data
    node = @root

    if !@root.children?
    return @root.val
    end

    data.split(//).inject("") do |phrase,digit|
    if digit == "0"
    node = node.left
    else
    node = node.right
    end
    if !node.children?
    phrase += node.val
    node = @root
    end
    phrase
    end
    end

    # ...

    These three methods define the external interface for this class. First, you
    create HuffmanTree objects by passing in the data a tree should be constructed
    from. Frequencies are counted for the characters in the data and a tree is
    built from those counts.

    The encode() method takes the data you wish to apply the encoding to and returns
    a String of ones and zeros representing the data. This implementation just
    iterates over the characters, using a helper method to translate them. Note
    that Drew's implementation normalizes case which results in smaller encodings,
    but this step needs to be removed if you want lossless compression.

    The decode method is the most complex in the set, but still not too hard to
    grasp. It starts at the root node and iterates over each one and zero,
    selecting the correct child node. Each time it reaches a leaf node (no
    children), that character value is added to the translation and the search
    resets to the root node.

    Now we just need to see the helper methods used in those methods. This first
    one is the reverse of the decoder we just examined:

    # ...

    private

    # this method encodes a given character based on our
    # tree representation
    def encode_char char
    node = @root
    coding = ""

    # encode to 0 if only one character
    if !@root.children?
    return "0"
    end

    # we do a binary search, building the representation
    # of the character based on which branch we follow
    while node.val != char
    if node.right.val.include? char
    node = node.right
    coding += "1"
    else
    node = node.left
    coding += "0"
    end
    end
    coding
    end

    # ...

    Again, the search begins with the root node and advances down the tree branches.
    This time the search is for nodes containing the character and we can stop as
    soon as we reach a leaf. The encoding is the path of one and zero branches that
    lead to the character.

    These last two methods handle tree construction:

    # ...

    # get word frequencies in a given phrase
    def build_frequencies phrase
    phrase.downcase.split(//).inject(Hash.new(0)) do |hash,item|
    hash[item] += 1
    hash
    end
    end

    # build huffmantree using the priority queue method
    def build_tree
    queue = NodeQueue.new

    # build a node for each character and place in pqueue
    @freqs.keys.each do |char|
    queue.enqueue(Node.new(char,@freqs[char]))
    end

    while !queue.size.zero?

    # if only one node exists, it is the root. return it
    return queue.dequeue if queue.size == 1

    # dequeue two lightest nodes, create parent,
    # add children and enqueue newly created node
    node = Node.new
    node.right = queue.dequeue
    node.left = queue.dequeue
    node.val = node.left.val+node.right.val
    node.weight = node.left.weight+node.right.weight
    queue.enqueue node
    end
    end
    end

    The first method, build_frequencies(), is just a character counter. The counts
    are returned in a Hash keyed by the character for a given count.

    The main work is done in build_tree(). It begins by creating a priority queue
    and queuing each of the characters from the frequency count. After that, the
    while loop is a direct translation of the process I described at the beginning
    of this summary.

    The final bit of code puts the tree to work creating Drew's solution:

    # get command lines args, build tree and encode data
    if __FILE__ == $0
    require 'enumerator'

    data = ARGV.join(" ")
    tree = HuffmanTree.new data

    # get encoded data and split into bits
    code = tree.encode(data)
    encoded_bits = code.scan(/\d{1,8}/)

    # output
    puts
    puts "Original"
    puts data
    puts "#{data.size} bytes"
    puts
    puts "Encoded"
    encoded_bits.each_slice(5) do |slice|
    puts slice.join(" ")
    end
    puts "#{encoded_bits.size} bytes"
    puts
    puts "%d percent compression" %
    (100.0 - (encoded_bits.size.to_f/data.size.to_f)*100.0)
    puts
    puts "Decoded"
    puts tree.decode(code)
    end

    The first few chunks of this code just run the interface methods we have been
    examining. The last big chunk is simply the output of results using some
    straightforward printing logic.

    My thanks to all who took on this challenge. Several of you wrote library
    quality solutions. It was impressive to see.

    Tomorrow we will try some magical math, as quick as we can...
    Ruby Quiz, May 17, 2007
    #1
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Eddy Soeparmin

    Phone Format (770) 123-1234

    Eddy Soeparmin, Jul 28, 2003, in forum: ASP .Net
    Replies:
    3
    Views:
    3,189
    Swanand Mokashi
    Jul 28, 2003
  2. huffman encoder

    , Dec 21, 2005, in forum: C Programming
    Replies:
    16
    Views:
    810
    Chuck F.
    Dec 23, 2005
  3. Jesse Merriman

    [SOLUTION][QUIZ] Huffman Encoder (#123)

    Jesse Merriman, May 13, 2007, in forum: Ruby
    Replies:
    1
    Views:
    167
    Charles Lowe
    May 13, 2007
  4. Ruby Quiz

    [QUIZ] Huffman Encoder (#123)

    Ruby Quiz, May 11, 2007, in forum: Ruby
    Replies:
    11
    Views:
    313
    Drew Olson
    May 16, 2007
  5. Kyle Schmitt

    Late huffman (quiz 123)

    Kyle Schmitt, May 16, 2007, in forum: Ruby
    Replies:
    0
    Views:
    87
    Kyle Schmitt
    May 16, 2007
Loading...

Share This Page