Where are Huffman encoding applications?

Discussion in 'VHDL' started by Weng Tianxiang, Aug 1, 2006.

  1. Hi,
    I want to know how many fields Huffman encoding thoery are used.

    As I can counted,
    1. In Window operating system, similar encoding is used to conpress
    text files;
    2. Text compression in ZIP files;
    3. JPEG fram compression;
    4. MPEG format compression? which one uses it?

    Wireless communication?

    Any other applications?

    Thank you.

    Weng
     
    Weng Tianxiang, Aug 1, 2006
    #1
    1. Advertising

  2. Weng Tianxiang

    Guest

    Weng Tianxiang wrote:
    > Hi,
    > I want to know how many fields Huffman encoding thoery are used.
    >
    > As I can counted,
    > 1. In Window operating system, similar encoding is used to conpress
    > text files;
    > 2. Text compression in ZIP files;
    > 3. JPEG fram compression;
    > 4. MPEG format compression? which one uses it?
    >
    > Wireless communication?
    >
    > Any other applications?
    >
    > Thank you.
    >
    > Weng


    I am sure in JPEG and MPEG.
     
    , Aug 1, 2006
    #2
    1. Advertising

  3. Weng Tianxiang

    Jack Klein Guest

    On 1 Aug 2006 04:02:31 -0700, "Weng Tianxiang" <>
    wrote in comp.programming:

    > Hi,
    > I want to know how many fields Huffman encoding thoery are used.
    >
    > As I can counted,
    > 1. In Window operating system, similar encoding is used to conpress
    > text files;
    > 2. Text compression in ZIP files;
    > 3. JPEG fram compression;
    > 4. MPEG format compression? which one uses it?
    >
    > Wireless communication?
    >
    > Any other applications?
    >
    > Thank you.
    >
    > Weng


    FAX transmission, although the Huffman codes are predefined and not
    generated dynamically from the data.

    --
    Jack Klein
    Home: http://JK-Technology.Com
    FAQs for
    comp.lang.c http://c-faq.com/
    comp.lang.c++ http://www.parashift.com/c -faq-lite/
    alt.comp.lang.learn.c-c++
    http://www.contrib.andrew.cmu.edu/~ajo/docs/FAQ-acllc.html
     
    Jack Klein, Aug 1, 2006
    #3
  4. Hi Jack,
    Fax application cannot be counted as full Huffman encoding. The reason
    is the statistics for characters are not dynamically collected.

    Thank you.

    Weng

    Jack Klein wrote:
    > On 1 Aug 2006 04:02:31 -0700, "Weng Tianxiang" <>
    > wrote in comp.programming:
    >
    > > Hi,
    > > I want to know how many fields Huffman encoding thoery are used.
    > >
    > > As I can counted,
    > > 1. In Window operating system, similar encoding is used to conpress
    > > text files;
    > > 2. Text compression in ZIP files;
    > > 3. JPEG fram compression;
    > > 4. MPEG format compression? which one uses it?
    > >
    > > Wireless communication?
    > >
    > > Any other applications?
    > >
    > > Thank you.
    > >
    > > Weng

    >
    > FAX transmission, although the Huffman codes are predefined and not
    > generated dynamically from the data.
    >
    > --
    > Jack Klein
    > Home: http://JK-Technology.Com
    > FAQs for
    > comp.lang.c http://c-faq.com/
    > comp.lang.c++ http://www.parashift.com/c -faq-lite/
    > alt.comp.lang.learn.c-c++
    > http://www.contrib.andrew.cmu.edu/~ajo/docs/FAQ-acllc.html
     
    Weng Tianxiang, Aug 1, 2006
    #4
  5. > Fax application cannot be counted as full Huffman encoding. The reason
    > is the statistics for characters are not dynamically collected.


    Also for JPEG, etc. in most cases the predefined default Huffman-tables are
    used (Althought the file-format would support custom tables). I think the
    term "Huffman" is also widely used for such "static" applications.

    Regards,

    Thomas

    P.S.: I am wondering what the reason for your question is?

    www.entner-electronics.com
     
    Thomas Entner, Aug 2, 2006
    #5
  6. Hi Thomas,
    When Huffman frequency table is available, the decoding procedure is
    stright forward.

    I know many electronic books use Huffman encoding method to compress
    the book full context.

    Usually in the first pass of full context, Huffman frequency table is
    established , in the 2nd pass, a Hash function is used to locate a new
    word entry, then encode, based on the Huffman frequency table, and
    output it.

    I want to know how Hash function is used in the 2nd pass.

    Any books or tips?

    Thank you.

    Weng


    Thomas Entner wrote:
    > > Fax application cannot be counted as full Huffman encoding. The reason
    > > is the statistics for characters are not dynamically collected.

    >
    > Also for JPEG, etc. in most cases the predefined default Huffman-tables are
    > used (Althought the file-format would support custom tables). I think the
    > term "Huffman" is also widely used for such "static" applications.
    >
    > Regards,
    >
    > Thomas
    >
    > P.S.: I am wondering what the reason for your question is?
    >
    > www.entner-electronics.com
     
    Weng Tianxiang, Aug 2, 2006
    #6
  7. Weng Tianxiang

    Andy Ray Guest

    Weng Tianxiang wrote:
    > Hi Thomas,
    > When Huffman frequency table is available, the decoding procedure is
    > stright forward.
    >
    > I know many electronic books use Huffman encoding method to compress
    > the book full context.
    >
    > Usually in the first pass of full context, Huffman frequency table is
    > established , in the 2nd pass, a Hash function is used to locate a new
    > word entry, then encode, based on the Huffman frequency table, and
    > output it.
    >
    > I want to know how Hash function is used in the 2nd pass.
    >



    I think you are asking: how do I generate the huffman codes given the
    frequency of each symbol. If so look at:

    http://en.wikipedia.org/wiki/Huffman_coding

    in the section "basic technique".

    Cheers,

    Andy.
     
    Andy Ray, Aug 3, 2006
    #7
  8. Andy,
    No. I want to know the hash function.

    For example, I have a Huffman encoding table as follows:

    name Fre encode
    ....
    Andy 15, "011",
    Ray 20, "110",
    ....

    Now a text string "Andy Ray wrote" is incoming, I want to know how to
    match the incoming "Andy" and the table entry "Andy".

    One must establish a 1-to-1 relationship between the incoming word and
    the table entry.

    What I can imagine to do in software is to create a hash function, then
    if there are more entries responding to one entry, further string match
    must be made.

    I am wondering about whether or not a hardware design can does it.

    Thank you.

    Weng

    Andy Ray wrote:
    > Weng Tianxiang wrote:
    > > Hi Thomas,
    > > When Huffman frequency table is available, the decoding procedure is
    > > stright forward.
    > >
    > > I know many electronic books use Huffman encoding method to compress
    > > the book full context.
    > >
    > > Usually in the first pass of full context, Huffman frequency table is
    > > established , in the 2nd pass, a Hash function is used to locate a new
    > > word entry, then encode, based on the Huffman frequency table, and
    > > output it.
    > >
    > > I want to know how Hash function is used in the 2nd pass.
    > >

    >
    >
    > I think you are asking: how do I generate the huffman codes given the
    > frequency of each symbol. If so look at:
    >
    > http://en.wikipedia.org/wiki/Huffman_coding
    >
    > in the section "basic technique".
    >
    > Cheers,
    >
    > Andy.
     
    Weng Tianxiang, Aug 3, 2006
    #8
  9. Weng Tianxiang

    Logan Shaw Guest

    Weng Tianxiang wrote:
    > Andy,
    > No. I want to know the hash function.
    >
    > For example, I have a Huffman encoding table as follows:
    >
    > name Fre encode
    > ...
    > Andy 15, "011",
    > Ray 20, "110",
    > ...
    >
    > Now a text string "Andy Ray wrote" is incoming, I want to know how to
    > match the incoming "Andy" and the table entry "Andy".
    >
    > One must establish a 1-to-1 relationship between the incoming word and
    > the table entry.
    >
    > What I can imagine to do in software is to create a hash function, then
    > if there are more entries responding to one entry, further string match
    > must be made.


    A trie might be an option. I once had a conversation with someone who
    worked at a company that needed to match strings quickly in hardware
    and I believe they might've used a trie for that, although I could be
    remembering wrong since it was a few years ago.

    If you haven't encountered tries, think of them as deterministic
    finite-state automata shaped like a tree, where the root is the
    start state, and transitions (edges) from one node to the next
    correspond to symbols on your input.

    For example, if you want to recognize the strings "car", "cat",
    and "cup", your trie would look like this:


    (start)--->( )-+--->( )----->(end)
    c | u p
    |
    +--->( )-+--->(end)
    a | r
    |
    +--->(end)
    t

    Basically, you start at the start state, absorb a token and follow
    the edge associated with it, and repeat until you either don't have
    a matching edge or hit an end state.

    The nice thing about a trie is that you don't have to worry about
    collisions or worst-case behavior. The time to match is a linear
    function of the length of the string you're matching.

    A straightforward trie where every input symbol is a character (hence,
    probably 8 bits or maybe even larger) might be a little wasteful of
    space. But there are ways around that. The most obvious one is to
    say that the input is a string of bits and each bit is an input
    symbol. This makes your tree 8 times as deep as with 8-bit characters
    as input symbols, but it has the advantage that each node can have
    no more than 2 children (as compared to 256). A nice middle ground
    might be to make your input symbols pairs of bits or sequences of
    4 bits.

    - Logan
     
    Logan Shaw, Aug 3, 2006
    #9
  10. Hi Logan,
    Your method is interesting, but is far from what I am looking for.

    It is too slow. My target is 64-bit inputs on every clock and one must
    find its entry within 1-2 clocks and resolve entry conflicts if any at
    the same time.

    This design, if succeeded, can be widely used for any dynamic Huffman
    encoding.

    I think there is no such hardware design so far in the world and why I
    found that almost all Huffman encoding applications are static as in
    FAX, electronic books, ZIP files, JPEG, MPEG and so on.

    Thank you.

    Weng

    Logan Shaw wrote:
    > Weng Tianxiang wrote:
    > > Andy,
    > > No. I want to know the hash function.
    > >
    > > For example, I have a Huffman encoding table as follows:
    > >
    > > name Fre encode
    > > ...
    > > Andy 15, "011",
    > > Ray 20, "110",
    > > ...
    > >
    > > Now a text string "Andy Ray wrote" is incoming, I want to know how to
    > > match the incoming "Andy" and the table entry "Andy".
    > >
    > > One must establish a 1-to-1 relationship between the incoming word and
    > > the table entry.
    > >
    > > What I can imagine to do in software is to create a hash function, then
    > > if there are more entries responding to one entry, further string match
    > > must be made.

    >
    > A trie might be an option. I once had a conversation with someone who
    > worked at a company that needed to match strings quickly in hardware
    > and I believe they might've used a trie for that, although I could be
    > remembering wrong since it was a few years ago.
    >
    > If you haven't encountered tries, think of them as deterministic
    > finite-state automata shaped like a tree, where the root is the
    > start state, and transitions (edges) from one node to the next
    > correspond to symbols on your input.
    >
    > For example, if you want to recognize the strings "car", "cat",
    > and "cup", your trie would look like this:
    >
    >
    > (start)--->( )-+--->( )----->(end)
    > c | u p
    > |
    > +--->( )-+--->(end)
    > a | r
    > |
    > +--->(end)
    > t
    >
    > Basically, you start at the start state, absorb a token and follow
    > the edge associated with it, and repeat until you either don't have
    > a matching edge or hit an end state.
    >
    > The nice thing about a trie is that you don't have to worry about
    > collisions or worst-case behavior. The time to match is a linear
    > function of the length of the string you're matching.
    >
    > A straightforward trie where every input symbol is a character (hence,
    > probably 8 bits or maybe even larger) might be a little wasteful of
    > space. But there are ways around that. The most obvious one is to
    > say that the input is a string of bits and each bit is an input
    > symbol. This makes your tree 8 times as deep as with 8-bit characters
    > as input symbols, but it has the advantage that each node can have
    > no more than 2 children (as compared to 256). A nice middle ground
    > might be to make your input symbols pairs of bits or sequences of
    > 4 bits.
    >
    > - Logan
     
    Weng Tianxiang, Aug 3, 2006
    #10
  11. Weng Tianxiang

    Andy Ray Guest

    Weng Tianxiang wrote:
    > Andy,
    > No. I want to know the hash function.
    >
    > For example, I have a Huffman encoding table as follows:
    >
    > name Fre encode
    > ...
    > Andy 15, "011",
    > Ray 20, "110",
    > ...
    >
    > Now a text string "Andy Ray wrote" is incoming, I want to know how to
    > match the incoming "Andy" and the table entry "Andy".
    >
    > One must establish a 1-to-1 relationship between the incoming word and
    > the table entry.
    >
    > What I can imagine to do in software is to create a hash function, then
    > if there are more entries responding to one entry, further string match
    > must be made.
    >
    > I am wondering about whether or not a hardware design can does it.
    >



    Ok - you're asking about decoding, not encoding.

    I don't think there's a hash function for huffman codes, though I'd love
    to be proven wrong.

    You can, however, decode huffman codes with a binary search tree encoded
    into a rom/ram. This works because no code is a prefix of another. You
    can alter the tree to take more than one input bit at a time at the
    expense of hardware.

    A fully parallel hardware solution needs CAM (with don't cares), but
    that's really expensive to implement.

    Cheers,

    Andy.
     
    Andy Ray, Aug 3, 2006
    #11
  12. Hi Andy,
    "Ok - you're asking about decoding, not encoding."

    No, It is encoding!

    Weng
     
    Weng Tianxiang, Aug 3, 2006
    #12
  13. Weng Tianxiang wrote:
    > Your method is interesting, but is far from what I am looking for.
    > It is too slow.
    > I found that almost all Huffman encoding applications are static as in
    > FAX, electronic books, ZIP files, JPEG, MPEG and so on.


    You are wrong there. ZIP files and most EBooks use some variant of LZ
    encoding using a suffix tree (which might be feasible in hardware),
    followed by Huffman coding - static *or* dynamic - on the resultant
    symbols. Zip in particular uses the deflate algorithm, which can emit
    either static or dynamically-generated Huffman data on each block.
    But AFAIK, the Huffman table is *not* updated with each symbol. There
    was a kid on sci.crypt a while back banging on about a per-symbol
    dynamic Huffman technique he'd invented, perhaps look there.

    JPEG and MPEG use a DCT (and optionally in MPEG, motion compensation)
    before applying further compression. The DCT is *not* huffman by
    another name, its a discrete cosine transform.

    Also, look for information on Suffix Trees (used in Lempel-Zif
    encoding), and you might find it can be done in a limited way in
    hardware. The tree is built incrementally, and looks kinda like
    Logan's trie, except that the first symbol of a sequence is at
    the leaf, with the last symbol at the root. Kinda...

    In general, though Huffman coding was thought to be "optimal" two
    decades ago, newer techniques have been shown to be better. For
    example, arithmetic coding (needs two passes over the data though),
    and the Burrows-Wheeler transform (used in bzip).

    There should be enough terms for you to go googling and perhaps find
    a more suitable algorithm than Huffman. Don't forget - compression is
    simply the process of removing whatever is predictable from a data
    stream. The better you understand your data, the better an algorithm
    can predict it, and the better your compressor can be.

    Good luck!

    Clifford Heath.
     
    Clifford Heath, Aug 3, 2006
    #13
  14. Hi Cliffor,
    I appreciate your response very much and you are an experienced expert
    in the field.

    I am a newbie in the field and really like to learn more. But I am
    determined to learn all those things you have mentioned better.

    What books can you suggest for me to buy and read about the fields?

    I have a book "Data Compression Book" by Mark Nelson.

    I want to learn two arithmatic algorithm too.

    Thank you.

    Weng



    Clifford Heath wrote:
    > Weng Tianxiang wrote:
    > > Your method is interesting, but is far from what I am looking for.
    > > It is too slow.
    > > I found that almost all Huffman encoding applications are static as in
    > > FAX, electronic books, ZIP files, JPEG, MPEG and so on.

    >
    > You are wrong there. ZIP files and most EBooks use some variant of LZ
    > encoding using a suffix tree (which might be feasible in hardware),
    > followed by Huffman coding - static *or* dynamic - on the resultant
    > symbols. Zip in particular uses the deflate algorithm, which can emit
    > either static or dynamically-generated Huffman data on each block.
    > But AFAIK, the Huffman table is *not* updated with each symbol. There
    > was a kid on sci.crypt a while back banging on about a per-symbol
    > dynamic Huffman technique he'd invented, perhaps look there.
    >
    > JPEG and MPEG use a DCT (and optionally in MPEG, motion compensation)
    > before applying further compression. The DCT is *not* huffman by
    > another name, its a discrete cosine transform.
    >
    > Also, look for information on Suffix Trees (used in Lempel-Zif
    > encoding), and you might find it can be done in a limited way in
    > hardware. The tree is built incrementally, and looks kinda like
    > Logan's trie, except that the first symbol of a sequence is at
    > the leaf, with the last symbol at the root. Kinda...
    >
    > In general, though Huffman coding was thought to be "optimal" two
    > decades ago, newer techniques have been shown to be better. For
    > example, arithmetic coding (needs two passes over the data though),
    > and the Burrows-Wheeler transform (used in bzip).
    >
    > There should be enough terms for you to go googling and perhaps find
    > a more suitable algorithm than Huffman. Don't forget - compression is
    > simply the process of removing whatever is predictable from a data
    > stream. The better you understand your data, the better an algorithm
    > can predict it, and the better your compressor can be.
    >
    > Good luck!
    >
    > Clifford Heath.
     
    Weng Tianxiang, Aug 3, 2006
    #14
  15. Weng Tianxiang wrote:
    > I am a newbie in the field and really like to learn more. But I am
    > determined to learn all those things you have mentioned better.
    > What books can you suggest for me to buy and read about the fields?


    Hmm. Not sure I'd recommend books - they tend to be too theoretical.
    If I were you I'd enter some of the search words into wikipedia and
    google and read any technical papers you find that look authoritative.
    <http://en.wikipedia.org/wiki/Data_compression> is a good start.
    Code can also be useful, though often very cryptic - people who write
    compressors are often very bad at writing readable code.

    What's the source of your data? Are the 64 bits broken into fields,
    say 4x16-bit fields? What patterns might you expect? Do the values
    tend to cluster around a certain range, or tend to be close to
    previous readings? All these things affect the predicability of your
    data and so are intimately connected with the compression approach
    you must choose.
     
    Clifford Heath, Aug 4, 2006
    #15
  16. Weng Tianxiang

    CBFalconer Guest

    Weng Tianxiang wrote:
    >
    > Your method is interesting, but is far from what I am looking for.
    >
    > It is too slow. My target is 64-bit inputs on every clock and one must
    > find its entry within 1-2 clocks and resolve entry conflicts if any at
    > the same time.
    >
    > This design, if succeeded, can be widely used for any dynamic Huffman
    > encoding.
    >
    > I think there is no such hardware design so far in the world and why I
    > found that almost all Huffman encoding applications are static as in
    > FAX, electronic books, ZIP files, JPEG, MPEG and so on.


    Please do not top-post. Your reply belongs after, or intermixed
    with, the *snipped* material to which you reply.

    Huffman encoding is rarely used today. Adaptive Huffman is more
    common. For a complete review see: "The Compression Book" by Mark
    Nelson and Jean-Loup Gailly, M & T Books, ISBN 1-5581-434-1.

    --
    Chuck F () ()
    Available for consulting/temporary embedded and systems.
    <http://cbfalconer.home.att.net> USE maineline address!
     
    CBFalconer, Aug 5, 2006
    #16
  17. CBFalconer wrote:
    > Weng Tianxiang wrote:
    > >
    > > Your method is interesting, but is far from what I am looking for.
    > >
    > > It is too slow. My target is 64-bit inputs on every clock and one must
    > > find its entry within 1-2 clocks and resolve entry conflicts if any at
    > > the same time.
    > >
    > > This design, if succeeded, can be widely used for any dynamic Huffman
    > > encoding.
    > >
    > > I think there is no such hardware design so far in the world and why I
    > > found that almost all Huffman encoding applications are static as in
    > > FAX, electronic books, ZIP files, JPEG, MPEG and so on.

    >
    > Please do not top-post. Your reply belongs after, or intermixed
    > with, the *snipped* material to which you reply.
    >
    > Huffman encoding is rarely used today. Adaptive Huffman is more
    > common. For a complete review see: "The Compression Book" by Mark
    > Nelson and Jean-Loup Gailly, M & T Books, ISBN 1-5581-434-1.
    >
    > --
    > Chuck F () ()
    > Available for consulting/temporary embedded and systems.
    > <http://cbfalconer.home.att.net> USE maineline address!


    Hi Chuck,
    Thank you for your advice.

    I have bought the book "The Compression Book".

    Weng
     
    Weng Tianxiang, Aug 5, 2006
    #17
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. NOBODY
    Replies:
    2
    Views:
    817
    Thomas Weidenfeller
    Oct 17, 2003
  2. niko
    Replies:
    3
    Views:
    3,799
  3. dot

    Python Huffman encoding

    dot, Nov 25, 2004, in forum: Python
    Replies:
    3
    Views:
    1,795
    Guyon Morée
    Nov 30, 2004
  4. Using Huffman Compression

    , Oct 12, 2005, in forum: C Programming
    Replies:
    8
    Views:
    542
    Mabden
    Oct 28, 2005
  5. huffman encoder

    , Dec 21, 2005, in forum: C Programming
    Replies:
    16
    Views:
    823
    Chuck F.
    Dec 23, 2005
Loading...

Share This Page