Re: Run-length encoding (RLE) of stream segments ...

Discussion in 'Java' started by Arne Vajhøj, Dec 26, 2010.

  1. Arne Vajhøj

    Arne Vajhøj Guest

    On 26-12-2010 12:23, lbrt chx _ gemale kom wrote:
    > RLE of stream segments ...
    > ~
    > I know this is not a newsgroup about algorithms, but you may know of a java library or some API doing this.
    > ~
    > There are versions of character-based Run length encoding (RLE)
    > ~
    > http://en.wikipedia.org/wiki/Run-length_encoding
    > ~
    > http://rosettacode.org/wiki/Run-length_encoding
    > ~
    > But say instead of basing the encoding on single characters you consider sequences. Let's take as an example the one used and cannibalized on the web:
    > ~
    > ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~
    > ~
    > RLE is a very simple form of data compression in which runs of data (that is, sequences in which the same data value occurs in many consecutive data elements) are stored as a single data value and count, rather than as the original run. This is most useful on data that contains many such runs: for example, relatively simple graphic images such as icons, line drawings, and animations. It is not recommended for use with files that don't have many runs as it could potentially double the file size.
    > For example, consider a screen containing plain black text on a solid white background. There will be many long runs of white pixels in the blank space, and many short runs of black pixels within the text. Let us take a hypothetical single scan line, with B representing a black pixel and W representing white:
    > WWWWWWWWWWWWBWWWWWWWWWWWWBBBWWWWWWWWWWWWWWWWWWWWWWWWBWWWWWWWWWWWWWW
    > If we apply the run-length encoding (RLE) data compression algorithm to the above hypothetical scan line, we get the following:
    > 12W1B12W3B24W1B14W
    > Interpret this as twelve W's, one B, twelve W's, three B's, etc.
    > While: WBWBWBWBWBWBWB would be: 1W1B1W1B1W1B1W1B1W1B1W1B1W1B
    > ~
    > ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~
    > ~
    > The encoding data is actually longer! It could simple be 7[WB]
    > ~
    > Also, the longest non overlapping segments should be prioritized over shorter repeated ones:
    > ~
    > WBXAWBXWBXQWBPWBQ
    > ~
    > Instead of encoding that sequence as (something like ;-)):
    > ~
    > {0,4,7,11,14}[WB],{2,6,9}X,{3}A,{10,16}Q,{13}P
    > ~
    > It should be encoded like:
    > ~
    > {0,4,7}[WBX],{3}A,{10,16}Q,{11,14},WB,{13}P
    > ~
    > Any theories on these kinds of encodings? Ideally implemented in Java?


    RLE performs so poorly compared to modern compression algorithms, that
    I don't think there is much interest in them today.

    Why don't you just use regular zip (deflate) algorithm, which is
    build into Java.

    Arne
     
    Arne Vajhøj, Dec 26, 2010
    #1
    1. Advertising

  2. Arne Vajhøj

    Arne Vajhøj Guest

    On 27-12-2010 18:43, Wanja Gayk wrote:
    > RLE is quite easy to implement in Java.
    > You might look into a more sophisticated version using controlbytes or
    > control sequences.
    > You chose the byte that is the least likely to occur in the data stream
    > as control byte and encode runs like follows ("#" being the
    > controlbyte):
    > #12WB#12W#3B#24WB#14W
    > instead of: 12W1B12W3B24W1B14W
    > That approach will waste some bytes for each run, but it will save on
    > random data, because you'd get:
    > WBWBWBWBWBWBWB
    > Instead of: 1W1B1W1B1W1B1W1B1W1B1W1B1W1B
    > Controlbytes itself can be encoded like: "#1").
    > An even shorter method is to substitute the controlbytes by using the
    > first two characters of a run to act as control sequence, so you'd get:
    > WW10BWW10BB1WW22BWW12
    > Any time your decoder finds two subsequent characters, it assumes the
    > next byte is a count for the same characters to follow.
    >
    > Controls bytes offer more flexibility though, because for 2-byte-runs
    > you might introduce a second controlbyte for another coding scheme.
    > Imagine 2-byte runs:
    > $7WB
    > Anyway, since it's hard to know the optimum chunk-size, which may be
    > locally different, you're usually better off with using an LZSS
    > (Lempel,Ziv,Storer,Szymanski variant of LZ77 with controlbyte)
    > for anything else but simple runs.
    > LZSS encodes using references to previous ocurrences, using a
    > controlbyte:
    > The whiskymixer mixes whisky in a whiskymixer.
    > becomes
    > The whiskymixer $6,4s$18,7 in a$30,12.
    > where
    > $6,4
    > reads: copy from cursor position minus 6, 4 characters.
    >
    > Any sequence smaller than 3 bytes gets ignored, any controlbyte gets
    > encoded as $1.
    >
    > The end of file can be denoted by a zero offset or a zero run length:
    > #0 or $0 respectively.
    >
    > Please note that all numbers in the schemes are raw values, no text, and
    > that the comma I used in the LZSS sequences is just a separator for the
    > human eye.
    >
    > So you could imagine a hybrid approach using different controlbytes (or
    > "escapes", as they are also called) for different encodings:
    > #=RLE (run length, byte)
    > $=RLE2 (run length, 2 bytes)
    > %=LZSS (displacement, n)
    > &=LZSS+run (displacement, n, run length)
    >
    > There have been several compressors which use a hybrid approach and more
    > sophisticated encoding of the encoding sequence, for example with Elias
    > Gamma codes, which encode near sequences with less bits than far
    > sequences.
    > Be warned that LZSS is not quite the fastest way to compress data, but
    > it's rather easy to implement and decompression is tivial.
    > This could be an interesting read on a hybrid approach, there is:
    > http://www.cs.tut.fi/~albert/Dev/pucrunch/
    > http://www.cs.tut.fi/~albert/Dev/pucrunch/packing.html
    > The latter being a link with good explainations to different algorithms.


    Why LZSS and not LZH?

    I would expect LZH to always give better compression and LZH comes
    with Java so no need for a third party library.

    (OK - deflate is a not a simple LZH but has some tweaks, but those
    tweaks should not be a problem)

    Arne
     
    Arne Vajhøj, Dec 28, 2010
    #2
    1. Advertising

  3. Arne Vajhøj

    Arne Vajhøj Guest

    On 28-12-2010 06:48, Wanja Gayk wrote:
    > In article<4d1928ec$0$23755$>,
    > says...
    >> Why LZSS and not LZH?
    >>
    >> I would expect LZH to always give better compression and LZH comes
    >> with Java so no need for a third party library.
    >>
    >> (OK - deflate is a not a simple LZH but has some tweaks, but those
    >> tweaks should not be a problem)

    >
    > LZH ist, as far as I know, a hybrid of LZSS and adaptive Huffman coding,
    > which means that the escape sequence, the reference and the message
    > itself are decoded from an apadptive huffman tree. Since I guess the
    > original poster is quite new to the topic, I would not recommend to try
    > this for a start.
    >
    > LZH is, as far as I know, implemented in Winzip. Java already knows the
    > ZipOutputStream and the ZipInputStream. No need to reinvent the wheel.


    Exactly.

    Even a new to the topic could use it.

    Arne
     
    Arne Vajhøj, Dec 28, 2010
    #3
  4. Arne Vajhøj

    Arne Vajhøj Guest

    On 31-12-2010 12:46, Wanja Gayk wrote:
    > In article<4d1a19d9$0$23753$>,
    > says...
    >>>> Why LZSS and not LZH?

    > [..]
    >>> LZH is, as far as I know, implemented in Winzip. Java already knows the
    >>> ZipOutputStream and the ZipInputStream. No need to reinvent the wheel.

    >>
    >> Exactly.
    >>
    >> Even a new to the topic could use it.

    >
    > Well, but that's boring isn't it? If you like to get into the topic, why
    > not start with something simple?
    > I find teh compression stuff really interesting!


    I did too. 20 years ago. Implementing LZW and LZSS in Fortran
    is so fun.

    :)

    Arne
     
    Arne Vajhøj, Jan 1, 2011
    #4
  5. Arne Vajhøj

    Eric Sosman Guest

    On 12/31/2010 12:46 PM, Wanja Gayk wrote:
    > [...]
    > I find teh compression stuff really interesting!


    Perhaps comp.compression or comp.compression.research would be more
    useful to you than comp.lang.java.programmer or alt.undersea.baking ...

    --
    Eric Sosman
    lid
     
    Eric Sosman, Jan 1, 2011
    #5
  6. Arne Vajhøj

    Eric Sosman Guest

    On 1/7/2011 9:03 PM, Wanja Gayk wrote:
    > In article<ifo3bv$gmm$-september.org>, esosman@ieee-dot-
    > org.invalid says...
    >> On 12/31/2010 12:46 PM, Wanja Gayk wrote:
    >>> [...]
    >>> I find teh compression stuff really interesting!

    >>
    >> Perhaps comp.compression or comp.compression.research would be more
    >> useful to you than comp.lang.java.programmer or alt.undersea.baking ...

    >
    > I know comp.compression, no worries.


    Then what are you doing here? No, don't explain: Just go away.

    --
    Eric Sosman
    lid
     
    Eric Sosman, Jan 8, 2011
    #6
  7. Arne Vajhøj

    Arne Vajhøj Guest

    On 07-01-2011 21:26, Eric Sosman wrote:
    > On 1/7/2011 9:03 PM, Wanja Gayk wrote:
    >> In article<ifo3bv$gmm$-september.org>, esosman@ieee-dot-
    >> org.invalid says...
    >>> On 12/31/2010 12:46 PM, Wanja Gayk wrote:
    >>>> [...]
    >>>> I find teh compression stuff really interesting!
    >>>
    >>> Perhaps comp.compression or comp.compression.research would be more
    >>> useful to you than comp.lang.java.programmer or alt.undersea.baking ...

    >>
    >> I know comp.compression, no worries.

    >
    > Then what are you doing here? No, don't explain: Just go away.


    I believe that he answered a question that somebody posted
    at cljp.

    An answer that did talk about Java.

    The thread then drifted a little bit off topic, but that
    is seen before.

    Arne
     
    Arne Vajhøj, Jan 8, 2011
    #7
  8. Arne Vajhøj

    Lew Guest

    Eric Sosman wrote:
    >> Then what are you doing here? No, don't explain: Just go away.


    Chris Uppal wrote:
    > Eric, have you been taking Lew's pills ?!


    I have no beef with Wanja. Nor would I say that algorithm discussions are off
    topic here. Don't Java programmers need to know about algorithms?

    I am a little disappointed in you, Chris.

    --
    Lew
    Ceci n'est pas une pipe.
     
    Lew, Jan 8, 2011
    #8
  9. Arne Vajhøj

    Arne Vajhøj Guest

    Re: Run-length encoding (RLE) of stream segments ... [PLONK]

    On 09-01-2011 12:15, Wanja Gayk wrote:
    > In article<ig8i07$9dh$-september.org>, esosman@ieee-dot-
    > org.invalid says...
    >>>>> [...]
    >>>>> I find teh compression stuff really interesting!
    >>>>
    >>>> Perhaps comp.compression or comp.compression.research would be more
    >>>> useful to you than comp.lang.java.programmer or alt.undersea.baking ...
    >>>
    >>> I know comp.compression, no worries.

    >>
    >> Then what are you doing here? No, don't explain: Just go away.

    >
    > Well, Eric, I guess other readers of this newsgroup will judge better
    > whose manners are inappropriate here.
    > As for me, I've had enough of you. "Thank you" for wasting my time.
    >
    > *plonk*


    Mistake.

    Eric is not like that in general and often provides
    good information.

    I suspect that he got you mixed up with somebody else.

    Arne
     
    Arne Vajhøj, Jan 9, 2011
    #9
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. PC
    Replies:
    2
    Views:
    3,953
    Marc Guardiani
    Nov 12, 2003
  2. nullptr

    RLE code review

    nullptr, Nov 8, 2003, in forum: C Programming
    Replies:
    14
    Views:
    644
    Jonathan2s6
    Nov 12, 2003
  3. Gurjant

    4 BITS RLE IN BMP

    Gurjant, Feb 6, 2008, in forum: C Programming
    Replies:
    1
    Views:
    441
    Jack Klein
    Feb 6, 2008
  4. Arne Vajhøj
    Replies:
    0
    Views:
    573
    Arne Vajhøj
    Dec 27, 2010
  5. Lew
    Replies:
    2
    Views:
    373
Loading...

Share This Page