Algorithm that makes maximum compression of completly diffused data.

J

jonas.thornvall

I am searching for the program or algorithm that makes the best possible of completly (diffused data/random noise) and wonder what the state of art compression is.

I understand this is not the correct forum but since i think i have an algorithm that can do this very good, and do not know where to turn for such question i was thinking to start here.

It is of course lossless compression i am speaking of.
 
J

jonas.thornvall

Den onsdagen den 30:e oktober 2013 kl. 19:53:59 UTC+1 skrev Mark Lawrence:
I can't help with compression but I can help with a marvellous source of

the opposite, expansion, it's google groups.



--

Python is the second best programming language in the world.

But the best has yet to be invented. Christian Tismer



Mark Lawrence

And your still a stupid monkey i dare you to go test your IQ.
 
J

jonas.thornvall

Den onsdagen den 30:e oktober 2013 kl. 20:18:30 UTC+1 skrev Mark Lawrence:
It's you're as in you are and not your as in belongs to me.



I have no intention of getting my IQ tested, but I do know that it's a

minimum of 120 as that was required for me to pass the old UK 11+

examination. Given that I spent my time at a grammar school in the top

stream I'd guess that my IQ is actually higher, but there you go.



Not that that really matters. What does is that I'm smart enough to be

able to follow a set of instructions when requested to do so, for

example I could probably follow these

https://wiki.python.org/moin/GoogleGroupsPython if I needed to. I'm

therefore assuming that you're not bright enough to follow these

instructions and so have annoyed thousands of people with your double

spaced crap, which I've again snipped.



--

Python is the second best programming language in the world.

But the best has yet to be invented. Christian Tismer



Mark Lawrence

I do not follow instructions, i make them accesible to anyone. And you just following them is a clear example to your lack of IQ.
 
J

jonas.thornvall

Den onsdagen den 30:e oktober 2013 kl. 20:18:30 UTC+1 skrev Mark Lawrence:
It's you're as in you are and not your as in belongs to me.



I have no intention of getting my IQ tested, but I do know that it's a

minimum of 120 as that was required for me to pass the old UK 11+

examination. Given that I spent my time at a grammar school in the top

stream I'd guess that my IQ is actually higher, but there you go.



Not that that really matters. What does is that I'm smart enough to be

able to follow a set of instructions when requested to do so, for

example I could probably follow these

https://wiki.python.org/moin/GoogleGroupsPython if I needed to. I'm

therefore assuming that you're not bright enough to follow these

instructions and so have annoyed thousands of people with your double

spaced crap, which I've again snipped.



--

Python is the second best programming language in the world.

But the best has yet to be invented. Christian Tismer



Mark Lawrence

What i actually saying is that you are indeed... an anal code monkey that never ever had a selfsustained thought of your own.

Think about it.
 
A

Antoon Pardon

Op 30-10-13 20:01, (e-mail address removed) schreef:
Den onsdagen den 30:e oktober 2013 kl. 19:53:59 UTC+1 skrev Mark Lawrence:

And your still a stupid monkey i dare you to go test your IQ.
Are you sure it is this game you want to play? Please consider
carefully: For what purpose did you come to this group? Is your
behaviour accomplishing that purpose?

You were asked to take responsibility for the detrimental effect your
choice of news reader software has on this newsgroup. Your answer boiled
down to a very clear "I don't care about the detrimental effect I cause"
Well until you start caring and adapt your behaviour, people will tend
not to care about your questions/problems and the most likely responses
you will get is people pointing to your anti-social behaviour.

You may feel very righteous in your response to Mark but you will just
further alienate the regulars. If that is your goal you can continue
as you did before and soon you will be in a lot of kill file or if
you hope for some cooperation from the regulars, you'd better show
you can be cooperative too.
 
D

Dan Stromberg

xz compression is pretty hard, if a little bit slow. Also, if you want
really stellar compression ratios and you don't care about time to
compress, you might check out one of the many paq implementations.

I have a module that does xz compression in 4 different ways:
http://stromberg.dnsalias.org/svn/backshift/tags/1.20/xz_mod.py
It's only for smallish chunks in the ctypes version, because that was all I
needed. The others should be able to handle relatively large inputs.
 
J

jonas.thornvall

Den onsdagen den 30:e oktober 2013 kl. 20:35:59 UTC+1 skrev Tim Delaney:
I am searching for the program or algorithm that makes the best possible of completly (diffused data/random noise) and wonder what the state of art compression is.




I understand this is not the correct forum but since i think i have an algorithm that can do this very good, and do not know where to turn for such question i was thinking to start here.



It is of course lossless compression i am speaking of.



This is not an appropriate forum for this question. If you know it's an inappropriate forum (as you stated) then do not post the question here. Do asearch with your preferred search engine and look up compression on lossless Wikipedia. And read and understand the following link:



http://www.catb.org/esr/faqs/smart-questions.html



paying special attention to the following parts:



http://www.catb.org/esr/faqs/smart-questions.html#forum
http://www.catb.org/esr/faqs/smart-questions.html#prune


http://www.catb.org/esr/faqs/smart-questions.html#courtesy

http://www.catb.org/esr/faqs/smart-questions.html#keepcool


http://www.catb.org/esr/faqs/smart-questions.html#classic



If you have *python* code implementing this algorithm and want help, postthe parts you want help with (and preferably post the entire algorithm in a repository).




However, having just seen the following from you in a reply to Mark ("I do not follow instructions, i make them accesible to anyone"), I am not not going to give a second chance - fail to learn from the above advice and you'll meet my spam filter.



If the data is truly completely random noise, then there is very little that lossless compression can do. On any individual truly random data set you might get a lot of compression, a small amount of compression, or even expansion, depending on what patterns have randomly occurred in the data set.But there is no current lossless compression algorithm that can take trulyrandom data and systematically compress it to be smaller than the original..



If you think you have an algorithm that can do this on truly random data,you're probably wrong - either your data is has patterns the algorithm canexploit, or you've simply been lucky with the randomness of your data so far.



Tim Delaney

No i am not wrong.
End of story
 
J

jonas.thornvall

Den onsdagen den 30:e oktober 2013 kl. 20:46:57 UTC+1 skrev Modulok:
I am searching for the program or algorithm that makes the best possible of completly (diffused data/random noise) and wonder what the state of art compression is.




I understand this is not the correct forum but since i think i have an algorithm that can do this very good, and do not know where to turn for such question i was thinking to start here.



It is of course lossless compression i am speaking of.

--

https://mail.python.org/mailman/listinfo/python-list


 



None. If the data to be compressed is truly homogeneous, random noise as you
describe (for example a 100mb file read from cryptographically secure random

bit generator such as /dev/random on *nix systems), the state-of-the-art
lossless compression is zero and will remain that way for the foreseeable

future.


There is no lossless algorithm that will reduce truly random (high entropy)
data by any significant margin. In classical information theory, such an

algorithm can never be invented. See: Kolmogorov complexity


Real world data is rarely completely random. You would have to test various

algorithms on the data set in question. Small things such as non-obvious
statistical clumping can make a big difference in the compression ratio from

one algorithm to another. Data that might look "random", might not actually be
random in the entropy sense of the word.





Not to sound like a downer, but I would wager that the data you're testing your

algorithm on is not as truly random as you imply or is not a large enoughbody
of test data to draw such conclusions from. It's akin to inventing a perpetual

motion machine or an inertial propulsion engine or any other classically
impossible solutions. (This only applies to truly random data.)



-Modulok-

Well then i have news for you.
 
J

jonas.thornvall

Den onsdagen den 30:e oktober 2013 kl. 20:46:57 UTC+1 skrev Modulok:
I am searching for the program or algorithm that makes the best possible of completly (diffused data/random noise) and wonder what the state of art compression is.




I understand this is not the correct forum but since i think i have an algorithm that can do this very good, and do not know where to turn for such question i was thinking to start here.



It is of course lossless compression i am speaking of.

--

https://mail.python.org/mailman/listinfo/python-list


 



None. If the data to be compressed is truly homogeneous, random noise as you
describe (for example a 100mb file read from cryptographically secure random

bit generator such as /dev/random on *nix systems), the state-of-the-art
lossless compression is zero and will remain that way for the foreseeable

future.


There is no lossless algorithm that will reduce truly random (high entropy)
data by any significant margin. In classical information theory, such an

algorithm can never be invented. See: Kolmogorov complexity


Real world data is rarely completely random. You would have to test various

algorithms on the data set in question. Small things such as non-obvious
statistical clumping can make a big difference in the compression ratio from

one algorithm to another. Data that might look "random", might not actually be
random in the entropy sense of the word.





Not to sound like a downer, but I would wager that the data you're testing your

algorithm on is not as truly random as you imply or is not a large enoughbody
of test data to draw such conclusions from. It's akin to inventing a perpetual

motion machine or an inertial propulsion engine or any other classically
impossible solutions. (This only applies to truly random data.)



-Modulok-

My algorithm will compress data from any random data source.
 
G

Gene Heskett

Den onsdagen den 30:e oktober 2013 kl. 20:46:57 UTC+1 skrev Modulok:

Well then i have news for you.

Congratulations Jonas. My kill file for this list used to have only one
name, but now has 2. Unfortunately I will still see the backscatter of
others trying to tell you how to post and interact with this list. But for
now, this person with, since we are quoting IQ's here, a tested 147 says
good by.

Cheers, Gene
--
"There are four boxes to be used in defense of liberty:
soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)

BOFH excuse #275:

Bit rot
A pen in the hand of this president is far more
dangerous than 200 million guns in the hands of
law-abiding citizens.
 
G

Grant Edwards

I am searching for the program or algorithm that makes the best
possible of completly (diffused data/random noise) and wonder what
the state of art compression is.
[...]

It is of course lossless compression i am speaking of.

For completely random noise, the CAT compression algorithm will
acheive the maximum theoretical result. It's been available on Unix
systems for decades via the "cat" command.

It's also trivial to implement in your own code if you desire.
 
M

Mark Janssen

I am searching for the program or algorithm that makes the best possible of completly (diffused data/random noise) and wonder what the state of art compression is.

Is this an April Fool's Joke? A key idea of "completely" random is
that you *can't* compress it.
 
J

Joshua Landau

It's you're as in you are and not your as in belongs to me.

I have no intention of getting my IQ tested, but I do know that it's a
minimum of 120 as that was required for me to pass the old UK 11+
examination. Given that I spent my time at a grammar school in the top
stream I'd guess that my IQ is actually higher, but there you go.

What I'm confounded about is this list's inability to recognise a
troll when it slaps it vocally in the face.

This isn't like Nikos. There's no "troll vs. incompetent" debate to be
had. This guy started talking about compressing *random data* and
immediately descended into insults. Your IQ, whatever it may be, is
going to waste ;).
 
T

Tim Chase

started talking about compressing *random data*

If it's truly random bytes, as long as you don't need *the same*
random data, you can compress it quite easily. Lossy compression is
acceptable for images, so why not random files? :)

import os
inname = "random.txt"
namez = inname + '.rnz'
# compress the file
with open(outnamez, 'w') as f:
f.write(os.stat(inname).st_size)

# uncompress the file
with open(namez) as f:
size = int(f.read())
with open('/dev/random', 'rb') as rnd, open(inname, 'wb') as out:
for i in range(size):
out.write(rnd.read(1))


There are optimizations that can be made, and I didn't make it run
on Win32, but I leave those as exercises for the reader. That
said, this compresses *remarkably* well for large files ;-)

-tkc
 
C

Chris Angelico

If it's truly random bytes, as long as you don't need *the same*
random data, you can compress it quite easily. Lossy compression is
acceptable for images, so why not random files? :)

Maybe. But what if it's not truly random, but only pseudo-random?

# create a file full of random data
import random
seed = random.getrandbits(32)
length = random.getrandbits(16) # in four-byte units
random.seed(seed)
inname = "random.txt"
namez = inname + '.rnz'
with open(inname, "wb") as bigfile:
for _ in range(length):
bigfile.write(random.getrandbits(32).to_bytes(4,"big"))

# compress that file
with open(namez, "wb") as smallfile:
smallfile.write(seed.to_bytes(4,"big"))
smallfile.write(length.to_bytes(4,"big"))

# uncompress it
with open(namez, "rb") as f:
seed = int.from_bytes(f.read(4),"big")
length = int.from_bytes(f.read(4),"big")
random.seed(seed)
with open("out_" + inname, "wb") as bigfile:
for _ in range(length):
bigfile.write(random.getrandbits(32).to_bytes(4,"big"))

Voila! Very impressive compression ratio, and exploits the very
randomness of the data!

ChrisA
 
D

Dave Angel

I am searching for the program or algorithm that makes the best possible of completly (diffused data/random noise) and wonder what the state of art compression is.

I understand this is not the correct forum but since i think i have an algorithm that can do this very good, and do not know where to turn for such question i was thinking to start here.

It is of course lossless compression i am speaking of.

See http://gailly.net/05533051.html for a discussion of a patent on a
similarly ludicrous "algorithm."

Maybe you can hoodwink the patent office into granting you one as well.
 
T

Tim Roberts

Well then i have news for you.

Well, then, why don't you share it?

Let me try to get you to understand WHY what you say is impossible. Let's
say you do have a function f(x) that can produce a compressed output y for
any given x, such that y is always smaller than x. If that were true, then
I could call f() recursively:
f(f(...f(f(f(f(f(x)))))...))
and eventually the result get down to a single bit. I hope it is clear
that there's no way to restore a single bit back into different source
texts.

Here's another way to look at it. If f(x) is smaller than x for every x,
that means there MUST me multiple values of x that produce the same f(x).
Do you see? If x is three bits and f(x) is two bits, that means there are
8 possible values for x but only 4 values for f(x). So, given an f(x), you
cannot tell which value of x it came from. You have lost information.
 
M

Mark Janssen

Let me try to get you to understand WHY what you say is impossible. Let's
say you do have a function f(x) that can produce a compressed output y for
any given x, such that y is always smaller than x. If that were true, then
I could call f() recursively:
f(f(...f(f(f(f(f(x)))))...))
and eventually the result get down to a single bit. I hope it is clear
that there's no way to restore a single bit back into different source
texts.

Hey, that's a nice proof!

Cheers,

Mark Janssen
 
S

Steven D'Aprano

Well, then, why don't you share it?

Let me try to get you to understand WHY what you say is impossible.
[snip reasons]

Expanding on Tim's post... the first scenario Tim gives, applying the
compression repeatedly to its own output until you eventually get a
single byte, can be overcome if there are data sets that are unchanged by
compression. That is, if f() is the compression function:

f(f(f(data))) = f(f(data)) == f(data) == data

for some values of data. So if you start with some other data, and
compress it, then compress it again, and again, eventually you end up
with one of these attractors, after which repeated compression doesn't
give you any more benefit.

[Aside: the attractors aren't necessarily a single point. The attractor
could be a cycle of two or more points, f(x) == y and f(y) == x. Or you
could even have a "strange attractor", which brings us to chaos theory.]

But your second reason, better known as the pigeonhole principle,
demonstrates that for any lossless compression method, there must be data
sets that actually expand the data. It doesn't matter how cleverly you
compress the data, you can't fit 20kg of potatoes in a 10kg bag, so to
speak. Suppose your compression algorithm compresses a single byte into a
nybble (four bits). There are 256 different input data sets (0x00,
0x01, ... 0xFF) and only 16 different outputs (0x0, 0x1, ... 0xF). There
is no way for 256 pigeons to fit in 16 pigeon holes unless you put two or
more pigeons in at least one hole. Ergo, if the compression algorithm is
lossless, *some* data must be expanded rather than compressed.

Alternatively, the compression may be lossy. Or both!

Obviously data isn't compressed a byte at a time, that would be silly.
But the principle still applies.

There is a way to apparently get around these limits: store data
externally, perhaps inside the compression application itself. Then, if
you just look at the compressed file (the "data.zip" equivalent, although
I stress that zip compression is *not* like this), you might think it has
shrunk quite a lot. But when you include the hidden data, the compression
is not quite so impressive...
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,904
Latest member
HealthyVisionsCBDPrice

Latest Threads

Top