inaccuracies in compression algorithm

cornelis van der bent · Jul 23, 2009

I have a limited range of positive integer values that occur all over
my data to be compressed. After decompression these value may have
changed a little, +/- a fixed percentage. So we're talking about
lossy [de]compression here.

My idea is/was to compress a value by taking its log() and scaling
this up by a factor. The factor being just large enough to get an
integer compressed value with which the original value can be
calculated, give/take the allowed error percentage. I also subtract
an offset from the log() result to let my compressed range start at 0;
this to increase compression even further. The offset is log(<lowest-
value-in-range>).

By just including the above mentioned 'factor' and 'offset' in the
data, I can decompress these values.

Here's my code (I've copy pasted most from working source but typed
the test() routine here, so please forgive typos):

static double factor;
static double offset;

void test(int minValue, int maxValue, int errorPercentage)
{
double resulution;
int n;

resolution = log(maxValue * (1.0 + (errorPercentage / 100.0))) -
log(maxValue * (1.0 - (errorPercentage / 100.0)));

factor = 1 / resolution;

offset = log((double)minValue);

for (n = minValue; n < maxValue; n++)
{
int down = scaler->scaleDown(n);
int up = scaler->scaleUp(down);

printf("%4d => %d => %d => %.2f\n", n, down, up, ((100.0 * (up
- n)) / (double)n ));
}
}

int scaleDown(int originalValue)
{
return (int)round((log((double)originalValue) - offset) * factor);
}

int scaleUp(int scaledValue)
{
return (int)round(exp(((double)scaledValue / factor) + offset));
}

When running test(200, 2000, 20), I see that the error is asymetrical:
Given a certain compressed value, lowest input values have > 20%
error, while on the other side the highest input values have < 20%
error.
1060 => 12 => 1297 => 22.36
....
1589 => 12 => 1297 => -18.38

Does anyone have an idea what causes this and how to fix it?

Thanks for listening!

Ben Bacarisse · Jul 23, 2009

cornelis van der bent said:
I have a limited range of positive integer values that occur all over
my data to be compressed. After decompression these value may have
changed a little, +/- a fixed percentage. So we're talking about
lossy [de]compression here.

My idea is/was to compress a value by taking its log() and scaling
this up by a factor. The factor being just large enough to get an
integer compressed value with which the original value can be
calculated, give/take the allowed error percentage. I also subtract
an offset from the log() result to let my compressed range start at 0;
this to increase compression even further. The offset is log(<lowest-
value-in-range>).

By just including the above mentioned 'factor' and 'offset' in the
data, I can decompress these values.

Here's my code (I've copy pasted most from working source but typed
the test() routine here, so please forgive typos):

static double factor;
static double offset;

void test(int minValue, int maxValue, int errorPercentage)
{
double resulution;
int n;

resolution = log(maxValue * (1.0 + (errorPercentage / 100.0))) -
log(maxValue * (1.0 - (errorPercentage / 100.0)));

factor = 1 / resolution;

offset = log((double)minValue);

for (n = minValue; n < maxValue; n++)
{
int down = scaler->scaleDown(n);
int up = scaler->scaleUp(down);

This is not C. Of course, the problem is not caused by this being C++
but then you don't have a C question either (see below).

printf("%4d => %d => %d => %.2f\n", n, down, up, ((100.0 * (up
- n)) / (double)n ));
}
}

int scaleDown(int originalValue)
{
return (int)round((log((double)originalValue) - offset) * factor);
}

int scaleUp(int scaledValue)
{
return (int)round(exp(((double)scaledValue / factor) + offset));
}

When running test(200, 2000, 20), I see that the error is asymetrical:
Given a certain compressed value, lowest input values have > 20%
error, while on the other side the highest input values have < 20%
error.
1060 => 12 => 1297 => 22.36
...
1589 => 12 => 1297 => -18.38

Does anyone have an idea what causes this and how to fix it?

Two different input values map to the same scaled one (12) and this
maps to the same "uncompressed" output so, of course, the error is not
the same. There is really no answer to "what causes this" other than
the computation causes it. To fix it, use a different computation.

This is not a C question. You will get much better answers in a group
about compression or, failing that, a general programming group like
comp.programming.

cornelis van der bent · Jul 23, 2009

Two different input values map to the same scaled one (12) and this
maps to the same "uncompressed" output so, of course, the error is not
the same. There is really no answer to "what causes this" other than
the computation causes it. To fix it, use a different computation.

This is not a C question. You will get much better answers in a group
about compression or, failing that, a general programming group like
comp.programming.

You are right Ben! I appologise for the inconvenience. Removed my
message.

Boon · Jul 23, 2009

cornelis said:
I have a limited range of positive integer values that occur all over
my data to be compressed. After decompression these value may have
changed a little, +/- a fixed percentage. So we're talking about
lossy [de]compression here.

Try comp.compression ;-)

Ben Bacarisse · Jul 23, 2009

Richard Heathfield said:
Ben Bacarisse said:

From reading the OP's reply to you, I see that you're right (as
usual), but on this occasion I'm not sure that you deserved to be.
There is nothing in the above function that is not legal C, as far
as I can tell. We don't have a definition of scaler, but its type
could easily be defined along these lines:

struct S
{
int (*scaleDown)(int);
int (*scaleUp(int);
/* other stuff here */
};

All we need now for test() to be legal C is struct S scaler; and a
couple of function pointer assignments elsecode. Since we weren't
given the whole code, without further info from the OP we have no
reason to suppose that this isn't the case in the OP's program.

You are quite right. It was just a guess and if I was correct it was
by accident. Of course, if this had been C there is a good chance it
would have been written:

int down = scaler->scaleDown(scaler, n);

but that is no reason to say the code was not C.

Keith Thompson · Jul 23, 2009

cornelis van der bent said:
You are right Ben! I appologise for the inconvenience. Removed my
message.

FYI, removing a message rarely works. Because cancel messages
(the mechanism by which messages are removed) are often abused,
most servers ignore them. Once you post something, it's probably
there for good.

Lew Pitcher · Jul 23, 2009

Ben Bacarisse said:

From reading the OP's reply to you, I see that you're right (as
usual), but on this occasion I'm not sure that you deserved to be.
There is nothing in the above function that is not legal C, as far
as I can tell. We don't have a definition of scaler, but its type
could easily be defined along these lines:

struct S
{
int (*scaleDown)(int);
int (*scaleUp(int);
/* other stuff here */
};

All we need now for test() to be legal C is struct S scaler; and a
couple of function pointer assignments elsecode. Since we weren't
given the whole code, without further info from the OP we have no
reason to suppose that this isn't the case in the OP's program.

Had we interpreted the OPs source code as C, I have a question...

The initializer value assigned to the OP's
int up
*depends* on the initializer value assigned to the OPs
int down

The OPs code depends on the order of initialization of two independant
declarations. I can't find anything in 9989-1999 that guarantees that
declarations will be 'executed' (i.e. storage allocated, initializations
applied) in the order that they were declared in.

If the C language does not make such a guarantee, then the OP's code (when
viewed as /C/ code) takes on "undefined behaviour" at the very least, as it
depends on the order of initialization to follow the order imposed in the
source code.

Am I interpreting this correctly? Can anyone point me to the place in
9989-1999 which guarantees that declarations will be 'executed' (in the way
I noted above) in order?

Thanks
--
Lew Pitcher

Master Codewright & JOAT-in-training | Registered Linux User #112576
http://pitcher.digitalfreehold.ca/ | GPG public key available by request
---------- Slackware - Because I know what I'm doing. ------

Morris Keesan · Jul 23, 2009

Ben Bacarisse said:

From reading the OP's reply to you, I see that you're right (as
usual), but on this occasion I'm not sure that you deserved to be.
There is nothing in the above function that is not legal C, as far
as I can tell. We don't have a definition of scaler, but its type
could easily be defined along these lines:

struct S
{
int (*scaleDown)(int);
int (*scaleUp(int);
/* other stuff here */
};

All we need now for test() to be legal C is struct S scaler; and a
couple of function pointer assignments elsecode.

Of course, you meant struct S *scaler, along with creation of a (struct S)
and assignment of its address to scaler.

Phil Carmody · Jul 23, 2009

Lew Pitcher said:
Had we interpreted the OPs source code as C, I have a question...

The initializer value assigned to the OP's
int up
*depends* on the initializer value assigned to the OPs
int down

The OPs code depends on the order of initialization of two independant
declarations. I can't find anything in 9989-1999 that guarantees that
declarations will be 'executed' (i.e. storage allocated, initializations
applied) in the order that they were declared in.

Sequence points. The initialisations are ordered.

Phil

Lew Pitcher · Jul 23, 2009

Sequence points. The initialisations are ordered.

I /knew/ that there was something obvious that I was missing. Thanks

--
Lew Pitcher

Master Codewright & JOAT-in-training | Registered Linux User #112576
http://pitcher.digitalfreehold.ca/ | GPG public key available by request
---------- Slackware - Because I know what I'm doing. ------

jameskuyper · Jul 23, 2009

Lew said:
Had we interpreted the OPs source code as C, I have a question...

The initializer value assigned to the OP's
int up
*depends* on the initializer value assigned to the OPs
int down

The OPs code depends on the order of initialization of two independant
declarations. I can't find anything in 9989-1999 that guarantees that
declarations will be 'executed' (i.e. storage allocated, initializations
applied) in the order that they were declared in.

If the C language does not make such a guarantee, then the OP's code (when
viewed as /C/ code) takes on "undefined behaviour" at the very least, as it
depends on the order of initialization to follow the order imposed in the
source code.

Am I interpreting this correctly? Can anyone point me to the place in
9989-1999 which guarantees that declarations will be 'executed' (in the way
I noted above) in order?

"A block allows a set of declarations and statements to be grouped
into one syntactic unit.
The initializers of objects that have automatic storage duration, and
the variable length
array declarators of ordinary identifiers with block scope, are
evaluated and the values are
stored in the objects (including storing an indeterminate value in
objects without an
initializer) each time the declaration is reached in the order of
execution, as if it were a
statement, and within each declaration in the order that declarators
appear." (6.8p3)

The phrase "as if it were a statement" takes us back to 6.8p2:
"A statement specifies an action to be performed. Except as indicated,
statements are
executed in sequence."

Note also that there's a sequence point at the end of each full
declarator (6.7.5p3).

Ben Bacarisse · Jul 23, 2009

Lew Pitcher said:
I /knew/ that there was something obvious that I was missing. Thanks

Just to round this off, even the following is well-defined:

int down = scaler->scaleDown(n), up = scaler->scaleUp(down);

due to 'down = scaler->scaleDown(n)' being a full declarator which has
a sequence point at its end. Not that it is a good idea, just a
permitted one.

jameskuyper · Jul 23, 2009

Phil said:
Sequence points. The initialisations are ordered.

Sequence points in themselves only ensure that the expressions they
separate are ordered rather than overlapping. They do not, in
themselves, impose a particular order. In each case where a sequence
point is specified by the standard, there's separate wording that also
tells you the order. For example:

6.5.13p4: "the && operator guarantees left-to-right evaluation;"

It could have just as said "right-to-left evaluation" without
violating anything that the standard specifies about the meaning of
sequence points.

cornelis van der bent · Jul 23, 2009

You are quite right. It was just a guess and if I was correct it was
by accident. Of course, if this had been C there is a good chance it
would have been written:

int down = scaler->scaleDown(scaler, n);

but that is no reason to say the code was not C.

Here speak the OP. Yes, this code was taken from a C++ source; I just
forgot to remove scaler->. The globals were of course member
variables. Still, C is my mother tongue and never fell in love with C+
+.

It's kinda fun to see a number of familiar names still here. 'I was
here' sometimes in the early/mid 90's and onwards. But wtf happened,
isn't there anything that can be done against all this spam (I'm
looking at this group from google.groups.com).

Cheers,

Kees

Ben Bacarisse · Jul 23, 2009

cornelis van der bent said:
'I was
here' sometimes in the early/mid 90's and onwards.

Welcome back.

But wtf happened,
isn't there anything that can be done against all this spam (I'm
looking at this group from google.groups.com).

A good news feed and a filtering news reader can make it tolerable.

Eric Sosman · Jul 24, 2009

jameskuyper said:
Sequence points in themselves only ensure that the expressions they
separate are ordered rather than overlapping. They do not, in
themselves, impose a particular order. [...]

Did you mean something other than what you said? I think
what you said implies that in

int answer = 42;
++answer;
++answer;
printf ("answer = %d\n", ++answer);

.... the output could be any of 42, 43, 44, or 45, because the
sequence points do not impose an ordering. (In a non-overlapping
but otherwise unordered execution, any or all of the increments
might occur before or after the body of printf(), so long as
they don't overlap it or each other.) But in C as I understand it,
the output cannot be anything other than 45: All three increments
must occur before the body of printf() begins its execution.

See 5.1.2.3p3:

[...] At certain specified points in the execution
sequence called /sequence points,/ all side effects
of previous evaluations shall be complete and no
side effects of subsequent evaluations shall have
taken place. [...]

The words "previous" and "subsequent" describe an ordering,
not mere separation.

James Kuyper · Jul 24, 2009

Eric said:
jameskuyper said:

Sequence points in themselves only ensure that the expressions they
separate are ordered rather than overlapping. They do not, in
themselves, impose a particular order. [...]

Click to expand...

Did you mean something other than what you said? I think
what you said implies that in

int answer = 42;
++answer;
++answer;
printf ("answer = %d\n", ++answer);

... the output could be any of 42, 43, 44, or 45, because the
sequence points do not impose an ordering.

No, that is not what I said, and most of those results are not permitted
by the standard, DESPITE the fact that the sequence points don't impose
an order. That's because something else does impose an order. In this
case, it's section 6.8p2: "Except as indicated, statements are executed
in sequence." The "Except as indicated" (which really should say "except
as otherwise indicated") is particularly relevant: it is the exception
that keeps 6.8p2 from conflicting with other sections of the standard
which specify non-sequential evaluation of statements as a result of
selection, iteration and jump statements. For example:

A;
if(B)
C;
D;

Strict sequential order would require that those statements be evaluated
in the order A, B, C, and then D. The special rules for if() statements
allow for them to be executed in the order A, B, D, if the value of B
happens to compare equal to 0.

... (In a non-overlapping
but otherwise unordered execution, any or all of the increments
might occur before or after the body of printf(), so long as
they don't overlap it or each other.) But in C as I understand it,
the output cannot be anything other than 45: All three increments
must occur before the body of printf() begins its execution.

See 5.1.2.3p3:

[...] At certain specified points in the execution
sequence called /sequence points,/ all side effects
of previous evaluations shall be complete and no
side effects of subsequent evaluations shall have
taken place. [...]

The words "previous" and "subsequent" describe an ordering,
not mere separation.

They refer to an ordering, but do not define it - they don't tell you
which evaluations are "prior" and which ones are "subsequent". It's
other statements in the standard that allow you to identify the actual
order.

Keith Thompson · Jul 24, 2009

cornelis van der bent said:
It's kinda fun to see a number of familiar names still here. 'I was
here' sometimes in the early/mid 90's and onwards. But wtf happened,
isn't there anything that can be done against all this spam (I'm
looking at this group from google.groups.com).

I use news.eternal-september.org and a real newsreader; I don't see
much spam here.

Keith Thompson · Jul 24, 2009

Eric Sosman said:
jameskuyper said:

Sequence points in themselves only ensure that the expressions they
separate are ordered rather than overlapping. They do not, in
themselves, impose a particular order. [...]

Click to expand...

Did you mean something other than what you said? I think
what you said implies that in

int answer = 42;
++answer;
++answer;
printf ("answer = %d\n", ++answer);

... the output could be any of 42, 43, 44, or 45, because the
sequence points do not impose an ordering. (In a non-overlapping
but otherwise unordered execution, any or all of the increments
might occur before or after the body of printf(), so long as
they don't overlap it or each other.) But in C as I understand it,
the output cannot be anything other than 45: All three increments
must occur before the body of printf() begins its execution.

No, what James said doesn't imply that the output could be anything
other than 45. Sequence points *in themselves* don't necessarily
impose a particular order. You also need C99 6.8p2:

Except as indicated, statements are executed in sequence.

See 5.1.2.3p3:

[...] At certain specified points in the execution
sequence called /sequence points,/ all side effects
of previous evaluations shall be complete and no
side effects of subsequent evaluations shall have
taken place. [...]

The words "previous" and "subsequent" describe an ordering,
not mere separation.

Yes, but *what* ordering?

Then again, "statements are executed in sequence" doesn't tell you
what the sequence is.

I suppose that, once the standard says that statements are executed
"in sequence", it's considered so obvious that the sequence of
execution matches the sequence in the source code that it doesn't need
to be stated. And, similarly, one could argue that once it's stated
that sequence points impose an ordering, it's obvious what that
ordering is.

[C language] Issue in the Lotka-Volterra model.	0	Jun 28, 2023
Weird Behavior with Rays in C and OpenGL	4	Feb 13, 2024
How to try a range of hex values in C# code ?	0	Nov 19, 2022
very simple Genetic Algorithm completed	4	Jan 31, 2008
'Random' does not contain a definition for 'Next'	1	Sep 13, 2006
I Need Help with making a function that draws in a canvas using location data.	1	Dec 17, 2021
Python point location of intersect between two lines	0	Feb 28, 2018
algorithm, optimization, or other problem?	2	Feb 21, 2006

inaccuracies in compression algorithm

cornelis van der bent

Ben Bacarisse

cornelis van der bent

Boon

Ben Bacarisse

Keith Thompson

Lew Pitcher

Morris Keesan

Phil Carmody

Lew Pitcher

jameskuyper

Ben Bacarisse

jameskuyper

cornelis van der bent

Ben Bacarisse

Eric Sosman

James Kuyper

Keith Thompson

Keith Thompson

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads