inaccuracies in compression algorithm

  • Thread starter cornelis van der bent
  • Start date
C

cornelis van der bent

I have a limited range of positive integer values that occur all over
my data to be compressed. After decompression these value may have
changed a little, +/- a fixed percentage. So we're talking about
lossy [de]compression here.

My idea is/was to compress a value by taking its log() and scaling
this up by a factor. The factor being just large enough to get an
integer compressed value with which the original value can be
calculated, give/take the allowed error percentage. I also subtract
an offset from the log() result to let my compressed range start at 0;
this to increase compression even further. The offset is log(<lowest-
value-in-range>).

By just including the above mentioned 'factor' and 'offset' in the
data, I can decompress these values.

Here's my code (I've copy pasted most from working source but typed
the test() routine here, so please forgive typos):

static double factor;
static double offset;

void test(int minValue, int maxValue, int errorPercentage)
{
double resulution;
int n;

resolution = log(maxValue * (1.0 + (errorPercentage / 100.0))) -
log(maxValue * (1.0 - (errorPercentage / 100.0)));

factor = 1 / resolution;

offset = log((double)minValue);

for (n = minValue; n < maxValue; n++)
{
int down = scaler->scaleDown(n);
int up = scaler->scaleUp(down);

printf("%4d => %d => %d => %.2f\n", n, down, up, ((100.0 * (up
- n)) / (double)n ));
}
}

int scaleDown(int originalValue)
{
return (int)round((log((double)originalValue) - offset) * factor);
}

int scaleUp(int scaledValue)
{
return (int)round(exp(((double)scaledValue / factor) + offset));
}

When running test(200, 2000, 20), I see that the error is asymetrical:
Given a certain compressed value, lowest input values have > 20%
error, while on the other side the highest input values have < 20%
error.
1060 => 12 => 1297 => 22.36
....
1589 => 12 => 1297 => -18.38

Does anyone have an idea what causes this and how to fix it?

Thanks for listening!
 
B

Ben Bacarisse

cornelis van der bent said:
I have a limited range of positive integer values that occur all over
my data to be compressed. After decompression these value may have
changed a little, +/- a fixed percentage. So we're talking about
lossy [de]compression here.

My idea is/was to compress a value by taking its log() and scaling
this up by a factor. The factor being just large enough to get an
integer compressed value with which the original value can be
calculated, give/take the allowed error percentage. I also subtract
an offset from the log() result to let my compressed range start at 0;
this to increase compression even further. The offset is log(<lowest-
value-in-range>).

By just including the above mentioned 'factor' and 'offset' in the
data, I can decompress these values.

Here's my code (I've copy pasted most from working source but typed
the test() routine here, so please forgive typos):

static double factor;
static double offset;

void test(int minValue, int maxValue, int errorPercentage)
{
double resulution;
int n;

resolution = log(maxValue * (1.0 + (errorPercentage / 100.0))) -
log(maxValue * (1.0 - (errorPercentage / 100.0)));

factor = 1 / resolution;

offset = log((double)minValue);

for (n = minValue; n < maxValue; n++)
{
int down = scaler->scaleDown(n);
int up = scaler->scaleUp(down);

This is not C. Of course, the problem is not caused by this being C++
but then you don't have a C question either (see below).
printf("%4d => %d => %d => %.2f\n", n, down, up, ((100.0 * (up
- n)) / (double)n ));
}
}

int scaleDown(int originalValue)
{
return (int)round((log((double)originalValue) - offset) * factor);
}

int scaleUp(int scaledValue)
{
return (int)round(exp(((double)scaledValue / factor) + offset));
}

When running test(200, 2000, 20), I see that the error is asymetrical:
Given a certain compressed value, lowest input values have > 20%
error, while on the other side the highest input values have < 20%
error.
1060 => 12 => 1297 => 22.36
...
1589 => 12 => 1297 => -18.38

Does anyone have an idea what causes this and how to fix it?

Two different input values map to the same scaled one (12) and this
maps to the same "uncompressed" output so, of course, the error is not
the same. There is really no answer to "what causes this" other than
the computation causes it. To fix it, use a different computation.

This is not a C question. You will get much better answers in a group
about compression or, failing that, a general programming group like
comp.programming.
 
C

cornelis van der bent

Two different input values map to the same scaled one (12) and this
maps to the same "uncompressed" output so, of course, the error is not
the same.  There is really no answer to "what causes this" other than
the computation causes it.  To fix it, use a different computation.

This is not a C question.  You will get much better answers in a group
about compression or, failing that, a general programming group like
comp.programming.

You are right Ben! I appologise for the inconvenience. Removed my
message.
 
B

Boon

cornelis said:
I have a limited range of positive integer values that occur all over
my data to be compressed. After decompression these value may have
changed a little, +/- a fixed percentage. So we're talking about
lossy [de]compression here.

Try comp.compression ;-)
 
B

Ben Bacarisse

Richard Heathfield said:
Ben Bacarisse said:


From reading the OP's reply to you, I see that you're right (as
usual), but on this occasion I'm not sure that you deserved to be.
There is nothing in the above function that is not legal C, as far
as I can tell. We don't have a definition of scaler, but its type
could easily be defined along these lines:

struct S
{
int (*scaleDown)(int);
int (*scaleUp(int);
/* other stuff here */
};

All we need now for test() to be legal C is struct S scaler; and a
couple of function pointer assignments elsecode. Since we weren't
given the whole code, without further info from the OP we have no
reason to suppose that this isn't the case in the OP's program.

You are quite right. It was just a guess and if I was correct it was
by accident. Of course, if this had been C there is a good chance it
would have been written:

int down = scaler->scaleDown(scaler, n);

but that is no reason to say the code was not C.
 
K

Keith Thompson

cornelis van der bent said:
You are right Ben! I appologise for the inconvenience. Removed my
message.

FYI, removing a message rarely works. Because cancel messages
(the mechanism by which messages are removed) are often abused,
most servers ignore them. Once you post something, it's probably
there for good.
 
L

Lew Pitcher

Ben Bacarisse said:


From reading the OP's reply to you, I see that you're right (as
usual), but on this occasion I'm not sure that you deserved to be.
There is nothing in the above function that is not legal C, as far
as I can tell. We don't have a definition of scaler, but its type
could easily be defined along these lines:

struct S
{
int (*scaleDown)(int);
int (*scaleUp(int);
/* other stuff here */
};

All we need now for test() to be legal C is struct S scaler; and a
couple of function pointer assignments elsecode. Since we weren't
given the whole code, without further info from the OP we have no
reason to suppose that this isn't the case in the OP's program.

Had we interpreted the OPs source code as C, I have a question...

The initializer value assigned to the OP's
int up
*depends* on the initializer value assigned to the OPs
int down

The OPs code depends on the order of initialization of two independant
declarations. I can't find anything in 9989-1999 that guarantees that
declarations will be 'executed' (i.e. storage allocated, initializations
applied) in the order that they were declared in.

If the C language does not make such a guarantee, then the OP's code (when
viewed as /C/ code) takes on "undefined behaviour" at the very least, as it
depends on the order of initialization to follow the order imposed in the
source code.

Am I interpreting this correctly? Can anyone point me to the place in
9989-1999 which guarantees that declarations will be 'executed' (in the way
I noted above) in order?

Thanks
--
Lew Pitcher

Master Codewright & JOAT-in-training | Registered Linux User #112576
http://pitcher.digitalfreehold.ca/ | GPG public key available by request
---------- Slackware - Because I know what I'm doing. ------
 
M

Morris Keesan

Ben Bacarisse said:


From reading the OP's reply to you, I see that you're right (as
usual), but on this occasion I'm not sure that you deserved to be.
There is nothing in the above function that is not legal C, as far
as I can tell. We don't have a definition of scaler, but its type
could easily be defined along these lines:

struct S
{
int (*scaleDown)(int);
int (*scaleUp(int);
/* other stuff here */
};

All we need now for test() to be legal C is struct S scaler; and a
couple of function pointer assignments elsecode.

Of course, you meant struct S *scaler, along with creation of a (struct S)
and assignment of its address to scaler.
 
P

Phil Carmody

Lew Pitcher said:
Had we interpreted the OPs source code as C, I have a question...

The initializer value assigned to the OP's
int up
*depends* on the initializer value assigned to the OPs
int down

The OPs code depends on the order of initialization of two independant
declarations. I can't find anything in 9989-1999 that guarantees that
declarations will be 'executed' (i.e. storage allocated, initializations
applied) in the order that they were declared in.

Sequence points. The initialisations are ordered.

Phil
 
L

Lew Pitcher

Sequence points. The initialisations are ordered.

I /knew/ that there was something obvious that I was missing. Thanks

--
Lew Pitcher

Master Codewright & JOAT-in-training | Registered Linux User #112576
http://pitcher.digitalfreehold.ca/ | GPG public key available by request
---------- Slackware - Because I know what I'm doing. ------
 
J

jameskuyper

Lew said:
Had we interpreted the OPs source code as C, I have a question...

The initializer value assigned to the OP's
int up
*depends* on the initializer value assigned to the OPs
int down

The OPs code depends on the order of initialization of two independant
declarations. I can't find anything in 9989-1999 that guarantees that
declarations will be 'executed' (i.e. storage allocated, initializations
applied) in the order that they were declared in.

If the C language does not make such a guarantee, then the OP's code (when
viewed as /C/ code) takes on "undefined behaviour" at the very least, as it
depends on the order of initialization to follow the order imposed in the
source code.

Am I interpreting this correctly? Can anyone point me to the place in
9989-1999 which guarantees that declarations will be 'executed' (in the way
I noted above) in order?

"A block allows a set of declarations and statements to be grouped
into one syntactic unit.
The initializers of objects that have automatic storage duration, and
the variable length
array declarators of ordinary identifiers with block scope, are
evaluated and the values are
stored in the objects (including storing an indeterminate value in
objects without an
initializer) each time the declaration is reached in the order of
execution, as if it were a
statement, and within each declaration in the order that declarators
appear." (6.8p3)

The phrase "as if it were a statement" takes us back to 6.8p2:
"A statement specifies an action to be performed. Except as indicated,
statements are
executed in sequence."

Note also that there's a sequence point at the end of each full
declarator (6.7.5p3).
 
B

Ben Bacarisse

Lew Pitcher said:
I /knew/ that there was something obvious that I was missing. Thanks

Just to round this off, even the following is well-defined:

int down = scaler->scaleDown(n), up = scaler->scaleUp(down);

due to 'down = scaler->scaleDown(n)' being a full declarator which has
a sequence point at its end. Not that it is a good idea, just a
permitted one.
 
J

jameskuyper

Phil said:
Sequence points. The initialisations are ordered.

Sequence points in themselves only ensure that the expressions they
separate are ordered rather than overlapping. They do not, in
themselves, impose a particular order. In each case where a sequence
point is specified by the standard, there's separate wording that also
tells you the order. For example:

6.5.13p4: "the && operator guarantees left-to-right evaluation;"


It could have just as said "right-to-left evaluation" without
violating anything that the standard specifies about the meaning of
sequence points.
 
C

cornelis van der bent

You are quite right.  It was just a guess and if I was correct it was
by accident.  Of course, if this had been C there is a good chance it
would have been written:

   int down = scaler->scaleDown(scaler, n);

but that is no reason to say the code was not C.

Here speak the OP. Yes, this code was taken from a C++ source; I just
forgot to remove scaler->. The globals were of course member
variables. Still, C is my mother tongue and never fell in love with C+
+.

It's kinda fun to see a number of familiar names still here. 'I was
here' sometimes in the early/mid 90's and onwards. But wtf happened,
isn't there anything that can be done against all this spam (I'm
looking at this group from google.groups.com).

Cheers,

Kees
 
B

Ben Bacarisse

cornelis van der bent said:
'I was
here' sometimes in the early/mid 90's and onwards.

Welcome back.
But wtf happened,
isn't there anything that can be done against all this spam (I'm
looking at this group from google.groups.com).

A good news feed and a filtering news reader can make it tolerable.
 
E

Eric Sosman

jameskuyper said:
Sequence points in themselves only ensure that the expressions they
separate are ordered rather than overlapping. They do not, in
themselves, impose a particular order. [...]

Did you mean something other than what you said? I think
what you said implies that in

int answer = 42;
++answer;
++answer;
printf ("answer = %d\n", ++answer);

.... the output could be any of 42, 43, 44, or 45, because the
sequence points do not impose an ordering. (In a non-overlapping
but otherwise unordered execution, any or all of the increments
might occur before or after the body of printf(), so long as
they don't overlap it or each other.) But in C as I understand it,
the output cannot be anything other than 45: All three increments
must occur before the body of printf() begins its execution.

See 5.1.2.3p3:

[...] At certain specified points in the execution
sequence called /sequence points,/ all side effects
of previous evaluations shall be complete and no
side effects of subsequent evaluations shall have
taken place. [...]

The words "previous" and "subsequent" describe an ordering,
not mere separation.
 
J

James Kuyper

Eric said:
jameskuyper said:
Sequence points in themselves only ensure that the expressions they
separate are ordered rather than overlapping. They do not, in
themselves, impose a particular order. [...]

Did you mean something other than what you said? I think
what you said implies that in

int answer = 42;
++answer;
++answer;
printf ("answer = %d\n", ++answer);

... the output could be any of 42, 43, 44, or 45, because the
sequence points do not impose an ordering.

No, that is not what I said, and most of those results are not permitted
by the standard, DESPITE the fact that the sequence points don't impose
an order. That's because something else does impose an order. In this
case, it's section 6.8p2: "Except as indicated, statements are executed
in sequence." The "Except as indicated" (which really should say "except
as otherwise indicated") is particularly relevant: it is the exception
that keeps 6.8p2 from conflicting with other sections of the standard
which specify non-sequential evaluation of statements as a result of
selection, iteration and jump statements. For example:

A;
if(B)
C;
D;

Strict sequential order would require that those statements be evaluated
in the order A, B, C, and then D. The special rules for if() statements
allow for them to be executed in the order A, B, D, if the value of B
happens to compare equal to 0.
... (In a non-overlapping
but otherwise unordered execution, any or all of the increments
might occur before or after the body of printf(), so long as
they don't overlap it or each other.) But in C as I understand it,
the output cannot be anything other than 45: All three increments
must occur before the body of printf() begins its execution.

See 5.1.2.3p3:

[...] At certain specified points in the execution
sequence called /sequence points,/ all side effects
of previous evaluations shall be complete and no
side effects of subsequent evaluations shall have
taken place. [...]

The words "previous" and "subsequent" describe an ordering,
not mere separation.

They refer to an ordering, but do not define it - they don't tell you
which evaluations are "prior" and which ones are "subsequent". It's
other statements in the standard that allow you to identify the actual
order.
 
K

Keith Thompson

cornelis van der bent said:
It's kinda fun to see a number of familiar names still here. 'I was
here' sometimes in the early/mid 90's and onwards. But wtf happened,
isn't there anything that can be done against all this spam (I'm
looking at this group from google.groups.com).

I use news.eternal-september.org and a real newsreader; I don't see
much spam here.
 
K

Keith Thompson

Eric Sosman said:
jameskuyper said:
Sequence points in themselves only ensure that the expressions they
separate are ordered rather than overlapping. They do not, in
themselves, impose a particular order. [...]

Did you mean something other than what you said? I think
what you said implies that in

int answer = 42;
++answer;
++answer;
printf ("answer = %d\n", ++answer);

... the output could be any of 42, 43, 44, or 45, because the
sequence points do not impose an ordering. (In a non-overlapping
but otherwise unordered execution, any or all of the increments
might occur before or after the body of printf(), so long as
they don't overlap it or each other.) But in C as I understand it,
the output cannot be anything other than 45: All three increments
must occur before the body of printf() begins its execution.

No, what James said doesn't imply that the output could be anything
other than 45. Sequence points *in themselves* don't necessarily
impose a particular order. You also need C99 6.8p2:

Except as indicated, statements are executed in sequence.

See 5.1.2.3p3:

[...] At certain specified points in the execution
sequence called /sequence points,/ all side effects
of previous evaluations shall be complete and no
side effects of subsequent evaluations shall have
taken place. [...]

The words "previous" and "subsequent" describe an ordering,
not mere separation.

Yes, but *what* ordering?

Then again, "statements are executed in sequence" doesn't tell you
what the sequence is.

I suppose that, once the standard says that statements are executed
"in sequence", it's considered so obvious that the sequence of
execution matches the sequence in the source code that it doesn't need
to be stated. And, similarly, one could argue that once it's stated
that sequence points impose an ordering, it's obvious what that
ordering is.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,020
Latest member
GenesisGai

Latest Threads

Top