# inaccuracies in compression algorithm

Discussion in 'C Programming' started by cornelis van der bent, Jul 23, 2009.

1. ### cornelis van der bentGuest

I have a limited range of positive integer values that occur all over
my data to be compressed. After decompression these value may have
changed a little, +/- a fixed percentage. So we're talking about
lossy [de]compression here.

My idea is/was to compress a value by taking its log() and scaling
this up by a factor. The factor being just large enough to get an
integer compressed value with which the original value can be
calculated, give/take the allowed error percentage. I also subtract
an offset from the log() result to let my compressed range start at 0;
this to increase compression even further. The offset is log(<lowest-
value-in-range>).

By just including the above mentioned 'factor' and 'offset' in the
data, I can decompress these values.

Here's my code (I've copy pasted most from working source but typed
the test() routine here, so please forgive typos):

static double factor;
static double offset;

void test(int minValue, int maxValue, int errorPercentage)
{
double resulution;
int n;

resolution = log(maxValue * (1.0 + (errorPercentage / 100.0))) -
log(maxValue * (1.0 - (errorPercentage / 100.0)));

factor = 1 / resolution;

offset = log((double)minValue);

for (n = minValue; n < maxValue; n++)
{
int down = scaler->scaleDown(n);
int up = scaler->scaleUp(down);

printf("%4d => %d => %d => %.2f\n", n, down, up, ((100.0 * (up
- n)) / (double)n ));
}
}

int scaleDown(int originalValue)
{
return (int)round((log((double)originalValue) - offset) * factor);
}

int scaleUp(int scaledValue)
{
return (int)round(exp(((double)scaledValue / factor) + offset));
}

When running test(200, 2000, 20), I see that the error is asymetrical:
Given a certain compressed value, lowest input values have > 20%
error, while on the other side the highest input values have < 20%
error.
1060 => 12 => 1297 => 22.36
....
1589 => 12 => 1297 => -18.38

Does anyone have an idea what causes this and how to fix it?

Thanks for listening!

cornelis van der bent, Jul 23, 2009

2. ### Ben BacarisseGuest

This is not C. Of course, the problem is not caused by this being C++
but then you don't have a C question either (see below).
Two different input values map to the same scaled one (12) and this
maps to the same "uncompressed" output so, of course, the error is not
the same. There is really no answer to "what causes this" other than
the computation causes it. To fix it, use a different computation.

This is not a C question. You will get much better answers in a group
about compression or, failing that, a general programming group like
comp.programming.

Ben Bacarisse, Jul 23, 2009

3. ### cornelis van der bentGuest

You are right Ben! I appologise for the inconvenience. Removed my
message.

cornelis van der bent, Jul 23, 2009
4. ### BoonGuest

Try comp.compression ;-)

Boon, Jul 23, 2009
5. ### Ben BacarisseGuest

You are quite right. It was just a guess and if I was correct it was
by accident. Of course, if this had been C there is a good chance it
would have been written:

int down = scaler->scaleDown(scaler, n);

but that is no reason to say the code was not C.

Ben Bacarisse, Jul 23, 2009
6. ### Keith ThompsonGuest

FYI, removing a message rarely works. Because cancel messages
(the mechanism by which messages are removed) are often abused,
most servers ignore them. Once you post something, it's probably
there for good.

Keith Thompson, Jul 23, 2009
7. ### Lew PitcherGuest

Had we interpreted the OPs source code as C, I have a question...

The initializer value assigned to the OP's
int up
*depends* on the initializer value assigned to the OPs
int down

The OPs code depends on the order of initialization of two independant
declarations. I can't find anything in 9989-1999 that guarantees that
declarations will be 'executed' (i.e. storage allocated, initializations
applied) in the order that they were declared in.

If the C language does not make such a guarantee, then the OP's code (when
viewed as /C/ code) takes on "undefined behaviour" at the very least, as it
depends on the order of initialization to follow the order imposed in the
source code.

Am I interpreting this correctly? Can anyone point me to the place in
9989-1999 which guarantees that declarations will be 'executed' (in the way
I noted above) in order?

Thanks
--
Lew Pitcher

Master Codewright & JOAT-in-training | Registered Linux User #112576
http://pitcher.digitalfreehold.ca/ | GPG public key available by request
---------- Slackware - Because I know what I'm doing. ------

Lew Pitcher, Jul 23, 2009
8. ### Morris KeesanGuest

Of course, you meant struct S *scaler, along with creation of a (struct S)
and assignment of its address to scaler.

Morris Keesan, Jul 23, 2009
9. ### Phil CarmodyGuest

Sequence points. The initialisations are ordered.

Phil

Phil Carmody, Jul 23, 2009
10. ### Lew PitcherGuest

I /knew/ that there was something obvious that I was missing. Thanks

--
Lew Pitcher

Master Codewright & JOAT-in-training | Registered Linux User #112576
http://pitcher.digitalfreehold.ca/ | GPG public key available by request
---------- Slackware - Because I know what I'm doing. ------

Lew Pitcher, Jul 23, 2009
11. ### jameskuyperGuest

"A block allows a set of declarations and statements to be grouped
into one syntactic unit.
The initializers of objects that have automatic storage duration, and
the variable length
array declarators of ordinary identifiers with block scope, are
evaluated and the values are
stored in the objects (including storing an indeterminate value in
objects without an
initializer) each time the declaration is reached in the order of
execution, as if it were a
statement, and within each declaration in the order that declarators
appear." (6.8p3)

The phrase "as if it were a statement" takes us back to 6.8p2:
"A statement specifies an action to be performed. Except as indicated,
statements are
executed in sequence."

Note also that there's a sequence point at the end of each full
declarator (6.7.5p3).

jameskuyper, Jul 23, 2009
12. ### Ben BacarisseGuest

Just to round this off, even the following is well-defined:

int down = scaler->scaleDown(n), up = scaler->scaleUp(down);

due to 'down = scaler->scaleDown(n)' being a full declarator which has
a sequence point at its end. Not that it is a good idea, just a
permitted one.

Ben Bacarisse, Jul 23, 2009
13. ### jameskuyperGuest

Sequence points in themselves only ensure that the expressions they
separate are ordered rather than overlapping. They do not, in
themselves, impose a particular order. In each case where a sequence
point is specified by the standard, there's separate wording that also
tells you the order. For example:

6.5.13p4: "the && operator guarantees left-to-right evaluation;"

It could have just as said "right-to-left evaluation" without
violating anything that the standard specifies about the meaning of
sequence points.

jameskuyper, Jul 23, 2009
14. ### cornelis van der bentGuest

Here speak the OP. Yes, this code was taken from a C++ source; I just
forgot to remove scaler->. The globals were of course member
variables. Still, C is my mother tongue and never fell in love with C+
+.

It's kinda fun to see a number of familiar names still here. 'I was
here' sometimes in the early/mid 90's and onwards. But wtf happened,
isn't there anything that can be done against all this spam (I'm
looking at this group from google.groups.com).

Cheers,

Kees

cornelis van der bent, Jul 23, 2009
15. ### Ben BacarisseGuest

Welcome back.
A good news feed and a filtering news reader can make it tolerable.

Ben Bacarisse, Jul 23, 2009
16. ### Eric SosmanGuest

Did you mean something other than what you said? I think
what you said implies that in

.... the output could be any of 42, 43, 44, or 45, because the
sequence points do not impose an ordering. (In a non-overlapping
but otherwise unordered execution, any or all of the increments
might occur before or after the body of printf(), so long as
they don't overlap it or each other.) But in C as I understand it,
the output cannot be anything other than 45: All three increments
must occur before the body of printf() begins its execution.

See 5.1.2.3p3:

[...] At certain specified points in the execution
sequence called /sequence points,/ all side effects
of previous evaluations shall be complete and no
side effects of subsequent evaluations shall have
taken place. [...]

The words "previous" and "subsequent" describe an ordering,
not mere separation.

Eric Sosman, Jul 24, 2009
17. ### James KuyperGuest

No, that is not what I said, and most of those results are not permitted
by the standard, DESPITE the fact that the sequence points don't impose
an order. That's because something else does impose an order. In this
case, it's section 6.8p2: "Except as indicated, statements are executed
in sequence." The "Except as indicated" (which really should say "except
as otherwise indicated") is particularly relevant: it is the exception
that keeps 6.8p2 from conflicting with other sections of the standard
which specify non-sequential evaluation of statements as a result of
selection, iteration and jump statements. For example:

A;
if(B)
C;
D;

Strict sequential order would require that those statements be evaluated
in the order A, B, C, and then D. The special rules for if() statements
allow for them to be executed in the order A, B, D, if the value of B
happens to compare equal to 0.
They refer to an ordering, but do not define it - they don't tell you
which evaluations are "prior" and which ones are "subsequent". It's
other statements in the standard that allow you to identify the actual
order.

James Kuyper, Jul 24, 2009
18. ### Keith ThompsonGuest

I use news.eternal-september.org and a real newsreader; I don't see
much spam here.

Keith Thompson, Jul 24, 2009
19. ### Keith ThompsonGuest

No, what James said doesn't imply that the output could be anything
other than 45. Sequence points *in themselves* don't necessarily
impose a particular order. You also need C99 6.8p2:

Except as indicated, statements are executed in sequence.

Yes, but *what* ordering?

Then again, "statements are executed in sequence" doesn't tell you
what the sequence is.

I suppose that, once the standard says that statements are executed
"in sequence", it's considered so obvious that the sequence of
execution matches the sequence in the source code that it doesn't need
to be stated. And, similarly, one could argue that once it's stated
that sequence points impose an ordering, it's obvious what that
ordering is.

Keith Thompson, Jul 24, 2009