inaccuracies in compression algorithm

Discussion in 'C Programming' started by cornelis van der bent, Jul 23, 2009.

  1. I have a limited range of positive integer values that occur all over
    my data to be compressed. After decompression these value may have
    changed a little, +/- a fixed percentage. So we're talking about
    lossy [de]compression here.

    My idea is/was to compress a value by taking its log() and scaling
    this up by a factor. The factor being just large enough to get an
    integer compressed value with which the original value can be
    calculated, give/take the allowed error percentage. I also subtract
    an offset from the log() result to let my compressed range start at 0;
    this to increase compression even further. The offset is log(<lowest-
    value-in-range>).

    By just including the above mentioned 'factor' and 'offset' in the
    data, I can decompress these values.

    Here's my code (I've copy pasted most from working source but typed
    the test() routine here, so please forgive typos):

    static double factor;
    static double offset;

    void test(int minValue, int maxValue, int errorPercentage)
    {
    double resulution;
    int n;

    resolution = log(maxValue * (1.0 + (errorPercentage / 100.0))) -
    log(maxValue * (1.0 - (errorPercentage / 100.0)));

    factor = 1 / resolution;

    offset = log((double)minValue);

    for (n = minValue; n < maxValue; n++)
    {
    int down = scaler->scaleDown(n);
    int up = scaler->scaleUp(down);

    printf("%4d => %d => %d => %.2f\n", n, down, up, ((100.0 * (up
    - n)) / (double)n ));
    }
    }

    int scaleDown(int originalValue)
    {
    return (int)round((log((double)originalValue) - offset) * factor);
    }

    int scaleUp(int scaledValue)
    {
    return (int)round(exp(((double)scaledValue / factor) + offset));
    }

    When running test(200, 2000, 20), I see that the error is asymetrical:
    Given a certain compressed value, lowest input values have > 20%
    error, while on the other side the highest input values have < 20%
    error.
    1060 => 12 => 1297 => 22.36
    ....
    1589 => 12 => 1297 => -18.38

    Does anyone have an idea what causes this and how to fix it?

    Thanks for listening!
     
    cornelis van der bent, Jul 23, 2009
    #1
    1. Advertisements

  2. This is not C. Of course, the problem is not caused by this being C++
    but then you don't have a C question either (see below).
    Two different input values map to the same scaled one (12) and this
    maps to the same "uncompressed" output so, of course, the error is not
    the same. There is really no answer to "what causes this" other than
    the computation causes it. To fix it, use a different computation.

    This is not a C question. You will get much better answers in a group
    about compression or, failing that, a general programming group like
    comp.programming.
     
    Ben Bacarisse, Jul 23, 2009
    #2
    1. Advertisements

  3. You are right Ben! I appologise for the inconvenience. Removed my
    message.
     
    cornelis van der bent, Jul 23, 2009
    #3
  4. cornelis van der bent

    Boon Guest

    Try comp.compression ;-)
     
    Boon, Jul 23, 2009
    #4
  5. You are quite right. It was just a guess and if I was correct it was
    by accident. Of course, if this had been C there is a good chance it
    would have been written:

    int down = scaler->scaleDown(scaler, n);

    but that is no reason to say the code was not C.
     
    Ben Bacarisse, Jul 23, 2009
    #5
  6. FYI, removing a message rarely works. Because cancel messages
    (the mechanism by which messages are removed) are often abused,
    most servers ignore them. Once you post something, it's probably
    there for good.
     
    Keith Thompson, Jul 23, 2009
    #6
  7. cornelis van der bent

    Lew Pitcher Guest

    Had we interpreted the OPs source code as C, I have a question...

    The initializer value assigned to the OP's
    int up
    *depends* on the initializer value assigned to the OPs
    int down

    The OPs code depends on the order of initialization of two independant
    declarations. I can't find anything in 9989-1999 that guarantees that
    declarations will be 'executed' (i.e. storage allocated, initializations
    applied) in the order that they were declared in.

    If the C language does not make such a guarantee, then the OP's code (when
    viewed as /C/ code) takes on "undefined behaviour" at the very least, as it
    depends on the order of initialization to follow the order imposed in the
    source code.

    Am I interpreting this correctly? Can anyone point me to the place in
    9989-1999 which guarantees that declarations will be 'executed' (in the way
    I noted above) in order?

    Thanks
    --
    Lew Pitcher

    Master Codewright & JOAT-in-training | Registered Linux User #112576
    http://pitcher.digitalfreehold.ca/ | GPG public key available by request
    ---------- Slackware - Because I know what I'm doing. ------
     
    Lew Pitcher, Jul 23, 2009
    #7
  8. Of course, you meant struct S *scaler, along with creation of a (struct S)
    and assignment of its address to scaler.
     
    Morris Keesan, Jul 23, 2009
    #8
  9. cornelis van der bent

    Phil Carmody Guest

    Sequence points. The initialisations are ordered.

    Phil
     
    Phil Carmody, Jul 23, 2009
    #9
  10. cornelis van der bent

    Lew Pitcher Guest

    I /knew/ that there was something obvious that I was missing. Thanks

    --
    Lew Pitcher

    Master Codewright & JOAT-in-training | Registered Linux User #112576
    http://pitcher.digitalfreehold.ca/ | GPG public key available by request
    ---------- Slackware - Because I know what I'm doing. ------
     
    Lew Pitcher, Jul 23, 2009
    #10
  11. cornelis van der bent

    jameskuyper Guest

    "A block allows a set of declarations and statements to be grouped
    into one syntactic unit.
    The initializers of objects that have automatic storage duration, and
    the variable length
    array declarators of ordinary identifiers with block scope, are
    evaluated and the values are
    stored in the objects (including storing an indeterminate value in
    objects without an
    initializer) each time the declaration is reached in the order of
    execution, as if it were a
    statement, and within each declaration in the order that declarators
    appear." (6.8p3)

    The phrase "as if it were a statement" takes us back to 6.8p2:
    "A statement specifies an action to be performed. Except as indicated,
    statements are
    executed in sequence."

    Note also that there's a sequence point at the end of each full
    declarator (6.7.5p3).
     
    jameskuyper, Jul 23, 2009
    #11
  12. Just to round this off, even the following is well-defined:

    int down = scaler->scaleDown(n), up = scaler->scaleUp(down);

    due to 'down = scaler->scaleDown(n)' being a full declarator which has
    a sequence point at its end. Not that it is a good idea, just a
    permitted one.
     
    Ben Bacarisse, Jul 23, 2009
    #12
  13. cornelis van der bent

    jameskuyper Guest

    Sequence points in themselves only ensure that the expressions they
    separate are ordered rather than overlapping. They do not, in
    themselves, impose a particular order. In each case where a sequence
    point is specified by the standard, there's separate wording that also
    tells you the order. For example:

    6.5.13p4: "the && operator guarantees left-to-right evaluation;"


    It could have just as said "right-to-left evaluation" without
    violating anything that the standard specifies about the meaning of
    sequence points.
     
    jameskuyper, Jul 23, 2009
    #13
  14. Here speak the OP. Yes, this code was taken from a C++ source; I just
    forgot to remove scaler->. The globals were of course member
    variables. Still, C is my mother tongue and never fell in love with C+
    +.

    It's kinda fun to see a number of familiar names still here. 'I was
    here' sometimes in the early/mid 90's and onwards. But wtf happened,
    isn't there anything that can be done against all this spam (I'm
    looking at this group from google.groups.com).

    Cheers,

    Kees
     
    cornelis van der bent, Jul 23, 2009
    #14
  15. Welcome back.
    A good news feed and a filtering news reader can make it tolerable.
     
    Ben Bacarisse, Jul 23, 2009
    #15
  16. cornelis van der bent

    Eric Sosman Guest

    Did you mean something other than what you said? I think
    what you said implies that in

    int answer = 42;
    ++answer;
    ++answer;
    printf ("answer = %d\n", ++answer);

    .... the output could be any of 42, 43, 44, or 45, because the
    sequence points do not impose an ordering. (In a non-overlapping
    but otherwise unordered execution, any or all of the increments
    might occur before or after the body of printf(), so long as
    they don't overlap it or each other.) But in C as I understand it,
    the output cannot be anything other than 45: All three increments
    must occur before the body of printf() begins its execution.

    See 5.1.2.3p3:

    [...] At certain specified points in the execution
    sequence called /sequence points,/ all side effects
    of previous evaluations shall be complete and no
    side effects of subsequent evaluations shall have
    taken place. [...]

    The words "previous" and "subsequent" describe an ordering,
    not mere separation.
     
    Eric Sosman, Jul 24, 2009
    #16
  17. cornelis van der bent

    James Kuyper Guest

    No, that is not what I said, and most of those results are not permitted
    by the standard, DESPITE the fact that the sequence points don't impose
    an order. That's because something else does impose an order. In this
    case, it's section 6.8p2: "Except as indicated, statements are executed
    in sequence." The "Except as indicated" (which really should say "except
    as otherwise indicated") is particularly relevant: it is the exception
    that keeps 6.8p2 from conflicting with other sections of the standard
    which specify non-sequential evaluation of statements as a result of
    selection, iteration and jump statements. For example:

    A;
    if(B)
    C;
    D;

    Strict sequential order would require that those statements be evaluated
    in the order A, B, C, and then D. The special rules for if() statements
    allow for them to be executed in the order A, B, D, if the value of B
    happens to compare equal to 0.
    They refer to an ordering, but do not define it - they don't tell you
    which evaluations are "prior" and which ones are "subsequent". It's
    other statements in the standard that allow you to identify the actual
    order.
     
    James Kuyper, Jul 24, 2009
    #17
  18. I use news.eternal-september.org and a real newsreader; I don't see
    much spam here.
     
    Keith Thompson, Jul 24, 2009
    #18
  19. No, what James said doesn't imply that the output could be anything
    other than 45. Sequence points *in themselves* don't necessarily
    impose a particular order. You also need C99 6.8p2:

    Except as indicated, statements are executed in sequence.

    Yes, but *what* ordering?

    Then again, "statements are executed in sequence" doesn't tell you
    what the sequence is.

    I suppose that, once the standard says that statements are executed
    "in sequence", it's considered so obvious that the sequence of
    execution matches the sequence in the source code that it doesn't need
    to be stated. And, similarly, one could argue that once it's stated
    that sequence points impose an ordering, it's obvious what that
    ordering is.
     
    Keith Thompson, Jul 24, 2009
    #19
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.