Re: Data compression, date range

Discussion in 'C Programming' started by Michael Foukarakis, Feb 22, 2010.

  1. On Feb 22, 11:56 am, Branimir Maksimovic <> wrote:
    > given
    >
    > const uint64_t maxDateValue = 6074000998ull;
    >
    > write decode/encode date range functions
    >
    > uint64_t encodeDate(uint64_t start, uint64_t stop);
    > void decodeDate(uint64_t blob, uint64_t* start, uint64_t* stop);
    >
    > Interesting exercise ;)


    Where's the exercise? In trying to figure out what the exercise is?
    Michael Foukarakis, Feb 22, 2010
    #1
    1. Advertising

  2. Michael Foukarakis

    Eric Sosman Guest

    On 2/22/2010 10:36 AM, Branimir Maksimovic wrote:
    > Michael Foukarakis wrote:
    >> On Feb 22, 11:56 am, Branimir Maksimovic <> wrote:
    >>> given
    >>>
    >>> const uint64_t maxDateValue = 6074000998ull;
    >>>
    >>> write decode/encode date range functions
    >>>
    >>> uint64_t encodeDate(uint64_t start, uint64_t stop);
    >>> void decodeDate(uint64_t blob, uint64_t* start, uint64_t* stop);
    >>>
    >>> Interesting exercise ;)

    >>
    >> Where's the exercise? In trying to figure out what the exercise is?

    > Exercise is to store from / to date in unsigned 64 bit integer,
    > and retrieve it back. Constant is maximum number of ranges,
    > that is distance from / to , that can be stored in uint64_t.
    > Actually, largest range that can be stored is 0-6074000997ull.
    > I just wanted to simplify spec as in original spec that constant wasn't
    > given.
    > I have solution, but will post it next Monday ;)
    > Date is eg number of seconds or integer mapped to date, anything...


    Are we allowed to assume anything about the values of
    start and stop? If 0 <= start <= stop < maxDateValue, for
    example, the problem is trivial. With other conditions it
    might be harder. (With only 0 <= start < maxDateValue and
    0 <= stop < maxDateValue, there's no reversible encoding.)

    --
    Eric Sosman
    lid
    Eric Sosman, Feb 22, 2010
    #2
    1. Advertising

  3. In article <>, Michael Foukarakis <> writes:
    > On Feb 22, 11:56=A0am, Branimir Maksimovic <> wrote:
    >> given
    >>
    >> const uint64_t maxDateValue =3D 6074000998ull;
    >>
    >> write decode/encode date range functions
    >>
    >> uint64_t encodeDate(uint64_t start, uint64_t stop);
    >> void decodeDate(uint64_t blob, uint64_t* start, uint64_t* stop);
    >>
    >> Interesting exercise ;)

    >
    > Where's the exercise? In trying to figure out what the exercise is?



    I guess the exercise is:

    The following inequalities hold for the "start" and "stop" params of
    encodeDate():

    0 <= start <= stop <= 6074000998

    Compress the (start, stop) pair into a single uint64_t value, so that
    decodeDate() can reproduce the individual components later.


    --o--


    6074000998 is 1,01101010,00001001,11100110,01100110 in binary. It
    contains around 32.5 bits of information. If we wanted to store two such
    *unrelated* numbers, we couldn't do that. However, we can throw away one
    bit of information, because we can restore the ordering (a yes/no
    decision for two numbers) without transmitting it explicitly.

    Thus 32.5 + 31.5 = 64.


    --o--


    Let base B be decimal 3,037,000,499. The greatest base-B number
    representable as an uint64_t is

    2 | 3 | 2,746,052,116

    (digits written in decimal).

    Both start and stop are not greater than "2 | 0" in base B -- see
    maxDateValue.

    start = t | u (base B)
    stop = v | w (base B)

    The value sent over the wire will look this way in base-B:

    case | w | u

    Cases:

    stop start w < u implies this case

    v|w t|u

    2|0 2|0 2
    2|0 1|u * 2
    2|0 0|u * 1
    1|w 1|u 1
    1|w 0|u * 0
    0|w 0|u 0


    Notice that where "w < u" does NOT hold, there the case digit equals
    both t and v. Also notice that where the case digit is 2, there "w" is
    0, thus "case | w | u" (base-B) will not exceed the limit described
    above.

    Implementation:


    static const uint64_t base = UINT64_C(3037000499);

    uint64_t
    encodeDate(uint64_t start, uint64_t stop)
    {
    uint64_t
    t = start / base,
    u = start % base,
    v = stop / base,
    w = stop % base;

    return
    w < u
    ? (
    2u == v
    ? base * base * (t + 1u) + u /* t+1 | 0 | u */
    : w * base + u /* 0 | w | u */
    )
    : base * base * t /* t | w | u */
    + base * w
    + u;
    }


    void
    decodeDate(uint64_t blob, uint64_t* start, uint64_t* stop)
    {
    uint64_t case = blob / (base * base);

    /* Initialize final values with low base-B digits. */
    *stop = blob / base % base; /* w */
    *start = blob % base; /* u */

    /* Set high base-B digit. */
    if (*stop < *start) {
    switch (case) {
    case 2:
    *start += base;

    case 1:
    *stop += base;

    case 0:
    *stop += base;
    }
    }
    else {
    *start += case * base;
    *stop += case * base;
    }
    }


    Cheers,
    lacos
    Ersek, Laszlo, Feb 22, 2010
    #3
  4. (Ersek, Laszlo) writes:
    [...]
    > 6074000998 is 1,01101010,00001001,11100110,01100110 in binary. It
    > contains around 32.5 bits of information. If we wanted to store two such
    > *unrelated* numbers, we couldn't do that. However, we can throw away one
    > bit of information, because we can restore the ordering (a yes/no
    > decision for two numbers) without transmitting it explicitly.
    >
    > Thus 32.5 + 31.5 = 64.

    [...]

    Now that I understand the problem, it becomes clear that the value
    6074000998 was chosen very carefully so it's very close to 2**32.5
    (or, equivalently, 2**32 * sqrt(2)). Of course I'm using "**"
    as an exponentiation operator and being sloppy with types; these
    are mathematical expressions, not C expressions.

    2.0**32.5 is approximately 6074000999.9521.
    log2(6074000998) is approximately 32.49999999953634.

    --
    Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
    Nokia
    "We must do something. This is something. Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"
    Keith Thompson, Feb 22, 2010
    #4
  5. In article <>, Keith Thompson <> writes:
    > (Ersek, Laszlo) writes:
    > [...]
    >> 6074000998 is 1,01101010,00001001,11100110,01100110 in binary. It
    >> contains around 32.5 bits of information. If we wanted to store two such
    >> *unrelated* numbers, we couldn't do that. However, we can throw away one
    >> bit of information, because we can restore the ordering (a yes/no
    >> decision for two numbers) without transmitting it explicitly.
    >>
    >> Thus 32.5 + 31.5 = 64.

    > [...]
    >
    > Now that I understand the problem, it becomes clear that the value
    > 6074000998 was chosen very carefully so it's very close to 2**32.5
    > (or, equivalently, 2**32 * sqrt(2)). Of course I'm using "**"
    > as an exponentiation operator and being sloppy with types; these
    > are mathematical expressions, not C expressions.
    >
    > 2.0**32.5 is approximately 6074000999.9521.
    > log2(6074000998) is approximately 32.49999999953634.


    Yes, I'm also curious where this problem comes from.

    Branimir, was it a school / academic problem, or was the integer
    6074000998 a hint from an already solved, bigger industrial problem (ie.
    "what is the greatest integer maxDateValue, so that...")?

    Thanks,
    lacos
    Ersek, Laszlo, Feb 22, 2010
    #5
  6. Michael Foukarakis

    Eric Sosman Guest

    On 2/22/2010 5:28 PM, Branimir Maksimovic wrote:
    > Ersek, Laszlo wrote:
    >> In article <>, Keith Thompson
    >> <> writes:
    >>> (Ersek, Laszlo) writes:
    >>> [...]
    >>>> 6074000998 is 1,01101010,00001001,11100110,01100110 in binary. It
    >>>> contains around 32.5 bits of information. If we wanted to store two
    >>>> such
    >>>> *unrelated* numbers, we couldn't do that. However, we can throw away
    >>>> one
    >>>> bit of information, because we can restore the ordering (a yes/no
    >>>> decision for two numbers) without transmitting it explicitly.
    >>>>
    >>>> Thus 32.5 + 31.5 = 64.
    >>> [...]
    >>>
    >>> Now that I understand the problem, it becomes clear that the value
    >>> 6074000998 was chosen very carefully so it's very close to 2**32.5
    >>> (or, equivalently, 2**32 * sqrt(2)). Of course I'm using "**"
    >>> as an exponentiation operator and being sloppy with types; these
    >>> are mathematical expressions, not C expressions.
    >>>
    >>> 2.0**32.5 is approximately 6074000999.9521.
    >>> log2(6074000998) is approximately 32.49999999953634.

    >>
    >> Yes, I'm also curious where this problem comes from.
    >>
    >> Branimir, was it a school / academic problem, or was the integer
    >> 6074000998 a hint from an already solved, bigger industrial problem (ie.
    >> "what is the greatest integer maxDateValue, so that...")?
    >>

    >
    > I don;t know somebody asked question on local programming forum,
    > and lot of us tried to solve it. I was best with O(1),
    > solution with second place guy logn solution.
    > It was interesting that one guy invented complex
    > compression formula which worked like 10 minutes to
    > pack single range ;)


    Unbelievable! There are obviously N * (N+1) / 2 integers
    X,Y satisfying 0 <= X <= Y < N. Could nobody think of an easy
    way to make the transformation?

    See also "triangular matrix." Don't the kids learn anything
    these days?

    --
    Eric Sosman
    lid
    Eric Sosman, Feb 22, 2010
    #6
  7. Michael Foukarakis

    0m Guest

    On Feb 22, 5:22 pm, (Ersek, Laszlo) wrote:

    > 6074000998 is 1,01101010,00001001,11100110,01100110 in binary. It
    > contains around 32.5 bits of information. If we wanted to store two such
    > *unrelated* numbers, we couldn't do that. However, we can throw away one
    > bit of information, because we can restore the ordering (a yes/no
    > decision for two numbers) without transmitting it explicitly.
    >


    can't understand, how can one manage to restore bits one by one
    properly. which bit belongs to x and which belongs to y? your code
    doesn't show this procedure:) thanks.
    0m, Feb 23, 2010
    #7
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Peter Grison

    Date, date date date....

    Peter Grison, May 28, 2004, in forum: Java
    Replies:
    10
    Views:
    3,229
    Michael Borgwardt
    May 30, 2004
  2. Replies:
    46
    Views:
    949
    Antoon Pardon
    Jul 25, 2006
  3. Lambda
    Replies:
    2
    Views:
    382
    James Kanze
    Jul 16, 2008
  4. Ersek, Laszlo

    Re: Data compression, date range

    Ersek, Laszlo, Feb 23, 2010, in forum: C Programming
    Replies:
    3
    Views:
    286
    Keith Thompson
    Feb 24, 2010
  5. SimonDev
    Replies:
    0
    Views:
    856
    SimonDev
    Nov 12, 2009
Loading...

Share This Page