Cross-platform way to pack (int + flags) to unsigned int

Discussion in 'C Programming' started by Alex J, Jun 10, 2013.

  1. Alex J

    Alex J Guest

    Hi all,

    Given two values: int X and int F and assuming that
    (X << 2) >> 2 == X and F is a two-bit value write cross-platform code to "pack" X and F to unsigned int.

    I've solved this as follows.
    Packing:
    unsigned int P = (unsigned int) ((X << 2) | F);

    Unpacking:
    int F = ((int) P) & 3;
    int A = ((int) P) >> 2;

    I know that both packing and unpacking code is not valid from ISO C point of view but still the question is - should I care about it?

    The only compiler I care about is GCC ver.>4 and its targets, e.g. windows, linux, mac os x.

    How this problem can be solved in a truly cross-platform way? Especially assuming the packed data will be read on the multiple platforms and sizeof(int) == sizeof(unsigned int) == 32 on all of these platforms.

    P.S.:
    #include <stdio.h>
    #include <stdlib.h>

    void check_int(int a, int f) {
    int a1;
    int f1;
    unsigned int p = (unsigned int) (a << 2) | f;

    a1 = ((int) p) >> 2;
    f1 = ((int) p) & 3;

    if ((a1 != a) || (f1 != f)) {
    fprintf(stderr, "a1(%d) != a(%d) || f1(%d) != f(%d)\n", a1, a, f1, f);
    exit(-1);
    } else {
    fprintf(stdout, "OK: %d, %d\n", a, f);
    }
    }

    int main() {
    check_int(1, 0);
    check_int(144555666, 3);
    check_int(-1, 2);
    check_int(-222333444, 1);
    return 0;
    }
     
    Alex J, Jun 10, 2013
    #1
    1. Advertising

  2. Alex J

    Paul N Guest

    On Jun 10, 8:27 pm, Alex J <> wrote:
    > Hi all,
    >
    > Given two values: int X and int F and assuming that
    > (X << 2) >> 2 == X and F is a two-bit value write cross-platform codeto "pack" X and F to unsigned int.
    >
    > I've solved this as follows.
    > Packing:
    > unsigned int P = (unsigned int) ((X << 2) | F);
    >
    > Unpacking:
    > int F = ((int) P) & 3;
    > int A = ((int) P) >> 2;
    >
    > I know that both packing and unpacking code is not valid from ISO C pointof view but still the question is - should I care about it?


    I'm not an expert, but I thought that messing around with unsigned
    values is perfectly safe. So I would be inclined to write:

    Packing:
    unsigned int P = (( (unsigned int) X << 2) | (unsigned int) F );

    Unpacking:
    int F = (int) (P & 3);
    int A = (int) (P >> 2);

    But there are experts in this group who can give a more authoritative
    reply.
     
    Paul N, Jun 10, 2013
    #2
    1. Advertising

  3. On Mon, 10 Jun 2013 12:27:43 -0700 (PDT), Alex J <>
    wrote:

    >Hi all,
    >
    >Given two values: int X and int F and assuming that
    >(X << 2) >> 2 == X and F is a two-bit value write cross-platform code to "pack" X and F to unsigned int.
    >
    >I've solved this as follows.
    >Packing:
    >unsigned int P = (unsigned int) ((X << 2) | F);
    >
    >Unpacking:
    >int F = ((int) P) & 3;
    >int A = ((int) P) >> 2;
    >
    >I know that both packing and unpacking code is not valid from ISO C point of view but still the question is - should I care about it?


    In what way do you think it is invalid?

    >
    >The only compiler I care about is GCC ver.>4 and its targets, e.g. windows, linux, mac os x.
    >
    >How this problem can be solved in a truly cross-platform way? Especially assuming the packed data will be read on the multiple platforms and sizeof(int) == sizeof(unsigned int) == 32 on all of these platforms.


    How is the data being transferred between platforms, as binary or
    text? If text, you should not have a problem.

    If binary, will all the system always have the same endianness? If
    one is big-endian, then f will be stored in the last two bits of the
    fourth byte. When read on a little endian system, it will look for f
    in the last two bits of the first byte.

    --
    Remove del for email
     
    Barry Schwarz, Jun 10, 2013
    #3
  4. Alex J

    Lew Pitcher Guest

    On Monday 10 June 2013 16:59, in comp.lang.c, wrote:

    > On Mon, 10 Jun 2013 12:27:43 -0700 (PDT), Alex J <>
    > wrote:
    >
    >>Hi all,
    >>
    >>Given two values: int X and int F and assuming that
    >>(X << 2) >> 2 == X and F is a two-bit value write cross-platform code to
    >>"pack" X and F to unsigned int.
    >>
    >>I've solved this as follows.
    >>Packing:
    >>unsigned int P = (unsigned int) ((X << 2) | F);
    >>
    >>Unpacking:
    >>int F = ((int) P) & 3;
    >>int A = ((int) P) >> 2;
    >>
    >>I know that both packing and unpacking code is not valid from ISO C point
    >>of view but still the question is - should I care about it?

    >
    > In what way do you think it is invalid?
    >
    >>
    >>The only compiler I care about is GCC ver.>4 and its targets, e.g.
    >>windows, linux, mac os x.
    >>
    >>How this problem can be solved in a truly cross-platform way? Especially
    >>assuming the packed data will be read on the multiple platforms and
    >>sizeof(int) == sizeof(unsigned int) == 32 on all of these platforms.

    >
    > How is the data being transferred between platforms, as binary or
    > text? If text, you should not have a problem.


    Caveat: Text shouldn't be a problem IF both writer and reader agree on the
    characterset of the data, the native line termination format of the data,
    the block data format of the data (i.e., for IBM mainframe, is the data
    Fixed Blocked, Variable Blocked, Variable Blocked Spanned, Undefined or
    something else), etc.

    > If binary, will all the system always have the same endianness? If
    > one is big-endian, then f will be stored in the last two bits of the
    > fourth byte. When read on a little endian system, it will look for f
    > in the last two bits of the first byte.


    In other words, a common data exchange format must be decided upon and
    agreed to by all writers and readers /before/ the programs are developed,
    coded, and tested.

    --
    Lew Pitcher
    "In Skills, We Trust"
     
    Lew Pitcher, Jun 10, 2013
    #4
  5. Alex J

    Eric Sosman Guest

    On 6/10/2013 3:27 PM, Alex J wrote:
    > Hi all,
    >
    > Given two values: int X and int F and assuming that
    > (X << 2) >> 2 == X and F is a two-bit value write cross-platform code to "pack" X and F to unsigned int.


    Let's pause a moment to study the X condition more closely.
    If it's meant as a portable ("cross-platform") statement, it
    implies 0 <= X && X <= INT_MAX/4 (otherwise, simply evaluating
    X<<2 would yield undefined behavior). On the 32-bit systems you
    mention below, this means 0 <= X && X <= 0x1FFFFFFF. Keep the
    range restriction in mind for what follows.

    > I've solved this as follows.
    > Packing:
    > unsigned int P = (unsigned int) ((X << 2) | F);


    Okay; the cast is unnecessary but harmless.

    > Unpacking:
    > int F = ((int) P) & 3;
    > int A = ((int) P) >> 2;


    Okay; again, the casts are unnecessary but harmless. (But
    see below for a lame excuse that semi-justifies the second one.)

    > I know that both packing and unpacking code is not valid from ISO C point of view but still the question is - should I care about it?


    As long as the range restrictions hold, there's nothing invalid
    about what you've shown. What invalidity are you worried about?

    > The only compiler I care about is GCC ver.>4 and its targets, e.g. windows, linux, mac os x.
    >
    > How this problem can be solved in a truly cross-platform way? Especially assuming the packed data will be read on the multiple platforms and sizeof(int) == sizeof(unsigned int) == 32 on all of these platforms.


    Depends what you mean by "truly cross-platform." The assumption
    of a 32-bit int excludes a few platforms right away, and there may be
    a few more where gcc isn't available. Still, a large majority of
    "mainstream" platforms meet your requirements.

    There's another problem lurking, though: Endianness. If the 32-bit
    int is composed of four 8-bit bytes, there are 4! = 24 ways to arrange
    those bytes;[*] two arrangements ("Big-Endian" and "Little-Endian") are
    popular today, and at least one more ("Middle-Endian") has been seen
    in the past. If X and F are one million and one, respectively, you'll
    pack them as the value 4000001, forming the four bytes 00 3D 09 01 (in
    hex). A Big-Endian machine would transmit or store these with the 00
    first and the 01 last, but a Little-Endian machine would do things the
    other way around. So if one of them writes the value (in its native
    order) and the other reads it (in *its* native order), 4000001 will
    be misinterpreted as 17382656, from which you'll extract X = 4345664
    and F = 0. The fidelity leaves a little to be desired!

    That's not to say these problems can't be dealt with, just that
    you'd better give them some thought. See the FAQ.

    [*] Actually, there are 32! ~= 2.6E35 ways to arrange the bits.
    Consider yourself fortunate that nobody's quite that perverse. Yet.

    > P.S.:
    > #include <stdio.h>
    > #include <stdlib.h>
    >
    > void check_int(int a, int f) {
    > int a1;
    > int f1;
    > unsigned int p = (unsigned int) (a << 2) | f;
    >
    > a1 = ((int) p) >> 2;
    > f1 = ((int) p) & 3;
    >
    > if ((a1 != a) || (f1 != f)) {
    > fprintf(stderr, "a1(%d) != a(%d) || f1(%d) != f(%d)\n", a1, a, f1, f);
    > exit(-1);


    Don't Do That. Use EXIT_FAILURE. (Does anybody *know* where
    this exit(-1) meme got started? Does anybody know of *any* system
    on which a -1 exit status survives unchanged all the way to the point
    where an invoker could examine it? I think it *might* have worked
    on VMS -- but it would have meant "success" if it did.)

    > } else {
    > fprintf(stdout, "OK: %d, %d\n", a, f);
    > }
    > }
    >
    > int main() {
    > check_int(1, 0);
    > check_int(144555666, 3);
    > check_int(-1, 2);
    > check_int(-222333444, 1);


    The final two tests are on shaky ground, as they violate the
    range restrictions mentioned earlier. (On the other hand, they
    also -- sort of -- justify some of the casts you've written: If
    X<<2 with X negative doesn't explode *and* (int)p with p outside
    the range of int doesn't explode *and* (int)p>>2 with (int)p
    negative doesn't explode, then the cast *might* save the day.
    But I wouldn't call that "truly cross-platform.")

    > return 0;
    > }
    >


    Another approach might be to use a struct with bit-fields:

    struct packed {
    int X : 30;
    unsigned int F : 2;
    };

    This doesn't solve the representation issues -- if anything, it
    makes them trickier -- but it relaxes the range restriction to
    permit negative X'es.

    --
    Eric Sosman
    d
     
    Eric Sosman, Jun 10, 2013
    #5
  6. Alex J

    Alex J Guest

    On Tuesday, June 11, 2013 1:33:21 AM UTC+4, Eric Sosman wrote:
    > On 6/10/2013 3:27 PM, Alex J wrote:
    >
    > snip...
    > Let's pause a moment to study the X condition more closely.
    >
    > If it's meant as a portable ("cross-platform") statement, it
    >
    > implies 0 <= X && X <= INT_MAX/4 (otherwise, simply evaluating
    >
    > X<<2 would yield undefined behavior). On the 32-bit systems you
    >
    > mention below, this means 0 <= X && X <= 0x1FFFFFFF. Keep the
    >
    > range restriction in mind for what follows.


    Yes, you're absolutely right. But I need signed types too.

    >
    >
    >
    > > I've solved this as follows.

    >
    > > Packing:

    >
    > > unsigned int P = (unsigned int) ((X << 2) | F);

    >
    >
    >
    > Okay; the cast is unnecessary but harmless.
    >
    >
    >
    > > Unpacking:

    >
    > > int F = ((int) P) & 3;

    >
    > > int A = ((int) P) >> 2;

    >
    >
    >
    > Okay; again, the casts are unnecessary but harmless. (But
    >
    > see below for a lame excuse that semi-justifies the second one.)
    >
    >
    >
    > > I know that both packing and unpacking code is not valid from ISO C point of view but still the question is - should I care about it?

    >
    >
    >
    > As long as the range restrictions hold, there's nothing invalid
    >
    > about what you've shown. What invalidity are you worried about?


    ISO C99 (6.5.7/4) - undefined behavior for left shift for negative value.

    >
    >
    >
    > > The only compiler I care about is GCC ver.>4 and its targets, e.g. windows, linux, mac os x.

    >
    > >

    >
    > > How this problem can be solved in a truly cross-platform way? Especially assuming the packed data will be read on the multiple platforms and sizeof(int) == sizeof(unsigned int) == 32 on all of these platforms.

    >
    >
    >
    > Depends what you mean by "truly cross-platform." The assumption
    >
    > of a 32-bit int excludes a few platforms right away, and there may be
    >
    > a few more where gcc isn't available. Still, a large majority of
    >
    > "mainstream" platforms meet your requirements.
    >
    >
    >
    > There's another problem lurking, though: Endianness.


    Yes, you're right. I am aware of it and I planned to "document" low-endian representation of the transmitted binary data (as in x86).

    >
    >
    >
    > That's not to say these problems can't be dealt with, just that
    >
    > you'd better give them some thought. See the FAQ.
    >
    >
    >
    > [*] Actually, there are 32! ~= 2.6E35 ways to arrange the bits.
    >
    > Consider yourself fortunate that nobody's quite that perverse. Yet.
    >
    >
    >
    > > P.S.:

    >
    > > #include <stdio.h>

    >
    > > #include <stdlib.h>

    >
    > >

    >
    > > void check_int(int a, int f) {

    >
    > > int a1;

    >
    > > int f1;

    >
    > > unsigned int p = (unsigned int) (a << 2) | f;

    >
    > >

    >
    > > a1 = ((int) p) >> 2;

    >
    > > f1 = ((int) p) & 3;

    >
    > >

    >
    > > if ((a1 != a) || (f1 != f)) {

    >
    > > fprintf(stderr, "a1(%d) != a(%d) || f1(%d) != f(%d)\n", a1, a, f1, f);

    >
    > > exit(-1);

    >
    >
    >
    > Don't Do That. Use EXIT_FAILURE. (Does anybody *know* where
    >
    > this exit(-1) meme got started? Does anybody know of *any* system
    >
    > on which a -1 exit status survives unchanged all the way to the point
    >
    > where an invoker could examine it? I think it *might* have worked
    >
    > on VMS -- but it would have meant "success" if it did.)


    Thanks for pointing on that.

    >
    >
    >
    > > } else {

    >
    > > fprintf(stdout, "OK: %d, %d\n", a, f);

    >
    > > }

    >
    > > }

    >
    > >

    >
    > > int main() {

    >
    > > check_int(1, 0);

    >
    > > check_int(144555666, 3);

    >
    > > check_int(-1, 2);

    >
    > > check_int(-222333444, 1);

    >
    >
    >
    > The final two tests are on shaky ground, as they violate the
    >
    > range restrictions mentioned earlier. (On the other hand, they
    >
    > also -- sort of -- justify some of the casts you've written: If
    >
    > X<<2 with X negative doesn't explode *and* (int)p with p outside
    >
    > the range of int doesn't explode *and* (int)p>>2 with (int)p
    >
    > negative doesn't explode, then the cast *might* save the day.
    >
    > But I wouldn't call that "truly cross-platform.")


    Yes, you're right. But I need signed integers.
    May be there is a reliable way to transform to/from a packed binary number representation - i.e. flags + number (e.g. network byte order)?
    After quick googling I did not find any and now I believe I shouldn't do it..

    I need a quick loading and saving the big packs of binary data on the same platform (writing a big array of the unsigned ints) so I believe I should provide a special converter for big endian platforms. Convertation will be the rare though theoretically possible case so I do not care about its speedand memory consumption.

    Is it sufficient to have two converters: one for little->big endian format converter and big->little endian format converter?

    AFAIK all the known 32-bit platform (well, better to say platforms with 32-bit ints) with same endianess share the *same* binary representation of ints and all the bitwise operations on integer numbers has the same effect? Oh, I forgot to mention that at the moment I care of GCC only but support forthe other modern compilers - msvc, icc would be nice.

    If it is true at least I can rely on the packing/unpacking operations I specified above for both big and little endian platforms and write converters that aware about endianess. Of course endianess information will be encodedin the header of the transmitted binary representation.

    >
    >
    >
    > > return 0;

    >
    > > }

    >
    > >

    >
    >
    >
    > Another approach might be to use a struct with bit-fields:
    >
    >
    >
    > struct packed {
    >
    > int X : 30;
    >
    > unsigned int F : 2;
    >
    > };
    >
    >
    >
    > This doesn't solve the representation issues -- if anything, it
    >
    > makes them trickier -- but it relaxes the range restriction to
    >
    > permit negative X'es.
    >


    Thank you and all who answered.

    >
    >
    > --
    >
    > Eric Sosman
    >
     
    Alex J, Jun 11, 2013
    #6
  7. Alex J

    Alex J Guest

    On Tuesday, June 11, 2013 1:33:21 AM UTC+4, Eric Sosman wrote:
    > [snip]
    > Another approach might be to use a struct with bit-fields:
    >
    > struct packed {
    > int X : 30;
    > unsigned int F : 2;
    > };
    >
    > This doesn't solve the representation issues -- if anything, it
    > makes them trickier -- but it relaxes the range restriction to
    > permit negative X'es.


    I heard that bit fields are non-portable and there is no guarantee that compiler will not apply some alignment to the struct that's why I didn't use it.

    I am probably wrong but even with pragma pack(1) struct is not guaranteed to be 32-bit size or simply said sizeof(struct packed) will not always be 4. Yet I'm not sure on that.

    Please correct me if I'm wrong.

    >
    > --
    > Eric Sosman
     
    Alex J, Jun 11, 2013
    #7
  8. Alex J

    James Kuyper Guest

    On 06/11/2013 06:24 AM, Alex J wrote:
    > On Tuesday, June 11, 2013 1:33:21 AM UTC+4, Eric Sosman wrote:
    >> [snip]
    >> Another approach might be to use a struct with bit-fields:
    >>
    >> struct packed {
    >> int X : 30;
    >> unsigned int F : 2;
    >> };
    >>
    >> This doesn't solve the representation issues -- if anything, it
    >> makes them trickier -- but it relaxes the range restriction to
    >> permit negative X'es.

    >
    > I heard that bit fields are non-portable and there is no guarantee that compiler will not apply some alignment to the struct that's why I didn't use it.


    That's what he meant when he said "it makes them trickier".

    > I am probably wrong but even with pragma pack(1) struct is not guaranteed to be 32-bit size or simply said sizeof(struct packed) will not always be 4. Yet I'm not sure on that.


    #pragma pack itself is not standard, so the standard guarantees nothing
    about how it works on those implementations which support it - and the
    ones that do support it do so with several different incompatible
    syntaxes for specifying the way the structures are packed.

    To avoid undefined behavior during packing, you'll have to transform
    valid values for X into positive numbers, convert to unsigned, and then
    performing the left shift. For unpacking, you need to perform the
    inverse operations in the opposite order. There's several different
    ways to make the numbers positive. One of the simplest is:

    #define INT30_MIN (-1<<29)

    // Packing
    p = (unsigned)(x-INT30_MIN) << 2 | f

    // Unpacking
    f = p & 3;
    x = (int)(p>>2)+INT30_MIN;

    The code would have to be a bit more complicated if you want it to work
    on systems where int and unsigned int are not both 32 bit types. You'll
    still have to deal with byte ordering when reading or writing the
    packed values.
    --
    James Kuyper
     
    James Kuyper, Jun 11, 2013
    #8
  9. Alex J

    Eric Sosman Guest

    On 6/11/2013 4:35 AM, Alex J wrote:
    > On Tuesday, June 11, 2013 1:33:21 AM UTC+4, Eric Sosman wrote:
    >> [...]
    >> As long as the range restrictions hold, there's nothing invalid
    >> about what you've shown. What invalidity are you worried about?

    >
    > ISO C99 (6.5.7/4) - undefined behavior for left shift for negative value.


    You began with

    Given two values: int X and int F and assuming that
    (X << 2) >> 2 == X and F is a two-bit value

    .... which means either that X is non-negative and not too large,
    or that you're *not* worried about 6.5.7p4! If 6.5.7.4 is in
    fact a concern, you'll need to revise your assumption about X.

    > [...]
    > May be there is a reliable way to transform to/from a packed binary number representation - i.e. flags + number (e.g. network byte order)?
    > After quick googling I did not find any and now I believe I shouldn't do it.


    One fully-portable approach would be to add a suitable offset
    to X before encoding, ensuring that what's shifted is non-negative:

    unsigned int encoded = (X + OFFSET) << 2 | F;

    Then you subtract the same offset when extracting:

    int decodedX = (encoded >> 2) - OFFSET;
    int decodedF = encoded & 3;

    > I need a quick loading and saving the big packs of binary data on the same platform (writing a big array of the unsigned ints) so I believe I should provide a special converter for big endian platforms. Convertation will be the rare though theoretically possible case so I do not care about its speed and memory consumption.


    You've confused me. When you say "on the same platform," it seems
    that you want code that will work everywhere, but that packing and
    extracting all happen on the same system; in this case, endianness is
    not an issue. But when you talk about a "converter for big endian
    platforms," it seems that data exchange between variegated platforms
    is in fact needed ...

    Either way, it's easy to read and write the data in a consistent
    "wire format" regardless of the host platform's endianness. Here's
    how you could write a four-byte value in Little-Endian order:

    // Error-checking omitted for brevity
    unsigned int value = ...;
    putc(value & 0xFF, stream);
    putc((value >> 8) & 0xFF, stream);
    putc((value >> 16) & 0xFF, stream);
    putc((value >> 24) & 0xFF, stream);

    If you're certain of 32-bitness you could omit the final &0xFF, but
    any speedup would surely be negligible compared to the I/O. Then
    you can read the bytes back the same way:

    unsigned int b0 = getc(stream);
    unsigned int b1 = getc(stream);
    unsigned int b2 = getc(stream);
    unsigned int b3 = getc(stream);
    unsigned int value = (b3 << 24) + (b2 << 16) + (b1 << 8) + b0;

    .... or even

    int X = (b3 << 22) + (b2 << 14) + (b1 << 6) + (b0 >> 2)
    - OFFSET;
    int F = b0 & 3;

    > Is it sufficient to have two converters: one for little->big endian format converter and big->little endian format converter?


    As illustrated above, I think it suffices to have zero converters.

    > AFAIK all the known 32-bit platform (well, better to say platforms with 32-bit ints) with same endianess share the *same* binary representation of ints and all the bitwise operations on integer numbers has the same effect? Oh, I forgot to mention that at the moment I care of GCC only but support for the other modern compilers - msvc, icc would be nice.


    This sounds like a digression; I'm not sure what you're driving at.
    Negative numbers, maybe? You need to avoid them anyhow, because even if
    all the platforms use two's complement (it's been years since I saw one
    that didn't) you still need to worry about getting the sign right when
    extracting. Right-shifting a negative int is formally undefined; in
    practice, some platforms duplicate the sign bit while others introduce
    zeros (giving a non-negative result).

    > If it is true at least I can rely on the packing/unpacking operations I specified above for both big and little endian platforms and write converters that aware about endianess. Of course endianess information will be encoded in the header of the transmitted binary representation.


    Or just read and write the same "wire format" everywhere.

    --
    Eric Sosman
    d
     
    Eric Sosman, Jun 11, 2013
    #9
  10. Alex J

    Eric Sosman Guest

    On 6/11/2013 6:24 AM, Alex J wrote:
    > On Tuesday, June 11, 2013 1:33:21 AM UTC+4, Eric Sosman wrote:
    >> [snip]
    >> Another approach might be to use a struct with bit-fields:
    >>
    >> struct packed {
    >> int X : 30;
    >> unsigned int F : 2;
    >> };
    >>
    >> This doesn't solve the representation issues -- if anything, it
    >> makes them trickier -- but it relaxes the range restriction to
    >> permit negative X'es.

    >
    > I heard that bit fields are non-portable and there is no guarantee that compiler will not apply some alignment to the struct that's why I didn't use it.


    Like much of C, bit-fields are portable within limits. Every C
    compiler supports bit-fields, with widths up to at least the width
    of an int -- Since you're assuming 32-bit ints, the :30 bit-field is
    fine. The compiler has a lot of freedom in how it chooses to store
    the bits (which is why I said the representation issues get trickier),
    but if you're only worried about intra-machine storage that's not a
    problem.

    > I am probably wrong but even with pragma pack(1) struct is not guaranteed to be 32-bit size or simply said sizeof(struct packed) will not always be 4. Yet I'm not sure on that.


    Correct: As I said, the compiler has a lot of freedom. As for
    #pragma pack(1) -- Well, once you've uttered a non-Standard #pragma,
    *nothing* is guaranteed by the C language.

    --
    Eric Sosman
    d
     
    Eric Sosman, Jun 11, 2013
    #10
  11. Alex J

    Tim Rentsch Guest

    Alex J <> writes:

    > On Tuesday, June 11, 2013 1:33:21 AM UTC+4, Eric Sosman wrote:
    >> On 6/10/2013 3:27 PM, Alex J wrote:
    >>
    >> snip...
    >> Let's pause a moment to study the X condition more closely.
    >> If it's meant as a portable ("cross-platform") statement, it
    >> implies 0 <= X && X <= INT_MAX/4 (otherwise, simply evaluating
    >> X<<2 would yield undefined behavior). On the 32-bit systems you
    >> mention below, this means 0 <= X && X <= 0x1FFFFFFF. Keep the
    >> range restriction in mind for what follows.

    >
    > Yes, you're absolutely right. But I need signed types too. [snip]


    The code examples suggested in other responses have bugs
    in them. Here is an easy and portable way to do what
    you want to do (disclaimer: typed in, not compiled):

    #include <limits.h>

    #if UINT_MAX <= INT_MAX
    # error sorry, this platform is screwy.
    #endif

    #define OFFSET (INT_MAX/4 + 1)

    unsigned
    pack( int x, int f ){
    return (unsigned)(x+OFFSET) << 2 | f & 0x3u;
    }

    void
    unpack( unsigned u, int *x, int *f ){
    *x = (int)(u>>2) - OFFSET, *f = u & 0x3;
    }

    This doesn't address the issue of how to transmit the
    unsigned value reliably, but it looks like you know
    what you're going to do about that.
     
    Tim Rentsch, Jun 11, 2013
    #11
  12. Alex J

    Alex J Guest

    On Tuesday, June 11, 2013 7:53:11 PM UTC+4, Eric Sosman wrote:
    > On 6/11/2013 4:35 AM, Alex J wrote:
    > [snip]
    > You've confused me. When you say "on the same platform," it seems
    > that you want code that will work everywhere, but that packing and
    > extracting all happen on the same system; in this case, endianness is
    > not an issue. But when you talk about a "converter for big endian
    > platforms," it seems that data exchange between variegated platforms
    > is in fact needed ...


    I'm sorry for not being clear. Priority one for me is a code that behaves in the expected way on all the target platforms but I didn't mean the *same*binary representation of ints on all the target platforms (so that the serialized data may be freely transmitted to the other platform via some remote protocol and then processed as is).

    > [snip]
    > --
    >
    > Eric Sosman
     
    Alex J, Jun 11, 2013
    #12
  13. Alex J

    Alex J Guest

    On Tuesday, June 11, 2013 9:37:16 PM UTC+4, Tim Rentsch wrote:
    > [snip]
    > #include <limits.h>
    >
    > #if UINT_MAX <= INT_MAX
    > # error sorry, this platform is screwy.
    > #endif
    >
    > #define OFFSET (INT_MAX/4 + 1)
    >
    > unsigned
    > pack( int x, int f ){
    > return (unsigned)(x+OFFSET) << 2 | f & 0x3u;
    > }
    >
    > void
    > unpack( unsigned u, int *x, int *f ){
    > *x = (int)(u>>2) - OFFSET, *f = u & 0x3;
    > }
    >


    Thank you for the sample.

    >
    >
    > This doesn't address the issue of how to transmit the
    > unsigned value reliably, but it looks like you know
    > what you're going to do about that.


    Yep, that's right. Thank you and thanks others who pointed to the offset trick.
     
    Alex J, Jun 11, 2013
    #13
  14. Tim Rentsch <> wrote:
    > Alex J <> writes:


    (snip)
    >> Yes, you're absolutely right. But I need signed types too. [snip]


    > The code examples suggested in other responses have bugs
    > in them. Here is an easy and portable way to do what
    > you want to do (disclaimer: typed in, not compiled):


    (snip)
    > #define OFFSET (INT_MAX/4 + 1)


    > unsigned
    > pack( int x, int f ){
    > return (unsigned)(x+OFFSET) << 2 | f & 0x3u;


    I suppose that is true, but for values in range, and the
    appropriate arithmetic shift operations, shouldn't it also
    work with signed int, arithmetic shift of those signed int
    values, and the appropriate inverse?

    The OP assured us that (x<<2)>>2==x, which should be true for
    in range x and arithmetic shift on sign magnitude, ones
    complement, and twos complement machines, not that he is likely
    to run into the first two.

    -- glen
     
    glen herrmannsfeldt, Jun 11, 2013
    #14
  15. Alex J

    James Kuyper Guest

    On 06/11/2013 04:20 PM, glen herrmannsfeldt wrote:
    > Tim Rentsch <> wrote:
    >> Alex J <> writes:

    >
    > (snip)
    >>> Yes, you're absolutely right. But I need signed types too. [snip]

    >
    >> The code examples suggested in other responses have bugs
    >> in them. Here is an easy and portable way to do what
    >> you want to do (disclaimer: typed in, not compiled):

    >
    > (snip)
    >> #define OFFSET (INT_MAX/4 + 1)

    >
    >> unsigned
    >> pack( int x, int f ){
    >> return (unsigned)(x+OFFSET) << 2 | f & 0x3u;

    >
    > I suppose that is true, but for values in range, and the
    > appropriate arithmetic shift operations, shouldn't it also
    > work with signed int, arithmetic shift of those signed int
    > values, and the appropriate inverse?


    No, the OP told us that some values of x for which x<0 are in range, and
    the behavior of x<<2 is undefined for such values.

    > The OP assured us that (x<<2)>>2==x, which should be true for


    He said that he was assuming this, not that he had verified it. Since
    the behavior is undefined, no matter how many tests he might have
    performed to verify that relationship, a fully conforming implementation
    is free to generate code that produces very different results on the
    very next test.

    More to the point, because the behavior is undefined, a fully conforming
    implementation is free to generate code for

    temp = x<<2, temp>>2 == x

    which fails for x<0, even though the "equivalent" code with no explicit
    temporary variable apparently "succeeded". It could, for instance,
    perform optimizations that invalidated the supposed equivalence of the
    two pieces of code; so long as it's invalidated only for those values
    where the behavior is undefined.

    > in range x and arithmetic shift on sign magnitude, ones
    > complement, and twos complement machines, not that he is likely
    > to run into the first two.


    Because the behavior of x<<2 for x<0 is undefined, I've never bothered
    finding out what actual behavior occurs when such code is executed.
    You're making some assumptions about that behavior, and your assumptions
    might be right for the implementations you're familiar with, and maybe
    even for the ones that Alex J needs his code to work on. However, I
    doubt that the Committee would have made the behavior undefined if that
    were universally true.
     
    James Kuyper, Jun 11, 2013
    #15
  16. Alex J

    Eric Sosman Guest

    On 6/11/2013 4:44 PM, James Kuyper wrote:
    >[...]
    > Because the behavior of x<<2 for x<0 is undefined, I've never bothered
    > finding out what actual behavior occurs when such code is executed.
    > You're making some assumptions about that behavior, and your assumptions
    > might be right for the implementations you're familiar with, and maybe
    > even for the ones that Alex J needs his code to work on. However, I
    > doubt that the Committee would have made the behavior undefined if that
    > were universally true.


    Right. On every platform I've used (well, except perhaps
    one from Long Ago and before C was invented), the left-shift
    would simply have "lost" the extra copies of the sign bit. A
    bigger problem arises on the inverse: When right-shifting to
    extract the negative value, some right-shifts would fill with
    copies of the sign bit (preserving negativity) while others
    would fill with zeroes (exhibiting a positive attitude). Both
    operations are formally undefined by C; I think it's the latter
    that poses the greater practical problem.

    Moral: Don't Do That.

    --
    Eric Sosman
    d
     
    Eric Sosman, Jun 12, 2013
    #16
  17. Alex J

    Philip Lantz Guest

    Eric Sosman wrote:
    > Right. On every platform I've used (well, except perhaps
    > one from Long Ago and before C was invented), the left-shift
    > would simply have "lost" the extra copies of the sign bit. A
    > bigger problem arises on the inverse: When right-shifting to
    > extract the negative value, some right-shifts would fill with
    > copies of the sign bit (preserving negativity) while others
    > would fill with zeroes (exhibiting a positive attitude). Both
    > operations are formally undefined by C.


    Actually, the result of a right shift of a negative value is
    implementation defined; a left shift of a negative value is undefined
    behavior. (A distinction without a difference, I know.)
     
    Philip Lantz, Jun 12, 2013
    #17
  18. Alex J

    Eric Sosman Guest

    On 6/12/2013 1:44 AM, Philip Lantz wrote:
    > Eric Sosman wrote:
    >> Right. On every platform I've used (well, except perhaps
    >> one from Long Ago and before C was invented), the left-shift
    >> would simply have "lost" the extra copies of the sign bit. A
    >> bigger problem arises on the inverse: When right-shifting to
    >> extract the negative value, some right-shifts would fill with
    >> copies of the sign bit (preserving negativity) while others
    >> would fill with zeroes (exhibiting a positive attitude). Both
    >> operations are formally undefined by C.

    >
    > Actually, the result of a right shift of a negative value is
    > implementation defined; a left shift of a negative value is undefined
    > behavior. (A distinction without a difference, I know.)


    Oops! Thanks for the correction.

    --
    Eric Sosman
    d
     
    Eric Sosman, Jun 12, 2013
    #18
  19. Alex J

    Phil Carmody Guest

    Barry Schwarz <> writes:
    > On Mon, 10 Jun 2013 12:27:43 -0700 (PDT), Alex J <>
    > wrote:
    >
    > >Hi all,
    > >
    > >Given two values: int X and int F and assuming that
    > >(X << 2) >> 2 == X and F is a two-bit value write cross-platform code to "pack" X and F to unsigned int.
    > >
    > >I've solved this as follows.
    > >Packing:
    > >unsigned int P = (unsigned int) ((X << 2) | F);
    > >
    > >Unpacking:
    > >int F = ((int) P) & 3;
    > >int A = ((int) P) >> 2;
    > >
    > >I know that both packing and unpacking code is not valid from ISO C point of view but still the question is - should I care about it?

    >
    > In what way do you think it is invalid?


    Well, if validity is measured in terms of portability, then the right
    shift of (signed) int, being non-portable, makes it invalid. It's a
    valid criterion - not work on some architectures is a pretty good
    reason to call the code invalid.

    Phil
    --
    If "law-abiding citizens have nothing to fear" from privacy-invading
    technologies and policies, then law-abiding governments should have
    nothing to fear from whistleblowers.
     
    Phil Carmody, Jun 12, 2013
    #19
  20. Alex J

    Tim Rentsch Guest

    Philip Lantz <> writes:

    > Eric Sosman wrote:
    >> Right. On every platform I've used (well, except perhaps
    >> one from Long Ago and before C was invented), the left-shift
    >> would simply have "lost" the extra copies of the sign bit. A
    >> bigger problem arises on the inverse: When right-shifting to
    >> extract the negative value, some right-shifts would fill with
    >> copies of the sign bit (preserving negativity) while others
    >> would fill with zeroes (exhibiting a positive attitude). Both
    >> operations are formally undefined by C.

    >
    > Actually, the result of a right shift of a negative value is
    > implementation defined; a left shift of a negative value is undefined
    > behavior. (A distinction without a difference, I know.)


    Actually there's a big difference. It may be rare that the
    difference has a significant effect, but it can. Doing a left
    shift of a negative value can easily produce a trap representation
    (obviously only in implementations that have trap representations
    for signed integers). This may not occur often, but certainly it
    is not unheard of. So the 'undefined behavior' consequences are
    not just imaginary. By contrast, a right shift of a negative
    value must produce some valid value -- it can't just blow up the
    way a left shift of negative values can.
     
    Tim Rentsch, Jun 15, 2013
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Timo Freiberger
    Replies:
    3
    Views:
    979
    Bob Hairgrove
    Oct 30, 2004
  2. er
    Replies:
    6
    Views:
    501
    Andre Kostur
    Sep 14, 2007
  3. Steve Holden
    Replies:
    0
    Views:
    792
    Steve Holden
    Feb 8, 2009
  4. ciccio

    int*unsigned int = unsigned?

    ciccio, Jun 4, 2010, in forum: C++
    Replies:
    2
    Views:
    423
    Öö Tiib
    Jun 4, 2010
  5. pozz
    Replies:
    12
    Views:
    765
    Tim Rentsch
    Mar 20, 2011
Loading...

Share This Page