Bit and Byte Order Portability

H

hmoulding

This may be a FAQ, in which case you all may (probably will) yell at
me.
I haven't coded in plain C for almost 20 years, so I hope the following
code is actually done right.

Suppose the following declaration:

union {
unsigned int uvalue;
unsigned nybbles:4[8];
unsigned bits:1[32];
struct {
unsigned nybble0:4;
unsigned nybble1:4;
unsigned nybble2:4;
unsigned nybble3:4;
unsigned nybble4:4;
unsigned nybble5:4;
unsigned nybble6:4;
unsigned nybble7:4;
} nybblepack;
struct {
unsigned bit0:1;
unsigned bit1:1;
unsigned bit2:1;
unsigned bit3:1;
unsigned bit4:1;
unsigned bit5:1;
unsigned bit6:1;
unsigned bit7:1;
// ...
} bitpack;
} testorder;

and the following code

testorder.uvalue = 0x12345678;
for ( i=0; i<8 ) printf("nybbles[%i] = %i", i, testorder.nybbles);
for ( i=0; i<32 ) printf("bits[%i] = %i", i, testorder.bits);
printf("nybble0 = %i", testorder.nybblepack.nybble0);
printf("nybble1 = %i", testorder.nybblepack.nybble1);
printf("nybble2 = %i", testorder.nybblepack.nybble2);
printf("nybble3 = %i", testorder.nybblepack.nybble3);
printf("nybble4 = %i", testorder.nybblepack.nybble4);
printf("nybble5 = %i", testorder.nybblepack.nybble5);
printf("nybble6 = %i", testorder.nybblepack.nybble6);
printf("nybble7 = %i", testorder.nybblepack.nybble7);
printf("bit0 = %i", testorder.bitpack.bit0);
printf("bit1 = %i", testorder.bitpack.bit1);
printf("bit2 = %i", testorder.bitpack.bit2);
printf("bit3 = %i", testorder.bitpack.bit3);
printf("bit4 = %i", testorder.bitpack.bit4);
printf("bit5 = %i", testorder.bitpack.bit5);
printf("bit6 = %i", testorder.bitpack.bit6);
printf("bit7 = %i", testorder.bitpack.bit7);
//...

Assuming a standards compliant compiler, would it produce the same
output on all machines? (I suspect not, if for no other reason than
that an int may be larger than what's necessary to store 2^32-1.)

Would it produce the same output on the same machine with different
compliant compilers?

Can anyone refer me to the relevant section in the standard?
 
E

Eric Sosman

This may be a FAQ, in which case you all may (probably will) yell at
me.
I haven't coded in plain C for almost 20 years, so I hope the following
code is actually done right.

Suppose the following declaration:

union {
unsigned int uvalue;
unsigned nybbles:4[8];
unsigned bits:1[32];

Bzzt! The compiler rejects this line, because you
can't make arrays of bit-fields. (... because you can't
make a pointer to a bit-field, and C array indexing is
defined in terms of pointer arithmetic.) Let's just
pretend this line and the related loop below are deleted.
struct {
unsigned nybble0:4;
unsigned nybble1:4;
unsigned nybble2:4;
unsigned nybble3:4;
unsigned nybble4:4;
unsigned nybble5:4;
unsigned nybble6:4;
unsigned nybble7:4;
} nybblepack;
struct {
unsigned bit0:1;
unsigned bit1:1;
unsigned bit2:1;
unsigned bit3:1;
unsigned bit4:1;
unsigned bit5:1;
unsigned bit6:1;
unsigned bit7:1;
// ...
} bitpack;
} testorder;

and the following code

testorder.uvalue = 0x12345678;
for ( i=0; i<8 ) printf("nybbles[%i] = %i", i, testorder.nybbles);
for ( i=0; i<32 ) printf("bits[%i] = %i", i, testorder.bits);


We're pretending this line is deleted, right?
printf("nybble0 = %i", testorder.nybblepack.nybble0);
printf("nybble1 = %i", testorder.nybblepack.nybble1);
printf("nybble2 = %i", testorder.nybblepack.nybble2);
printf("nybble3 = %i", testorder.nybblepack.nybble3);
printf("nybble4 = %i", testorder.nybblepack.nybble4);
printf("nybble5 = %i", testorder.nybblepack.nybble5);
printf("nybble6 = %i", testorder.nybblepack.nybble6);
printf("nybble7 = %i", testorder.nybblepack.nybble7);
printf("bit0 = %i", testorder.bitpack.bit0);
printf("bit1 = %i", testorder.bitpack.bit1);
printf("bit2 = %i", testorder.bitpack.bit2);
printf("bit3 = %i", testorder.bitpack.bit3);
printf("bit4 = %i", testorder.bitpack.bit4);
printf("bit5 = %i", testorder.bitpack.bit5);
printf("bit6 = %i", testorder.bitpack.bit6);
printf("bit7 = %i", testorder.bitpack.bit7);
//...

Assuming a standards compliant compiler, would it produce the same
output on all machines? (I suspect not, if for no other reason than
that an int may be larger than what's necessary to store 2^32-1.)

Or smaller, for that matter. And, no: You will not
necessarily get the same output on all implementations.
In theory, at least, you might not get any output at all:
storing into one member of a union and then reading from
a different member produces undefined behavior.

Even if "all goes well" you won't always get the same
output. There are "endianness" issues, there's the question
of how the compiler decides to arrange the bit-fields (it
has quite a lot of freedom), there's possible padding within
the structs that are members ...
Would it produce the same output on the same machine with different
compliant compilers?

The Standard speaks only of "the implementation," and
doesn't divide it into separate components. We may think of
the compiler, the library, the O/S, and the hardware as distinct
pieces of the implementation, but as far as the Standard is
concerned changing any of these pieces gives a different
"implementation" altogether. The Standard does not require
different implementations to agree on such details.
Can anyone refer me to the relevant section in the standard?

You've asked about things that are spread fairly widely
through the Standard, so this list is incomplete:

6.2.5 Types
6.2.6 Representations of types
6.7.2.1 Structure and union specifiers
 
G

George Wicks

(e-mail address removed) wrote in @o13g2000cwo.googlegroups.com:

Emmmm.. bit fields are pretty much machine-dependent..
Frem "C: A Reference Manual" Samuel Harbison/Guy Steele:

"Bit fields are typically used in machine-dependent programs that
must force a data structure to correspond to a fixed hardware
representation.."

Now, your printf() output would probably look the same across
different plaforms, but if you were to create a binary file on one
machine using a bitfield structure, it might not be read back the
same way on a different machine platform.
 
H

Helge Moulding

Eric said:
unsigned bits:1[32];
Bzzt! The compiler rejects this line, because you
can't make arrays of bit-fields. (... because you can't
make a pointer to a bit-field, and C array indexing is
defined in terms of pointer arithmetic.)

I knew that. I mean, I knew that the standard doesn't define
pointers to bit fields (I think implementations may have them),
but I didn't think through the necessary implications for
arrays.
And, no: You will not necessarily get the same output on all
implementations.

OK, that's what I thought I remembered. Each implementation
has to have its own hardware specific stuff if you want to
mess around at the hardware level.
In theory, at least, you might not get any output at all:
storing into one member of a union and then reading from
a different member produces undefined behavior.

I suppose that makes sense, but it's difficult in that case
to explain the use of unions in the first place, isn't it?
You've asked about things that are spread fairly widely
through the Standard, so this list is incomplete:
6.2.5 Types
6.2.6 Representations of types
6.7.2.1 Structure and union specifiers

All the same, this is more than I was able to figure out
after poking around with Google for an hour. Thanks a bunch!
 
E

Eric Sosman

Helge said:
I suppose that makes sense, but it's difficult in that case
to explain the use of unions in the first place, isn't it?

Unions are mostly space-savers, like "variant records"
in Pascal:

struct shape {
enum { RECTANGLE, SQUARE, ELLIPSE, CIRCLE } type;
union {
struct { double wide, tall; } rectangle;
struct { double side; } square;
struct { double major, minor; } ellipse;
struct { double radius; } circle;
} data;
};

If you really want to peek and poke at the representations
of data objects (C programmers seem to indulge in this far
more often than they actually need to), the sanctioned way
is to use an `unsigned char*' to inspect/adjust the bytes
individually:

/* How is an `unsigned int' arranged? */
unsigned int data = 0x12345678;
unsigned char *p = (unsigned char*) &data;
printf ("0x%08X =>", data);
while (p < (unsigned char*)(&data + 1))
printf (" %02X", (unsigned int)*p++);
printf ("\n");

(There are some non-portable assumptions built into this
code, but if they're wrong you'll just get ugly output, not
anything "really bad.")
 
C

Chris Croughton

The following article about bye alignment and ordering should help:

http://www.eventhelix.com/RealtimeMantra/ByteAlignmentAndOrdering.htm

It's a good illustration. It does however miss one thing -- apart from
char, the integer types in C aren't specified as to size (except that
short is at least as big as char, int is at least as big as short, and
long is at least as big as int; and short and int are at least 16 bits
and long is at least 32 bits). So specifying a structure as:

struct packet
{
long ll;
short ss;
char cc;
};

is still not defined even if you know that the byte ordering and
alignment are correct, because on one machine long might be 32 bits and
on another 64 bits (or even 36, 40, 48 or even stranger sizes). For
that matter a char may be anything at least 8 bits (9, 12, 24...).

The only way to deal with structures portably is to convert them "by
hand". I usually have routines:

void pack_1_octet(unsigned char *buffer, unsigned char val);
void pack_2_octet(unsigned char *buffer, unsigned int val);
void pack_4_octet(unsigned char *buffer, unsigned long val);

unsigned char unpack_1_octet(unsigned char *buffer);
unsigned int unpack_2_octet(unsigned char *buffer);
unsigned long unpack_4_octet(unsigned char *buffer);

or something like that, so that the byte order and size remains
constant whatever the compiler and target.

Chris C
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,483
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top