Portability issues (union, bitfields)

Noob · Nov 4, 2009

Hello,

I'm dealing with a library whose author seems to have relied implicitly
on several non-portable features. I'm trying to expose every assumption.
(In the context of C89).

typedef struct {
unsigned int aa : 1;
unsigned int bb : 1;
unsigned int cc : 1;
unsigned int dd : 1;
unsigned int ee : 1;
unsigned int ff : 1;
unsigned int gg : 1;
unsigned int hh : 1;
unsigned int reserved : 24;
} bitmap_t;

typedef union {
unsigned char uc;
bitmap_t map;
} foo_t;

unsigned frob(unsigned u)
{
foo_t foo;
foo.map.aa = (u >> 0) & 1;
foo.map.bb = (u >> 1) & 1;
foo.map.cc = (u >> 2) & 1;
foo.map.dd = (u >> 3) & 1;
foo.map.ee = (u >> 4) & 1;
foo.map.ff = (u >> 5) & 1;
foo.map.gg = (u >> 6) & 1;
foo.map.hh = (u >> 7) & 1;
/* garbage in foo.map.reserved */
return foo.uc;
}

int main(void)
{
unsigned res;
/* my tests */
res = frob(1); /* 0000 0001 */
res = frob(0x40); /* 0100 0000 */
res = frob(0xAC); /* 1010 1100 */
res = frob(0x35); /* 0011 0101 */
return 0;
}

1. The definition of bitmap_t seems to imply that the author thinks
(unsigned int) is at least (or exactly) 32-bits wide.
I think we get UB on platforms where (unsigned int) is only 16-bits
wide? Even if the "reserved" field is never accessed?

2. (I'm not sure about this one.) foo is written to using the map field,
then read from using a different field. This specific instance might be
OK, because uc's type is unsigned char?

3. The code seems to assume field "aa" maps to the least-significant bit
of the parameter (bit 0), "bb" maps to bit 1, etc.
Consider frob(1);
foo.map.aa <- 1
all other fields <- 0
reserved is left undefined

The assumption seems to be:
For any 0 <= u <= 255, frob(u) == u

Moreover, "uc" will typically be only 8-bits wide, while "map" is
32-bits wide. Is there any guarantee whether which bits of "uc" and
"map" overlap? (May depend on endianness?)

Did I miss any more assumptions?

Regards.

Eric Sosman · Nov 4, 2009

Noob said:
Hello,

I'm dealing with a library whose author seems to have relied implicitly
on several non-portable features. I'm trying to expose every assumption.
(In the context of C89).

typedef struct {
unsigned int aa : 1;
unsigned int bb : 1;
unsigned int cc : 1;
unsigned int dd : 1;
unsigned int ee : 1;
unsigned int ff : 1;
unsigned int gg : 1;
unsigned int hh : 1;
unsigned int reserved : 24;
} bitmap_t;

typedef union {
unsigned char uc;
bitmap_t map;
} foo_t;

unsigned frob(unsigned u)
{
foo_t foo;
foo.map.aa = (u >> 0) & 1;
foo.map.bb = (u >> 1) & 1;
foo.map.cc = (u >> 2) & 1;
foo.map.dd = (u >> 3) & 1;
foo.map.ee = (u >> 4) & 1;
foo.map.ff = (u >> 5) & 1;
foo.map.gg = (u >> 6) & 1;
foo.map.hh = (u >> 7) & 1;
/* garbage in foo.map.reserved */
return foo.uc;
}

int main(void)
{
unsigned res;
/* my tests */
res = frob(1); /* 0000 0001 */
res = frob(0x40); /* 0100 0000 */
res = frob(0xAC); /* 1010 1100 */
res = frob(0x35); /* 0011 0101 */
return 0;
}

1. The definition of bitmap_t seems to imply that the author thinks
(unsigned int) is at least (or exactly) 32-bits wide.

I don't believe so. He assumes that an int is at least 24
bits wide, but I don't see an assumption of any specific >=24-bit
width.

I think we get UB on platforms where (unsigned int) is only 16-bits
wide? Even if the "reserved" field is never accessed?

Since a bit-field cannot (portably) be wider than an int,
and since an int can be as narrow as sixteen bits, it's possible
that the attempt to declare a 24-bit field may fail.

2. (I'm not sure about this one.) foo is written to using the map field,
then read from using a different field. This specific instance might be
OK, because uc's type is unsigned char?

The value of foo.uc is unspecified. Since unsigned char has no
trap representations you won't get UB by fetching it, but there's
no telling what value you'll get.

3. The code seems to assume field "aa" maps to the least-significant bit
of the parameter (bit 0), "bb" maps to bit 1, etc.

The assumption is unwarranted, but I'm not sure the code
really makes the assumption. We'd have to know something about
the expected/intended inputs and outputs to know what the
assumptions are. Maybe this code is used to discover something
about the way a particular compiler lays out bit-fields?

Consider frob(1);
foo.map.aa <- 1
all other fields <- 0
reserved is left undefined

The assumption seems to be:
For any 0 <= u <= 255, frob(u) == u

As above, I can't tell what's being assumed. If that's the
desired transformation, there are portable (and easier!) ways
to achieve it.

Moreover, "uc" will typically be only 8-bits wide, while "map" is
32-bits wide. Is there any guarantee whether which bits of "uc" and
"map" overlap? (May depend on endianness?)

No guarantees, or at any rate very few. All we know is

1) The bit-fields are packed into "addressable storage
units" of a size that the compiler chooses (but is not
required to document, as far as I know).

2) Since an ASU is at least one byte long and aa...hh will
all fit in one byte, they will occupy the same ASU.

3) If the ASU has at least 32 bits (and if int has at least
24), reserved will occupy the same ASU as aa...hh.

3a) Otherwise, reserved may occupy an ASU of its own, or
may "straddle" multiple adjacent ASU's, possibly using
part of the ASU containing aa...hh.

4) We know that the ASU(s) containing reserved will not
precede the ASU containing aa...hh.

We don't know the size of the ASU (I think the ASU's for
different bit-fields may even have different sizes), and we
don't know which of an ASU's bits are used for which fields,
we don't know whether the fields are "tightly" or "loosely"
packed, and we don't know what values any "slack" bits in
the ASU's might take.

Did I miss any more assumptions?

Hard to tell. It would be helpful to know what the code
is trying to do, or thinks it's trying to do.

Noob · Nov 5, 2009

Eric said:
I don't believe so. He assumes that an int is at least 24
bits wide, but I don't see an assumption of any specific >=24-bit
width.

You are correct when you say the code only assumes width >= 24, but IMO,
it is clear that, in the author's mind, width == 32 and he is padding
the struct "by hand" to fill the 32 bits.

I don't think it has ever crossed the author's mind that an int could be
24-bits wide. (IMHO, not many people who call themselves "C programmers"
are aware than an int could be 24-bits wide.)

Since a bit-field cannot (portably) be wider than an int,
and since an int can be as narrow as sixteen bits, it's possible
that the attempt to declare a 24-bit field may fail.

What does it mean for a declaration to fail?
Compiler warning then UB?

The value of foo.uc is unspecified. Since unsigned char has no
trap representations you won't get UB by fetching it, but there's
no telling what value you'll get.

Do you agree that, on some platforms, the bits in the union will map to
the bits of foo.uc? Do you say its value is unspecified because this
might not be the case (point 3) or for some other reason?

The assumption is unwarranted, but I'm not sure the code
really makes the assumption. We'd have to know something about
the expected/intended inputs and outputs to know what the
assumptions are. Maybe this code is used to discover something
about the way a particular compiler lays out bit-fields?

You're right, I did leave out some critical piece of information (a
comment) which stated :

/* Bit 0 : aa is used for X
Bit 1 : bb is used for Y
... */

The bit mask is used by the library user to request specific features.

For example, if the user wants features aa and cc, then he calls

frob(1<<0 | 1<<2);

Then frob does the little dance with the bit field, but needs to pass a
bit mask down the chain. The obvious answer would be to never introduce
any bit fields, and to work with the masks all the way (as was done
before), but (IMO) the author is convinced that the new code is easier
to maintain because the meaning of each bit is spelled out in the
field's name. This happens to work because GCC packs the bits least
significant-bit-first, but it will break with a vengeance if we ever
move to a different compiler, or if GCC suddenly changes the bit order
(though that seems rather unlikely).

As above, I can't tell what's being assumed. If that's the
desired transformation, there are portable (and easier!) ways
to achieve it.

Yes, wrapper macros seem to nicely solve the problem of portability and
maintainability. I was told that bit fields are "nicer" to debug. (They
may have a point, but nicer at the cost of hell breaking loose when we
change compilers seems like a hefty price to pay.)

Eric Sosman · Nov 5, 2009

Noob said:
Eric said:

Noob said:

[... bit-fields in a struct, union-punned with unsigned char ...]
1. The definition of bitmap_t seems to imply that the author thinks
(unsigned int) is at least (or exactly) 32-bits wide.

Click to expand...

I don't believe so. He assumes that an int is at least 24
bits wide, but I don't see an assumption of any specific >=24-bit
width.

Click to expand...

You are correct when you say the code only assumes width >= 24, but IMO,
it is clear that, in the author's mind, width == 32 and he is padding
the struct "by hand" to fill the 32 bits.

Yes, the author almost certainly had "thirty-two" somewhere
in the back of his brain. But since the code makes no use of the
24-bit padding field, I'm still not sure that the assumption of
a 32-bit int is really relevant.

I don't think it has ever crossed the author's mind that an int could be
24-bits wide. (IMHO, not many people who call themselves "C programmers"
are aware than an int could be 24-bits wide.)

It has been many years since I used a machine with 24-bit
words (four six-bit characters per word). I doubt we'll see
such things again in general-purpose machines. (Special-purpose
hardware may be a different story.)

What does it mean for a declaration to fail?
Compiler warning then UB?

With a 16-bit int, say, the `unsigned int reserved : 24;'
struct member would be an invalid declaration. It would "fail"
in the same way that `int array[-42];' would "fail."

The exact requirement is that the specified width shall not
exceed the width of the bit-field's base type, and it's in a
Constraints section (6.7.2.1p3) so a diagnostic is required for
violations. 6.7.2.1p4 goes on to list the allowable base types:
_Bool (C99 only), the two flavors of int, and "some other
implementation-defined type." No diagnostic is required for
the use of base types beyond the required three -- but the
implementation is not obliged to accept them, either.

As far as I can see, there is no 100% portable way to
specify a 24-bit bit-field. The widest 100% portable base type
is int, and int could be as narrow as 16 bits, and you're stuck.
You could specify `unsigned int reserved : 24;' and hope int is
wide enough, or you could write `unsigned long reserved : 24;'
and hope the implementation accepts long (necessarily >=32 bits),
but your hopes might be dashed either way.

Do you agree that, on some platforms, the bits in the union will map to
the bits of foo.uc? Do you say its value is unspecified because this
might not be the case (point 3) or for some other reason?

To the first, yes. To the second, I'm relying on 6.2.6.1p7:

When a value is stored in a member of an object of union
type, the bytes of the object representation that do not
correspond to that member but do correspond to other
members take unspecified values, [...]

In the case at hand, values are stored in the struct member of
a union, and then an unsigned char member is fetched. We know
that the struct's eight 1-bit bit-fields occupy the first
"addressable storage unit" in the struct, and that the ASU is
the first thing in that struct and hence the first thing in the
union. We also know that the unsigned char is the first thing
in the union -- but we don't know how big the ASU is, nor which
of its bits hold the bit-fields. The union's first byte -- the
unsigned char fetched at the end -- might be in an unused part
of the ASU, and the values of the bits that correspond to no
member of the struct are unspecified.

That's how I understand it, anyhow.

You're right, I did leave out some critical piece of information (a
comment) which stated :

/* Bit 0 : aa is used for X
Bit 1 : bb is used for Y
... */

The bit mask is used by the library user to request specific features.

For example, if the user wants features aa and cc, then he calls

frob(1<<0 | 1<<2);

Then frob does the little dance with the bit field, but needs to pass a
bit mask down the chain. The obvious answer would be to never introduce
any bit fields, and to work with the masks all the way (as was done
before), but (IMO) the author is convinced that the new code is easier
to maintain because the meaning of each bit is spelled out in the
field's name. This happens to work because GCC packs the bits least
significant-bit-first, but it will break with a vengeance if we ever
move to a different compiler, or if GCC suddenly changes the bit order
(though that seems rather unlikely).

He could keep using bit-fields (aside from the problematic
24-bit field, which he doesn't seem to need anyhow). It's the
type-punning that makes the trouble: He goes to all this trouble
to set and clear the bit-fields without assuming anything about
their order, and then he messes it up with a different assumption.
Why not just pass the struct and its bit-fields around? (And why
not jettison that 24-bit element, if it's not used?)

John Temples · Nov 5, 2009

It has been many years since I used a machine with 24-bit
words (four six-bit characters per word). I doubt we'll see
such things again in general-purpose machines. (Special-purpose
hardware may be a different story.)

Some compilers for 8-bit platforms support a 24-bit integer type via
an extension such as "short long". But I'm not aware of any that
allow "int" to become 24 bits with a compile-time option.

Seebs · Nov 5, 2009

Some compilers for 8-bit platforms support a 24-bit integer type via
an extension such as "short long". But I'm not aware of any that
allow "int" to become 24 bits with a compile-time option.

I'd doubt you'd see them outside custom DSP work, but I've almost
certainly got a machine in my basement containing a 24-bit DSP.

-s

Walter Banks · Nov 5, 2009

Seebs said:
I'd doubt you'd see them outside custom DSP work, but I've almost
certainly got a machine in my basement containing a 24-bit DSP.

There are quite a few 24 bit processors with an int of 24 bits.

We have 24bit int data type support on some 8 bit embedded
system compilers. The type is defined as a size specific int rather
than some weird combination of long and short. 24 bit ints
fit the needs of many applications and significantly reduce cycle
counts and RAM requirements on 8 bit processors

Walter..

John Temples · Nov 6, 2009

I'd doubt you'd see them outside custom DSP work,

No, just conventional 8-bit processors. On an 8-bit CPU, working with
24 bits generates less code for math, argument passing, etc., than
working with 32 bits.

Alternative approach to bitfields	9	Jul 23, 2012
_Bool bitfields and cast / assignment	5	Jun 14, 2010
fread/fwrite Portability Issues	20	Jul 22, 2006
Array of 4 bit fields?	9	May 25, 2010
A generic interface for numeric variables	8	Apr 4, 2011
union {unsigned char u[10]; ...}	30	Mar 13, 2007
Bitfields in a heterogenous environment	6	Feb 4, 2005
Safe use of unions	4	Jun 30, 2006

Portability issues (union, bitfields)

Noob

Eric Sosman

Noob

Eric Sosman

John Temples

Seebs

Walter Banks

John Temples

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads