Question about size and memory layout of a Union.

dspfun · Oct 15, 2007

Hi!

I have stumbled on the following C question and I am wondering wether
the answer is implementation specific or if the answer is always
valid, especially question b).

BRs!

----------------------------------------------------
Following C code runs on a "little endian" machine that has a 32-bit
word alignment for memory access. An unsigned integer is 32 bit long.
struct A
{
unsigned int a1;
char a2;
};

union B
{
unsigned char d[10];
struct A a;
};
union B b;

memset((void*)&b, 0xff, sizeof(b));
memset((void*)&(b.a), 0, sizeof(b.a));
b.a.a1 = 0x01020304;
b.a.a2 = 0x5;

a) What is the size of variable b?
Answer: 12
b) Write down the memory dump of b in the form of hex numbers,
starting from lower address.
Answer: 0x4 0x3 0x2 0x1 0x5 0x0 0x0 0x0 0xff 0xff 0xff 0xff
----------------------------------------------------

Richard Heathfield · Oct 15, 2007

dspfun said:

Hi!

I have stumbled on the following C question and I am wondering wether
the answer is implementation specific or if the answer is always
valid, especially question b).

It's implementation-specific in both cases.

Following C code runs on a "little endian" machine that has a 32-bit
word alignment for memory access. An unsigned integer is 32 bit long.
struct A
{
unsigned int a1;
char a2;
};

The implementation can put an arbitrary number of padding bytes after any
structure member. There's nothing to stop the implementation putting 32
bits of padding between a1 and a2 if it likes.

union B
{
unsigned char d[10];
struct A a;
};
union B b;

memset((void*)&b, 0xff, sizeof(b));
memset((void*)&(b.a), 0, sizeof(b.a));

The cast is unnecessary in both cases.

b.a.a1 = 0x01020304;
b.a.a2 = 0x5;

a) What is the size of variable b?

sizeof b

Answer: 12

b) Write down the memory dump of b in the form of hex numbers,
starting from lower address.

unsigned char *p = (unsigned char *)*b;
size_t n = sizeof b;
while(n--)
{
printf(" 0x%x", *p++);
}
putchar('\n');

Answer: 0x4 0x3 0x2 0x1 0x5 0x0 0x0 0x0 0xff 0xff 0xff 0xff

Maybe, but maybe not. The Standard does not guarantee this.

dspfun · Oct 15, 2007

It's implementation-specific in both cases.

Thank you for great answers!

Just to make sure:
The question itself specifies that the machine is little endian and
has a 32-bit word alignment for memory access and an unsigned integer
is 32 bit long. Does that make the answers explicit/valid, or are the
answers still implementation specific?

BRs!

Richard Heathfield · Oct 15, 2007

dspfun said:

Thank you for great answers!

Just to make sure:
The question itself specifies that the machine is little endian and
has a 32-bit word alignment for memory access and an unsigned integer
is 32 bit long. Does that make the answers explicit/valid, or are the
answers still implementation specific?

Well, your answers demonstrate one reasonable response to the code by an
implementation on such a platform. Possibly the *most* reasonable
response. Unfortunately for the purpose of your question, the C Standard
does not require that implementations must be reasonable; it requires only
that they act in accordance with the Standard. And it is certainly
possible - even easy - to imagine answers different to yours that are
still in accordance with the Standard, even on such hardware as you
specify.

Let's just say that I'd be - um... aha! - surprised to learn that an
implementation that targets your platform does so in a way different to
what you clearly expect, unless specifically instructed so to do.

Martien Verbruggen · Oct 15, 2007

dspfun said:

union B
{
unsigned char d[10];
struct A a;
};
union B b;
b) Write down the memory dump of b in the form of hex numbers,
starting from lower address.

Click to expand...

unsigned char *p = (unsigned char *)*b;

ITYM

unsigned char *p = (unsigned char *)&b;

Martien

Richard Heathfield · Oct 15, 2007

Martien Verbruggen said:

dspfun said:

union B
{
unsigned char d[10];
struct A a;
};
union B b;
b) Write down the memory dump of b in the form of hex numbers,
starting from lower address.

Click to expand...

unsigned char *p = (unsigned char *)*b;

Click to expand...

ITYM

unsigned char *p = (unsigned char *)&b;

Er, yes, I do, don't I?

dspfun · Oct 18, 2007

Just to make sure:

Well, your answers demonstrate one reasonable response to the code by an
implementation on such a platform. Possibly the *most* reasonable
response. Unfortunately for the purpose of your question, the C Standard
does not require that implementations must be reasonable; it requires only
that they act in accordance with the Standard. And it is certainly
possible - even easy - to imagine answers different to yours that are
still in accordance with the Standard, even on such hardware as you
specify.

Thanks again for your answers!

Could you give some "easy" examples of how the answers could be
different while still running on the specified hardware?

BRs!

Mark McIntyre · Oct 18, 2007

Thanks again for your answers!

Could you give some "easy" examples of how the answers could be
different while still running on the specified hardware?

Many compilers offer a pragma "pack" which alters how memory is laid
out in unions and structs.
A 16-bit compiler might give a different answer from a 32-bit compiler
on the same hardware (ISTR that MSVC 1.5 and MSVC 4.0 managed this).

Further discussion is probably offtopic here.

--
Mark McIntyre

"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it."
--Brian Kernighan

J. J. Farrell · Oct 19, 2007

Thanks again for your answers!

Could you give some "easy" examples of how the answers could be
different while still running on the specified hardware?

The compiler might put 37 bytes of padding between the two members of
struct A. That would be an odd thing for it to do, but it would be
perfectly valid and fully compliant with the C Standard.

dspfun · Oct 19, 2007

Thank you for your answers!

To try to conclude by refering to the sections of the C-standard I
beleive question a) above is implementation defined due to the
following text in the C-standard:
---------------------------------------------
6.5.3.4 The sizeof operator
4. The value of the result is implementation-defined, and its type (an
unsigned integer type) is size_t, defined in <stddef.h> (and other
headers).
---------------------------------------------

Question b) is implementation defined due to the following text in the
C-standard:
----------------------------------------------
6.2.6.1 General
6 When a value is stored in an object of structure or union type,
including in a member object, the bytes of the object representation
that correspond to any padding bytes take unspecified values.42) The
value of a structure or union object is never a trap representation,
even though the value of a member of the structure or union object may
be
a trap representation.

7 When a value is stored in a member of an object of union type, the
bytes of the object representation that do not correspond to that
member but do correspond to other members take unspecified values.

6.7.2.1 Structure and union specifiers
15. There may be unnamed padding at the end of a structure or union.

James Kuyper Jr. · Oct 19, 2007

dspfun said:
Thank you for your answers!

To try to conclude by refering to the sections of the C-standard I
beleive question a) above is implementation defined due to the
following text in the C-standard:
---------------------------------------------
6.5.3.4 The sizeof operator
4. The value of the result is implementation-defined, and its type (an
unsigned integer type) is size_t, defined in <stddef.h> (and other
headers).
---------------------------------------------

More importantly:
6.7.1p13: "... There may be unnamed padding within a structure object, ..."

Question b) is implementation defined due to the following text in the
C-standard:
----------------------------------------------
6.2.6.1 General
6 When a value is stored in an object of structure or union type,
including in a member object, the bytes of the object representation
that correspond to any padding bytes take unspecified values.42) The
value of a structure or union object is never a trap representation,
even though the value of a member of the structure or union object may
be
a trap representation.

7 When a value is stored in a member of an object of union type, the
bytes of the object representation that do not correspond to that
member but do correspond to other members take unspecified values.

6.7.2.1 Structure and union specifiers
15. There may be unnamed padding at the end of a structure or union.

Well, you've left out the most important one:
6.2.6.1p1: "The representations of all types are unspecified except as
stated in this subclause."

While the standard specifies many features of the representation of
integer types, it always refers to bits in terms of their value, not in
terms of their physical locations withing the integer object. In
principle, a conforming implementation could store the bits that make up
a 32-bit integer in any order it wants, just so long as it implements
the bit-wise operators to handle those bits in that same order. The
standard doesn't ordain any particular connections between the ordering
used in an integer type and the ordering used in unsigned char.

There are 32! possible orderings of 32 bits, though only a small number
of those orderings are actually used. However, I've read that of the 12
possible orderings of the 4 bytes underlying a 4-byte integer, 8 of them
are in actual use. The simplistic dichotomy between little- and
big-endian architectures doesn't cover all of the possibilities (though
it does cover most of the popular ones).

dspfun · Oct 19, 2007

More importantly:
6.7.1p13: "... There may be unnamed padding within a structure object, ...."

Well, you've left out the most important one:
6.2.6.1p1: "The representations of all types are unspecified except as
stated in this subclause."

While the standard specifies many features of the representation of
integer types, it always refers to bits in terms of their value, not in
terms of their physical locations withing the integer object. In
principle, a conforming implementation could store the bits that make up
a 32-bit integer in any order it wants, just so long as it implements
the bit-wise operators to handle those bits in that same order. The
standard doesn't ordain any particular connections between the ordering
used in an integer type and the ordering used in unsigned char.

There are 32! possible orderings of 32 bits, though only a small number
of those orderings are actually used. However, I've read that of the 12
possible orderings of the 4 bytes underlying a 4-byte integer, 8 of them
are in actual use. The simplistic dichotomy between little- and
big-endian architectures doesn't cover all of the possibilities (though
it does cover most of the popular ones).- Dölj citerad text -

- Visa citerad text -

Thank you for your help!

Peter Pichler · Oct 19, 2007

James said:
There are 32! possible orderings of 32 bits, though only a small number
of those orderings are actually used. However, I've read that of the 12
possible orderings of the 4 bytes underlying a 4-byte integer, 8 of them
are in actual use.

ITYM 24 possible orderings. Just like there are 32! possibilities with
32 bits, there are 4! possibilities with 4 bytes. 4! = 24.

jameskuyper · Oct 19, 2007

Peter said:
ITYM 24 possible orderings. Just like there are 32! possibilities with
32 bits, there are 4! possibilities with 4 bytes. 4! = 24.

You're right. Also, I'm not too sure about the number that are in
actual use; I've seen what claimed to be a definitive list, and I
think the number of entries in that list was 8, but I'm not sure where
to find that list, and I'm not willing to swear on the accuracy of my
memory. The important point, of course, is that it's a lot larger than
2.

Keith Thompson · Oct 19, 2007

You're right. Also, I'm not too sure about the number that are in
actual use; I've seen what claimed to be a definitive list, and I
think the number of entries in that list was 8, but I'm not sure where
to find that list, and I'm not willing to swear on the accuracy of my
memory. The important point, of course, is that it's a lot larger than
2.

Interesting. I think the only ones I've ever heard in actual use are
big-endian, little-endian, and PDP-11-endian (the latter joins two
little-endian 16-bit words into a big-endian 32-bit longword). Does
anyone know of other examples in real life?

Gordon Burditt · Oct 20, 2007

There are 32! possible orderings of 32 bits, though only a small number

ITYM 24 possible orderings. Just like there are 32! possibilities with
32 bits, there are 4! possibilities with 4 bytes. 4! = 24.

What in the standard prohibits the use of different bit-orderings to
represent integers, such that, say, the most significant bit and the
least significant bit are in the *same* byte (and next to each other)?

(Not that I expect anyone to actually take advantage of this if it is allowed).

Justin Spahr-Summers · Oct 20, 2007

What in the standard prohibits the use of different bit-orderings to
represent integers, such that, say, the most significant bit and the
least significant bit are in the *same* byte (and next to each other)?

I'm not able to look up whether there is such a clause or not, but it
doesn't really matter. Bitwise operations are defined in terms of
arithmetic, and you shouldn't need to know the representation for
portable code anyhow.

Justin Spahr-Summers · Oct 20, 2007

I'm not able to look up whether there is such a clause or not, but it
doesn't really matter. Bitwise operations are defined in terms of
arithmetic, and you shouldn't need to know the representation for
portable code anyhow.

I spoke much too soon. I found a draft of the standard real quick and
realized my mistake. Only bitwise shift operations are defined in
terms of arithmetic, apparently. I don't know how much that changes,
however.

pete · Oct 20, 2007

Gordon Burditt wrote:

What in the standard prohibits the use of different bit-orderings to
represent integers, such that, say, the most significant bit and the
least significant bit are in the *same* byte (and next to each other)?

Nothing does.

"next to each other" doesn't mean anything.
Bits in the same byte are only distiguishable
by the values that they represent.

James Kuyper Jr. · Oct 20, 2007

Gordon said:
What in the standard prohibits the use of different bit-orderings to
represent integers, such that, say, the most significant bit and the
least significant bit are in the *same* byte (and next to each other)?

Nothing - that is how Gordon and I both calculated that there are 32!
possible bit orderings for a 32-bit integer.

Adding adressing of IPv6 to program	1	Feb 16, 2023
Union and strict aliasing	4	Jul 28, 2012
shift a block of memory content	7	Jun 11, 2006
Structure Size and Padding Byte Questions	2	Oct 1, 2013
sizeof a union	18	Nov 29, 2005
zero up memory	44	Jan 23, 2012
Strange error when reallocating memory	18	Dec 22, 2012
Memory footprint of a structure of structures	23	Nov 23, 2011

Question about size and memory layout of a Union.

dspfun

Richard Heathfield

dspfun

Richard Heathfield

Martien Verbruggen

Richard Heathfield

dspfun

Mark McIntyre

J. J. Farrell

dspfun

James Kuyper Jr.

dspfun

Peter Pichler

jameskuyper

Keith Thompson

Gordon Burditt

Justin Spahr-Summers

Justin Spahr-Summers

pete

James Kuyper Jr.

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads