Bug in gcc4 initialisers suspected

S.Tobias · May 23, 2005

[Newsgroups restricted to c.l.c. only]

In comp.lang.c glen herrmannsfeldt said:
Marcel van Kervinck wrote:

Dear all,

I would like to confirm my suspicion of a compiler
bug in gcc4 for MacOSX. The code example below expects
that the initializer of 'v' sets all elements in
v.u.bytes[] to zero, as specified by the C99 standard:

Click to expand...

(snip)

Click to expand...

typedef struct {
union {
unsigned char bytes[2];
} u;
} vector_t;

Click to expand...

It seems that none of the posts so far have mentioned the union.

The question was about initialization of a (deeply) nested member.

As I understand, though I could be wrong, the standard sets the
value of the variable to zero, not just bytes. (The distinction
is important when using memset()).

I don't understand what you're trying to say. Everything is
about values, not bytes or representations.

As some variable types might have non-zero bit representation when
the value is zero, and a union could have more than one type of
variable, it seems slightly possible that the restriction is lifted
for unions.

Initializers of aggregates or unions always initialize (recursively and
in order) their members, not "union as a whole". The representation
of a union is that of the last stored to member, plus some unspecified
bytes of other members and padding.

I might even wonder about using initializers when you
can't specify which of the union variables is being initialized.

Initializer for a union always initializes the first member.

Still, on most machines float, double, and pointers should initialize
with all bits zero, and most systems do just that.

Maybe. Maybe not.

Consider:

typedef struct {
union {
double d;
unsigned char bytes[2];
} u;
} vector_t;

int main(void)
{
int i;
for (i=0; i<2; i++) {
vector_t v = {{{1}}};

v.u.d is initialized with `1'.

assert(v.u.bytes[0] == 1);

This is unspecified, even if we know the representation of the `double' type.

v.u.bytes[1]++;
}
}

Douglas A. Gwyn · Jun 2, 2005

glen said:
Marcel said:

I would like to confirm my suspicion of a compiler
bug in gcc4 for MacOSX. The code example below expects
that the initializer of 'v' sets all elements in
v.u.bytes[] to zero, as specified by the C99 standard: (snip)
typedef struct {
union {
unsigned char bytes[2];
} u;
} vector_t;

Click to expand...

...
As I understand, though I could be wrong, the standard sets the
value of the variable to zero, not just bytes. (The distinction
is important when using memset()).

Default initialization gives zero values as if by assignment
(i.e. null pointer values for pointer types). There is a
requirement, spelled out in C99, that all-0 bits (including
padding) be a valid representation of zero for all integer
types. (Not for floating types or pointers, however.)

As some variable types might have non-zero bit representation when
the value is zero, and a union could have more than one type of
variable, it seems slightly possible that the restriction is lifted
for unions. I might even wonder about using initializers when you
can't specify which of the union variables is being initialized.

Default initialization applies to the first member of the
union. In C99 one can explicitly specify which member is
initialized, using a "designated initializer".

Keith Thompson · Jun 2, 2005

Douglas A. Gwyn said:
Default initialization gives zero values as if by assignment
(i.e. null pointer values for pointer types). There is a
requirement, spelled out in C99, that all-0 bits (including
padding) be a valid representation of zero for all integer
types. (Not for floating types or pointers, however.)

But it's still possible, I think, for some representation other than
all-bits-zero to be a valid representation of 0 for an integer type.
In other words (assuming the necessary #includes et al):

int all_bits_zero;
int numerically_zero;

memset(&all_bits_zero, 0, sizeof(int));
numerically_zero = 0;

The language requires (all_bits_zero == numerically_zero) to evaluate
to 1, but memcmp(&all_bits_zero, &numerically_zero, sizeof(int)) may
or may not return 0.

For the memcmp() to return a non-0 value, type int would have to have
padding bits, and one or more of them would have to be set to 1 by the
initialization. I wouldn't expect any system other than the DS9K to
actually behave this way.

Am I correct?

Christian Bau · Jun 2, 2005

Keith Thompson said:
But it's still possible, I think, for some representation other than
all-bits-zero to be a valid representation of 0 for an integer type.
In other words (assuming the necessary #includes et al):

int all_bits_zero;
int numerically_zero;

memset(&all_bits_zero, 0, sizeof(int));
numerically_zero = 0;

The language requires (all_bits_zero == numerically_zero) to evaluate
to 1, but memcmp(&all_bits_zero, &numerically_zero, sizeof(int)) may
or may not return 0.

For the memcmp() to return a non-0 value, type int would have to have
padding bits, and one or more of them would have to be set to 1 by the
initialization. I wouldn't expect any system other than the DS9K to
actually behave this way.

I think there is one DSP made by Texas Instruments that uses 32 bit int,
and 40 bit longs, stored in 64 bits. Registers are 32 bits, but the
processor implements 40 bit operations which use pairs of registers
(seems some people want just a little bit more than 32 bit of precision,
with little extra cost).

Lets say I write

long x = 0x5311111153;

It seems possible that an optimising compiler would generate machine
code like

"load 0x11111153 into register reg0"
"store reg0 into lower 32 bit of x"
"store reg0 into higher 32 bit of x"

setting the padding bits in x to 0x111111. Unless the compiler actively
makes sure that the padding bits are always zero, I could imagine that
the padding bits become non-zero when storing zero. Maybe if I write

unsigned long long x = 0xffffff00_00000000;
unsigned long y = (unsigned long) x;

If x is not used after the assignment, the compiler could use the same
memory for x and y, and treat the cast as a no-op because all the
non-padding bits are unchanged. y equals 0, but memcmp would not return
0. (All assuming unsigned long is 40 bits + 24 padding bits).

pete · Jun 3, 2005

Keith said:
But it's still possible, I think, for some representation other than
all-bits-zero to be a valid representation of 0 for an integer type.
In other words (assuming the necessary #includes et al):

int all_bits_zero;
int numerically_zero;

memset(&all_bits_zero, 0, sizeof(int));
numerically_zero = 0;

The language requires (all_bits_zero == numerically_zero) to evaluate
to 1, but memcmp(&all_bits_zero, &numerically_zero, sizeof(int)) may
or may not return 0.

For the memcmp() to return a non-0 value, type int would have to have
padding bits, and one or more of them would have to be set to 1 by the
initialization. I wouldn't expect any system other than the DS9K to
actually behave this way.

Am I correct?

If numerically_zero contains a bit pattern for negative zero,
it will compare equal to zero
and also have a different bit pattern from all_bits_zero.

Keith Thompson · Jun 3, 2005

Christian Bau said:
Maybe if I write

unsigned long long x = 0xffffff00_00000000;
unsigned long y = (unsigned long) x;

If x is not used after the assignment, the compiler could use the same
memory for x and y, and treat the cast as a no-op because all the
non-padding bits are unchanged. y equals 0, but memcmp would not return
0. (All assuming unsigned long is 40 bits + 24 padding bits).

More plausibly, if x is not used after the assignment, the whole thing
could be replaced by the equivalent of

unsigned long y = 0;

But if the initialization of x isn't constant, I suppose what you
describe might be possible.

Keith Thompson · Jun 3, 2005

pete said:
If numerically_zero contains a bit pattern for negative zero,
it will compare equal to zero
and also have a different bit pattern from all_bits_zero.

Is the integer constant 0 allowed to evaluate to negative zero?
(I'll check this later, when my copy of the standard is handy.)

Ben Pfaff · Jun 3, 2005

Keith Thompson said:
Is the integer constant 0 allowed to evaluate to negative zero?

No. C99 6.2.6.2:

3 If the implementation supports negative zeros, they shall be
generated only by:

- the &, |, ^, ~, <<, and >> operators with arguments that
produce such a value;

- the +, -, *, /, and % operators where one argument is a
negative zero and the result is zero;

- compound assignment operators based on the above cases.

Douglas A. Gwyn · Jun 11, 2005

Keith said:
But it's still possible, I think, for some representation other than
all-bits-zero to be a valid representation of 0 for an integer type.

Yes, in fact every ones-complement and every sign/magnitude
representation scheme provides two representations for the
value zero. Integer types (other than unsigned char) may
also contain "padding" bits that might be non-0. As a
rule, for a given type the value is uniquely determined by
the representation, but not vice versa.

Douglas A. Gwyn · Jun 11, 2005

Keith said:
Is the integer constant 0 allowed to evaluate to negative zero?

That doesn't make sense. The integer constant 0 has a value of 0.
Any integer "negative zero" representation also has a value of 0.

Keith Thompson · Jun 11, 2005

Douglas A. Gwyn said:
That doesn't make sense. The integer constant 0 has a value of 0.
Any integer "negative zero" representation also has a value of 0.

Is it legal for the following:

int x;
x = 0;

to result in a "negative zero" representation being stored in x?

kuyper · Jun 11, 2005

Keith said:
Is it legal for the following:

int x;
x = 0;

to result in a "negative zero" representation being stored in x?

No. Ben Pfaff has already (June 3) quoted the relevant clause:
6.2.6.2p3.

Marcel van Kervinck · Jun 15, 2005

In comp.std.c Michael Mair said:
This is wrong.
gcc has the default setting "-std=gnu89" which is neither C89
nor C99 and may do as GNU wants.
So, unless you can confirm that you compiled in C89 mode or the
C99-like mode _including_ "-pedantic", there is nothing to be
added.

Gcc may do what it wants with invalid programs.
My man pages say that a valid ISO C program should compile
properly without requiring -pedantic, -std or -ansi. (Expect
some rare exceptions that are not applicible here.)

And yes, also with your suggested flags the bug appeared.
No difference. The suggestion is still appreciated.

In the meantime the vendor simply went ahead and fixed
the problem. They reported to me today that in the Xcode
download of June 6 the problem is solved. I checked and
it is ok now. The gcc version is still the same (4.0.0),
so perhaps it was a bug in their build. Great response
from Apple!!

Best regards,

Marcel

#include <assert.h>

typedef struct {
union {
unsigned char bytes[2];
} u;
} vector_t;

int main(void)
{
int i;
for (i=0; i<2; i++) {
vector_t v = {{{0,}}};
assert(v.u.bytes[1] == 0);
v.u.bytes[1]++;
}
}

Click to expand...

Click to expand...

Marcel van Kervinck · Jun 15, 2005

S.Tobias said:
I would like to confirm my suspicion of a compiler
bug in gcc4 for MacOSX. The code example below expects
that the initializer of 'v' sets all elements in
v.u.bytes[] to zero, as specified by the C99 standard:
typedef struct {
union {
unsigned char bytes[2];
} u;
} vector_t;

Click to expand...

It seems that none of the posts so far have mentioned the union.

Click to expand...

The question was about initialization of a (deeply) nested member.

For the record. The question mentioned that the union was 'relevant',
yes even essential, in producing the observed behavior. Read:

kuyper · Jun 15, 2005

Marcel said:
Gcc may do what it wants with invalid programs.

Whether or not this program is invalid depends upon which standard
you're judging it against. Your first message on this group mentioned
the C99 standard, under which it is valid.

My man pages say that a valid ISO C program should compile
properly without requiring -pedantic, -std or -ansi. (Expect
some rare exceptions that are not applicible here.)

On the man page on our system, that comment is made only about
-pendantic; not about -std or -ansi. The -std=C99 option is essential
if your code makes any use of C99-specific features. Arguably, code
that uses those features might qualify as rare.

However, the statement about -pedantic is basically correct, and also
applies to -ansi. They are seldom needed to make a valid ISO C program
compile properly. Their primary purpose is to make sure than an invalid
ISO C program fails to compile, as it should.

Compiler bug in lcc-win32	31	Jul 17, 2011
Apparent bug in 5.8 wrt tied scalars	2	Nov 19, 2005
Bus error in IRB when trying: require "fox16"	6	Feb 3, 2007
strict aliasing rules in ISO C, someone understands them ?	20	Oct 13, 2005
comp.lang.c Answers to Frequently Asked Questions (FAQ List)	15	Apr 1, 2006
compiling perl 5.8.7 on Solaris 8	3	Nov 17, 2005
[ANN] JRuby 1.2.0RC1 Released	8	Feb 25, 2009
comp.lang.c Answers to Frequently Asked Questions (FAQ List)	1	Feb 1, 2004

Bug in gcc4 initialisers suspected

S.Tobias

Douglas A. Gwyn

Keith Thompson

Christian Bau

pete

Keith Thompson

Keith Thompson

Ben Pfaff

Douglas A. Gwyn

Douglas A. Gwyn

Keith Thompson

kuyper

Marcel van Kervinck

Marcel van Kervinck

kuyper

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads