Bug in gcc4 initialisers suspected

  • Thread starter Marcel van Kervinck
  • Start date
S

S.Tobias

[Newsgroups restricted to c.l.c. only]

In comp.lang.c glen herrmannsfeldt said:
Marcel van Kervinck wrote:
Dear all,

I would like to confirm my suspicion of a compiler
bug in gcc4 for MacOSX. The code example below expects
that the initializer of 'v' sets all elements in
v.u.bytes[] to zero, as specified by the C99 standard:
typedef struct {
union {
unsigned char bytes[2];
} u;
} vector_t;
It seems that none of the posts so far have mentioned the union.

The question was about initialization of a (deeply) nested member.
As I understand, though I could be wrong, the standard sets the
value of the variable to zero, not just bytes. (The distinction
is important when using memset()).

I don't understand what you're trying to say. Everything is
about values, not bytes or representations.
As some variable types might have non-zero bit representation when
the value is zero, and a union could have more than one type of
variable, it seems slightly possible that the restriction is lifted
for unions.

Initializers of aggregates or unions always initialize (recursively and
in order) their members, not "union as a whole". The representation
of a union is that of the last stored to member, plus some unspecified
bytes of other members and padding.
I might even wonder about using initializers when you
can't specify which of the union variables is being initialized.

Initializer for a union always initializes the first member.
Still, on most machines float, double, and pointers should initialize
with all bits zero, and most systems do just that.

Maybe. Maybe not.
Consider:
typedef struct {
union {
double d;
unsigned char bytes[2];
} u;
} vector_t;
int main(void)
{
int i;
for (i=0; i<2; i++) {
vector_t v = {{{1}}};
v.u.d is initialized with `1'.
assert(v.u.bytes[0] == 1);
This is unspecified, even if we know the representation of the `double' type.
v.u.bytes[1]++;
}
}
 
D

Douglas A. Gwyn

glen said:
Marcel said:
I would like to confirm my suspicion of a compiler
bug in gcc4 for MacOSX. The code example below expects
that the initializer of 'v' sets all elements in
v.u.bytes[] to zero, as specified by the C99 standard: (snip)
typedef struct {
union {
unsigned char bytes[2];
} u;
} vector_t;
...
As I understand, though I could be wrong, the standard sets the
value of the variable to zero, not just bytes. (The distinction
is important when using memset()).

Default initialization gives zero values as if by assignment
(i.e. null pointer values for pointer types). There is a
requirement, spelled out in C99, that all-0 bits (including
padding) be a valid representation of zero for all integer
types. (Not for floating types or pointers, however.)
As some variable types might have non-zero bit representation when
the value is zero, and a union could have more than one type of
variable, it seems slightly possible that the restriction is lifted
for unions. I might even wonder about using initializers when you
can't specify which of the union variables is being initialized.

Default initialization applies to the first member of the
union. In C99 one can explicitly specify which member is
initialized, using a "designated initializer".
 
K

Keith Thompson

Douglas A. Gwyn said:
Default initialization gives zero values as if by assignment
(i.e. null pointer values for pointer types). There is a
requirement, spelled out in C99, that all-0 bits (including
padding) be a valid representation of zero for all integer
types. (Not for floating types or pointers, however.)

But it's still possible, I think, for some representation other than
all-bits-zero to be a valid representation of 0 for an integer type.
In other words (assuming the necessary #includes et al):

int all_bits_zero;
int numerically_zero;

memset(&all_bits_zero, 0, sizeof(int));
numerically_zero = 0;

The language requires (all_bits_zero == numerically_zero) to evaluate
to 1, but memcmp(&all_bits_zero, &numerically_zero, sizeof(int)) may
or may not return 0.

For the memcmp() to return a non-0 value, type int would have to have
padding bits, and one or more of them would have to be set to 1 by the
initialization. I wouldn't expect any system other than the DS9K to
actually behave this way.

Am I correct?
 
C

Christian Bau

Keith Thompson said:
But it's still possible, I think, for some representation other than
all-bits-zero to be a valid representation of 0 for an integer type.
In other words (assuming the necessary #includes et al):

int all_bits_zero;
int numerically_zero;

memset(&all_bits_zero, 0, sizeof(int));
numerically_zero = 0;

The language requires (all_bits_zero == numerically_zero) to evaluate
to 1, but memcmp(&all_bits_zero, &numerically_zero, sizeof(int)) may
or may not return 0.

For the memcmp() to return a non-0 value, type int would have to have
padding bits, and one or more of them would have to be set to 1 by the
initialization. I wouldn't expect any system other than the DS9K to
actually behave this way.

I think there is one DSP made by Texas Instruments that uses 32 bit int,
and 40 bit longs, stored in 64 bits. Registers are 32 bits, but the
processor implements 40 bit operations which use pairs of registers
(seems some people want just a little bit more than 32 bit of precision,
with little extra cost).

Lets say I write

long x = 0x5311111153;

It seems possible that an optimising compiler would generate machine
code like

"load 0x11111153 into register reg0"
"store reg0 into lower 32 bit of x"
"store reg0 into higher 32 bit of x"

setting the padding bits in x to 0x111111. Unless the compiler actively
makes sure that the padding bits are always zero, I could imagine that
the padding bits become non-zero when storing zero. Maybe if I write

unsigned long long x = 0xffffff00_00000000;
unsigned long y = (unsigned long) x;

If x is not used after the assignment, the compiler could use the same
memory for x and y, and treat the cast as a no-op because all the
non-padding bits are unchanged. y equals 0, but memcmp would not return
0. (All assuming unsigned long is 40 bits + 24 padding bits).
 
P

pete

Keith said:
But it's still possible, I think, for some representation other than
all-bits-zero to be a valid representation of 0 for an integer type.
In other words (assuming the necessary #includes et al):

int all_bits_zero;
int numerically_zero;

memset(&all_bits_zero, 0, sizeof(int));
numerically_zero = 0;

The language requires (all_bits_zero == numerically_zero) to evaluate
to 1, but memcmp(&all_bits_zero, &numerically_zero, sizeof(int)) may
or may not return 0.

For the memcmp() to return a non-0 value, type int would have to have
padding bits, and one or more of them would have to be set to 1 by the
initialization. I wouldn't expect any system other than the DS9K to
actually behave this way.

Am I correct?

If numerically_zero contains a bit pattern for negative zero,
it will compare equal to zero
and also have a different bit pattern from all_bits_zero.
 
K

Keith Thompson

Christian Bau said:
Maybe if I write

unsigned long long x = 0xffffff00_00000000;
unsigned long y = (unsigned long) x;

If x is not used after the assignment, the compiler could use the same
memory for x and y, and treat the cast as a no-op because all the
non-padding bits are unchanged. y equals 0, but memcmp would not return
0. (All assuming unsigned long is 40 bits + 24 padding bits).

More plausibly, if x is not used after the assignment, the whole thing
could be replaced by the equivalent of

unsigned long y = 0;

But if the initialization of x isn't constant, I suppose what you
describe might be possible.
 
K

Keith Thompson

pete said:
If numerically_zero contains a bit pattern for negative zero,
it will compare equal to zero
and also have a different bit pattern from all_bits_zero.

Is the integer constant 0 allowed to evaluate to negative zero?
(I'll check this later, when my copy of the standard is handy.)
 
B

Ben Pfaff

Keith Thompson said:
Is the integer constant 0 allowed to evaluate to negative zero?

No. C99 6.2.6.2:

3 If the implementation supports negative zeros, they shall be
generated only by:

- the &, |, ^, ~, <<, and >> operators with arguments that
produce such a value;

- the +, -, *, /, and % operators where one argument is a
negative zero and the result is zero;

- compound assignment operators based on the above cases.
 
D

Douglas A. Gwyn

Keith said:
But it's still possible, I think, for some representation other than
all-bits-zero to be a valid representation of 0 for an integer type.

Yes, in fact every ones-complement and every sign/magnitude
representation scheme provides two representations for the
value zero. Integer types (other than unsigned char) may
also contain "padding" bits that might be non-0. As a
rule, for a given type the value is uniquely determined by
the representation, but not vice versa.
 
D

Douglas A. Gwyn

Keith said:
Is the integer constant 0 allowed to evaluate to negative zero?

That doesn't make sense. The integer constant 0 has a value of 0.
Any integer "negative zero" representation also has a value of 0.
 
K

Keith Thompson

Douglas A. Gwyn said:
That doesn't make sense. The integer constant 0 has a value of 0.
Any integer "negative zero" representation also has a value of 0.

Is it legal for the following:

int x;
x = 0;

to result in a "negative zero" representation being stored in x?
 
K

kuyper

Keith said:
Is it legal for the following:

int x;
x = 0;

to result in a "negative zero" representation being stored in x?

No. Ben Pfaff has already (June 3) quoted the relevant clause:
6.2.6.2p3.
 
M

Marcel van Kervinck

In comp.std.c Michael Mair said:
This is wrong.
gcc has the default setting "-std=gnu89" which is neither C89
nor C99 and may do as GNU wants.
So, unless you can confirm that you compiled in C89 mode or the
C99-like mode _including_ "-pedantic", there is nothing to be
added.

Gcc may do what it wants with invalid programs.
My man pages say that a valid ISO C program should compile
properly without requiring -pedantic, -std or -ansi. (Expect
some rare exceptions that are not applicible here.)

And yes, also with your suggested flags the bug appeared.
No difference. The suggestion is still appreciated.

In the meantime the vendor simply went ahead and fixed
the problem. They reported to me today that in the Xcode
download of June 6 the problem is solved. I checked and
it is ok now. The gcc version is still the same (4.0.0),
so perhaps it was a bug in their build. Great response
from Apple!!

Best regards,

Marcel
#include <assert.h>

typedef struct {
union {
unsigned char bytes[2];
} u;
} vector_t;

int main(void)
{
int i;
for (i=0; i<2; i++) {
vector_t v = {{{0,}}};
assert(v.u.bytes[1] == 0);
v.u.bytes[1]++;
}
}
 
M

Marcel van Kervinck

S.Tobias said:
I would like to confirm my suspicion of a compiler
bug in gcc4 for MacOSX. The code example below expects
that the initializer of 'v' sets all elements in
v.u.bytes[] to zero, as specified by the C99 standard:
typedef struct {
union {
unsigned char bytes[2];
} u;
} vector_t;
It seems that none of the posts so far have mentioned the union.
The question was about initialization of a (deeply) nested member.


For the record. The question mentioned that the union was 'relevant',
yes even essential, in producing the observed behavior. Read:
 
K

kuyper

Marcel said:
Gcc may do what it wants with invalid programs.

Whether or not this program is invalid depends upon which standard
you're judging it against. Your first message on this group mentioned
the C99 standard, under which it is valid.
My man pages say that a valid ISO C program should compile
properly without requiring -pedantic, -std or -ansi. (Expect
some rare exceptions that are not applicible here.)

On the man page on our system, that comment is made only about
-pendantic; not about -std or -ansi. The -std=C99 option is essential
if your code makes any use of C99-specific features. Arguably, code
that uses those features might qualify as rare.

However, the statement about -pedantic is basically correct, and also
applies to -ansi. They are seldom needed to make a valid ISO C program
compile properly. Their primary purpose is to make sure than an invalid
ISO C program fails to compile, as it should.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,770
Messages
2,569,584
Members
45,077
Latest member
SangMoor21

Latest Threads

Top