List of undefined behaviour and other sneeky bugs

J

John Reye

1)
a = i++;

http://c-faq.com/expr/evalorder1.html

So this attempt at optimisation is wrong:

int i;
int a[10];
for (i = 0; i < sizeof(a); )
a = i++; /* set a to {1, 2, 3, ...} */

Correctly optimized, it should be:
for (i = 0; i < sizeof(a); )
i = a = i+1; /* set a to {1, 2, 3, ...} */



2)

int i;
char b[10];
(int *)b = i;

Alignment problem. There is not guarantee that b is so aligned, that
it's address satisfies alignment-requirements, which an int-pointer
would need (such as even address, or maby "address divisible by 4", or
whatever it happens to be)


3)

struct
{
char a;
int b;
} mystruct;

// set byte at lowest address within b to 0x12;
char *p = &mystruct;
p[1] = 0x12;


Bug: struct padding was forgotten. There will probably be some padding
between a and b, so that b is aligned for good memory-access.

Rather use:
*((char *)&mystruct.b) = 0x12;


Or:
struct
{
char a;
union {
int b;
char c;
};
} mystruct;
mystruct.c = 0x12

Would this union work on every platform??




Which undefined behaviour (or other bugs) do you think is interesting
to know about? :)
Thanks.
 
J

Jens Thoms Toerring

John Reye said:
1)
a = i++;

So this attempt at optimisation is wrong:

int i;
int a[10];
for (i = 0; i < sizeof(a); )
a = i++; /* set a to {1, 2, 3, ...} */

Correctly optimized, it should be:
for (i = 0; i < sizeof(a); )
i = a = i+1; /* set a to {1, 2, 3, ...} */


I don
't know what this has to do with "optimiation" but, tes, the
second version is more correct. But there's still a problemL
you use of sizeif(a) as the value for the end of the loop. But
sizeof(a) is the number of bytes in that array, not the number
of elements (and these are typically, except for char arrays,
different). Use instead "sizeof a / sizeof *a" (and make sure
that 'a' is an array and not merely a pointer).
2)
int i;
char b[10];
(int *)b = i;
Alignment problem. There is not guarantee that b is so aligned, that
it's address satisfies alignment-requirements, which an int-pointer
would need (such as even address, or maby "address divisible by 4", or
whatever it happens to be)

True. If, for some reasons, you must do that use memcpy().
3)
struct
{
char a;
int b;
} mystruct;
// set byte at lowest address within b to 0x12;
char *p = &mystruct;
p[1] = 0x12;
Bug: struct padding was forgotten. There will probably be some padding
between a and b, so that b is aligned for good memory-access.
Rather use:
*((char *)&mystruct.b) = 0x12;

Correct but also don't do that unless you have very good reasons
- why would you set just one of the bytes of an int? Depending
on the endianness etc. of your machine the result when the 'b'
member then is used as an int will be quite different.
Or:
struct
{
char a;
union {
int b;
char c;
};
} mystruct;
mystruct.c = 0x12
Would this union work on every platform??

Yes. But the you won't be able to use the 'b' element of
the union since reading a different member than has been
set the last time round also invokes undefined behavior.
Which undefined behaviour (or other bugs) do you think is interesting
to know about? :)

All of them;-) There's a complete list at the end of the C
standard (at least in C89, see A.6.2, and C99, see Annex J2).

Regards, Jens
 
J

John Reye

Thanks for the warning about sizeof!! I missed that.

Yes. But the you won't be able to use the 'b' element of
the union since reading a different member than has been
set the last time round also invokes undefined behavior.

Oh no, please no. Why on earth would that be undefined behaviour?

mystruct.c = 0x12;
int tmp = mystruct.b;
f(tmp);

Do you really mean, that the code above does not guarantee that the
lowest byte-address of tmp is set to 0x12??
I simply cannot believe that. It it is so, then please suggest to me
how I can solve this.
 
B

Ben Pfaff

John Reye said:
Thanks for the warning about sizeof!! I missed that.



Oh no, please no. Why on earth would that be undefined behaviour?

C99 6.2.6.1 "Language" says:

7 When a value is stored in a member of an object of union
type, the bytes of the object representation that do not
correspond to that member but do correspond to other members
take unspecified values, but the value of the union object
shall not thereby become a trap representation.

So you can't portably rely on the value of 'b' after assigning to
'c'.
 
J

Jens Thoms Toerring

John Reye said:
Thanks for the warning about sizeof!! I missed that.
Oh no, please no. Why on earth would that be undefined behaviour?
mystruct.c = 0x12;
int tmp = mystruct.b;
f(tmp);
Do you really mean, that the code above does not guarantee that the
lowest byte-address of tmp is set to 0x12??

That's exactly what it does - it sets the byte at the lowest
memory address. The problem is only: what do you get when you
now read the whole int? On a little-endian machine this will
have modified the least significant byte while on a big-endian
machine the most significant byte. Thus the standard isn't able
to define what value you will get when you read 'b' after you
have written something to 'c'. Or a different example: when you
have

union {
double a;
int b;
} my_union;

my_union.b = 3;
printf( "%f\n", my_union.a );

What would you expect to get? Or how could doing something
like that be defined properly?
Regards, Jens
 
J

John Reye

C99 6.2.6.1 "Language" says:

7    When a value is stored in a member of an object of union
     type, the bytes of the object representation that do not
     correspond to that member but do correspond to other members
     take unspecified values, but the value of the union object
     shall not thereby become a trap representation.

So you can't portably rely on the value of 'b' after assigning to
'c'.

Thanks for the exact C-standard reference.
Still: this is very unsettling.

Would this solve the issue ?? ->

struct
{
char a;
union {
int b;
struct {
char byte0;
char byte1;
char byte2;
char byte3;
};
};
} mystruct;
mystruct.byte0 = 0x12;
int tmp = mystruct.b;
f(b);

If it does solve the issue, and the lowest-address-byte of tmp is
0x12, then:
Ahhh it is not portable. Other ints have only 2 bytes.
Is there a portable way of fixing this??
 
B

Ben Pfaff

John Reye said:
On May 3, 8:09 pm, Ben Pfaff wrote:
Would this solve the issue ?? ->

struct
{
char a;
union {
int b;
struct {
char byte0;
char byte1;
char byte2;
char byte3;
};
};
} mystruct;
mystruct.byte0 = 0x12;
int tmp = mystruct.b;
f(b);

If it does solve the issue, and the lowest-address-byte of tmp is
0x12, then:
Ahhh it is not portable. Other ints have only 2 bytes.
Is there a portable way of fixing this??

Can you tell us about your goal, as opposed to telling us about
your proposed solution? I can think of multiple solutions, but I
don't understand your problem.
 
J

John Reye

That's exactly what it does - it sets the byte at the lowest
memory address.
OK, so then there is no problem with reading b, right.
I always did say: lowest byte-address.


What you write there is a different issue:
The problem is only: what do you get when you
now read the whole int? On a little-endian machine this will
have modified the least significant byte while on a big-endian
machine the most significant byte. Thus the standard isn't able
to define  what value you will get when you read 'b' after you
have written something to 'c'.
Well yes, because there are machines with different endianness.

But I can get around that quite easy like this:

#include <stdio.h>

#define LITTLE_ENDIAN /* undef it, if big endian!! */

#ifdef LITTLE_ENDIAN

struct byte4 {
unsigned char byte0;
unsigned char byte1;
unsigned char byte2;
unsigned char byte3;
};
#else

struct byte4 {
unsigned char byte3;
unsigned char byte2;
unsigned char byte1;
unsigned char byte0;
};

#endif

int main(void)
{
struct
{
char a;
union {
int b;
struct byte4 b4;
};
} mystruct;
mystruct.b4.byte0 = 0x12; // this sets least signif. byte!
printf("%x\n", mystruct.b);
}
 
J

John Reye

Can you tell us about your goal, as opposed to telling us about
your proposed solution?  I can think of multiple solutions, but I
don't understand your problem.


Hang on. I think I just understood the following:


Yes. But the you won't be able to use the 'b' element of
the union since reading a different member than has been
set the last time round also invokes undefined behavior.

So you can't portably rely on the value of 'b' after assigning to
'c'.

I thought this means that after writing c
mystruct.c = 0x12;
there is no guarantee that the lowest-byte address of mystruct.b is
0x12.

This is what got me all confused.
So it was a misunderstanding, from my side.

Ultimately: we cannot rely on the VALUE of mystruct.b, if the VALUE is
interpreted as integer.
However the lowest-byte-address within mystruct.b is guaranteed to be
0x12.
 
J

John Reye

To lesson the confusion.
Note: My last 2 posts are unrelated.

AAA)
In the post that has the code
#ifdef LITTLE_ENDIAN
I set the LEAST SIGNIFICANT BYTE of mystruct.b to 0x12.

BBB)
In all other posts above, I set the LOWEST-ADDRESSED-BYTE WITHIN
mystruct.b to 0x12.


AAA) and BBB) are not the same because of the endianness issue:
little endian is not big endian.
 
B

Ben Pfaff

John Reye said:
I thought this means that after writing c
mystruct.c = 0x12;
there is no guarantee that the lowest-byte address of mystruct.b is
0x12.

This is what got me all confused.
So it was a misunderstanding, from my side.

Ultimately: we cannot rely on the VALUE of mystruct.b, if the VALUE is
interpreted as integer.
However the lowest-byte-address within mystruct.b is guaranteed to be
0x12.

I believe that that statement is correct. That is, if you read
mystruct.b as an array of bytes, then the first byte will be
0x12. But the value of mystruct.b as a whole is unspecified, and
it might even be a trap representation (though I do not know of
any implementation where that would happen).
 
J

Jens Thoms Toerring

John Reye said:
OK, so then there is no problem with reading b, right.
I always did say: lowest byte-address.

What you write there is a different issue:
Well yes, because there are machines with different endianness.
But I can get around that quite easy like this:
#include <stdio.h>
#define LITTLE_ENDIAN /* undef it, if big endian!! */
#ifdef LITTLE_ENDIAN
struct byte4 {
unsigned char byte0;
unsigned char byte1;
unsigned char byte2;
unsigned char byte3;
};
#else
struct byte4 {
unsigned char byte3;
unsigned char byte2;
unsigned char byte1;
unsigned char byte0;
};

int main(void)
{
struct
{
char a;
union {
int b;
struct byte4 b4;
};
} mystruct;
mystruct.b4.byte0 = 0x12; // this sets least signif. byte!
printf("%x\n", mystruct.b);
}

Yes, if a) you can determine safely if your program is compiled
on a little- or big-endian machine (and there could, at least in
principle, exist machines with more weird bit-representations)
and b) you restrict yourself to machines were sizeof(int) is 4
- but in principle sizeof(int) can be any number not less than 1.

The question already asked by someone else is what you
achieve with this? There is a much simpler way to modify
the the least significant byte of an int (well, assuming
tha a byte consists of 8 bits only, which also isn't a
given, otherwise you'd have to use CHAR_BIT in some clever
way):

b = ( b & ~ 0xFFU ) | 0x12;

This will work independent on endianess and sizeof(int)
since it doesn't make assumptions about the way the
bits and bytes are ordered - the compiler has to do all
that boring work.
Regards, Jens
 
K

Keith Thompson

John Reye said:
For the wildest undefined behaviour of all (in my opinion of course):

char a;
char *p = &a;

a = 1;
*p = 2;
printf("%d\n", a);


It seems that the C standard does not guarantee, that the value 2 will
get printed in the above code!

See
http://groups.google.com/group/comp...d/e02720057d4d406a?hl=en#msg_5fc82888eacd4f95

I believe Jens Gustedt is mistaken. The behavior of the above code is
well defined.

N1570 6.2.4p2:

An object exists, has a constant address, and retains its
last-stored value throughout its lifetime.

The statement

*p = 2;

stores the value 2 in a; that's its "last-stored value", which it
retains through the end of its lifetime or until a new value is stored.
 
K

Keith Thompson

John Reye said:
struct
{
char a;
union {
int b;
char c;
};
} mystruct;
mystruct.c = 0x12

Would this union work on every platform??

You have an unnamed union. Some compilers support this as an extension,
but it's not valid in standard C.

You can fix it like this:

struct {
char a;
union {
int b;
char c;
} u;
} mystruct;
mystruct.u.c = 0x12;

Note that you have to refer to mystruct.u.c, not mystruct.c, since c is
a member of a union which is a member of a struct.

As for whether it "works", that depends on what you want it to do. It
stores the value 0x12 in mystruct.u.c, which is also the first byte of
mystruct.u.b. If you access the first byte of mystruct.u.b,
*(char*)&mystruct.u.b, you'll get 0x12. If you access mystruct.u.b as
an int, you'll get garbage, possibly a trap representation.

I can't think of anything useful you could do with this. If you want to
access the char value you just stored, you can refer to mystruct.u.c.
 
J

John Reye

There is a much simpler way to modify
the the least significant byte of an int (well, assuming
tha a byte consists of 8 bits only, which also isn't a
given, otherwise you'd have to use CHAR_BIT in some clever
way):

b = ( b & ~ 0xFFU ) | 0x12;

This will work independent on endianess and sizeof(int)
Ah yes of course.

If you don't want to worry about CHAR_BIT, just use:
b = ( b & ~ ((unsigned char)-1U) ) | 0x12;

About ((unsigned char)-1U)
see here:
http://groups.google.com/group/comp...f/a0a59b00091619de?hl=en#msg_1716f8836ce809a0



By the way: sometimes there might be good reasons to use unions.
Here is an example to set the most significant byte of an unsigned
long long to 0x12U.
I'm pretty sure that nothing will be faster than this ->

typedef unsigned long long my_ull;

union ACCESS_FAST {
my_ull v;
unsigned char ca[sizeof(my_ull)];
};

my_ull var;
((union ACCESS_FAST *) &var)->ca[sizeof(union ACCESS_FAST)-1] =
0x12U;
printf("%llx\n", var);

Is there any problem with this code?

Alternative:

union ACCESS_FAST var2;
var.ca[sizeof(var2.ca)-1] = 0x12U;
printf("%llx\n", var2.v);
 
J

John Reye

As for whether it "works", that depends on what you want it to do.  It
stores the value 0x12 in mystruct.u.c, which is also the first byte of
mystruct.u.b.  If you access the first byte of mystruct.u.b,
*(char*)&mystruct.u.b, you'll get 0x12.  If you access mystruct.u.b as
an int, you'll get garbage, possibly a trap representation.
Will I really get garbage? I think not. The lowest-addressed byte is
0x12, and the other bytes of mystruct.u.b are what they were before,
right?
A trap? How can that possibly be justified? Why does the C language
have unions then? What is the intended use of unions?

Thanks
 
B

Ben Pfaff

John Reye said:
Why does the C language have unions then? What is the intended
use of unions?

"Tagged" unions are useful for, among other purposes,
constructing trees in which different nodes may contain different
data.

enum node_type {
/* Terminal. */
INT,
REAL,

/* Nonterminal. */
ADD,
SUB,
MUL,
DIV
};

struct node {
enum node_type type;
union {
int integer; /* INT */
double real; /* REAL */
struct node *children[2]; /* ADD, SUB, MUL, DIV */
} u;
};
 
J

John Reye

John Reye said:
Why does the C language have unions then? What is the intended
use of unions?

"Tagged" unions are useful for, among other purposes,
constructing trees in which different nodes may contain different
data.

enum node_type {
        /* Terminal. */
        INT,
        REAL,

        /* Nonterminal. */
        ADD,
        SUB,
        MUL,
        DIV

};

struct node {
        enum node_type type;
        union {
                int integer;                 /* INT */
                double real;                 /* REAL */
                struct node *children[2];    /* ADD, SUB, MUL, DIV */
        } u;







};

So if you use a particular union access member... for a particular
union in memory, then you should always access that particular union-
variable, with the same member?

Is that the only intended use?

In particular: is it a misuse of unions, to use them in order to
enable different memory-accesses (word, byte, etc.) to the same
portion of memory???? I have (mis)used it like this in above posts
quite often.
Is that strictly wrong?

Thanks.
 
J

Jens Thoms Toerring

Will I really get garbage? I think not. The lowest-addressed byte is
0x12, and the other bytes of mystruct.u.b are what they were before,
right?

As you have seen there are different architectures, having
different ways of representing e.g. an int - it can start
wtith the most significant bute at the lowest address or
with the least significant byte (or some other byte - the
C standard must be written in a way that it can be used
on all possible architectures). Thus when you change just
the byte at the lowest adress the value you get when you
read out the whole int can be different on different archi-
tectures.

What is meant when the C standard says that something is un-
defined is that it can't promise a certain outcome on all
possible architectures. Nothing more. Thus, if you want to
write a truely portable program you must avoid all kinds of
undefined behaviour. But if you write platform dependent
code and the platform defines what the C standard doesn't
it's completely fine to just use it, you just have to be
aware of that. Actually, without undefined behaviour you
wouldn't be able to write platform-dependent code, so it
would be impossible to e.g. write an operating system or
a device driver.
A trap? How can that possibly be justified?

By the existence of machines with properties that may
seem unusual to you but for which probably good reasons
existed. The C standard can only mandate behaviour that
has a chance to be implemented on as many architectures
as possible - otherwise it would be impossible to write
a C compiler for them. So it's restricted to a smallest
common divder. And that's exactly one of the primary rea-
sons why C has been that successful and is still going
strong after 40 years - it's possible to have a C com-
piler for a supercomputer and for whatever system runs
a wrist-watch, a phone etc., and a program written in
standard C will run the same way on each of them.
Why does the C language
have unions then? What is the intended use of unions?

So that you can store values of different types in the
same location. But, of course, it hardly makes sense to
expect that after one has written a value of type A in-
to that location to be able to read something of a dif-
ferent type from it. Or do you often put a handkerchief
into a drawer and, when you open it again, it's a rabbit?

Regards, Jens
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,764
Messages
2,569,567
Members
45,041
Latest member
RomeoFarnh

Latest Threads

Top