Aliasing rules - int and long

O

Old Wolf

Consider the following program:

#include <stdio.h>

int main(void)
{
/* using malloc to eliminate alignment worries */
unsigned long *p = malloc( sizeof *p );

if ( p && sizeof(long) == sizeof(int) )
{
*p = 30;
printf( "%u\n", *(unsigned int *)p );
}

free(p);
return 0;
}

Is there any undefined behaviour here? The aliasing rules section
in the standard (C99 6.5) does not seem to permit this, but I can't
see how it would fail, since unsigned int and unsigned long are
required to have pure binary representations.

To clarify, I am expecting the above program will either produce
no output, or output 30, no other options.
 
P

Peter Nilsson

Old Wolf said:
Consider the following program:

#include <stdio.h>

int main(void)
{
/* using malloc to eliminate alignment worries */
unsigned long *p = malloc( sizeof *p );

if ( p && sizeof(long) == sizeof(int) )

Same size does not mean same range or representation.
{
*p = 30;
printf( "%u\n", *(unsigned int *)p );
}

free(p);
return 0;
}

Is there any undefined behaviour here? The aliasing rules section
in the standard (C99 6.5) does not seem to permit this, but I can't
see how it would fail,

There doesn't need to be an existing architecture on which
something fails in order for undefined behaviour to be undefined
behaviour.
since unsigned int and unsigned long are
required to have pure binary representations.

But there's no guarantee that the value bits in unsigned int
correspond precisely to the value bits in an unsigned long.
For example, one might be big endian, the other might be
little endian.
 
C

christian.bau

Consider the following program:

#include <stdio.h>

int main(void)
{
/* using malloc to eliminate alignment worries */
unsigned long *p = malloc( sizeof *p );

if ( p && sizeof(long) == sizeof(int) )
{
*p = 30;
printf( "%u\n", *(unsigned int *)p );
}

free(p);
return 0;
}

Is there any undefined behaviour here? The aliasing rules section
in the standard (C99 6.5) does not seem to permit this, but I can't
see how it would fail, since unsigned int and unsigned long are
required to have pure binary representations.

To clarify, I am expecting the above program will either produce
no output, or output 30, no other options.

Undefined behavior.

A compiler can assume that whenever an unsigned long is written and an
int is read, both locations are different because otherwise there
would be undefined behavior, and therefore the order of a write and a
read can be reversed under the "as if" rule. Consider the following
example:

int f (int* p, unsigned long* q)
{
*p = 1; *q = 2; return *p;
}

Here the compiler is free to generate code that will always return 1.
If you call it as

unsigned long l;
int i = f ((int *) &l, &l);

then there is undefined behavior.

(And I don't think there is any guarantee that int and unsigned long
have their value bits in the same bit positions. On the Deathstation
8000, int is bigendian and long is littleendian. On the new and
improved Deathstation 9000, all the bits in an int and and unsigned
long are in reversed order, so your example would return 0x78000000.
They are working on a new version where value bits in unsigned long
are in a different random permutation every time you start a program).
 
Y

Yevgen Muntyan

Peter said:
Same size does not mean same range or representation.


There doesn't need to be an existing architecture on which
something fails in order for undefined behaviour to be undefined
behaviour.


But there's no guarantee that the value bits in unsigned int
correspond precisely to the value bits in an unsigned long.
For example, one might be big endian, the other might be
little endian.

It would only affect the number printed. You can randomly set bits in
an unsigned long, assuming there are no padding bits, and print it.
I.e. the problems with representation here are the same as if you did

unsigned long a = 30;
unsigned int b;
memcpy(&b, &a, sizeof b);
printf("%u", b);

Only aliasing rules seem to make the original code undefined
(yes we assume no padding bits here, we can test it in runtime,
the example can be modified to avoid any problems with it).

Yevgen
 
I

Ian Collins

softwindow said:
you didn't use "include <malloc.h>"

is it right?
Context?

<malloc.h> isn't a standard header. malloc and friends are declared in
<stdlib.h>
 
P

Peter Nilsson

Yevgen Muntyan said:
It would only affect the number printed.

No, both unsigned int and unsigned long can have trap
representations, and Christian Bau has pointed out that
the purpose of aliasing undefined behaviour relates to
optimisation. So a compiler needn't see the intent of
the source.
You can randomly set bits in an unsigned long, assuming there
are no padding bits, and print it.

If you're going to assume a vanilla machine, then there's little
point discussing standard C semantics.

Can I ask you (and the OP): Why is so important to be able
to alias ints through longs and vice versa? What exactly
is wrong with the normal conversion by value? It has
considerably fewer problems than aliasing.
 
Y

Yevgen Muntyan

Peter said:
No, both unsigned int and unsigned long can have trap
representations,

Not if there are no padding bits.
If you're going to assume a vanilla machine, then there's little
point discussing standard C semantics.

Um, are you saying there is no point in discussion standard C
semantics because on vanilla machine this stuff just works?
Compilers which follow letter of standard and break existing
code is not exactly news, even on those "PC" computers.
The OP's example might not be strict enough, but it can
easily check presence of padding bytes. There is huge difference
between code which invokes undefined or unspecified behavior on
some implementation and code which doesn't, even if this code is
not strictly-conforming.
Can I ask you (and the OP): Why is so important to be able
to alias ints through longs and vice versa?

You missed an important thing here. That was malloc'ed area,
and its first bytes were set using assignment; there wasn't
a 'real' object declared to have type long or int. Question is:
why is that different from memcpy(), what do aliasing rules
mean.

double *p = malloc (sizeof (double));
*p = 3.14;
printf("%u\n", *((unsigned*)p));

void *p = malloc (sizeof (double));
*((double*)p) = 3.14;
printf("%u\n", *((unsigned*)p));

double d = 3.14;
unsigned u;
void *p = malloc (sizeof (double));
memcpy(p, &d, sizeof d);
memcpy(&u, p, sizeof u);
printf("%u\n", u);

What is permitted and what isn't? Then replace double with
unsigned long from OP example.
What exactly
is wrong with the normal conversion by value? It has
considerably fewer problems than aliasing.

And may have totally different properties from type punning.
Certainly nothing is wrong with normal conversions.

Yevgen
 
P

Peter Nilsson

Yevgen Muntyan said:
Not if there are no padding bits.

Agreed, but if you need to assume no padding bits, then the
construct isn't useful in a maximal portability sense, is it?
Um, are you saying there is no point in discussion standard C
semantics because on vanilla machine this stuff just works?

I'm saying if you _have_ to make vanilla assumptions to feel
confident about such code, then you've missed the point of
comp.lang.c.
Compilers which follow letter of standard and break existing
code is not exactly news,

You seem to imply that the existing code is correct and the
standard is wrong. In some cases that may well be true, but
I don't see it being the case in the example given.

Fact is, most existing code is just 'lazy', and you seem to
be asking why the standard doesn't allow a particular form
of lazyness explicitly. The real question is, why should it?
even on those "PC" computers.
The OP's example might not be strict enough, but it can
easily check presence of padding bytes.

And what would be the point? Why do you think the 'real problem',
i.e. whatever circumstance has backed the OP into the corner
of wanting to alias an int with a long (or whatever), can't be
better solved using less shakey code that is just as efficient
on vanilla machines?

You missed an important thing here.
That was malloc'ed area,
and its first bytes were set using assignment;
there wasn't a 'real' object declared to have type long
or int.

It became a 'real' object with the assignment [cf effective
type.]
Question is: why is that different from memcpy(),
what do aliasing rules mean.

memcpy isn't about type punning, it's about copying memory
_without_ regard to type.

[Aside: I think it would have been better if C had a separate
type for byte that independant of char, but that wasn't to be.]
double *p = malloc (sizeof (double));
*p = 3.14;
printf("%u\n", *((unsigned*)p));
What is permitted and what isn't?

Good question, but I don't see how bad examples will help you
find good uses.
Then replace double with unsigned long from OP example.

To my eye, this replacement isn't any more useful than the
previous double example.
And may have totally different properties from type punning.

Yes. For a start it's more likely to be well defined, accurate
and useful.
Certainly nothing is wrong with normal conversions.

To play advocate here, there are some problems with characters
and character types to do with aliasing (e.g. explore the value
of é as a constant and as an input character), but the non
character examples so far are based on misguided precept that
it is useful to look at an int as a long. I just don't see the
use.

I think you would do better to explore Christian's optimisation
comments. The standard makes explicit exception for _genuine_
union usages; usages that seem to have been ignored in
preference to pointless discussions on reading longs as ints!
I think you should investigate the possibility of reading ints
as _ints_ through distinct structs sharing common initial
sequences.
 
Y

Yevgen Muntyan

Peter said:
Agreed, but if you need to assume no padding bits, then the
construct isn't useful in a maximal portability sense, is it?

I see. Then you needed to say this about the original piece of
code which assumes sizeof(long) == sizeof (int). Somehow I thought
you are fine with "if this and this holds, does standard guarantee
that is true?".
I'm saying if you _have_ to make vanilla assumptions to feel
confident about such code, then you've missed the point of
comp.lang.c.

I doubt it. Making certain assumptions and trying to see what
standard guarantees under those assumptions is not something
off-topic here. Depends on assumptions, of course.
You seem to imply that the existing code is correct and the
standard is wrong.

I don't.
In some cases that may well be true, but
I don't see it being the case in the example given.

Fact is, most existing code is just 'lazy', and you seem to
be asking why the standard doesn't allow a particular form
of lazyness explicitly.

Nope. I am asking *whether* it allows certain things.
The real question is, why should it?

It should not. It either does or does not. Do you have problems
with people who want to know what is allowed and why?
And what would be the point? Why do you think the 'real problem',
i.e. whatever circumstance has backed the OP into the corner
of wanting to alias an int with a long (or whatever),

I don't think he asked the question because of some particular
real situation where he actually does that thing.
can't be
better solved using less shakey code that is just as efficient
on vanilla machines?

You missed an important thing here.
That was malloc'ed area,
and its first bytes were set using assignment;
there wasn't a 'real' object declared to have type long
or int.

It became a 'real' object with the assignment [cf effective
type.]
Question is: why is that different from memcpy(),
what do aliasing rules mean.

memcpy isn't about type punning, it's about copying memory
_without_ regard to type.

In fact, it seems 6.5p6 says memcpy() would have the very same effect
as the assignment, the effective type becomes the type of the object
whose value was copied.
[Aside: I think it would have been better if C had a separate
type for byte that independant of char, but that wasn't to be.]
double *p = malloc (sizeof (double));
*p = 3.14;
printf("%u\n", *((unsigned*)p));
What is permitted and what isn't?

Good question, but I don't see how bad examples will help you
find good uses.

Suppose you're given piece of code (e.g. you want to use some
library with that piece of code). Are you going to throw it
away and rewrite everything to your tastes? Or is it fine to
use if you know it's actually correct? What do you do if you
can't say if it's correct? Say, the person who wrote it might
have been an experienced guy who didn't have any troubles
understanding certain things, and the code is perfectly valid
and everything. Or that person might have not known some things
and the code is crap. What then?
If you think "it's not important because it's useless", then
you either are wrong or you can predict future.
You know, "Don't do that" is not a 100%-working recipe.
(I won't try to tell about knowledge for the sake of knowledge
or knowledge which helps to better and deeper understand
language and all those fancy things)
To my eye, this replacement isn't any more useful than the
previous double example.


Yes. For a start it's more likely to be well defined,

This is your problem, you're fine with "this is most likely defined"
and this is "more likely undefined". I'd like to be sure if
something is undefined, and then to know why it is undefined.

....
To play advocate here, there are some problems with characters
and character types to do with aliasing (e.g. explore the value
of é as a constant and as an input character), but the non
character examples so far are based on misguided precept that
it is useful to look at an int as a long. I just don't see the
use.

I don't see the use either. I still want to know why it's UB.
So that perhaps when I see a thing *similar* to it but disguised
well in code written by a guy who *did* see the use, I could decide
if the code is okay or not.

Yevgen
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,534
Members
45,007
Latest member
obedient dusk

Latest Threads

Top