Function casting - UB?

N

nroberts

In C++ this would be totally undefined. How about in C? It works on
my machine with my compiler...

If it's UB, is it one of those rules that pretty much has to work
anyway? What I mean is things like the fact that it's UB to assign to
one part of a union and read from another but it generally works and
is a pretty important construct. Is this like that?

#include <stdio.h>

typedef void (*fun)(int, int);

void f(int i) {
int q = 33;
printf("%d %d\n", i, q);
}

int main(void) {
fun funptr = (fun)f;
funptr(3,2);

return 0;
}

output => 3 33
 
E

Eric Sosman

In C++ this would be totally undefined. How about in C? It works on
my machine with my compiler...

If it's UB, is it one of those rules that pretty much has to work
anyway? What I mean is things like the fact that it's UB to assign to
one part of a union and read from another but it generally works and
is a pretty important construct. Is this like that?

#include <stdio.h>

typedef void (*fun)(int, int);

void f(int i) {
int q = 33;
printf("%d %d\n", i, q);
}

int main(void) {
fun funptr = (fun)f;
funptr(3,2);

return 0;
}

output => 3 33

Undefined behavior, because the type of the pointer expression
`funptr' used in the call does not match the type of the called
function `f'. (The fact that the `(fun)' cast was necessary
should alert you to the mismatch.)

It's likely to "work" on a good many systems. For example,
on a system where the first four or so "sufficiently small"
arguments are passed in a few designated registers, `main' will
fill two registers and `f' will use only one of them, and there's
a decent chance that the one `f' uses will be the one `main' put
the 3 into. But it would be a bad idea to generalize from the
apparent success of this simple case to the notion "It works!"
Throw in a floating-point argument, or a union argument, or even
more than MagicNumber plain int arguments, and things may well go
blooey. Make the function struct- or union-valued, and things are
more likely than not to go blooey.

And then, there's wide variety in function linkage mechanisms.
Sometimes the caller takes care of both setting up and disposing
of the argument list, but sometimes the caller sets it up and the
callee disposes -- and if the callee does cleanup for a one-argument
list while the caller expected it to clean up two, it's blooey again.

Does it "pretty much" have to work? That sort of depends on
your definition of "pretty much," but my take would be "No."

Besides: Can you come up with a good reason to want to do such
a perverted thing in the first place? In what way does this merit
being called "a pretty important construct?"
 
T

Tim Rentsch

nroberts said:
In C++ this would be totally undefined. How about in C? It works on
my machine with my compiler...

The example you show is clearly and explicitly undefined behavior.
If it's UB, is it one of those rules that pretty much has to work
anyway? What I mean is things like the fact that it's UB to assign to
one part of a union and read from another but it generally works and
is a pretty important construct. Is this like that?

No, it's completely different, because it is both allowed to fail
and is likely to fail on some platforms. The other example you give,
assigning to one member of a union and reading from another, is
actually defined behavior, _not_ undefined behavior.
#include <stdio.h>

typedef void (*fun)(int, int);

void f(int i) {
int q = 33;
printf("%d %d\n", i, q);
}

int main(void) {
fun funptr = (fun)f;
funptr(3,2);

return 0;
}

output => 3 33

The call to f() through 'funptr' is undefined behavior because
the type of the function pointer used to call is not compatible
with the type of the function actually being called. And that
isn't just a theoretical problem.
 
H

Heikki Kallasjoki

....
the 3 into. But it would be a bad idea to generalize from the
apparent success of this simple case to the notion "It works!"

I fully agree that empirical experimentation is not terribly useful
(and/or perhaps interesting), but FWIW, a counter-example:

$ cat tmp.c
#include <windows.h>

int WINAPI f(int a) { return a; }

int g(void) {
int WINAPI (*p)(int, int) = (int WINAPI (*)(int, int))f;
return p(0, 1);
}

int main(void) {
return g();
}

$ i586-mingw32msvc-gcc -o tmp.exe tmp.c -O2 -fomit-frame-pointer
$ wine ./tmp.exe
wine: Unhandled page fault on read access to 0x00000001 at address 0x1
(thread 0022), starting debugger...

(For the "not really comp.lang.c material" details, see below.)
And then, there's wide variety in function linkage mechanisms.
Sometimes the caller takes care of both setting up and disposing
of the argument list, but sometimes the caller sets it up and the
callee disposes -- and if the callee does cleanup for a one-argument
list while the caller expected it to clean up two, it's blooey again.

The above case (involving the "WINAPI"/"stdcall" calling convention) is
an example of the latter, and with the frame pointer omitted, 'g' ends
up using the second provided parameter as its return address.
 
L

Les Cargill

nroberts said:
In C++ this would be totally undefined. How about in C? It works on
my machine with my compiler...

If it's UB, is it one of those rules that pretty much has to work
anyway? What I mean is things like the fact that it's UB to assign to
one part of a union and read from another but it generally works and
is a pretty important construct. Is this like that?

No. The example given is just wrong, period.
 
L

Les Cargill

nroberts said:
In C++ this would be totally undefined. How about in C? It works on
my machine with my compiler...

If it's UB, is it one of those rules that pretty much has to work
anyway? What I mean is things like the fact that it's UB to assign to
one part of a union and read from another

I am not sure that that is undefined behavior. That's sort of what
unions are *for*.
 
J

James Kuyper

nroberts wrote: ....

I am not sure that that is undefined behavior. That's sort of what
unions are *for*.

A footnote in the current version of the standard says that the result
of reading from a different member of a union than the one last written
is that the bit pattern stored in that memory is reinterpreted according
to the type of the member being read; it would therefore have defined
behavior, so long as that bit pattern is a valid one for that type.
Some have claimed that this conclusion can be derived from the normative
text of the standard, but I find the argument supporting that claim
weak. There's certainly no normative text that says so directly.
However, that is how unions were always intended to work, whether or not
the normative text of the standard has ever actually said so.
 
T

Tim Rentsch

christian.bau said:
I had to check that, and you are right (footnote 95 in the N1570
draft). I think there is a problem. Say long and float have the same
size, I have a union containing a long and a float, I write to the
long and read the float, then I am supposed to get a float with
exactly those bits that I stored. That's perfectly fine.

But what if the compiler doesn't know that both are elements of the
same union? If I just have a long*, and a float*, which _might_ point
to members of the same union, but the compiler doesn't know. Does the
rule apply then as well? That would completely destroy what is said in
other places.

This case is different, because it is addressed by different
portions of the effective type rules. In particular, using
the '.' or '->' form of access, the lvalue being accessed
has a declared type, and so those accesses never violate effective
type rules. When access is done using pointers, the rule for
determining effective type is different, so the two accesses
may very well run afoul of the effective type requirements.
I'd prefer if this was said in the standard explicitely, but with the
restriction that the value must be written, then read, using the . or -

Unfortunately the Standard often expresses itself rather obliquely,
and this case certainly falls into that category. However, it should
be easy to see that the two different cases you bring up are covered
under different areas of the effective type rules. See 6.5 p6.
Note especially the first sentence, which applies in the case of
member access (ie, through '.' or '->', but which does not apply
in the case of pointer access.
 
J

Johannes Bauer

The other example you give,
assigning to one member of a union and reading from another, is
actually defined behavior, _not_ undefined behavior.

I also was under the impression that writing to member x and reading
member y of a union is UB. Wikipedia says "This is not, however, a safe
use of unions in general.", which is pretty vague (i.e. it's not clear
which cases are safe and which are not).

Could you elaborate on why you think this is well-defined?

Best regards,
Johannes

--
Zumindest nicht öffentlich!
Ah, der neueste und bis heute genialste Streich unsere großen
Kosmologen: Die Geheim-Vorhersage.
- Karl Kaos über Rüdiger Thomas in dsa <[email protected]>
 
J

Johannes Bauer

I also was under the impression that writing to member x and reading
member y of a union is UB. Wikipedia says "This is not, however, a safe
use of unions in general.", which is pretty vague (i.e. it's not clear
which cases are safe and which are not).

Could you elaborate on why you think this is well-defined?

Ah, I just read James' response further down. Interesting. Really
thought this was undefined. Is this a recent change?

Best regards,
Johannes

--
Zumindest nicht öffentlich!
Ah, der neueste und bis heute genialste Streich unsere großen
Kosmologen: Die Geheim-Vorhersage.
- Karl Kaos über Rüdiger Thomas in dsa <[email protected]>
 
J

Jens Gustedt

Am 28.06.2012 09:27, schrieb Johannes Bauer:
Ah, I just read James' response further down. Interesting. Really
thought this was undefined. Is this a recent change?

I think this was not considered as a change in contents but given as
more precision on the intent. n1256.pdf has modification marks in this
region so I suppose that these came with TC3. They state
When a value is stored in a member of an object of union type, the
bytes of the object representation that do not correspond to that
member but do correspond to other members take unspecified values.

which in terms of the standard means that it is only UB if these
unspecific values are "forbidden" values for that type, in particular
trap representations.

This means that for most of modern architectures manipulating integer
values (except of _Bool) through unions is completely ok. Floating
point values, _Bool, and pointer types must be treated with more care.

Jens
 
T

Tim Rentsch

Johannes Bauer said:
I also was under the impression that writing to member x and reading
member y of a union is UB. Wikipedia says "This is not, however, a safe
use of unions in general.", which is pretty vague (i.e. it's not clear
which cases are safe and which are not).

Could you elaborate on why you think this is well-defined?

Besides the particular footnote (which you already mentioned in
your own followup), there is just the normative text pertaining to
types and storage access. If you read through the two main
sections on types (6.2.5 and 6.2.6), and also the description of
what happens on lvalue-to-value conversion, I think it's pretty
easy to see that the definition is there (although I freely admit
it isn't expressed as directly as one might like). Basically, the
same passages that explain how ordinary (ie, non-union-member)
access works also explain how access to union members work; the
only thing that's missing is knowing that the respective memories
overlap, which is stated in 6.2.5. There is another detail having
to do with effective type rules, but that doesn't contribute to
defining the semantics; it just needs to be checked to make sure
the effective type rules don't _un_define the semantics (and they
don't, but if you're interested look at 6.5 p6&7).
 
T

Tim Rentsch

Jens Gustedt said:
Am 28.06.2012 09:27, schrieb Johannes Bauer:
Ah, I just read James' response further down. Interesting. Really
thought this was undefined. Is this a recent change?

I think this was not considered as a change in contents but given as
more precision on the intent. n1256.pdf has modification marks in this
region so I suppose that these came with TC3. [snip]

Yes, if you read the Defect Report that prompted the change I
think you'll find that the intention was that the behavior
required was supposed to be the same all along (ie, since C90 and
presumably also before that), but changes in wording in other
places raised a concern that this (unchanged) requirement was not
evident enough without the footnote.
 
J

Joshua Maurice

This case is different, because it is addressed by different
portions of the effective type rules. In particular, using
the '.' or '->' form of access, the lvalue being accessed
has a declared type, and so those accesses never violate effective
type rules. When access is done using pointers, the rule for
determining effective type is different, so the two accesses
may very well run afoul of the effective type requirements.


Unfortunately the Standard often expresses itself rather obliquely,
and this case certainly falls into that category. However, it should
be easy to see that the two different cases you bring up are covered
under different areas of the effective type rules. See 6.5 p6.
Note especially the first sentence, which applies in the case of
member access (ie, through '.' or '->', but which does not apply
in the case of pointer access.

Sorry. Some silly questions if I may, please? Consider the following
programs:

int main(void)
{
union { int x; float y; } u;
u.y = 2;
u.x = 1;
return u.x;
}
/* ---- */
int main(void)
{
union { int x; float y; } u;
float * y = &u.y;
*y = 2;
int * x = &u.x;
*x = 1;
return u.x;
}
/* ---- */
int main(void)
{
union { int x; float y; } u;
float * y = &u.y;
int * x = &u.x;
*y = 2;
*x = 1;
return u.x;
}
/* ---- */
void foo(int * x, float * y)
{
*y = 2;
*x = 1;
}
int main(void)
{
union { int x; float y; } u;
float * y = &u.y;
int * x = &u.x;
foo(x, y);
return u.x;
}
/* ---- */
The last program above, except with foo in a different translation
unit.

Where exactly do you think we cross from defined behavior to undefined
behavior? I would argue that the first example is clearly not UB, and
the last example with foo() in a different translation unit is
probably UB. Specifically, the intent of the effective type rules is
to allow the compiler to do additional aliasing analysis and reorder
reads and writes that are sufficiently differently typed. With foo()
in a different translation unit, we want the compiler to be able to
reorder the writes to x and y in foo() from type aliasing analysis,
but if we do that then we'll change the semantics of the last program
and have it return garbage.

I don't have a strong opinion on this one. It seems that the intent of
the type access rules and the existence of unions is an inherent
contradiction - with several plausible ways out, of course.
 
B

Barry Schwarz

Sorry. Some silly questions if I may, please? Consider the following
programs:

int main(void)
{
union { int x; float y; } u;
u.y = 2;
u.x = 1;
return u.x;
}

Where exactly do you think we cross from defined behavior to undefined
behavior? I would argue that the first example is clearly not UB, and

None of your examples perform the sequence of operations under
discussion. In every case, you store a value in one member of the
union, store a value in a different member of the union, and then
access the member which was last stored. Accessing the last stored
member never yields undefined behavior.
 
T

Tim Rentsch

Joshua Maurice said:
Sorry. Some silly questions if I may, please? Consider the following
programs:

int main(void)
{
union { int x; float y; } u;
u.y = 2;
u.x = 1;
return u.x;
}
/* ---- */
int main(void)
{
union { int x; float y; } u;
float * y = &u.y;
*y = 2;
int * x = &u.x;
*x = 1;
return u.x;
}
/* ---- */
int main(void)
{
union { int x; float y; } u;
float * y = &u.y;
int * x = &u.x;
*y = 2;
*x = 1;
return u.x;
}
/* ---- */
void foo(int * x, float * y)
{
*y = 2;
*x = 1;
}
int main(void)
{
union { int x; float y; } u;
float * y = &u.y;
int * x = &u.x;
foo(x, y);
return u.x;
}
/* ---- */
The last program above, except with foo in a different translation
unit.

Where exactly do you think we cross from defined behavior to undefined
behavior? I would argue that the first example is clearly not UB, and
the last example with foo() in a different translation unit is
probably UB. Specifically, the intent of the effective type rules is
to allow the compiler to do additional aliasing analysis and reorder
reads and writes that are sufficiently differently typed. With foo()
in a different translation unit, we want the compiler to be able to
reorder the writes to x and y in foo() from type aliasing analysis,
but if we do that then we'll change the semantics of the last program
and have it return garbage.

I don't have a strong opinion on this one. It seems that the intent of
the type access rules and the existence of unions is an inherent
contradiction - with several plausible ways out, of course.

I'm sorry, I didn't see any silly questions. Is it okay if I
just answer what you asked? (See, there's an example of a silly
question. :)

If we take the effective type rules at face value, I don't think
any of these are undefined behavior. In each case the stores that
are done are consistent with the declared type of the member whose
object is being stored into. Going through the different sequences
(and I admit I haven't checked them as carefully as I might have)
and referring to the effective type rules in each case, I don't see
any violations. That includes the last case where the foo()
function is defined in a different TU, although AFAIK that doesn't
change whether effective type rules are violated.

Of course, this is upsetting, because intuitively we expect that
when it looks like reordering might muck things up then either the
reordering isn't allowed (presumably due to effective type rules
considerations) or the program has crossed over into undefined
behavior (probably because effective type rules have been
violated). None of the obvious alternatives seems appealing, eg,
"no reordering can be done in cases like this" (ick), or "stores
through the x and y pointers can be reordered, and the later access
of u.x just gets one or the other -- ie, unspecified behavior, but
not undefined behavior" (at odds with other parts of the Standard),
or "even though these case follow the letter of the law, effective
type wise, they violate its spirit, and therefore are undefined
behavior" (lacks evidence to be convincing). Of course, any
sensible developer would instinctively shy away from writing such
code, but that doesn't resolve the question.

I have two principal takeaways to offer.

First, how the effective type rules are phrased is somewhat broken,
or at least incomplete. If these examples are defined behavior,
that has serious negative consequences for code reordering. If
they are supposed to have undefined behavior, the effective type
rules don't express that adequately. Neither of those consequences
is acceptable, I would say, and in either case the Standard needs
to clarify what is meant.

Second, as a practical matter, this kind of pattern (taking
addresses of several members of the same union object, storing
through the resultant pointers, then using . or -> to get the value
of one of those members, is likely to be unspecified hehavior as
far as which store occurred last. That behavior is what I think
most seasoned developers would expect, how most actual compilers
will generate code, and (I opine) what the Standard would prescribe
if a suitable way of expressing that presented itself. My feeling
is that cases like this one _should_ be unspecified behavior, and not
undefined behavior, but I also know that finding suitable language
to delimit the boundaries -- clearly, correctly, and exactly --
is not at all an easy task.
 
T

Tim Rentsch

Barry Schwarz said:
None of your examples perform the sequence of operations under
discussion. In every case, you store a value in one member of the
union, store a value in a different member of the union, and then
access the member which was last stored. Accessing the last stored
member never yields undefined behavior.

Only the first example (ie, the only one not snipped) stores into
members. The other examples store into objects that happen to
coincide with memory areas corresponding to members of u, but
that's not the same as storing into members. If nothing else,
which parts of the effective type rules govern the accesses
are different in the two cases.
 
B

Barry Schwarz

Only the first example (ie, the only one not snipped) stores into
members. The other examples store into objects that happen to
coincide with memory areas corresponding to members of u, but
that's not the same as storing into members. If nothing else,
which parts of the effective type rules govern the accesses
are different in the two cases.

Do I understand correctly that storing into a member and storing into
the memory occupied by that member are somehow different?
 
J

Joshua Maurice

Do I understand correctly that storing into a member and storing into
the memory occupied by that member are somehow different?

I would hope not! (But maybe.) I agree that the current rules are
unclear.

I think/hope that:
struct foo { int x; };
int main(void)
{
struct foo f;
f.x = 1;
}
is definitionally equivalent to:
struct foo { int x; };
int main(void)
{
struct foo f;
int * y;
y = &f.x;
*y = 1;
}
Any decision that makes "f.x = 1;" somehow different than "y = &f.x;
*y = 1;" is my least preferred alternative.

I'd much rather have rules that require the compiler to limit its type
aliasing optimizations when unions are in scope. Basically, a rule in
the standard somewhere which says something like the following. Please
note that I just whipped this up, and I have no clue if it's actually
"correct". It could very probably/definitely be fixed, improved, etc.
I'm just trying to get the ball rolling. There's closely related
alternative formulations that would also be appealing to me.

Quickie Definition: The "lifetime" of a pointer value is the
contiguous interval of time of the program execution, starting when
the pointer value is "created", and ending when the last "copy" or
"derivation" of the pointer value ceases to exist in an object.
Example:
#include <stdlib.h>
int main(void)
{
{
int a[2];
int * x;
int * y;
{
x = a; /* this statement "creates" a pointer value */
}
/*the pointer object "x" exists, and it contains the pointer
value, so it's still "alive" */
y = x + 1;
x = 0;
/* the pointer value is still "alive" because a "derivation" of
it exists in the pointer object "y" */
}
/* the pointer value is now "dead", and the pointer value lifetime
has ended */
}

New Rule: For two accesses to two sufficiently differently typed
members of a union, if:
- the accesses are a write and a read, or two writes, to the union
member objects or sub-objects thereof, and
- the pointer value lifetimes of the pointer values used to do the
accesses overlap, and
- both accesses are done in scopes where the union definition is not
visible, then
- the program has undefined behavior.

This approach formulated disallows all aliasing optimization with the
types in a union when the union definition is in scope. Perhaps there
are "nicer" ways to do this without such a substantial penalty.
 
T

Tim Rentsch

Barry Schwarz said:
Do I understand correctly that storing into a member and storing into
the memory occupied by that member are somehow different?

They are, if for no other reason than because effective type
rules are different for the two cases. Let's look at the
pointer case first:

int
f( int *pi, float *pf ){
*pi = 1;
*pf = 2;
return *pi;
}

If pi and pf point to the same place -- for example, to two
members of the same union object -- this function violates
effective type rules, and therefore transgresses into
undefined behavior. So a call like

union { int i; float f; } u;
...
f( &u.i, &u.f );

would provoke undefined behavior. Now consider a similar
function that accesses the union object 'u' directly, eg,

int
g(){
u.i = 1;
u.f = 2;
return u.i;
}

The function g does not violate effective type rules. Its
behavior is defined, subject to the implementation-defined
representations of the two types involved. That is, it
should obey all the regular access rules, and there are no
'shall' stipulations that it violates (at least, I'm not
aware of any, and I've looked fairly long and hard at
questions like this), and that is enough to define the
behavior (again, subject to how the types are represented).

It makes sense that these two cases would be different. If
they weren't, then everywhere there were pointers to two
different types, those pointers might potentially point to
members of the same union object, which would greatly inhibit
potential code movement. Also, the "special guarantee" of
6.5.2.3 p6 would not be needed, because the possibility of
the two struct types belonging to the same union would (under
the assumption that pointers to objects of members and direct
member access is the same) be enough to guarantee correct
behavior. If that were so, there would be no reason to have
the guarantee of 6.5.2.3 p6.

In footnote 95 (footnote 83 in N1256), the Standard says in
plain English what happens when one member is read when
another has been stored into. But notice the way it says
that:

If the member used to read the contents of a union object
is not the same as the member last used to store a value
in the object, ...

Note: 'the member /used/ to read', and 'the member last /used/ to
store' (my emphasis). The explanation in the footnote applies only
to member access that is done directly, ie, using '.' or '->', and
not just dereferencing a pointer that happens to point to the
member in question. And that distinction is consistent with the
differences in how effective type rules treat the two situations.

Does this help explain my earlier statement?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,582
Members
45,065
Latest member
OrderGreenAcreCBD

Latest Threads

Top