casting from member to union

W

wkaras

In my compiler, the following code generates an error:

union U
{ int i; double d; };

U u;
int *ip = &u.i;
U *up = static_cast<U *>(ip); // error

I have to change the cast to reinterpret_cast for the code
to compile. It seems to me that casting from a (top-level)
union member pointer to a pointer to a union is comparably
as dangerous as casting from a base to a derived class
pointer. Doesn't the Standard require static cast to portably
cast from base to derived? I believe there are no guarantees
that reinterpret_cast is ever portable. Since the definition
of a union is that members are at the same address, it
seems like there should be some portable way to
cast a union member pointer to a pointer to the union.
 
V

Victor Bazarov

In my compiler, the following code generates an error:

union U
{ int i; double d; };

U u;
int *ip = &u.i;
U *up = static_cast<U *>(ip); // error

Why do you think you need to do that?
I have to change the cast to reinterpret_cast for the code
to compile.

It doesn't mean you get functional code. Read about "undefined behaviour"
when you have time.
> It seems to me that casting from a (top-level)
union member pointer to a pointer to a union is comparably
as dangerous as casting from a base to a derived class
pointer. Doesn't the Standard require static cast to portably
cast from base to derived?

Yes, it does. How is that relevant here?
> I believe there are no guarantees
that reinterpret_cast is ever portable.

There are restrictions imposed on 'reinterpret_cast' and for any other
cases the behaviour is undefined.
> Since the definition
of a union is that members are at the same address, it
seems like there should be some portable way to
cast a union member pointer to a pointer to the union.

Why? I mean, what for?

Unions exist in the language for (a) compatibility with C and (b) saving
memory in rare cases. What are you using it for?

V
 
W

wkaras

One thing I enjoy about this group is that, for any question of
the form "why doesn't the language to abc?", there always
at least one person who answers "you just shouldn't want
to do that!". Makes me nostalgic for the Pascal days of
my youth.

This issue came up when I wanted to implement a
free pool of structures. The elements of the free pool
would be a union of a structure instance and a
link pointer (for the free list). When the user
code passed the address of a structure to free,
I wanted to cast this to an address of the union type.
 
T

Tomás

posted:
In my compiler, the following code generates an error:

union U
{ int i; double d; };

U u;
int *ip = &u.i;
U *up = static_cast<U *>(ip); // error

I have to change the cast to reinterpret_cast for the code
to compile. It seems to me that casting from a (top-level)
union member pointer to a pointer to a union is comparably
as dangerous as casting from a base to a derived class
pointer.

There's no such thing as danger in programming. Look at it three ways:

1) Undefined Behaviour

This is crazy code that will _never_ work. Something like:

int* p;

*p = 6;


2) Well-defined Code

The code does exactly what you want it to do _all_ of the time:

int r = 4;

int* p = &r;

*p = 6;


3) Well-defined Code which may result in undefined behaviour on certain
platforms. It's the opposite to "portable code", ie. it's non-portable code.
For instance:

int r = 3;

r += 4000000;

On a lot of systems, this is okay because an "int" can hold the value... but
on other systems an "int" is too small, and so you'll have overflow on a
signed integral type, which results in Undefined Behaviour.

Now let's look at your code. This is what you're trying to do:

union PrimateUnion
{
int monkey;
double ape;
};

int main()
{
PrimateUnion a;

int* p = &a.monkey;

//Now get back a pointer to the union

PrimateUnion* p_a = p;
}

As you know, we can make this compile by using reinterpret_cast.

But the question is... what category of code does it fit into. Undefined
Behaviour? Well-defined and Portable? Well-defined and Non-portable?

The system which I'm most familiar with is Microsoft Windows. The code will
run absolutely fine on Microsoft Windows.

But what about portability? Will it run perfectly on _every_ platform? The
answer is to be found in the C++ Standard.

What you need to find out is:

When you have a union object, is the address of any element in the union
object _always_ equal to the address of the union object itself? If the
Standard provides such a guarantee, then the code is Well-defined and fully
portable. If you can find no such guarantee in the Standard, then your code
is ill-formed (even though it may run perfectly on some systems).

So if we get out the Standard and search for "union", we come up with the
following:

"Each data member is allocated as if it were the sole member
of a struct."

That doesn't tell us much... so let's look up "struct" to see what it's
getting at.

Doesn't the Standard require static cast to portably cast from base to
derived?

Yes, but the object in question must in fact be a "derived" object. You can
use "static_cast" if you're certain that it is, or "dynamic_cast" if it may
or may not be.
I believe there are no guarantees that reinterpret_cast is
ever portable.

Then it wouldn't be in the language.

Open the standard and search for "reinterpret_cast".
Since the definition of a union is that members are at
the same address, it seems like there should be some portable way to
cast a union member pointer to a pointer to the union.


Yes, we call it "reinterpret_cast". Just so you know, there are basically no
limits to what you can do in C++... just don't expect the program to work
exactly as you want on _every_ system.

Let's declare a "double" and then store a value in it as if it were an
"int":

double a;

int& b = *reinterpret_cast< int* > ( &a );

b = 5;


-Tomás
 
V

Victor Bazarov

[..ranting..]

First of all, please don't top-post. I've rearranged your reply.

The questions "why doesn't the language do blah" belongs to 'comp.std.c++'
where you can inquire about rationales behind different features of the
language. Here, in 'comp.lang.c++', the answer is always too simple to be
of any use: "Because the Standard does not specify it". C++ is a language
with an International Standard, in case you didn't know.

If you are interested in the _extensions_ to the language, then you need
to ask in the newsgroup for your compiler. That's where the discussions
Why do you think you need to do that?



It doesn't mean you get functional code. Read about "undefined behaviour"
when you have time.



Yes, it does. How is that relevant here?



There are restrictions imposed on 'reinterpret_cast' and for any other
cases the behaviour is undefined.



Why? I mean, what for?

Unions exist in the language for (a) compatibility with C and (b) saving
memory in rare cases. What are you using it for?
[/QUOTE]
> This issue came up when I wanted to implement a
> free pool of structures. The elements of the free pool
> would be a union of a structure instance and a
> link pointer (for the free list). When the user
> code passed the address of a structure to free,
> I wanted to cast this to an address of the union type.

I'm guessing, the problem you're solving with a cast is "if I have
a pointer to a data member of an object, how do I get the object?". Use
'reinterpret_cast', by all means, if it works for you. The Standard does
*not* guarantee that it would. You're basically in the implementation-
specific and/or platform-specific territory, AFAICT.

V
 
W

wkaras

Tomás said:
posted:


There's no such thing as danger in programming. Look at it three ways:

1) Undefined Behaviour

This is crazy code that will _never_ work. Something like:

int* p;

*p = 6;


2) Well-defined Code

The code does exactly what you want it to do _all_ of the time:

int r = 4;

int* p = &r;

*p = 6;


3) Well-defined Code which may result in undefined behaviour on certain
platforms. It's the opposite to "portable code", ie. it's non-portable code.
For instance:

int r = 3;

r += 4000000;

On a lot of systems, this is okay because an "int" can hold the value... but
on other systems an "int" is too small, and so you'll have overflow on a
signed integral type, which results in Undefined Behaviour.

Now let's look at your code. This is what you're trying to do:

union PrimateUnion
{
int monkey;
double ape;
};

int main()
{
PrimateUnion a;

int* p = &a.monkey;

//Now get back a pointer to the union

PrimateUnion* p_a = p;
}

As you know, we can make this compile by using reinterpret_cast.

But the question is... what category of code does it fit into. Undefined
Behaviour? Well-defined and Portable? Well-defined and Non-portable?

The system which I'm most familiar with is Microsoft Windows. The code will
run absolutely fine on Microsoft Windows.

But what about portability? Will it run perfectly on _every_ platform? The
answer is to be found in the C++ Standard.

What you need to find out is:

When you have a union object, is the address of any element in the union
object _always_ equal to the address of the union object itself? If the
Standard provides such a guarantee, then the code is Well-defined and fully
portable. If you can find no such guarantee in the Standard, then your code
is ill-formed (even though it may run perfectly on some systems).

So if we get out the Standard and search for "union", we come up with the
following:

"Each data member is allocated as if it were the sole member
of a struct."

That doesn't tell us much... so let's look up "struct" to see what it's
getting at.



Yes, but the object in question must in fact be a "derived" object. You can
use "static_cast" if you're certain that it is, or "dynamic_cast" if it may
or may not be.


Then it wouldn't be in the language.

Open the standard and search for "reinterpret_cast".



Yes, we call it "reinterpret_cast". Just so you know, there are basically no
limits to what you can do in C++... just don't expect the program to work
exactly as you want on _every_ system.

Let's declare a "double" and then store a value in it as if it were an
"int":

double a;

int& b = *reinterpret_cast< int* > ( &a );

b = 5;


-Tomás

Looks like this is another aspect of the Standard that is unnecessary
fuzzy. The statement (from the final draft):

The mapping performed by reinterpret_cast is implementation-defined.

seems to imply that reinterpret_cast is never guaranteed protable
to all conforming implementations of C++.
But then there are additional clauses that do seem to impose some
portability reqs. on reinterpret_cast. Quikly scanning, I didn't
see any that would guarantee the portability of my member-to-union
cast.

I had thought that static_cast meant "this is a conversion without
run-time checking that will be portable if used properly" and
reinterpret_cast meant "this is a conversion which cannot
be guaranteed by the Standard to be portable". But for
some reason, I guess it's more complicated than this.
 
V

Victor Bazarov

Tomás said:
[...]
3) Well-defined Code which may result in undefined behaviour on certain
platforms. It's the opposite to "portable code", ie. it's non-portable code.
For instance:

int r = 3;

r += 4000000;

On a lot of systems, this is okay because an "int" can hold the value... but
on other systems an "int" is too small, and so you'll have overflow on a
signed integral type, which results in Undefined Behaviour.

<nitpick>
Actually, that's "implementation-defined behaviour". See 4.7/3. What you
have here is assignment from 'long' to 'int', which involves an integral
conversion.
[...]

-Tomás

V
 
T

Tomás

<nitpick>
Actually, that's "implementation-defined behaviour". See 4.7/3. What
you have here is assignment from 'long' to 'int', which involves an
integral conversion.
</nitpick>

The reason I chose not to use the term "implementation-defined
behaviour" is because implentation-defined behaviour is two subsets:

A) The kind that may result in Undefined Behaviour

B) The kind that will just work differently.

For instance, here's a sample of A:

int main()
{
int b = 3;
b += 400000000; // <-- May result in UB
}

And here's a sample of B:

int main()
{
unsigned b = 2;

b -= 5;

// "b" may have different values on
// different platforms.
}

-Tomás
 
V

Victor Bazarov

Tomás said:
The reason I chose not to use the term "implementation-defined
behaviour" is because implentation-defined behaviour is two subsets:

Who taught you that? IDB is defined in the Standard. Look it up.
A) The kind that may result in Undefined Behaviour

What does that mean? If the Standard says it's IDB, how 'may' it "result
in" UB?
B) The kind that will just work differently.

Differently from what?
For instance, here's a sample of A:

int main()
{
int b = 3;
b += 400000000; // <-- May result in UB

What does you mean by "May result in UB"? Do you mean there are some
implementations that can say that our IDB is U?
}

And here's a sample of B:

int main()
{
unsigned b = 2;

b -= 5;

// "b" may have different values on
// different platforms.

Yes, stemming from the fact that UINT_MAX is different. But that's
pretty damn well defined in the Standard. And I cat tell you that

assert(b == UINT_MAX - 3);

should pass. The fact that it wraps around is not implementation-defined.
}

-Tomás

V
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,763
Messages
2,569,563
Members
45,039
Latest member
CasimiraVa

Latest Threads

Top