Union and pointer casts?

J

Jef Driesen

Hi,

Suppose I have two distinct data structures:

typedef struct foo_t {
...
} foo_t;

typedef struct bar_t {
...
} bar_t;

and a function that receives a pointer to such a structure, together with a type
to indicate which structure is being passed:

typedef enum data_type_t {
DATA_TYPE_FOO,
DATA_TYPE_BAR
} data_type_t;

void myfunction (data_type_t type, void *data)
{
foo_t *foo = data;
foo_t *bar = data;

switch (type) {
case DATA_TYPE_FOO:
/* Use foo here */
break:
case DATA_TYPE_BAR:
/* Use bar here */
break:
default:
return;
}
}

A typical usage would be like this:

int main(void)
{
foo_t foo;
bar_t bar;

myfunction (DATA_TYPE_FOO, &foo);
myfunction (DATA_TYPE_BAR, &bar);

return 0;
}

Is it portable to replace the separate variables and explicit casts with a union?

typedef union foobar_t {
bar_t bar;
foo_t foo;
} foobar_t;

void myfunction (data_type_t type, void *data)
{
foobar_t *foobar = data;

switch (type) {
case DATA_TYPE_FOO:
/* Use foobar->foo here */
break:
case DATA_TYPE_BAR:
/* Use foobar->bar here */
break:
default:
return;
}
}

I think this is a portable construct, but I'm not 100% sure. Note that it's not
my intent to try to interpret a foo_t as a bar_t. The main purpose of the union
is to improve the readability of the code (my real code has many more foo and
bar structs).

Jef
 
J

Joel C. Salomon

Jef Driesen said:
Suppose I have two distinct data structures:
and a function that receives a pointer to such a structure, together with a type to indicate which structure is being passed:
Is it portable to replace the separate variables and explicit casts with a union?

That is the main use of unions. You might consider this pattern, called a “tagged unionâ€:

struct foo {…};
struct bar {…};

struct foobar {
enum {
T_FOO,
T_BAR,
} tag;
union {
struct foo foo;
struct bar bar;
} data;
};

void myfunction(struct foobar *foobar) {
switch(foobar->type) {
case T_FOO:
/* use foobar->data->foo here */
break;
case T_BAR:
/* use foobar->data->bar here */
break;
default:
fprintf(stderr, "bad type\n");
abort();
}

There are extensions to C (MSVC, Plan 9, gcc with -fms-extensions) that allow
you to not name the union and to refer to foobar->foo or foobar->bar directly;
a version of this will be in the C1x standard.

(Plan 9’s compiler also allowed

typedef struct foo {int bas} foo;
typedef struct bar {int quux} foo;

struct foobar {
foo;
bar;
};

void func(struct foobar f) {
assert(f.bas == f.quux);
}

i.e., using the typedef name to declare an anonymous structure. The current
C1x draft allows that, but N1549 makes clear that this was *not* intended, &
will be removed. Shame, that; it’s a cool & useful feature.)

—Joel

N1549: <http://open-std.org/jtc1/sc22/wg14/www/docs/n1549.pdf>
 
T

Tim Rentsch

Jef Driesen said:
Hi,

Suppose I have two distinct data structures:

typedef struct foo_t {
...
} foo_t;

typedef struct bar_t {
...
} bar_t;

and a function that receives a pointer to such a structure, together
with a type to indicate which structure is being passed:

typedef enum data_type_t {
DATA_TYPE_FOO,
DATA_TYPE_BAR
} data_type_t;

void myfunction (data_type_t type, void *data)
{
foo_t *foo = data;
foo_t *bar = data;

switch (type) {
case DATA_TYPE_FOO:
/* Use foo here */
break:
case DATA_TYPE_BAR:
/* Use bar here */
break:
default:
return;
}
}

A typical usage would be like this:

int main(void)
{
foo_t foo;
bar_t bar;

myfunction (DATA_TYPE_FOO, &foo);
myfunction (DATA_TYPE_BAR, &bar);

return 0;
}

Is it portable to replace the separate variables and explicit casts with a union?

typedef union foobar_t {
bar_t bar;
foo_t foo;
} foobar_t;

void myfunction (data_type_t type, void *data)
{
foobar_t *foobar = data;

switch (type) {
case DATA_TYPE_FOO:
/* Use foobar->foo here */
break:
case DATA_TYPE_BAR:
/* Use foobar->bar here */
break:
default:
return;
}
}

I think this is a portable construct, but I'm not 100% sure. Note that
it's not my intent to try to interpret a foo_t as a bar_t. The main
purpose of the union is to improve the readability of the code (my
real code has many more foo and bar structs).

If called from your example main() function above, technically
this last function crosses over into undefined behavior. In
fact the undefined behavior happens even before getting to
the switch() statement.

To see why this is true, remember what we did: we took a
pointer to a foo_t or bar_t, and converted that to a 'void *'.
Okay, nothing wrong with that. But then, in the revised
myfunction(), we took the 'void *' pointer value and converted
it to a pointer to a foobar_t (the union type). The union type
may have (ie, the Standard allows it to have) a more restrictive
alignment requirement than the struct types. Hence, upon doing
the conversion of a struct pointer (in the guise of a 'void *',
but still pointing to one of the structs), we could get a pointer
that is not correctly aligned for access to the union type. The
Standard says clearly that if the resulting pointer value is not
correctly aligned for the target type then the behavior is
undefined.

If I had to take a bet at even money on this, I would bet that
this code would actually work on a platform chosen at random.
But, if what you're looking for is code that is within the bounds
of the Standard requires to work portably, this approach isn't it.
 
J

Jef Driesen

That is the main use of unions. You might consider this pattern, called a “tagged union”:

struct foo {…};
struct bar {…};

struct foobar {
enum {
T_FOO,
T_BAR,
} tag;
union {
struct foo foo;
struct bar bar;
} data;
};

void myfunction(struct foobar *foobar) {
switch(foobar->type) {
case T_FOO:
/* use foobar->data->foo here */
break;
case T_BAR:
/* use foobar->data->bar here */
break;
default:
fprintf(stderr, "bad type\n");
abort();
}

This is something I don't want to do, because the foo and bar data types are
part of a library where I want to be able to add new data types without breaking
backwards compatibility. But adding new structs to the union may change its size
and hence break backwards compatibility.

If the union is not part of the public api and used only internally, that's not
an issue. The union would just be a convenient way to avoid doing explicit casts.
 
P

Paul N

Hi,

Suppose I have two distinct data structures:

typedef struct foo_t {
    ...

} foo_t;

typedef struct bar_t {
    ...

} bar_t;

and a function that receives a pointer to such a structure, together witha type
to indicate which structure is being passed:

typedef enum data_type_t {
    DATA_TYPE_FOO,
    DATA_TYPE_BAR

} data_type_t;

void myfunction (data_type_t type, void *data)
{
    foo_t *foo = data;
    foo_t *bar = data;

    switch (type) {
    case DATA_TYPE_FOO:
       /* Use foo here */
       break:
    case DATA_TYPE_BAR:
       /* Use bar here */
       break:
    default:
       return;
    }

}

A typical usage would be like this:

int main(void)
{
    foo_t foo;
    bar_t bar;

    myfunction (DATA_TYPE_FOO, &foo);
    myfunction (DATA_TYPE_BAR, &bar);

    return 0;

}

Is it portable to replace the separate variables and explicit casts with a union?

typedef union foobar_t {
    bar_t bar;
    foo_t foo;

} foobar_t;

void myfunction (data_type_t type, void *data)
{
    foobar_t *foobar = data;

    switch (type) {
    case DATA_TYPE_FOO:
       /* Use foobar->foo here */
       break:
    case DATA_TYPE_BAR:
       /* Use foobar->bar here */
       break:
    default:
       return;
    }

}

I think this is a portable construct, but I'm not 100% sure. Note that it's not
my intent to try to interpret a foo_t as a bar_t. The main purpose of theunion
is to improve the readability of the code (my real code has many more fooand
bar structs).

As an alternative suggestion, why not have a union consisting of a
foo_t * and a bar_t * ?
 
B

Ben Bacarisse

Your code probably does have explicit casts but they've gone from the
example you posted.
I believe it would be portable. You could reasonably change the second
parameter of myfunction() to a 'foobar_t *', of course.

But that would require a whole lot more casts. The program has no data
that is actually of the union type (it's a figment designed to simplify
the code) so the conversion from the struct pointer to the union pointer
will require a cast (though it may be simply a cast to void *).

To the OP: Have you considered function pointers? Your myfunction
function would reduce to

dispatch[type](data);

and each of functions in the dispatch table would look like this:

void myfunction_foo(void *data)
{
foo_t *foo = data;
/* whatever the switch case did */
}

It means writing a function per case, but there is not that much more
noise in the functions than there is in the switch statement.
 
J

Jef Driesen

If called from your example main() function above, technically
this last function crosses over into undefined behavior. In
fact the undefined behavior happens even before getting to
the switch() statement.

To see why this is true, remember what we did: we took a
pointer to a foo_t or bar_t, and converted that to a 'void *'.
Okay, nothing wrong with that. But then, in the revised
myfunction(), we took the 'void *' pointer value and converted
it to a pointer to a foobar_t (the union type). The union type
may have (ie, the Standard allows it to have) a more restrictive
alignment requirement than the struct types. Hence, upon doing
the conversion of a struct pointer (in the guise of a 'void *',
but still pointing to one of the structs), we could get a pointer
that is not correctly aligned for access to the union type. The
Standard says clearly that if the resulting pointer value is not
correctly aligned for the target type then the behavior is
undefined.

If this is indeed a potential problem, then why doesn't the same logic apply to
my first example too? Here I did cast the data pointer to all possible struct
types, while only one of them will be the correct one:

foo_t *foo = data;
bar_t *bar = data;

Assuming the real type was of type foo_t, the bar variable may now point to a
struct which may have different alignment requirements. Or am I seeing this wrong?

I suppose the correct way would be to cast only to the correct type:

void myfunction (data_type_t type, void *data)
{
foo_t *foo = NULL;
bar_t *bar = NULL;

switch (type) {
case DATA_TYPE_FOO:
foo = data;
/* Use foo here */
break:
case DATA_TYPE_BAR:
foo = data;
/* Use bar here */
break:
default:
return;
}
}

or get rid of the foo and bar variables and cast the data pointer everywhere
where it is accessed:

((foo_t *) data)->member

But this is ugly and error-prone, especially when you have to do this cast often.
If I had to take a bet at even money on this, I would bet that
this code would actually work on a platform chosen at random.
But, if what you're looking for is code that is within the bounds
of the Standard requires to work portably, this approach isn't it.

I prefer portable code, but I don't want to take it to the extreme either.
 
B

Ben Bacarisse

Jef Driesen said:
On 24/02/11 22:02, Tim Rentsch wrote:

If this is indeed a potential problem, then why doesn't the same logic
apply to my first example too?

The key information is in the text I've left quoted: union types may
require stricter alignment than pointer types. All pointers to
structure types have the same alignment requirements as do all pointers
to union types, but they don't have the same alignment requirements as
each other.

I agree with what Tim says (in a part I snipped) that it is a reasonable
bet that this will work but it is not guaranteed.
possible struct types, while only one of them will be the correct one:

foo_t *foo = data;
bar_t *bar = data;

Assuming the real type was of type foo_t, the bar variable may now
point to a struct which may have different alignment requirements. Or
am I seeing this wrong?

I suppose the correct way would be to cast only to the correct type:

void myfunction (data_type_t type, void *data)
{
foo_t *foo = NULL;
bar_t *bar = NULL;

switch (type) {
case DATA_TYPE_FOO:
foo = data;
/* Use foo here */
break:
case DATA_TYPE_BAR:
foo = data;
/* Use bar here */
break:
default:
return;
}
}

or get rid of the foo and bar variables and cast the data pointer
everywhere where it is accessed:

What you originally has was fine because you can cover the void * to any
structure type without undefined behaviour (they all have the same
alignment requirements after all) provided that you don't access the
"wrong" structure, and your original code ensured that that did not
happen.
((foo_t *) data)->member

But this is ugly and error-prone, especially when you have to do this
cast often.


I prefer portable code, but I don't want to take it to the extreme
either.

That's a tough call. Someone suggested putting the pointers into a
union instead of the structures. That works but it is not very
convenient unless you can use C99's compound literals at the point of
call. Of course, using compound literals has portability implications
too.
 
J

Joel C. Salomon

Jef said:
I suppose the correct way would be to cast only to the correct type:

void myfunction (data_type_t type, void *data)
{
foo_t *foo = NULL;
bar_t *bar = NULL;

switch (type) {
case DATA_TYPE_FOO:
foo = data;
/* Use foo here */
break:
case DATA_TYPE_BAR:
foo = data;
/* Use bar here */
break:
default:
return;
}
}

Why not try this:

void myfunction (data_type_t type, void *data) {
switch (type) {
case DATA_TYPE_FOO:
foo_t *foo = data;
/* use foo here */
break;
case DATA_TYPE_BAR:
bar_t *bar = data;
/* Use bar here */
break;
default:
return;
}
}

i.e., only even defining `foo` & `bar` where they are needed.
012345678901234567890123456789012345678901234567890123456789012345678901234|6789
(Well actually, `foo` is in-scope but uninitialized in the `bar` case. The
compiler might well catch it if you accidentally use it there, or you can add
blocks, e.g.,

…
case DATA_TYPE_BAR: {
bar_t *bar = data;
/* Use bar here */
} break;
…

and you can use this technique in a pre-C99 compiler, too.)

—Joel
 
T

Tim Rentsch

Jef Driesen said:
If this is indeed a potential problem, then why doesn't the same logic
apply to my first example too? Here I did cast the data pointer to all
possible struct types, while only one of them will be the correct one:

foo_t *foo = data;
bar_t *bar = data;

Assuming the real type was of type foo_t, the bar variable may now
point to a struct which may have different alignment requirements. Or
am I seeing this wrong?

You're right, this usage is also undefined behavior. I didn't
notice earlier because I was focused on the question about using
unions.

I suppose the correct way would be to cast only to the correct type:

void myfunction (data_type_t type, void *data)
{
foo_t *foo = NULL;
bar_t *bar = NULL;

switch (type) {
case DATA_TYPE_FOO:
foo = data;
/* Use foo here */
break:
case DATA_TYPE_BAR:
foo = data;
/* Use bar here */
break:
default:
return;
}
}

or get rid of the foo and bar variables and cast the data pointer
everywhere where it is accessed:

((foo_t *) data)->member

But this is ugly and error-prone, especially when you have to do this cast often.


I prefer portable code, but I don't want to take it to the extreme either.

How about this way (please excuse a minor reformating):

void
myfunction( data_type_t type, void *data ){
switch (type) {

case DATA_TYPE_FOO: {
foo_t *foo = data;
/* Use foo here */
break:
}

case DATA_TYPE_BAR: {
bar_t *bar = data;
/* Use bar here */
break:
}

}
}

Not too bad aesthetics-wise, and completely portable (assuming of
course the calls are right).
 
T

Tim Rentsch

Ben Bacarisse said:
The key information is in the text I've left quoted: union types may
require stricter alignment than pointer types. All pointers to
structure types have the same alignment requirements as do all pointers
to union types, but they don't have the same alignment requirements as
each other.

I agree with what Tim says (in a part I snipped) that it is a reasonable
bet that this will work but it is not guaranteed.


What you originally has was fine because you can cover the void * to any
structure type without undefined behaviour (they all have the same
alignment requirements after all) provided that you don't access the
"wrong" structure, and your original code ensured that that did not
happen.

This isn't right. The two struct pointer _variables_ have the same
alignment requirements, but the pointer _values_ are pointing to
struct types that may have different alignment requirements. The
relevant requirement statement (from 6.3.2.3p7) is

If the resulting pointer is not correctly aligned for the
pointed-to type, the behavior is undefined.

The 'pointed-to' type is a structure type, and the converted
pointer values might not be correctly aligned for a struct
type other than that of the actual argument.
 
B

Ben Bacarisse

Tim Rentsch said:
Ben Bacarisse <[email protected]> writes:

This isn't right. The two struct pointer _variables_ have the same
alignment requirements, but the pointer _values_ are pointing to
struct types that may have different alignment requirements. The
relevant requirement statement (from 6.3.2.3p7) is

If the resulting pointer is not correctly aligned for the
pointed-to type, the behavior is undefined.

The 'pointed-to' type is a structure type, and the converted
pointer values might not be correctly aligned for a struct
type other than that of the actual argument.

Yes, you are right. I'd read this:

"[a]ll pointers to structure types shall have the same representation
and alignment requirements as each other" (6.2.5 p27)

as referring to the alignment of the pointed-to object rather than the
pointer object itself. The wording is slightly ambiguous because
whether a pointer is aligned or not *does* refer to the pointer-to type.
Thus A pointer may have its alignment requirements met (as per 6.2.5
p27) and yet not be correctly aligned (as per 6.3.2.3 p7)!
 
J

Jef Driesen

Hi,

Suppose I have two distinct data structures:

typedef struct foo_t {
...
} foo_t;

typedef struct bar_t {
...
} bar_t;

and a function that receives a pointer to such a structure, together with a type
to indicate which structure is being passed:

typedef enum data_type_t {
DATA_TYPE_FOO,
DATA_TYPE_BAR
} data_type_t;

void myfunction (data_type_t type, void *data)
{
foo_t *foo = data;
bar_t *bar = data;

switch (type) {
case DATA_TYPE_FOO:
/* Use foo here */
break:
case DATA_TYPE_BAR:
/* Use bar here */
break:
default:
return;
}
}

A typical usage would be like this:

int main(void)
{
foo_t foo;
bar_t bar;

myfunction (DATA_TYPE_FOO,&foo);
myfunction (DATA_TYPE_BAR,&bar);

return 0;
}

Is it portable to replace the separate variables and explicit casts with a union?

typedef union foobar_t {
bar_t bar;
foo_t foo;
} foobar_t;

[...]

How about the reverse: casting a pointer to union to a pointer to one of the
structs. Like in this code snippet:

int main(void)
{
foobar_t foobar;

foobar.foo = ...;
myfunction (DATA_TYPE_FOO, &foobar);

foobar.bar = ...;
myfunction (DATA_TYPE_BAR, &foobar);

return 0;
}

I think this is a portable construct, although I'm not 100% sure.
 
B

Ben Bacarisse

Jef Driesen said:
Hi,

Suppose I have two distinct data structures:

typedef struct foo_t {
...
} foo_t;

typedef struct bar_t {
...
} bar_t;

and a function that receives a pointer to such a structure, together with a type
to indicate which structure is being passed:

typedef enum data_type_t {
DATA_TYPE_FOO,
DATA_TYPE_BAR
} data_type_t;

void myfunction (data_type_t type, void *data)
{
foo_t *foo = data;
bar_t *bar = data;

switch (type) {
case DATA_TYPE_FOO:
/* Use foo here */
break:
case DATA_TYPE_BAR:
/* Use bar here */
break:
default:
return;
}
}

A typical usage would be like this:

int main(void)
{
foo_t foo;
bar_t bar;

myfunction (DATA_TYPE_FOO,&foo);
myfunction (DATA_TYPE_BAR,&bar);

return 0;
}

Is it portable to replace the separate variables and explicit casts with a union?

typedef union foobar_t {
bar_t bar;
foo_t foo;
} foobar_t;

[...]

How about the reverse: casting a pointer to union to a pointer to one
of the structs. Like in this code snippet:

int main(void)
{
foobar_t foobar;

foobar.foo = ...;
myfunction (DATA_TYPE_FOO, &foobar);

foobar.bar = ...;
myfunction (DATA_TYPE_BAR, &foobar);

return 0;
}

I think this is a portable construct, although I'm not 100% sure.

Yes that's fine but I'd re-word your description of it. Casts are used
to perform conversions, but a conversion without a cast is just a
conversion.

I'd be tempted to write

myfunction (DATA_TYPE_FOO, &foobar.foo);

just because it is so much more explicit, but I don't think it makes
much difference.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads


Members online

No members online now.

Forum statistics

Threads
473,763
Messages
2,569,563
Members
45,039
Latest member
CasimiraVa

Latest Threads

Top