Type information in structs of different types

P

pinkfloydhomer

Is it well-defined and portable to do something like:

typedef struct
{
int type;
char c;
} S1;

typedef struct
{
int type;
float f;
} S2;

void f(void* p)
{
S1* p1 = (S1*) p;
S2* p2 = (S2*) p;

if (p1->type == 1)
{
printf("%c\n", p1->c);
}
else
{
printf("%f\n", p2->f);
}
}

int main(void)
{
S1 s1;
s1.type = 1;
s1.c = 'a';

S2 s2;
s2.type = 2;
s2.f = 42.0;

f(&s1);
f(&s2);
}

?

If so, are there any constraints (the "shared" fields of the structs
have to be at the beginning etc.)?

If not, is there any portable way to do something similar?

/David
 
F

Flash Gordon

Is it well-defined and portable to do something like:

Almost. It is guaranteed that if the structs appear together in a union
then any common *initial* sequence of fields will be laid out the same
and you can use any of the struct types to access them. In practice, it
will work on almost all systems without the union.
typedef struct
{
int type;
char c;
} S1;

typedef struct
{
int type;
float f;
} S2;

union U12 {
s1: S1;
s2: S2;
};
void f(void* p)
{
S1* p1 = (S1*) p;
S2* p2 = (S2*) p;

if (p1->type == 1)
{
printf("%c\n", p1->c);
}
else
{
printf("%f\n", p2->f);
}
}

int main(void)
{
S1 s1;
s1.type = 1;
s1.c = 'a';

S2 s2;
s2.type = 2;
s2.f = 42.0;

f(&s1);
f(&s2);
}

?

If so, are there any constraints (the "shared" fields of the structs
have to be at the beginning etc.)?

If not, is there any portable way to do something similar?

It is a standard technique, although note the standard requirement of
having the union between the structs for maximum portability. Although I
would be surprised if there are any systems where it failed without the
union.
 
V

Vladimir S. Oka

Is it well-defined and portable to do something like:

I don't see anything wrong with it, but see below as well...
typedef struct
{
int type;
char c;
} S1;

typedef struct
{
int type;
float f;
} S2;

void f(void* p)
{
S1* p1 = (S1*) p;
S2* p2 = (S2*) p;

if (p1->type == 1)
{
printf("%c\n", p1->c);
}
else
{
printf("%f\n", p2->f);
}

This is not precise enough. It'll assume `S2` for any `type` that is not
1. This is probably the biggest issue with tricks like these. You
should take great care to:

a) always initialise the type field properly
b) cater for when a) is not the case
}

int main(void)
{
S1 s1;
s1.type = 1;
s1.c = 'a';

S2 s2;
s2.type = 2;
s2.f = 42.0;

f(&s1);
f(&s2);
}

?

If so, are there any constraints (the "shared" fields of the structs
have to be at the beginning etc.)?

If not, is there any portable way to do something similar?

/David

--
BR, Vladimir

Ginsburg's Law:
At the precise moment you take off your shoe in a shoe store, your
big toe will pop out of your sock to see what's going on.
 
M

Marc Boyer

Le 14-03-2006 said:
Is it well-defined and portable to do something like:

typedef struct
{
int type;
char c;
} S1;

typedef struct
{
int type;
float f;
} S2;

void f(void* p)
{
S1* p1 = (S1*) p;
S2* p2 = (S2*) p;

if (p1->type == 1)
{
printf("%c\n", p1->c);
}
else
{
printf("%f\n", p2->f);
}
}

int main(void)
{
S1 s1;
s1.type = 1;
s1.c = 'a';

S2 s2;
s2.type = 2;
s2.f = 42.0;

f(&s1);
f(&s2);
}

?

No, it is not.
See 6.5.2.3/5 and 6.5.2.3/8 for details.

Marc Boyer
 
C

Chris Torek

Is it well-defined and portable to do something like:

typedef struct
{
int type;
char c;
} S1;

typedef struct
{
int type;
float f;
} S2;

void f(void* p)
{
S1* p1 = (S1*) p;
S2* p2 = (S2*) p;

Someone else already posted section references so I will skip all
that (I assume they are correct).

Practically speaking, you can run into problems here as well. Eric
Sosman noted an example of a compiler in which problems did occur
with a similar construct. To make the problem "more apparent" let
me suggest a slightly different version of S1 and S2:

/* typedef struct S1 S1; */ /* purely for those who like typedefs */
struct S1 { int type; char c; };

/* typedef struct S2 S2; */
struct S2 { int type; char pad[10000]; float f; };

Now in f() we begin with this, which is the same but with the ugly
casts removed, and the beautiful "struct" keyword inserted:

void f(void *p) {
struct S1 *p1 = p;
struct S2 *p2 = p;

Clearly, the goal here is that some caller will call f() with
an actual instance of an "S1" object, and f() will be able to
tell that "p" really pointed to an "S1", rather than an "S2", by
the type field. So here is the caller:

void g(void) {
struct S1 *q = xmalloc(sizeof *q);
/* xmalloc() just calls malloc and exits if it fails */

q->type = 1; /* tell f() that this is an S1 */
f(q);
...
}

So now here we are in f(), and we have set both p1 and p2 equal to
p (which is g()'s "q"). Our compiler optimizes, though. We have
suggested to it that p1 and p2 are both valid. It peeks ahead at
the rest of our code and determines that neither p1 nor p2 may be
NULL at this point -- we use p1 without checking for NULL, and if
p1 != NULL then p2 != NULL -- and that there could be a use of
p2->f, which is going to be in a different cache line than p1->type.
So, in order to make the code run fast, it issues a prefetch load
of p2->f *right now* (with signalling NaN traps suppressed, of
course). Then it goes on to compile the rest of the function:
if (p1->type == 1)
{
printf("%c\n", p1->c);
}
else
{
printf("%f\n", p2->f);
}
}

Now, when we call f() from g(), we give it a pointer to the last
(newly-allocated) page in our virtual memory space. This page has
size 8192 bytes, but the address p2->f is at offset 10004. The
prefetch causes f() to trap with an invalid address, and the program
crashes.

What went wrong is pretty straightforward: we lied to the compiler,
claiming that p was *both* an S1 *and* an S2. It got its revenge.
What we need to do is avoid lying, and rewrite f() as, say:

void f(void *p) {
int *ptype = p;

if (*ptype == 1) {
struct S1 *p1 = p;

printf("%c\n", p1->c);
} else {
struct S2 *p2 = p;

printf("%f\n", p2->f);
}
}

In other words, resist the temptation to use one of the various
types of structures "as if" it were also all the others. Use the
first-element rule to access *only* the first element -- in this
case, the type-selecting integer -- and pick out the correct
structure type. Once you have the correct type, *then* establish
a pointer to the entire struct.
 
V

Vladimir S. Oka

See where? The standard? I don't have it.

See what? You didn't quote any context. See
<http://cfaj.freeshell.org/google/>.

You can get the Standard in electronic format. Either purchase it from
ISO (reportedly around $18), or download last public draft (N869) or
current public draft (N1124). I don't have the links handy, but
searching this newsgroup using Google Groups will help, as they were
posted not so long ago.
 
R

Rob Arthan

Flash said:
Almost. It is guaranteed that if the structs appear together in a union
then any common *initial* sequence of fields will be laid out the same
and you can use any of the struct types to access them.
...

Almost! I think you are right for C89, but section 6.5.2.3 of the C99
standard restricts this to places where "a declaration of the complete type
of the union is visible" and gives examples of right and wrong usage.

I'd be interested to know what this restriction is for, since I can't see
how the union type declaration helps in general.

Regards,

Rob.
 
R

Robin Haigh

Rob Arthan said:
Almost! I think you are right for C89, but section 6.5.2.3 of the C99
standard restricts this to places where "a declaration of the complete type
of the union is visible" and gives examples of right and wrong usage.

I'd be interested to know what this restriction is for, since I can't see
how the union type declaration helps in general.


It's impossible for common initial sequences to be laid out differently in
any case where it could matter, with or without the union being visible.

The issue is to do with optimisers making no-alias assumptions. If the
compiler sees two pointers to different types of structs, then with no other
clues it may assume they aren't aliases and optimise accordingly.

The visibility of the union is supposed to plant the idea in the optimiser's
mind that maybe the pointers point to different members of the same union
object, in a case where there would be some point in having two such
pointers.

So the point of the rule is to clear up a possible obscure difficulty in
using unions for type-punning, in a case where it's common practice and
where the desirability and legitimacy of doing it are clear-cut, but an
incautious optimiser might mess it up.

The OP's code is different because the two pointers aren't derived out of
sight and passed in as separate parameters, they're produced within the
function by casting the same void*. For an optimiser to then assume they
aren't aliases would be too ludicrous to be worth considering.
 
E

Eric Sosman

Robin Haigh wrote On 03/16/06 16:57,:
Flash Gordon wrote:



Almost! I think you are right for C89, but section 6.5.2.3 of the C99
standard restricts this to places where "a declaration of the complete
type

of the union is visible" and gives examples of right and wrong usage.

I'd be interested to know what this restriction is for, since I can't see
how the union type declaration helps in general.



It's impossible for common initial sequences to be laid out differently in
any case where it could matter, with or without the union being visible.

The issue is to do with optimisers making no-alias assumptions. If the
compiler sees two pointers to different types of structs, then with no other
clues it may assume they aren't aliases and optimise accordingly.
[...]

"The issue" is too limiting; "An issue" would be better.
I once described a different issue that actually made trouble
on an actual system with an actual compiler; Chris Torek has
improved my description and posted it to this thread. It's
recommended reading ...
 
B

Barry Schwarz

Almost! I think you are right for C89, but section 6.5.2.3 of the C99
standard restricts this to places where "a declaration of the complete type
of the union is visible" and gives examples of right and wrong usage.

I'd be interested to know what this restriction is for, since I can't see
how the union type declaration helps in general.

From the OP

typedef struct
{
int type;
char c;
} S1;

typedef struct
{
int type;
float f;
} S2;

S1 s1;
S2 s2;

It is entirely possible for S1 and S2 to have different alignment
requirements. For argument's sake, let's assume S2 is more
restrictive than S1. &s1 is passed to a function taking a void* and
in the function it is cast to S2*. If s1 does not meet the alignment
to an S2, this invokes undefined behavior.


Remove del for email
 
M

Me

Barry Schwarz wrote:
It is entirely possible for S1 and S2 to have different alignment
requirements. For argument's sake, let's assume S2 is more
restrictive than S1. &s1 is passed to a function taking a void* and
in the function it is cast to S2*. If s1 does not meet the alignment
to an S2, this invokes undefined behavior.

6.2.5/26 "All pointers to structure types shall have the same
representation and alignment requirements as each other"
 
K

Keith Thompson

Me said:
Barry Schwarz wrote:


6.2.5/26 "All pointers to structure types shall have the same
representation and alignment requirements as each other"

Does that refer to the alignment of the structure type or to the
alignment of the pointer type?
 
R

Robin Haigh

Eric Sosman said:
Robin Haigh wrote On 03/16/06 16:57,:
Flash Gordon wrote:


(e-mail address removed) wrote:

Is it well-defined and portable to do something like:

Almost. It is guaranteed that if the structs appear together in a union
then any common *initial* sequence of fields will be laid out the same
and you can use any of the struct types to access them.
...

Almost! I think you are right for C89, but section 6.5.2.3 of the C99
standard restricts this to places where "a declaration of the complete
type

of the union is visible" and gives examples of right and wrong usage.

I'd be interested to know what this restriction is for, since I can't see
how the union type declaration helps in general.



It's impossible for common initial sequences to be laid out differently in
any case where it could matter, with or without the union being visible.

The issue is to do with optimisers making no-alias assumptions. If the
compiler sees two pointers to different types of structs, then with no other
clues it may assume they aren't aliases and optimise accordingly.
[...]

"The issue" is too limiting; "An issue" would be better.
I once described a different issue that actually made trouble
on an actual system with an actual compiler; Chris Torek has
improved my description and posted it to this thread. It's
recommended reading ...


Yes. And in that case, using a union would help, but at the cost of every
struct S1 having to occupy the space of a struct S2.

There's also the possibility of forming a union of the pointers:
union u {
struct S1 *pS1;
struct S2 *pS2;
} u;

On the aliasing question, if we now call a function as func(u.pS1, u.pS2),
I'm not clear whether we need the union declaration in scope, or whether it
helps if we have it. But since we're told that struct pointers are
interchangeable as members of unions, it seems we should be able to find out
which union member is "in use" by dereferencing either of them to have a
peek at the tag field of the struct pointed to, and then the pre-fetching
problem does arise.
 
R

Robin Haigh

Keith Thompson said:
Does that refer to the alignment of the structure type or to the
alignment of the pointer type?


The previous paragraph uses the same wording and footnote for non-pointer
types, so I think it mwans the pointer object itself. But I think the case
is covered by "same representation", which I would take to mean
(1) a conversion and a reinterpretation produce the same result
(2) any pointer-to-struct object can be used as a generic container for
round-tripping of other pointer-to-struct values
 
E

Eric Sosman

Robin said:
Robin Haigh wrote On 03/16/06 16:57,:
Flash Gordon wrote:



(e-mail address removed) wrote:


Is it well-defined and portable to do something like:

Almost. It is guaranteed that if the structs appear together in a union
then any common *initial* sequence of fields will be laid out the same
and you can use any of the struct types to access them.
...

Almost! I think you are right for C89, but section 6.5.2.3 of the C99
standard restricts this to places where "a declaration of the complete

type


of the union is visible" and gives examples of right and wrong usage.

I'd be interested to know what this restriction is for, since I can't
see
how the union type declaration helps in general.



It's impossible for common initial sequences to be laid out differently
in
any case where it could matter, with or without the union being visible.

The issue is to do with optimisers making no-alias assumptions. If the
compiler sees two pointers to different types of structs, then with no
other
clues it may assume they aren't aliases and optimise accordingly.
[...]

"The issue" is too limiting; "An issue" would be better.
I once described a different issue that actually made trouble
on an actual system with an actual compiler; Chris Torek has
improved my description and posted it to this thread. It's
recommended reading ...



Yes. And in that case, using a union would help, but at the cost of every
struct S1 having to occupy the space of a struct S2.

Chris' post shows one way to avoid that space penalty.
An elaboration (same idea, really, but capable of carrying
more than one piece of information) is to define a struct
type that carries just the type code and whatever additional
information is useful, and make that struct the first element
of every "payload" struct:

struct header {
enum { S1TYPE, S2TYPE } type;
unsigned int refcount;
struct vtable *vtable;
};

struct S1 {
struct header head;
char data;
};

struct S2 {
struct header head;
char pad[10000];
float data;
};

You can now gain just a little bit of type-safety by making
the function f() -- and others, presumably -- take a pointer
to a `struct header' instead of a pointer to `void':

void f(struct header *ptr) {
if (ptr->type == S1TYPE) {
struct S1 *s1ptr = (struct S1*)ptr;
printf ("%c\n", s1ptr->data);
}
else if (ptr->type == S2TYPE) {
struct S2 *s2ptr = (struct S2*)ptr;
printf ("%f\n", s2ptr->data);
}
else {
launch_from(NOSE, "demons");
}
}

.... and you call it in the obvious way:

struct S1 s1instance = { { S1TYPE, 1, s1vtable), 'X' };
struct S2 s2instance = { { S2TYPE, 1, s2vtable}, {0}, 42.0f };
f (&s1instance.head);
f (&s2instance.head);

This sort of thing is often found in "poor man's O-O" code,
and works rather well once you're accustomed to it (in PMOO
programs, the initialization of the `head' element is usually
done by a "constructor" or "factory" function).

The important thing (in both Chris' post and this one) is
that you never form a `struct S1*' or `struct S2*' until you
actually know what kind of struct you've been given. It's the
jump-the-gun creation of unvalidated pointers that makes trouble.
 
K

Keith Thompson

Robin Haigh said:
The previous paragraph uses the same wording and footnote for non-pointer
types, so I think it mwans the pointer object itself. But I think the case
is covered by "same representation", which I would take to mean
(1) a conversion and a reinterpretation produce the same result
(2) any pointer-to-struct object can be used as a generic container for
round-tripping of other pointer-to-struct values

I don't think so. Two different struct types can have different
alignment requirements, but pointers to those types can still have the
same representation (for example, if all object pointer types have the
same representation).

For example, in at least one compiler the type "struct { char c; }"
only requires one-byte alignment. An object of that type could be
allocated at an odd address; converting a pointer to that object to
some other pointer-to-struct type could cause problems.

Given:
struct tiny { char c; };
struct bigger { char c; int i; };
I think that converting a pointer value from "struct tiny*" to
"struct bigger*" and back again, or vice versa, should be ok, but
you can't necessarily dereference the intermediate pointer.
 
R

Robin Haigh

Keith Thompson said:
I don't think so. Two different struct types can have different
alignment requirements, but pointers to those types can still have the
same representation (for example, if all object pointer types have the
same representation).

For example, in at least one compiler the type "struct { char c; }"
only requires one-byte alignment. An object of that type could be
allocated at an odd address; converting a pointer to that object to
some other pointer-to-struct type could cause problems.

Given:
struct tiny { char c; };
struct bigger { char c; int i; };
I think that converting a pointer value from "struct tiny*" to
"struct bigger*" and back again, or vice versa, should be ok, but
you can't necessarily dereference the intermediate pointer.


I thought Barry Schwarz's point was about the actual pointer conversion,
(struct S2 *)(void *)&S1. Actually dereferencing the wrong pointer is
obviously a different matter.

"Fetching" a whole struct as in *(struct S2 *)&S1 isn't going to do anything
useful unless a struct S2 matches an initial segment of a struct S1, in
which case its alignment requirements can't be more restrictive.

If we fetch a common member though the wrong pointer type, then even if the
pointer value is misaligned for the type it's been converted to, it must
still be correctly aligned for the type of the common member. That ought to
be enough. Otherwise, the situation is that we can't say ((struct S2
*)&S1)->type, but we can say *(int *)(struct S2 *)&S1, to get the first
member, or play games with unsigned chars and offsetof(), and so portably
reach the same result by cruder methods.

Which is close to the point really. The original requirement can be met
legally, portably and efficiently by mallocing blocks of bytes and laying
out bits of data at calculated offsets. Structs are supposed to be
syntactic sugar to make it more pleasant to write the same code, not an
obstacle course of abstract difficulties, otherwise we're going backwards.
 
R

Robin Haigh

Eric Sosman said:
[snip]

Chris' post shows one way to avoid that space penalty.
An elaboration (same idea, really, but capable of carrying
more than one piece of information) is to define a struct
type that carries just the type code and whatever additional
information is useful, and make that struct the first element
of every "payload" struct:

struct header {
enum { S1TYPE, S2TYPE } type;
unsigned int refcount;
struct vtable *vtable;
};

struct S1 {
struct header head;
char data;
};

struct S2 {
struct header head;
char pad[10000];
float data;
};

You can now gain just a little bit of type-safety by making
the function f() -- and others, presumably -- take a pointer
to a `struct header' instead of a pointer to `void':


and the equivalent in Chris's example would be to pass an int *. Hmm, an
unexpected proxy, but it works

void f(struct header *ptr) {
if (ptr->type == S1TYPE) {
struct S1 *s1ptr = (struct S1*)ptr;
printf ("%c\n", s1ptr->data);
}
else if (ptr->type == S2TYPE) {
struct S2 *s2ptr = (struct S2*)ptr;
printf ("%f\n", s2ptr->data);
}
else {
launch_from(NOSE, "demons");
}
}

... and you call it in the obvious way:

struct S1 s1instance = { { S1TYPE, 1, s1vtable), 'X' };
struct S2 s2instance = { { S2TYPE, 1, s2vtable}, {0}, 42.0f };
f (&s1instance.head);
f (&s2instance.head);

This sort of thing is often found in "poor man's O-O" code,
and works rather well once you're accustomed to it (in PMOO
programs, the initialization of the `head' element is usually
done by a "constructor" or "factory" function).

The important thing (in both Chris' post and this one) is
that you never form a `struct S1*' or `struct S2*' until you
actually know what kind of struct you've been given. It's the
jump-the-gun creation of unvalidated pointers that makes trouble.


So what it comes down to is that if a pointer-to-struct type is used to
access even the first member of a struct, the entire struct has to exist, as
allocated space -- a truncated version won't do even though we've no
intention of looking at the missing bit. I don't think it was always thus.
Meanwhile, if we want more flexible structured objects, we can always knock
something up using byte arrays and macros, without these obscure
complications.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,537
Members
45,023
Latest member
websitedesig25

Latest Threads

Top