Extending unions and ABI?

J

Jef Driesen

Suppose I have defined a union:

typedef struct foo_t {
...
} foo_t;

typedef struct bar_t {
...
} bar_t;

typedef union foobar_t {
foo_t foo;
bar_t bar;
} foobar_t;

Which is part of the public api of my project. The union is used to pass
different types of data (e.g. either a foo or a bar struct) back to the caller
through a callback function. The type is used distinguish between the two:

typedef enum foobar_type_t {
FOO;
BAR;
} foobar_type_t;

typedef void (*callback_t)(foobar_type_t type, const foobar_t *foobar);

void
dosomething (callback_t callback)
{

foobar_t foobar;

foobar.foo = ...
callback (FOO, &foobar);

foobar.bar = ...
callback (BAR, &foobar);
}

Note that I only use pointers to the union in the public api. The union itself
appears only in the implementation (e.g. in the dosomething() function).

Now suppose I need to extend my api to support a third type of data. Is it
portable to just add a third struct to the union? Will it break the ABI rules
and thus backwards compatibility. Is this a violation of the C standard?

The alternative would be to use a void pointer and require the user to cast to
the appropriate data type, but this is less readable:

foobar>foo.member

vs

((foo_t *)foobar)>member

Jef
 
T

Tom St Denis

Suppose I have defined a union:

typedef struct foo_t {
    ...

} foo_t;

typedef struct bar_t {
    ...

} bar_t;

typedef union foobar_t {
    foo_t foo;
    bar_t bar;

} foobar_t;

Which is part of the public api of my project. The union is used to pass
different types of data (e.g. either a foo or a bar struct) back to the caller
through a callback function. The type is used distinguish between the two:

typedef enum foobar_type_t {
    FOO;
    BAR;

} foobar_type_t;

typedef void (*callback_t)(foobar_type_t type, const foobar_t *foobar);

void
dosomething (callback_t callback)
{

    foobar_t foobar;

    foobar.foo = ...
    callback (FOO, &foobar);

    foobar.bar = ...
    callback (BAR, &foobar);

}

Note that I only use pointers to the union in the public api. The union itself
appears only in the implementation (e.g. in the dosomething() function).

Now suppose I need to extend my api to support a third type of data. Is it
portable to just add a third struct to the union? Will it break the ABI rules
and thus backwards compatibility. Is this a violation of the C standard?

If you add things to the union you potentially change the size of it,
so any code compiled against the definition already won't be able to
work with the object properly.

IOW, don't do that.

If you must, pass a void* to your callback prototype and have the
instances of the function deref it appropriately. e.g.

void callback_foo(void *data, ...)
{
struct foo *foo_data = data;

foo_data->whatever = 3;
}

similar for bar...

Tom
 
S

Shao Miller

Suppose I have defined a union:

typedef struct foo_t {
...
} foo_t;

typedef struct bar_t {
...
} bar_t;

typedef union foobar_t {
foo_t foo;
bar_t bar;
} foobar_t;

Which is part of the public api of my project.

As in, these are in a header file and the struct and union definitions
will thus be available within all users' translation units, assuming
they include the header?
The union is used to pass
different types of data (e.g. either a foo or a bar struct) back to the
caller through a callback function. The type is used distinguish between
the two:

typedef enum foobar_type_t {
FOO;
BAR;
} foobar_type_t;

Did you mean to use commas instead of semi-colons for the two values?
typedef void (*callback_t)(foobar_type_t type, const foobar_t *foobar);

void
dosomething (callback_t callback)
{

foobar_t foobar;

foobar.foo = ...
callback (FOO, &foobar);

foobar.bar = ...
callback (BAR, &foobar);
}

Note that I only use pointers to the union in the public api. The union
itself appears only in the implementation (e.g. in the dosomething()
function).

Do you mean that the union definition you gave closer to the beginning
of your post is only within the translation unit with the
'dosomething()' definition? Or do you mean that the only code which
will use the 'sizeof' or the members of the union will be within the
translation unit with the 'dosomething()' definition?
Now suppose I need to extend my api to support a third type of data. Is
it portable to just add a third struct to the union? Will it break the
ABI rules and thus backwards compatibility. Is this a violation of the C
standard?

From n1256.pdf, section 6.2.5, point 27:

"...All pointers to union types shall have the same representation
and alignment requirements as each other..."

:)
The alternative would be to use a void pointer and require the user to
cast to the appropriate data type,

No need to cast a 'void *' to another pointer-to-object type.
but this is less readable:

foobar>foo.member

vs

((foo_t *)foobar)>member

Are you meaning '->' instead of '>'?
 
M

Morris Keesan

As in, these are in a header file and the struct and union definitions
will thus be available within all users' translation units, assuming
they include the header?


Did you mean to use commas instead of semi-colons for the two values?


Do you mean that the union definition you gave closer to the beginning
of your post is only within the translation unit with the
'dosomething()' definition? Or do you mean that the only code which
will use the 'sizeof' or the members of the union will be within the
translation unit with the 'dosomething()' definition?


From n1256.pdf, section 6.2.5, point 27:

"...All pointers to union types shall have the same representation
and alignment requirements as each other..."

:)


No need to cast a 'void *' to another pointer-to-object type.

There is if you're going to do what he does here:

Unless you want to declare an extra pointer to hold the converted type:
(foo_t *foo = foobar)->member
 
J

Jef Driesen

As in, these are in a header file and the struct and union definitions
will thus be available within all users' translation units, assuming
they include the header?

Yes. How would they be able to use it otherwise?
Did you mean to use commas instead of semi-colons for the two values?

Of course.
Do you mean that the union definition you gave closer to the beginning
of your post is only within the translation unit with the
'dosomething()' definition? Or do you mean that the only code which
will use the 'sizeof' or the members of the union will be within the
translation unit with the 'dosomething()' definition?

The dosomething() function would be exposed in the public api, but the
implementation would be internally to the library. A user would have to do
something like this:

#include <dosomething.h>

void mycallback(foobar_type_t type, const foobar_t *foobar)
{
switch (type) {
case FOO:
/* Use foobar->foo.xyz */
break;
case BAR:
/* Use foobar->bar.abc */
break;
}
}

int
main (void)
{
dosomething (mycallback);
}

So the union is allocated in the dosomething() function in the library and
passed back to the caller, not the other way around.
From n1256.pdf, section 6.2.5, point 27:

"...All pointers to union types shall have the same representation
and alignment requirements as each other..."

:)


No need to cast a 'void *' to another pointer-to-object type.

I know, but if it's not assigned to another variable, but converted on-the-fly
like below I don't think you can avoid the cast.
Are you meaning '->' instead of'>'?

Of course.
 
J

Jens Thoms Toerring

If you add things to the union you potentially change the size of it,
so any code compiled against the definition already won't be able to
work with the object properly.

Well, but the user gets, as far as the OP wrote, only pointers
to it and not copies, so I would think the size of what's pointef
to is irrelevant - the size of the pointer remains the same. The
only information used when compiling a program using the header
file declaring the union when only using pointers is thus what
members there are - and not being aware that there could be
others won't hurt (except that the user can't access them).
IOW, don't do that.

I think it's completely safe to extend the union as long as the
user (i.e. the callback function written by the user) only gets
a pointer to an instance. Even if the user would create his own
instance of the union (using the old version without the new
members) and copies from the union pointed to he's still safe
since he will perhaps miss a few bytes but only for members he
can't access anyway because he doesn't know about them.
If you must, pass a void* to your callback prototype and have the
instances of the function deref it appropriately. e.g.
void callback_foo(void *data, ...)
{
struct foo *foo_data = data;

Now you're talking structures, but the OP has a union that
contains different structures. And he's not going to change
the structures but just add another structure to the union.
I don't see how converting in the function that calls the
callback function to a void pointer and then converting back
in the user-written callback function would help.

Regards, Jens
 
S

Shao Miller

There is if you're going to do what he does here:


Unless you want to declare an extra pointer to hold the converted type:
(foo_t *foo = foobar)->member

Oops. Good point. :) I didn't understand the need for a cast, since
one could use:

... foobar->foo.member ...

Assuming the union definition is available to the users... 'foo_t'
obviously must be.
 
S

Shao Miller

Yes. How would they be able to use it otherwise?

I didn't (and don't) understand your "the union itself appears only in
the implementation (e.g. in the dosomething() function)". What about it
only appears in the private TUs and not in the public headers?

[ Corrected typographical errors below. Not sure why indenting dies
with this news client. Sorry. ]
The union definition?
---
/* public.h */
...
typedef union foobar_t foobar_t;
...
---
/* private.c */
...
typedef union foobar_t {
foo_t foo;
bar_t bar;
} foobar_t;
...
---

Or is the definition in public.h?
The dosomething() function would be exposed in the public api, but the
implementation would be internally to the library. A user would have to
do something like this:

#include <dosomething.h>

void mycallback(foobar_type_t type, const foobar_t *foobar)
{
switch (type) {
case FOO:
/* Use foobar->foo.xyz */
break;
case BAR:
/* Use foobar->bar.abc */
break;
}
}

int
main (void)
{
dosomething (mycallback);
}

So the union is allocated in the dosomething() function in the library
and passed back to the caller, not the other way around.

Looks good to me. :)
I know, but if it's not assigned to another variable, but converted
on-the-fly like below I don't think you can avoid the cast.

Above, you have already shown:

foobar->foo.xyz

So why cast?

[ Corrected typographical errors below. ]
Really? I prefer the former, personally.

If you come up with version 2.0 of the "public.h" header and it includes
a new structure in the union, that structure could affect the alignment
requirements of the union.

If the only code that allocates such union objects is aware of the
change, the alignment requirements should still satisfy the old union
version, as long as you don't remove the old struct members, right?

I wouldn't expect that other users' code would care... Their foos and
bars are still aligned, and 'foo_t' and 'bar_t' still has the same tag
and same definition[n1256.pdf:6.2.7p1]. I'd view the use of a "newer"
union rather than the "outdated" union as "type-punning", under these
circumstances. I could be mistaken.
 
S

Shao Miller

If you come up with version 2.0 of the "public.h" header and it includes
a new structure in the union, that structure could affect the alignment
requirements of the union.

If the only code that allocates such union objects is aware of the
change, the alignment requirements should still satisfy the old union
version, as long as you don't remove the old struct members, right?

I wouldn't expect that other users' code would care... Their foos and
bars are still aligned, and 'foo_t' and 'bar_t' still has the same tag
and same definition[n1256.pdf:6.2.7p1]. I'd view the use of a "newer"
union rather than the "outdated" union as "type-punning", under these
circumstances. I could be mistaken.

Of course, since union version 2.0 could also change the size, hopefully
your old library users aren't using arrays with elements of this union
type! Then they'd be borked. :)
 
J

Jef Driesen

Yes. How would they be able to use it otherwise?

I didn't (and don't) understand your "the union itself appears only in
the implementation (e.g. in the dosomething() function)". What about it
only appears in the private TUs and not in the public headers?

[ Corrected typographical errors below. Not sure why indenting dies
with this news client. Sorry. ]
The union definition?
---
/* public.h */
...
typedef union foobar_t foobar_t;
...
---
/* private.c */
...
typedef union foobar_t {
foo_t foo;
bar_t bar;
} foobar_t;
...

In public.h, because if not I see no advantage in the union compared to using a
void pointer (instead of pointer to the union) together with the individual structs.
I know, but if it's not assigned to another variable, but converted
on-the-fly like below I don't think you can avoid the cast.

Above, you have already shown:

foobar->foo.xyz

So why cast?

[ Corrected typographical errors below. ]
Really? I prefer the former, personally.

I think there is a little misunderstand here. I also prefer the first version.
But this is of course only possible with the union (e.g. foobar is a pointer to
the union). Without the union, foobar would be a void pointer and needs to be
cast to a pointer to the appropriate struct (e.g. foo_t). And that's the second
version above.
If you come up with version 2.0 of the "public.h" header and it includes
a new structure in the union, that structure could affect the alignment
requirements of the union.

If the only code that allocates such union objects is aware of the
change, the alignment requirements should still satisfy the old union
version, as long as you don't remove the old struct members, right?

I wouldn't expect that other users' code would care... Their foos and
bars are still aligned, and 'foo_t' and 'bar_t' still has the same tag
and same definition[n1256.pdf:6.2.7p1]. I'd view the use of a "newer"
union rather than the "outdated" union as "type-punning", under these
circumstances. I could be mistaken.

I'm not sure either, that's why I'm asking :)
 
J

Jef Driesen

If you come up with version 2.0 of the "public.h" header and it includes
a new structure in the union, that structure could affect the alignment
requirements of the union.

If the only code that allocates such union objects is aware of the
change, the alignment requirements should still satisfy the old union
version, as long as you don't remove the old struct members, right?

I wouldn't expect that other users' code would care... Their foos and
bars are still aligned, and 'foo_t' and 'bar_t' still has the same tag
and same definition[n1256.pdf:6.2.7p1]. I'd view the use of a "newer"
union rather than the "outdated" union as "type-punning", under these
circumstances. I could be mistaken.

Of course, since union version 2.0 could also change the size, hopefully
your old library users aren't using arrays with elements of this union
type! Then they'd be borked. :)

If an application would define an array, that shouldn't cause any problems
because everywhere in that application they will be using one version of the
union. It's only when you have to pass them between the application and the
library that there can be a mismatch in version.

But in my case the data only needs to be passed from the library to the
application, and never the other way around. And I will be passing pointers, not
the structs or unions directly. The reason for the pointers is exactly the
ability to add new structs in the future without breaking backwards
compatibility, which is not possible with a plain union. Using the union would
be for convenience, to avoid the casting of a void pointer to the appropriate
struct.
 
S

Shao Miller

If you come up with version 2.0 of the "public.h" header and it includes
a new structure in the union, that structure could affect the alignment
requirements of the union.

If the only code that allocates such union objects is aware of the
change, the alignment requirements should still satisfy the old union
version, as long as you don't remove the old struct members, right?

I wouldn't expect that other users' code would care... Their foos and
bars are still aligned, and 'foo_t' and 'bar_t' still has the same tag
and same definition[n1256.pdf:6.2.7p1]. I'd view the use of a "newer"
union rather than the "outdated" union as "type-punning", under these
circumstances. I could be mistaken.

Of course, since union version 2.0 could also change the size, hopefully
your old library users aren't using arrays with elements of this union
type! Then they'd be borked. :)

If an application would define an array, that shouldn't cause any
problems because everywhere in that application they will be using one
version of the union. It's only when you have to pass them between the
application and the library that there can be a mismatch in version.

But in my case the data only needs to be passed from the library to the
application, and never the other way around. And I will be passing
pointers, not the structs or unions directly. The reason for the
pointers is exactly the ability to add new structs in the future without
breaking backwards compatibility, which is not possible with a plain
union. Using the union would be for convenience, to avoid the casting of
a void pointer to the appropriate struct.

I'd figured that you'd meant that.

Hopefully a user doesn't do something like make a copy of a 'foobar_t';
they'd be using an outdated size if you changed it library-wise. And
yet fortunately, since they only know about 'foo_t' and 'bar_t', they'd
still copy as many bytes as they care about. :)

Along the lines of the suggestion of "Columbus sailed the ocean China
Blue," you could certainly pass a pointer to a common, first member.
This strategy is useful with such macros as 'container_of()' or
'CONTAINING_RECORD()' and I've used and seen that strategy used quite a bit.

A useful first member of 'foo_t' and 'bar_t' might be (but is obviously
not limited to being):

- A 'size_t' with the intention of expressing the size of the object
(potentially even useful as a make-shift signature if you can guarantee
all 'XXX_t' will have different sizes).

- Your original 'enum' value for distinguishing the type (you had
'foobar_type_t' for this). Woe be to you if the day comes that you
think the order of the 'enum' values would be prettier in some other order.

- A function pointer for pointing to a function whose type is useful
across all 'XXX_t'. While such a "handler" might not immediately seem
as useful as comparing against your 'enum' values, it can be useful as a
way to "query" the object; perhaps for information including... Its
type. Or the operations that are supported on that type/object. Or any
extensions available for that type/object. :)

- A little 'struct' type that you will be happy with for eternity. I
don't care much for this one, since I don't know what the future might
teach, and because it seems to stray from "keeping it simple," and
because it could cost more memory than some of the other options.

- A pointer to a little 'const struct' type that you will be happy with
for eternity. The pointed-to structure can be common across all 'foo_t'
and encompass information about that type.

The following example is also available (with some nice syntax
colouring) at:

http://codepad.org/sMZemHGx

/**** public.h */

/*** Object types */
typedef struct s_animal_ s_animal;
typedef struct s_cat_ s_cat;
typedef struct s_dog_ s_dog;

/*** Function types */
typedef void f_any(void);
typedef f_any * f_handler(s_animal *, f_any op);
typedef void f_pet(s_animal *);

/*** Struct/union definitions */
struct s_animal_ {
s_animal * self;
f_handler * handler;
};
struct s_cat_ {
s_animal animal;
int lives;
const char * pet_noise;
};
struct s_dog_ {
s_animal animal;
const char * pet_noise;
};

/*** Function declarations */
extern f_pet pet;
extern s_cat * make_cat(void);
extern s_dog * make_dog(void);

/**** private.c */
#include <stddef.h>
#include <stdio.h>
#include <stdlib.h>

/*** Function definitions */

void pet(s_animal * animal) {
f_any * pet_func = animal->handler(animal, (f_any *)pet);
f_pet * pet_func2;
if (pet_func == (f_any *)0) {
puts("Huh? (Type \"help\" for help.)");
return;
};
pet_func2 = (f_pet *)pet_func;
pet_func2(animal);
return;
}

static f_pet pet_cat;
static void pet_cat(s_animal * animal) {
s_cat * cat = (void *)animal;
puts(cat->pet_noise);
return;
}

static f_handler handle_cat;
static f_any * handle_cat(s_animal * animal, f_any * op) {
if (op == (f_any *)pet)
return (f_any *)pet_cat;
return (f_any *)0;
}

s_cat * make_cat(void) {
s_cat * cat = malloc(sizeof *cat);
if (!cat) return NULL;
cat->animal.self = &cat->animal;
cat->animal.handler = handle_cat;
cat->lives = 9;
cat->pet_noise = "Purr, purr...";
return cat;
}

static f_pet pet_dog;
static void pet_dog(s_animal * animal) {
s_dog * dog = (void *)animal;
puts(dog->pet_noise);
return;
}

static f_handler handle_dog;
static f_any * handle_dog(s_animal * animal, f_any * op) {
if (op == (f_any *)pet)
return (f_any *)pet_dog;
return (f_any *)0;
}

s_dog * make_dog(void) {
s_dog * dog = malloc(sizeof *dog);
if (!dog) return NULL;
dog->animal.self = &dog->animal;
dog->animal.handler = handle_dog;
dog->pet_noise = "Pant, pant...";
return dog;
}

/**** test.c */
#include "public.h"

int main(void) {
s_cat * kitty = make_cat();
s_dog * doggy = make_dog();
if (kitty) {
pet(&kitty->animal);
free(kitty);
}
if (doggy) {
pet(&doggy->animal);
free(doggy);
}
return 0;
}
 
T

Tim Rentsch

Jef Driesen said:
Suppose I have defined a union:

typedef struct foo_t {
...
} foo_t;

typedef struct bar_t {
...
} bar_t;

typedef union foobar_t {
foo_t foo;
bar_t bar;
} foobar_t;

Which is part of the public api of my project. The union is used to
pass different types of data (e.g. either a foo or a bar struct) back
to the caller through a callback function. The type is used
distinguish between the two:

typedef enum foobar_type_t {
FOO;
BAR;
} foobar_type_t;

typedef void (*callback_t)(foobar_type_t type, const foobar_t *foobar);

void
dosomething (callback_t callback)
{

foobar_t foobar;

foobar.foo = ...
callback (FOO, &foobar);

foobar.bar = ...
callback (BAR, &foobar);
}

Note that I only use pointers to the union in the public api. The
union itself appears only in the implementation (e.g. in the
dosomething() function).

Now suppose I need to extend my api to support a third type of
data. Is it portable to just add a third struct to the union? Will it
break the ABI rules and thus backwards compatibility. Is this a
violation of the C standard? [.. snip alternative ..]

Sorry to be late in responding here.

If, after adding another member to foobar_t, all translation
units that can see these members are recompiled, all is
well. Conversely if some TU's that can see those members
are not recompiled, technically that's undefined behavior.

So, any "public api" project source would need to be
recompiled, _unless_ it can't see the members of the union
type foobar_t. Assuming they don't access such members, but
only pass pointers around, a way to do that would be to just
declare the type

union foobar_t;
typedef union foobar_t foobar_t;

and _not_ define the contents of 'union foobar_t', and use just
that declaration for those "public api" sources/TU's that are
not to be recompiled. After that, if all TU's that _can_ access
the union's members are recompiled whenever its definition
changes, that stays inside the Standard's boundaries and does
not transgress, even technically, into undefined behavior.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,012
Latest member
RoxanneDzm

Latest Threads

Top