Tagged unions

Johan Tibell · Jul 21, 2006

I use a tagged union to represent different expression types in one of
my programs.

struct exp {
enum {
LIT,
VAR
} type;
union {
int lit;
char *var;
} form;
};

In my implementation I've put the enum outside of the struct and given
it a name, "exp_type".

enum exp_type { /* ... */ };

struct my_struct {
enum exp_type type;
/* ... */
};

What would be the pros and cons of having it unnamed inside the struct
versus named outside the struct respectively? I can think of a few:

Pros:
* Less pollution of the namespace. I currently have two different
structs so I have to prefix my enum type names with "structname_" (e.g.
exp_type).
* Saves me some typing.
* Avoid repetition of the name "type" in the variable declaration
inside the struct (e.g. exp_type type).

Cons:
* Can't create a variable of the enum type since the type can't be
referred to. (Would it even be possible to refer to the enum type if it
was named _and_ declared inside the struct?). This can also be a good
thing if no more variables of the enum type will ever be created but it
can be a bit difficult to predict in advance.

This is probably more of a stylistic question than anything else (and
hence I expect 10^100 replies).

Tak-Shing Chan · Jul 21, 2006

I use a tagged union to represent different expression types in one of
my programs.

struct exp {
enum {
LIT,
VAR
} type;
union {
int lit;
char *var;
} form;
};

In my implementation I've put the enum outside of the struct and given
it a name, "exp_type".

enum exp_type { /* ... */ };

struct my_struct {
enum exp_type type;
/* ... */
};

What would be the pros and cons of having it unnamed inside the struct
versus named outside the struct respectively? I can think of a few:

Pros:
* Less pollution of the namespace. I currently have two different
structs so I have to prefix my enum type names with "structname_" (e.g.
exp_type).
* Saves me some typing.
* Avoid repetition of the name "type" in the variable declaration
inside the struct (e.g. exp_type type).

Cons:
* Can't create a variable of the enum type since the type can't be
referred to. (Would it even be possible to refer to the enum type if it
was named _and_ declared inside the struct?). This can also be a good
thing if no more variables of the enum type will ever be created but it
can be a bit difficult to predict in advance.

Why would you ever need to create such variables? That is
bad programming practice in my book (creating unnecessary
couplings).

This is probably more of a stylistic question than anything else (and
hence I expect 10^100 replies).

My preference is to use anonymous enums in this situation.

Tak-Shing

Johan Tibell · Jul 21, 2006

Tak-Shing Chan said:
Why would you ever need to create such variables? That is
bad programming practice in my book (creating unnecessary
couplings).

In this case I'm representing an abstract syntax tree (AST) created
using lex/yacc which will be passed to a function eval for evaluation.
In functional languages one usually uses an algebraic data type to
represent such ASTs and in OO languages a class hierarchy is often
(always?) used. I assumed that a tagged union would be the
corresponding representation in C. If you know of a better alternative
please let me know, I'm not an experienced C programmer.

Or perhaps I don't quite understand what part of my implementation you
think is bad. Are you referring to the whole tagged union thing?

Tak-Shing Chan · Jul 21, 2006

[You are quoting me out of context here. By ``such
variables'' I am referring to reused enums, not tagged unions.]

In this case I'm representing an abstract syntax tree (AST) created
using lex/yacc which will be passed to a function eval for evaluation.
In functional languages one usually uses an algebraic data type to
represent such ASTs and in OO languages a class hierarchy is often
(always?) used. I assumed that a tagged union would be the
corresponding representation in C. If you know of a better alternative
please let me know, I'm not an experienced C programmer.

Or perhaps I don't quite understand what part of my implementation you
think is bad. Are you referring to the whole tagged union thing?

You have misread my post. What I said was, tagged unions
are fine but reused enums are bad (in the context of this
thread). IMHO. YMMV.

Tak-Shing

Ben Pfaff · Jul 21, 2006

Johan Tibell said:
struct exp {
enum {
LIT,
VAR
} type;
union {
int lit;
char *var;
} form;
};
....versus...

enum exp_type { /* ... */ };

struct my_struct {
enum exp_type type;
/* ... */
};

Cons:
* Can't create a variable of the enum type since the type can't be
referred to. (Would it even be possible to refer to the enum type if it
was named _and_ declared inside the struct?).

(Yes, it would.)

This can also be a good thing if no more variables of the enum
type will ever be created but it can be a bit difficult to
predict in advance.

I often use the former style, where the enum is declared without
a tag inside the struct. If later it becomes necessary to refer
to its type explicitly (which is fairly rare), it's only the
matter of a moment's work to add a tag.

Rob Thorpe · Jul 21, 2006

Johan said:
I use a tagged union to represent different expression types in one of
my programs.

struct exp {
enum {
LIT,
VAR
} type;
union {
int lit;
char *var;
} form;
};

In my implementation I've put the enum outside of the struct and given
it a name, "exp_type".

enum exp_type { /* ... */ };

struct my_struct {
enum exp_type type;
/* ... */
};

What would be the pros and cons of having it unnamed inside the struct
versus named outside the struct respectively? I can think of a few:

Pros:
* Less pollution of the namespace. I currently have two different
structs so I have to prefix my enum type names with "structname_" (e.g.
exp_type).
* Saves me some typing.
* Avoid repetition of the name "type" in the variable declaration
inside the struct (e.g. exp_type type).
Cons:
* Can't create a variable of the enum type since the type can't be
referred to. (Would it even be possible to refer to the enum type if it
was named _and_ declared inside the struct?). This can also be a good
thing if no more variables of the enum type will ever be created but it
can be a bit difficult to predict in advance.

This is probably more of a stylistic question than anything else (and
hence I expect 10^100 replies).

If something like this is for use inside a programming langauge
implementation it is likely to be used a lot. In this case I'd
recommend separating the meaning of the data from it's structure. You
could for example create a set of inline functions that access parts of
the struct. That way it becomes much easier to change the inside of
the struct without breaking other things. (eg functions like get_type,
set_type, get_lit, etc)

This type of pseudo-OO in C isn't always a good idea, but it's useful
for something like this.

Johan Tibell · Jul 21, 2006

Tak-Shing Chan said:
[You are quoting me out of context here. By ``such
variables'' I am referring to reused enums, not tagged unions.]

I was a bit unsure about which part you were addressing (since you
almost quoted my entire message) and therefore I included the caveat at
the very end of my message. Thank you for the clarification.

Chris Torek · Jul 22, 2006

[vertically compressed]

struct exp {
enum { LIT, VAR } type;
union { int lit; char *var; } form;
};

[vs
enum exp_type { LIT, VAR };
struct exp {
enum exp_type type;
union { int lit; char *var; } form;
};
]

What would be the pros and cons of having it unnamed inside the struct
versus named outside the struct respectively? I can think of a few:

Pros:
* Less pollution of the namespace. I currently have two different
structs so I have to prefix my enum type names with "structname_" (e.g.
exp_type).

This is a smaller pro than it may look: enumeration members are
in the ordinary namespace, at the same scope as the overall definition
of the structure type, so LIT and VAR can appear anywhere up to
the end of the current scope and must be unique. That is:

struct expression {
enum { LIT, VAR } type;
...
};
struct fuse {
enum { UNLIT, LIT } type;
...
};

is no good -- the two "LIT"s conflict. (In C++ each struct has its
own little sub-namespace, but C is not C++.)

Thus, the only namespace you avoid polluting is the "tag" namespace.

* Saves me some typing.

Not much, since you can also write:

struct exp {
enum exp_type { LIT, VAR } type;
union { ... } form;
}

* Avoid repetition of the name "type" in the variable declaration
inside the struct (e.g. exp_type type).

Cons:
* Can't create a variable of the enum type since the type can't be
referred to. (Would it even be possible to refer to the enum type if it
was named _and_ declared inside the struct?).

Easily fixed by adding an enum tag, as above. Yes, you can refer to
"embedded" types afterward in C:

struct foo {
enum zot { ZOT_A, ZOT_B } zot;
struct bar {
int i;
};
char *p;
};
enum zot zed;
struct bar bar;

(Again, C++ is different -- another reason not to try to compile C
code with a C++ compiler: valid C code is sometimes invalid, but
sometimes valid yet meaning something else, in C++.)

Extending unions and ABI?	12	May 16, 2011
Html Helper not working, not recognizing InputModel inside RegistraModel	0	Oct 24, 2022
Arrays of Unions	21	May 10, 2010
What's The Best Practice Defining Error Codes in C	16	Jun 29, 2012
Leading padding in unions	25	Mar 2, 2009
struct hack	3	Mar 28, 2012
struct alignment	14	Jan 11, 2012
unions within an array	10	Mar 16, 2009

Tagged unions

Johan Tibell

Tak-Shing Chan

Johan Tibell

Tak-Shing Chan

Ben Pfaff

Rob Thorpe

Johan Tibell

Chris Torek

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads