Interface design - options with varargs

K

kid joe

Hi all,

Imagine youre making an object-oriented style library, say it deals with
"struct myobj"'s. There is a function
int myobj_init(struct myobj *o, ***);
where what comes in the *** part is what I want to discuss.

Obviously the init function will initialize the fields in the o
structure, maybe allocate some memory or associate other resources. The
*** should be options that can be used to initialize some of the fields.

It seems that the most common way of doing this is to replace *** by
a single parameter like a "struct myobj_options *", where this struct is
something like
struct myobj_options {
size_t initial_size;
int verbose;
int debugging;
etc.
}

Passing a NULL options pointer uses default settings for the options.

If later on the library develops and extra options become available, as
long as the code calling the library creates the struct myobj_options with
calloc and sets some fields, all the extra fields that didnt exist when
the calling code was written, will just be zero. Code using the library
needs to be recompiled but not otherwise rewritten.

What I was thinking was an alternative to this setup, where *** is
replaced by a varargs list:
int myobj_init(struct myobj *o, int n, ...);

Here n is the number of options that are supplied in the varargs. Of
course, the options would have to come in a specified order.

This seems like a nice arrangement to me. Here are some idioms:
myobj_init(&o, 0); /* use defaults for all arguments */
myobj_init(&o, 2, 1000, 1); /* specify first two options and use
defaults for the others */

When new options are added, existing code linking against the library
doesn't even need to be recompiled, unlike in the "options struct" case.

But I don't think I've ever seen this approach used. Is there a reason why
this would be a bad idea?


Cheers,
Joe
 
S

Stefan Ram

kid joe said:
Code using the library
needs to be recompiled but not otherwise rewritten.

You lose static argument checking this way.

When you have a function

void f( int x )

and change it to

void f( int x, int y )

later, the compiler will complain for every site where it
is called with only one argument.

When you have a function

void f( struct f * parameters )

and extend »struct f« later, the compiler might not warn
about sites where the new fields of the struct are not
initialized properly.

There is always a trade-off between static checking and
dynamic flexibility. Eventually you will have to make
a decision what is deemed more important in a given case.
replaced by a varargs list:

Using varargs, one also gives up some static argument
type checking.
 
K

kid joe

kid said:
Imagine youre making an object-oriented style library, say it deals with
"struct myobj"'s. There is a function

int myobj_init(struct myobj *o, ***);

where what comes in the *** part is what I want to discuss.

Obviously the init function will initialize the fields in the o
structure, maybe allocate some memory or associate other resources. The
*** should be options that can be used to initialize some of the fields.

It seems that the most common way of doing this is to replace *** by
a single parameter like a "struct myobj_options *", where this struct is
something like
struct myobj_options {
size_t initial_size;
int verbose;
int debugging;
etc.
}

Passing a NULL options pointer uses default settings for the options.

If later on the library develops and extra options become available, as
long as the code calling the library creates the struct myobj_options with
calloc and sets some fields, all the extra fields that didnt exist when
the calling code was written, will just be zero. Code using the library
needs to be recompiled but not otherwise rewritten.

What I was thinking was an alternative to this setup, where *** is
replaced by a varargs list:
int myobj_init(struct myobj *o, int n, ...); [...]
When new options are added, existing code linking against the library
doesn't even need to be recompiled, unlike in the "options struct" case.

But I don't think I've ever seen this approach used. Is there a reason why
this would be a bad idea?

The varargs approach is a bad idea. A much safer and more friendly
variant is a set of init functions which each take different sets of
arguments, to cover different cases.

Hi Blargg,

Thats OK if youve got a language like C++ that supports overloading, not
so nice in C...

Why do you think varargs is a bad idea? It seems to me youre trading off
some type safety (not much of an issue as most flags will be ints viewed
as booleans) and in exchange you get builtin compatibility with future
versions of the library without needing to recompile the code.

Cheers,
Joe


-- ...................... o _______________ _,
` Good Morning! , /\_ _| | .-'_|
`................, _\__`[_______________| _| (_|
] [ \, ][ ][ (_|
 
G

Giacomo Degli Esposti

This seems like a nice arrangement to me. Here are some idioms:
myobj_init(&o, 0);  /* use defaults for all arguments */
myobj_init(&o, 2, 1000, 1);  /* specify first two options and use
                         defaults for the others */

When new options are added, existing code linking against the library
doesn't even need to be recompiled, unlike in the "options struct" case.

But I don't think I've ever seen this approach used. Is there a reason why
this would be a bad idea?

I used a library where this approach was used. The vararg list was
formed by
a list of couples (name, value) with a terminator at the end. That
allows you
to specify only what you want to initialize, without the need to
specify
all values before the one you need.
It looked like this:

myobj_init( &obj, PROP_NAME, "my_obj", PROP_X, 100, PROP_Y, 200,
PROP_END );

ciao
Giacomo
 
G

Guest

Imagine youre making an object-oriented style library, say it deals with
"struct myobj"'s. There is a function
int myobj_init(struct myobj *o, ***);
where what comes in the *** part is what I want to discuss.

Obviously the init function will initialize the fields in the o
structure, maybe allocate some memory or associate other resources. The
*** should be options that can be used to initialize some of the fields.

It seems that the most common way of doing this is to replace *** by
a single parameter like a "struct myobj_options *", where this struct is
something like

struct myobj_options {
  size_t initial_size;
  int verbose;
  int debugging;
  etc.
}

Passing a NULL options pointer uses default settings for the options.

If later on the library develops and extra options become available, as
long as the code calling the library creates the struct myobj_options with
calloc and sets some fields, all the extra fields that didnt exist when
the calling code was written, will just be zero. Code using the library
needs to be recompiled but not otherwise rewritten.

It seems a bit odd to me that you can add new fields but not change
the
code. If there are more data members that I'd have thought that
implied a
behavioural change, which sounds like new code to me...

What I was thinking was an alternative to this setup, where *** is
replaced by a varargs list:
int myobj_init(struct myobj *o, int n, ...);

Here n is the number of options that are supplied in the varargs. Of
course, the options would have to come in a specified order.

This seems like a nice arrangement to me. Here are some idioms:
myobj_init(&o, 0);  /* use defaults for all arguments */
myobj_init(&o, 2, 1000, 1);  /* specify first two options and use
                         defaults for the others */

When new options are added, existing code linking against the library
doesn't even need to be recompiled, unlike in the "options struct" case.

again, I think that's a bit odd; at the very least I'd expect
constructors
(xxx_init()) to change.
But I don't think I've ever seen this approach used. Is there a reason why
this would be a bad idea?

people can't count. varargs interfaces are a little fragile.

I've drifted a bit into OO design (which I am not well qualified
to discuss). You might try comp.object though its traffic is rather
low these days. Perhaps you'll awaken them!


--
Nick Keighley

"Object-oriented programming is an exceptionally bad idea
that could only have originated in California."
--Dijkstra
 
K

kid joe

If you've got more than a very few options, a variable argument
list becomes error-prone: Not only is the programmer forced to remember
the correct order of all those options, and get all their types right,
but the compiler won't give him a helpful error message if he blunders.
Quickly, now: Spot the error in

myobj_init(&myobj, 5, 42, 18, PALE_YELLOW, 0.33, "Zaphod");

Some improvement can be had by using name/value pairs, leading
to calls like

myobj_init(&myobj,
OBJ_X, 42,
OBJ_Y, 18,
OBJ_HUE, PALE_YELLOW,
OBJ_PUISSANCE, 0.33,
OBJ_REFERER, "Zaphod",
OBJ_END_OF_LIST);

This relieves the programmer of the need to remember a fixed order
and gives him some mnemonics to look at, but getting the types right
is still his problem and the compiler still won't help. Also, the
calls start to get unwieldy, what with passing (about) twice as many
arguments as option values.

Individually-named attribute setters can be of use sometimes:

myobj_init(&myobj);
myobj_setXY(&myobj, 42, 18);
myobj_setHUE(&myobj, PALE_YELLOW);
myobj_setPUISSANCE(&myobj, 0.33);
myobj_setREFERER(&myobj, "Zaphod");

A drawback of this approach is that the code implementing the object
doesn't know when initialization is finished unless you add a
myobj_setITINCONCRETE() call (easily forgotten) or unless the object
can make a distinction between "initialization" calls and "other"
calls and maintain a status flag to indicate which "stage" it's in.

There's no Really Wonderful solution to this in C (most macro
assemblers handle it pretty well, but C is, ahem, more advanced).
Of the less-than-wonderful solutions available, two seem reasonable:

- The "options struct" approach, which gives the attributes
recognizable names, allows type-checking, and presents them
all at the same time for validation, and

- The "Keep it simple, stupid!" approach, which questions the
need to pack so much state into one indivisible object.

Hi Eric,

If I use the name/value pairs option, and define an enum joe_options to
hold the names, will it technically be UB in the following situation?

1) Programmer does #include <joe.h> to get the enum definition, then calls
myobj_init(&o, JOE_OPT_DEBUGGING, 1, JOE_END_OPTIONS) and compiles his
code.

2) The library is expanded, and now there are new opts like
JOE_OPT_EXCITING_NEW_FEATURE added at the end of the enum.

3) User runs the program compiled against the old headers (with the
outdated enum), but it gets linked dynamically against the new library.

I guess it will only be an issue if there are so many new options that the
compiler changes the type of the enum to a bigger integer type? So
probably OK in practise?

Cheers,
Joe
 
K

Keith Thompson

Eric Sosman said:
kid said:
[...]
If I use the name/value pairs option, and define an enum joe_options to
hold the names, will it technically be UB in the following situation?

1) Programmer does #include <joe.h> to get the enum definition, then calls
myobj_init(&o, JOE_OPT_DEBUGGING, 1, JOE_END_OPTIONS) and compiles his
code.

2) The library is expanded, and now there are new opts like
JOE_OPT_EXCITING_NEW_FEATURE added at the end of the enum.

3) User runs the program compiled against the old headers (with the
outdated enum), but it gets linked dynamically against the new library.

I guess it will only be an issue if there are so many new options that the
compiler changes the type of the enum to a bigger integer type? So
probably OK in practise?

OK in practice, and OK by guarantee. The type of the enum
itself might change, but the named enumerated constants are all
of type `int'. The constants are `int' regardless of the type
that underlies the enum: `char', `uintmax_t', or anything in
between.

Note that myobj_init() needs to call
va_arg(ap, int);
and not
va_arg(ap, enum joe_opt);

It can assign the result either to an object of type enum joe_opt or
to an object of type int. For an extra level of checking, you could
assign it to an int and then check that it's within the expected
range. You could even choose unusual values for the enumeration
constants and *probably* catch a few more errors:

enum joe_opt {
JOE_OPT_FIRST = 12345,
JOE_OPT_DEBUGGING = JOE_OPT_FIRST,
JOE_OPT_SOMETHING_ELSE,
/* ... */
JOE_END_OPTIONS,
JOE_OPT_LAST = JOE_END_OPTIONS
};

...

int opt = va_arg(ap, int);
if (opt < JOE_OPT_FIRST || opt > JOE_OPT_LAST) {
KABOOM("Caller screwed up");
}

I'm not sure whether this level of paranoia is justified.
 
K

Keith Thompson

Eric Sosman said:
Good points. It occurs to me that there's one more potential
worry: If the struct being initialized looks like

struct joe {
double trouble;
enum joe_opt options;
char broiled;
...
}

... then he *is* vulnerable to a change in the enum's width,
because that could change the size of the struct and the layout
of its elements. Instead, the struct should be

struct joe {
double trouble;
int options;
char broiled;
...
}

... the idea being that the `int' suffices for all the named
`enum joe_opt' constants, and its width is not influenced by
that of the type underlying an `enum joe_opt'.

Sure, but I wouldn't think that the structure he's trying to
initialize would contain a member of type enum joe_opt anyway. That
type is used for the variadic initialization function. I don't think
it makes much sense to use it for anything else.

Each of JOE_OPT_* arguments specifies one of, say, "double trouble" or
"char broiled" that's to be initialized by the following argument, but
struct joe has *all* those members.
 
K

kid joe

Eric Sosman said:
kid said:
[...]
If I use the name/value pairs option, and define an enum joe_options to
hold the names, will it technically be UB in the following situation?

1) Programmer does #include <joe.h> to get the enum definition, then calls
myobj_init(&o, JOE_OPT_DEBUGGING, 1, JOE_END_OPTIONS) and compiles his
code.

2) The library is expanded, and now there are new opts like
JOE_OPT_EXCITING_NEW_FEATURE added at the end of the enum.

3) User runs the program compiled against the old headers (with the
outdated enum), but it gets linked dynamically against the new library.

I guess it will only be an issue if there are so many new options that the
compiler changes the type of the enum to a bigger integer type? So
probably OK in practise?

OK in practice, and OK by guarantee. The type of the enum
itself might change, but the named enumerated constants are all
of type `int'. The constants are `int' regardless of the type
that underlies the enum: `char', `uintmax_t', or anything in
between.

Note that myobj_init() needs to call
va_arg(ap, int);
and not
va_arg(ap, enum joe_opt);

It can assign the result either to an object of type enum joe_opt or
to an object of type int. For an extra level of checking, you could
assign it to an int and then check that it's within the expected
range. You could even choose unusual values for the enumeration
constants and *probably* catch a few more errors:

enum joe_opt {
JOE_OPT_FIRST = 12345,
JOE_OPT_DEBUGGING = JOE_OPT_FIRST,
JOE_OPT_SOMETHING_ELSE,
/* ... */
JOE_END_OPTIONS,
JOE_OPT_LAST = JOE_END_OPTIONS
};

...

int opt = va_arg(ap, int);
if (opt < JOE_OPT_FIRST || opt > JOE_OPT_LAST) {
KABOOM("Caller screwed up");
}

I'm not sure whether this level of paranoia is justified.
The only things you need to worry about are scrambling the
values of the named constants, and exceeding the limit on the
number of named constants in an enumeration (C99 guarantees at
least 1023 constants). And, of course, the other drawbacks I
mentioned earlier: No enforcement (or conversion) of the "value"
types, no compiler complaint if JOE_END_OPTIONS is overlooked,
double-length argument lists.

Hi Keith,

I would put JOE_END_OPTIONS at the start of the enum (value 0) so that if
the enum gets expanded in a future version of the library then it would
have a consistent value across code compiled with older versions.

Why do you say that int should be passed to va_arg() and not enum
joe_opt? Wouldnt it be UB anyway to convert an int outside the range of
enum joe_opt to an enum joe_opt?

Finally I dont think double length arg lists is really a problem as Id
imagine the init() functions being called only once when the object is
created, so the overhead of extra arguments wouldnt make much difference.

Cheers,
Joe
 
K

Keith Thompson

kid joe said:
Eric Sosman said:
kid joe wrote:
[...]
If I use the name/value pairs option, and define an enum joe_options to
hold the names, will it technically be UB in the following situation?

1) Programmer does #include <joe.h> to get the enum definition, then calls
myobj_init(&o, JOE_OPT_DEBUGGING, 1, JOE_END_OPTIONS) and compiles his
code.

2) The library is expanded, and now there are new opts like
JOE_OPT_EXCITING_NEW_FEATURE added at the end of the enum.

3) User runs the program compiled against the old headers (with the
outdated enum), but it gets linked dynamically against the new library.

I guess it will only be an issue if there are so many new options that the
compiler changes the type of the enum to a bigger integer type? So
probably OK in practise?

OK in practice, and OK by guarantee. The type of the enum
itself might change, but the named enumerated constants are all
of type `int'. The constants are `int' regardless of the type
that underlies the enum: `char', `uintmax_t', or anything in
between.

Note that myobj_init() needs to call
va_arg(ap, int);
and not
va_arg(ap, enum joe_opt);

It can assign the result either to an object of type enum joe_opt or
to an object of type int. For an extra level of checking, you could
assign it to an int and then check that it's within the expected
range. You could even choose unusual values for the enumeration
constants and *probably* catch a few more errors:

enum joe_opt {
JOE_OPT_FIRST = 12345,
JOE_OPT_DEBUGGING = JOE_OPT_FIRST,
JOE_OPT_SOMETHING_ELSE,
/* ... */
JOE_END_OPTIONS,
JOE_OPT_LAST = JOE_END_OPTIONS
};

...

int opt = va_arg(ap, int);
if (opt < JOE_OPT_FIRST || opt > JOE_OPT_LAST) {
KABOOM("Caller screwed up");
}

I'm not sure whether this level of paranoia is justified.
The only things you need to worry about are scrambling the
values of the named constants, and exceeding the limit on the
number of named constants in an enumeration (C99 guarantees at
least 1023 constants). And, of course, the other drawbacks I
mentioned earlier: No enforcement (or conversion) of the "value"
types, no compiler complaint if JOE_END_OPTIONS is overlooked,
double-length argument lists.

I would put JOE_END_OPTIONS at the start of the enum (value 0) so that if
the enum gets expanded in a future version of the library then it would
have a consistent value across code compiled with older versions.

Good point!
Why do you say that int should be passed to va_arg() and not enum
joe_opt? Wouldnt it be UB anyway to convert an int outside the range of
enum joe_opt to an enum joe_opt?

Because the argument is promoted according to the default argument
promotions. If you pass an argument of type enum joe_opt, it will be
promoted to int; the type passed to va_arg() has to match the promoted
type of the parameter.
Finally I dont think double length arg lists is really a problem as Id
imagine the init() functions being called only once when the object is
created, so the overhead of extra arguments wouldnt make much difference.

I don't think there's any significant performance problem. The
question is how to minimize the conceptual overhead.

The mechanism we're discussing is fairly elaborate. You should at
least provide some examples in your documentation.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,772
Messages
2,569,593
Members
45,111
Latest member
VetaMcRae
Top