API Design

T

Thad Smith

jacob said:
http://cacm.acm.org/magazines/2009/5/24646-api-design-matters/fulltext


Communications of the ACM rarely bring something practical.
This is a good exception.

It is a good article about API design, and the problems of bad APIs.

Agreed. I plan to distribute the link to others at work. It makes
explicit the policy-free vs. policy-rich trade-off, which is a useful
insight for me.

Good design of low-level functions has a multiplying effect on
application code.

Followups set to comp.programming.
 
J

James Harris

http://cacm.acm.org/magazines/2009/5/24646-api-design-matters/fulltext

Communications  of the ACM rarely bring something practical.
This is a good exception.

It is a good article about API design, and the problems of bad APIs.

It is a good article though I'd take exception to his criticisms of
the C select function. If it were to do as he suggests and return a
number of fds without overwriting the input fd set it would have to
return newly allocated memory which it would be left to the caller to
free. This could be done but it would be unusual for C functions
wouldn't it? IIRC some of the string functions return new objects but
most other C functions seem to go to lengths to avoid doing so.

- in anticipation that someone will correct me if my impression is
wrong....

James
 
I

Ian Collins

James said:
It is a good article though I'd take exception to his criticisms of
the C select function. If it were to do as he suggests and return a
number of fds without overwriting the input fd set it would have to
return newly allocated memory which it would be left to the caller to
free. This could be done but it would be unusual for C functions
wouldn't it? IIRC some of the string functions return new objects but
most other C functions seem to go to lengths to avoid doing so.

The Unix select function is pretty clunky, but at least it does provide
a clearly documented means for passing an indefinite timeout.

Note that Unix also provides the poll function, which provides a cleaner
interface that preserves the original socket lists. If I were inventing
a new framework, I'd use poll rather than select.

See http://opengroup.org/onlinepubs/007908799/xsh/poll.html
 
P

Pascal J. Bourguignon

James Harris said:
It is a good article though I'd take exception to his criticisms of
the C select function. If it were to do as he suggests and return a
number of fds without overwriting the input fd set it would have to
return newly allocated memory which it would be left to the caller to
free. This could be done but it would be unusual for C functions
wouldn't it? IIRC some of the string functions return new objects but
most other C functions seem to go to lengths to avoid doing so.

- in anticipation that someone will correct me if my impression is
wrong....

You could pass it two vectors, one with the fds to select, and one
with space for the results (times three of course).

Or you could use a garbage collector, but then select couldn't be a
syscall anymore.
 
J

Jens Thoms Toerring

It is a good article though I'd take exception to his criticisms of
the C select function.

select() isn't a genuine C function (in the sense that it's part
of the C standard) but comes AFAIK from UNIX/POSIX;-)
If it were to do as he suggests and return a
number of fds without overwriting the input fd set it would have to
return newly allocated memory which it would be left to the caller to
free.

As far as I can see the function could also take a pointer to three
further fd_set's that belong to the caller and just modify those,
couldn't it?

On the other hand, there may have been also other considerations
when the select() function was invent a long time ago like speed,
memory requirements etc. on machines at the time, so I would be a
bit careful about blaming the original authors about coming up
with a bad API when I don't know all the factors they had to take
into account.
This could be done but it would be unusual for C functions
wouldn't it? IIRC some of the string functions return new objects but
most other C functions seem to go to lengths to avoid doing so.

I can't remember a single standard C string function that does
that, they all seem to operate on user supplied memory. Perhaps
you're thinking about some non-standard-C (but POSIX) functions
like strdup()?
- in anticipation that someone will correct me if my impression is
wrong....

- me too;-)
Regards, Jens
 
C

Chris McDonald

I can't remember a single standard C string function that does
that, they all seem to operate on user supplied memory. Perhaps
you're thinking about some non-standard-C (but POSIX) functions
like strdup()?

Not really wishing to steal the thread, but I've often wondered why
strdup() has not appeared in the C standard. Could a reason be because
it *does* return allocated memory?
 
S

Stefan Ram

Chris McDonald said:
Not really wishing to steal the thread, but I've often wondered why
strdup() has not appeared in the C standard. Could a reason be because
it *does* return allocated memory?

ISO/IEC 9899:1999 (E) has functions that return allocated
memory, viz., calloc, malloc, and realloc. But it does
not /mix/ them with functions that do something else.
It tries to supply low level building blocks and leave
the mixing to higher levels.

A function that works on a client buffer can work with
all kinds of storage (allocated, automatic, and static).
So it's more »policy-free«.

strdup merely combines malloc and strcpy, so when needed
it can be easily written. It also adds a little bit of
a policy.

What I wrote for myself and what also comes in handy
sometimes, is an sprintf-like function that allocates
a sufficient buffer and then prints to that buffer.
This is a generalization of strdup.
 
H

Hallvard B Furuseth

Jens said:
I can't remember a single standard C string function that does
that, they all seem to operate on user supplied memory.

Right. Or on static/program-allocated memory - asctime(), getenv(),
strerror().

There is fopen() which might allocate memory, but it has its own
function (fclose) which will free it.
 
B

BGB / cr88192

jacob navia said:
http://cacm.acm.org/magazines/2009/5/24646-api-design-matters/fulltext


Communications of the ACM rarely bring something practical.
This is a good exception.

It is a good article about API design, and the problems of bad APIs.

yeah, it is difficult to get API design down well...


usually I just make use of lots of "rules of thumb" I have learned from
prior experiences, and will often find an API which works well, and attempt
to use it as a template.


granted, some things are better, and other things are not as good.

for example, at a time I believed strongly in strict opaqueness, and also
really liked the design of OpenGL.


I then used it as the basis of the design of a physics library, and I
suspect that over the past several years, this opaqueness-centric API design
has probably done more harm than good.

for example, I had found that, in the choice of handles, it really does not
make much practical difference if the handles are integers or opaque
pointers, and infact the opaque-pointer practice may be better in the
general case, mostly since for any integer handle, one needs to know its
contexts, wheras pointers can be, inherently, self-identifying.

so, for an API based on integers, one may have to first "bind" a context,
and then "bind" an object, and then be able to perform manipulations. this
then makes a further issue, as then one needs to use a thread-local
variable, ... to keep track of these bindings, ...

OTOH, with a pointer, one would only need the pointer and the API call.

at the time, I had thought, "well, a pointer would be a problem if, say, a
network or disjoint address spaces is involved". though this would seem to
make intuitive sense (since an integer can be passed as-is, a pointer is
inherrently address-space specific, ...), I have found that in practice,
this is not such an issue:
marshalling opaque pointers is really not all that big of an issue, so long
as they don't necessarily need to be addressable. in fact, the "pointer" can
simply address an otherwise inaccessible region of memory, and used
essentially as an integer handle internally, ...

but, the gain of the pointer is that the "value" is inherently unique (even
if not the same in the multi-node case...), which in many cases buys far
more usability than it costs (the API can "just know" what context owns a
given object, and so the context need not be supplied).


this is not to say that integer handles are bad and pointers are good, only
that such a subtle issue can be hard to know up-front, and some bad choices
can lead to an API where using it is a lot more painful than it actually
need be.

likewise, shared structs are not quite as evil as I had once believed,
although granted, a shared struct is still something difficult to get right
(and, in general, I am still more a fan of getters/setters than I am of
having direct access to fields, ...).

however, if the struct itself is standardized, and more serves simply as a
way of moving data, it may well still be a better option than API calls with
piles of arguments, or long convoluted chains of getter and setter calls
(and having to worry about some particular getters/setters themselves either
involving expensive computations or otherwise changing state in ways which
impact subsequent calls, ...).


or, potentially, the use of adjustable unit conversions (which just happen
to be part of the global state), ...

so, it can be hard to get decisions right in retrospect even for ones' own
use cases sometimes, much less for the general case...


but I have made an observation: often the more effort I put into "designing"
something, the more likely it is to have some notable design flaw later on.
hence, in general, I have found it often works better to try to find a
design that already works well enough, and attempt to "adapt" the design to
work in the new context, than it is to design something clean (and have it
far more likely to turn out badly in the end...). (likewise, simpler is
often better, so better to design for the simple cases first, ...).


although, rigid adherence to a particular set of methodologies for an
otherwise unrelated piece of technology seems to not turn out so well
either, for example, OpenGL is OpenGL, but OpenGL is not necessarily the
perfect design template for things like Physics, GUI code, OO facilities,
....

even though the names and conventions may all look almost the same (one
creating objects and completing tasks via Begin and End pairs, ...), the
more subtle issue, that the task is inherently very different and not
necessarily a good fit for this API design style, may well impede subsequent
usability... (and, it may well make sense to have the "objects", of all
things, be the focus of attention...).

it is almost no better than those API designs where the designer sees a
problem and declares to themselves "it is all a simple matter of organizing
the class heirarchy" (as much as I despise this approach as well). "it is
all abstract properties, getters, and setters" may well not be much
better...

there are no silver bullets it seems...


or such...
 
F

Flash Gordon

Malcolm said:
That's right. It would make the string library dependent upon the dynamic
memory allocation library. Just for one trivial little function, it wasn't
considered worth it.

I very much doubt that is the issue, since both the sting functions and
memory allocation functions are in the *same* library, the standard C
library. As I understand it this was more a matter of philosophy, that
is was decided that the str* and mem* functions do not allocate memory,
and the only functions to allocate memory which needed a call to free
would be the *alloc functions.

Maybe a proposal for stralloc would have been seen more favourably ;-)
 
B

BGB / cr88192

Malcolm McLean said:
That's right. It would make the string library dependent upon the dynamic
memory allocation library. Just for one trivial little function, it wasn't
considered worth it.

and, oddly, I have a lot of special purpose 'strdup' functions which tend
instead to intern the strings...

I think it is mostly that 'strdup', as is such, is too convinient for what
it does.
it makes it really easy to just grab strings and forget, creating memory
leaks with every call.


I tend instead to intern the strings (for various API's, which happen to
include strdup-like functions), so that I can just as well use it as a
trivial "make sure string will not disappear" function.

all is not ideal though, since many of these APIs don't otherwise check or
GC these strings, and hence the memory is not reclaimed (and, hence, these
functions end up "not to be used" for one-off strings or buffers, ...).

granted, absent buffers, typically the memory creep is likely to be small
enough to be ignorable (a string table somewhere which gradually gets larger
with old dead strings never to be seen again).

usage cases for things like this is usually in my dynamic linker or compiler
code, usually for dealing with symbol names (variable names, function names,
....), ...

other, more frontend API calls, similarly "merge" strings, but instead use
the GC and a weak hash to dispose of them.


I guess this is mostly part of my own "unique" mindset, as I tend to see
most strings as atomic units on which operations are performed. thinking of
strings as character buffers is a little odd to me, as I tend to treat them
somewhat differently (a character buffer is for doing work, but a 'string'
is an immutable atomic unit).

or such...
 
S

Stefan Ram

BGB / cr88192 said:
it makes it really easy to just grab strings and forget,
creating memory leaks with every call.

For a program with a small run time and memory usage, this
might be appropriate. For a program with a large or
indeterminate run time, it might show that either the wrong
programmer was chosen or the wrong programming language.
 
N

Nobody

To me, any memory leak is a big red flag. Not so much for technical reasons,
but because it suggests that the programmer has lost control of his logic.

For typical Unix "commands", it's often not worth tracking memory
allocations. Everything will be freed when the process terminates.

Managing memory is easy when the pointer returned from a function is
*always* allocated by that particular call. But if you consider the case
where the object may already exist as a "shared" value (e.g. interned
strings), you can either:

1. Duplicate the object and require the caller to free() it. This is
wasteful if callers often retain the value for the duration of the process.

2. Return an indication of whether the caller should free it. This
information then has to be passed around with the pointer, complicating
the code.

3. Implement some form of garbage collection.

4. Don't worry about it. Sometimes this will result in more memory being
used than is strictly necessary.

Sometimes, #4 is the rational choice. Particularly, if the allocations are
likely to constitute a fixed (and relatively small) overhead per process,
or can otherwise never amount to more than a small proportion of the
process' total memory consumption.

The situation is different for a long-lived process (i.e. interactive
application, daemon, etc), where even a small leak can eventually grow to
dominate memory consumption (although you also have to worry about heap
fragmentation in that situation).
 
S

Stefan Ram

Nobody said:
The situation is different for a long-lived process (i.e. interactive
application, daemon, etc), where even a small leak can eventually grow to
dominate memory consumption (although you also have to worry about heap
fragmentation in that situation).

Or for code in a library, when it is unknown, which kind of
process will use it. Since the subject of this thread is
»API Design«, this might be relevant here.

Some of you will already know the following quotations,
because I have already posted them into Usenet before.

»There were two versions of it, one in Lisp and one in
C++. The display subsystem of the Lisp version was faster.
There were various reasons, but an important one was GC:
the C++ code copied a lot of buffers because they got
passed around in fairly complex ways, so it could be quite
difficult to know when one could be deallocated. To avoid
that problem, the C++ programmers just copied. The Lisp
was GCed, so the Lisp programmers never had to worry about
it; they just passed the buffers around, which reduced
both memory use and CPU cycles spent copying.«

<[email protected]>

»A lot of us thought in the 1990s that the big battle would
be between procedural and object oriented programming, and
we thought that object oriented programming would provide
a big boost in programmer productivity. I thought that,
too. Some people still think that. It turns out we were
wrong. Object oriented programming is handy dandy, but
it's not really the productivity booster that was
promised. The real significant productivity advance we've
had in programming has been from languages which manage
memory for you automatically.«

http://www.joelonsoftware.com/articles/APIWar.html

»[A]llocation in modern JVMs is far faster than the best
performing malloc implementations. The common code path
for new Object() in HotSpot 1.4.2 and later is
approximately 10 machine instructions (data provided by
Sun; see Resources), whereas the best performing malloc
implementations in C require on average between 60 and 100
instructions per call (Detlefs, et. al.; see Resources).
And allocation performance is not a trivial component of
overall performance -- benchmarks show that many
real-world C and C++ programs, such as Perl and
Ghostscript, spend 20 to 30 percent of their total
execution time in malloc and free -- far more than the
allocation and garbage collection overhead of a healthy
Java application (Zorn; see Resources).«

http://www-128.ibm.com/developerworks/java/library/j-jtp09275.html?ca=dgr-jw22JavaUrbanLegends

(OK, then garbage collection will take time in addition to
the allocation calls, but when a program only runs for a
short time, garbage collection might never be needed.)
 
J

jacob navia

Nobody a écrit :
3. Implement some form of garbage collection.

The lcc-win compiler system provides a grabage collector in its standard distribution.

I have been arguing this solution for years but not many people here listen. It is the best
solution: keep the money for the cake AND eat it!
 
J

jacob navia

Stefan Ram a écrit :
Or for code in a library, when it is unknown, which kind of
process will use it. Since the subject of this thread is
»API Design«, this might be relevant here.

Some of you will already know the following quotations,
because I have already posted them into Usenet before.

»There were two versions of it, one in Lisp and one in
C++. The display subsystem of the Lisp version was faster.
There were various reasons, but an important one was GC:
the C++ code copied a lot of buffers because they got
passed around in fairly complex ways, so it could be quite
difficult to know when one could be deallocated. To avoid
that problem, the C++ programmers just copied. The Lisp
was GCed, so the Lisp programmers never had to worry about
it; they just passed the buffers around, which reduced
both memory use and CPU cycles spent copying.«

The lcc-win compiler system provides a garbage collector in its standard
distribution.
<[email protected]>

»A lot of us thought in the 1990s that the big battle would
be between procedural and object oriented programming, and
we thought that object oriented programming would provide
a big boost in programmer productivity. I thought that,
too. Some people still think that. It turns out we were
wrong. Object oriented programming is handy dandy, but
it's not really the productivity booster that was
promised. The real significant productivity advance we've
had in programming has been from languages which manage
memory for you automatically.«

Exactly.

http://www.joelonsoftware.com/articles/APIWar.html

»[A]llocation in modern JVMs is far faster than the best
performing malloc implementations. The common code path
for new Object() in HotSpot 1.4.2 and later is
approximately 10 machine instructions (data provided by
Sun; see Resources), whereas the best performing malloc
implementations in C require on average between 60 and 100
instructions per call (Detlefs, et. al.; see Resources).
And allocation performance is not a trivial component of
overall performance -- benchmarks show that many
real-world C and C++ programs, such as Perl and
Ghostscript, spend 20 to 30 percent of their total
execution time in malloc and free -- far more than the
allocation and garbage collection overhead of a healthy
Java application (Zorn; see Resources).«

http://www-128.ibm.com/developerworks/java/library/j-jtp09275.html?ca=dgr-jw22JavaUrbanLegends

(OK, then garbage collection will take time in addition to
the allocation calls, but when a program only runs for a
short time, garbage collection might never be needed.)

Exactly
 
I

Ian Collins

jacob said:
Nobody a écrit :

The lcc-win compiler system provides a grabage collector in its standard
distribution.

I have been arguing this solution for years but not many people here
listen. It is the best
solution: keep the money for the cake AND eat it!

But it isn't practical on the majority of platforms C is used on these days.
 
J

jacob navia

Ian Collins a écrit :
But it isn't practical on the majority of platforms C is used on these
days.

Look, C++ is the best language in the world and C is a old shit.

We know that. You have told us countless times. C is for embedded systems
too small to support C++ and will disappear soon. OK We KNOW that by now, it is
not necessary to repeat it at each message.
 
B

BGB / cr88192

Stefan Ram said:
For a program with a small run time and memory usage, this
might be appropriate. For a program with a large or
indeterminate run time, it might show that either the wrong
programmer was chosen or the wrong programming language.

well, the big issue is mostly one of convinience:
it is a lot more convinient to simply forget about strings than to worry
about freeing them;
similarly, since strings tend to be fairly small, very often the code can
leak pretty badly and still keep running just fine (since the app will
almost invariably be exited and restarted before the leak becomes too much
of a problem).

but, this is the issue:
the convinience may prompt bad style...


hence, interning strings is at least a little better, because it allows a
similar convinience while generally bounding memory use (only as much memory
will be use as there are unique strings in the working set, which in
practice is typically much smaller than the available memory).

consider, for example, if a string were interned for every word in a mass of
english text documents. once a finite limit is reached (say, maybe 2500 or
3000 unique words), then this inflation will drop to to almost nothing
(usually periodic random strings, ...).

whereas naive use of strdup will be unbounded (dependent on the total amount
of text processed, rather than the upper bound on the number of unique words
present).

this subtle difference makes a notable difference for "medium length"
running times, especially for higher-activity apps (where a plain memory
leak could kill the app in a matter of minutes, but a more gradual leak may
allow it to last for hours or days or more before crashing...).

granted, it is all far from perfect, but I guess a lot depends on how much
one needs to expect from the code...

for example, what is fine for a command-line tool may not be good enough for
an interactive app, and what is good enough for an interactive app may still
not be good enough if reliability matters. but, then, OTOH, not all apps
need reliability, and for many things it is good enough if the thing only
runs at most a few hours or days at a time...

or, for a command line tool, it may only matter so long as it can process
whatever data it is given.


or (satire), one can make use of newer 64-bit systems as a means for being
even more lazy about dealing with memory leaks...
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,781
Messages
2,569,619
Members
45,314
Latest member
HugoKeogh

Latest Threads

Top