xmalloc string functions

Kelsey Bjarnason · Jan 28, 2008

[snips]

The point is that the string functions are far more usable when they
cannot encounter fail conditions.

You do not guarantee the string functions will not encounter fail
conditions; what you guarantee is that when fail conditions occur, the
program will abort rather than handle them gracefully.

If every call to getline() needs to be
checked, not for file termination but for memory failure, the program is
a mess.

Depends how it's handled, now don't it? If any call to getline risks
having the application terminate with no way to even detect the failure,
then yes, the program is a mess.

Kelsey Bjarnason · Jan 28, 2008

Are you writing performance specifications for Microsoft?

No, I'm reverse-engineering them based on observer performance
characteristics.

Actually, we use MSSQL a fair bit around here, because a few apps we need
are sufficiently boneheaded to require it, and it alone. That said, the
only one I know of we have issues with isn't MSSQL's fault; it's on a
dodgy machine.

I prefer other things - MySQL for small stuff, sqlite for really small
stuff, postgres for bigger stuff and so on - but I can't honestly say
MSSQL is a steaming load of dingo kidneys based on direct personal
experience.

'Course, it _does_ require Windows, which is another matter - don't get
me started.

Malcolm McLean · Jan 28, 2008

On Jan 27, 5:42 pm, "Malcolm McLean" <[email protected]> wrote:

int main(void) {

FILE *fp = fopen("/dev/zero", "r"); /* assume UNIX & fopen doesn't
fail */
char *foo, *bar;
foo = dup("Hello, World"); /* assume success */
bar = getline(fp); /* obviously this will fail to allocate enough
memory, will quit and a memory leak will occur because the return
value of dup() was not freed */
free(bar); free(foo);
return 0;
}
-- snip.c --

I can think of a *lot* more reasons why I would not want my program to
terminate because an allocation failed.
These solutions are horrible, I strongly suggest that you avoid using
them in your own programs.

int main(int argc, char **argv)
{
FILE *fp;
char *line;
int i = 1;

if(argc != 2)
exit(EXIT_FAILURE);
fp = fopen(argv[1], "r");
if(!fp)
{
fprintf(stderr, "Can't open %s\n", argv[1]);
exit(EXIT_FAILURE);
}
while(line = getline(fp))
{
printf("%d: %s", i++, line);
free(line);
}
fclose(fp);
return 0;
}

There's a program to print a file, prepending the line number. See how
simple it is, because we don't have to do any error processing?

santosh · Jan 28, 2008

CBFalconer said:
Are you writing performance specifications for Microsoft?

He was being sarcastic in criticising Malcolm's design strategies.

Richard Heathfield · Jan 28, 2008

Malcolm McLean said:

There's a program to print a file,

....which exhibits undefined behaviour because you call a variadic function
without a valid function prototype in scope.

prepending the line number. See how
simple it is, because we don't have to do any error processing?

Correction: see how simple it is, because you didn't *bother* to do any
error processing.

The computer on which I type this is connected to a UPS. Since the power
hardly ever fails, why do I bother? The UPS has a mains lead ending in a
plug with a fuse in it. Since it hardly ever happens that too much power
goes through the lead, why bother? The plug fits into a surge protector.
(In fact, the UPS is surge-protected, too.) The router also plugs into
this surge protector. But why bother, since surges are so rare? And the
surge protector has its own fuse. Why bother with /that/? And if we /are/
bothering with that, why bother to fuse-protect anything that plugs into
it?

You seem to have fundamentally misunderstood the importance of defensive
programming.

Malcolm McLean · Jan 28, 2008

Richard Heathfield said:
Malcolm McLean said:

...which exhibits undefined behaviour because you call a variadic function
without a valid function prototype in scope.

Correction: see how simple it is, because you didn't *bother* to do any
error processing.

The computer on which I type this is connected to a UPS. Since the power
hardly ever fails, why do I bother? The UPS has a mains lead ending in a
plug with a fuse in it. Since it hardly ever happens that too much power
goes through the lead, why bother? The plug fits into a surge protector.
(In fact, the UPS is surge-protected, too.) The router also plugs into
this surge protector. But why bother, since surges are so rare? And the
surge protector has its own fuse. Why bother with /that/? And if we /are/
bothering with that, why bother to fuse-protect anything that plugs into
it?

You seem to have fundamentally misunderstood the importance of
defensive programming.

It depends on the costs.
Here we've taken virtually all the error-processing out and passed it to
xmalloc, which will exit with an error message if anything goes wrong. We
could write the program so that it is not line-based at all. Then it won't
crash out if someone passes it a malformed line. But it would be harder to
read and understand.
The main cost of a program is usually the development cost, and then the
costs of maintainence. That's what you've got to attack. Convoluted logic
for every bit of string processing, because the machine might run out of
memory, isn't helping anyone, unless the application really must never
terminate.

Richard Heathfield · Jan 28, 2008

Malcolm McLean said:

Here we've taken virtually all the error-processing out and passed it to
xmalloc, which will exit with an error message if anything goes wrong.

If the cabin temperature drops below minimum required level, cut the
engines. If a slat deploys wrongly on the starboard wing, cut the engines.
If there's an oil pressure drop, cut the engines. If the aircraft stalls,
cut the engines. If the CD player on the flight deck fails, cut the
engines. If the co-pilot dozes off, cut the engines.

I will not fly with McLean Airways.

santosh · Jan 28, 2008

Malcolm said:
It depends on the costs.

[ ... ]

The main cost of a program is usually the development cost, and then
the costs of maintainence.

So the cost to the poor users from using an ill-designed program is okay
as long as it's development was easier is it?

That's what you've got to attack.
Convoluted logic for every bit of string processing, because the
machine might run out of memory, isn't helping anyone, unless the
application really must never terminate.

Typically you don't need to duplicate the logic for error handling at
every potential place of occurrence. Typically you can group all these
potential sites of errors into a few well defined categories, and a
single handler is usually sufficient for each such category. This means
that a moderately complex application will have, say, two or three
different out-of-memory handlers, which, IMO, is not convoluted or
undoable.

Army1987 · Jan 28, 2008

Malcolm said:
Here are six functions implemented on top of xmalloc(). No C programmer
should have any triouble providing the implemetations, though replace and
getquote are non-trivial.

char *dup(const char *str);
char *cat(const char *str1, const char *str2);
char *catandkill(char *str1, const char *str2);
char *tok(const char *str, const char *delims, char **end);
char *midstr(const char *str, int idx, int len);
char *replace(const char *str, const char *pattern, const char *rep);
char *getquote(const char *str, char quote, char escape, char **end);
char *getline(FILE *fp);

All return strings allocated with xmalloc().

No xmalloc needed.

#include <string.h>
#include <stdlib.h>

#if SLACK_PROGRAMMER
#define return else exit(EXIT_FAILURE); return
#endif

char *dup(const char *orig)
/* this function has the same name of a POSIX function to duplicate *
* file descriptors. Anything more imaginative? */
{
size_t size = strlen(orig) + 1;
char *new = malloc(size);
if (new != NULL)
memcpy(new, orig, size);
return new;
}

char *dup(const char *str1, const char *str2)
{
size_t s1 = strlen(str1), s2 = strlen(str2) + 1;
char *new = malloc(size);
if (new != NULL) {
memcpy(new, str1, s1);
memcpy(new + s1, str2, s2);
}
return new;
}

char *catandkill(const char *s1, const char s2)
/*requires s1 to point at the beginning of a malloc()ated block*/
{
char *new = s1 ? cat(s1, s2) : dup(s2);
if (new != NULL)
free(s1);
return new;
}

And so on.

Eric Sosman · Jan 28, 2008

Malcolm said:
int main(int argc, char **argv)
{
FILE *fp;
char *line;
int i = 1;

if(argc != 2)
exit(EXIT_FAILURE);
fp = fopen(argv[1], "r");
if(!fp)
{
fprintf(stderr, "Can't open %s\n", argv[1]);
exit(EXIT_FAILURE);
}
while(line = getline(fp))
{
printf("%d: %s", i++, line);
free(line);
}
fclose(fp);
return 0;
}

There's a program to print a file, prepending the line number. See how
simple it is, because we don't have to do any error processing?

Why does the program test the value of argc? If it's
acceptable for the program to die abruptly on an allocation
failure, surely it's equally acceptable to die with SIGSEGV
if too few arguments are provided, or to blunder merrily
onwards if there are too many.

Why does the program test the value returned by fopen()?
If it's acceptable for the program to die abruptly on an
allocation failure, surely it's equally acceptable to die
with SIGSEGV when calling getline(NULL).

In other words, why don't you have the same "What? Me worry?"
attitude towards usage errors and I/O errors that you do toward
allocation failures? (And you still haven't described what the
"familiar friend" getline() should do if an I/O error occurs.)

The really simple-minded thing about this program, though,
is that it shouldn't be allocating any dynamic memory at all!
It could be a character-at-a-time loop, just testing for '\n'
as it buzzes merrily along -- but no: Illustrations of how simple
things are must always be more complicated than they need to be.

Malcolm McLean · Jan 28, 2008

Eric Sosman said:
Malcolm said:

int main(int argc, char **argv)
{
FILE *fp;
char *line;
int i = 1;

if(argc != 2)
exit(EXIT_FAILURE);
fp = fopen(argv[1], "r");
if(!fp)
{
fprintf(stderr, "Can't open %s\n", argv[1]);
exit(EXIT_FAILURE);
}
while(line = getline(fp))
{
printf("%d: %s", i++, line);
free(line);
}
fclose(fp);
return 0;
}

There's a program to print a file, prepending the line number. See how
simple it is, because we don't have to do any error processing?

Click to expand...

Why does the program test the value of argc? If it's
acceptable for the program to die abruptly on an allocation
failure, surely it's equally acceptable to die with SIGSEGV
if too few arguments are provided, or to blunder merrily
onwards if there are too many.

Why does the program test the value returned by fopen()?
If it's acceptable for the program to die abruptly on an
allocation failure, surely it's equally acceptable to die
with SIGSEGV when calling getline(NULL).

In other words, why don't you have the same "What? Me worry?"
attitude towards usage errors and I/O errors that you do toward
allocation failures? (And you still haven't described what the
"familiar friend" getline() should do if an I/O error occurs.)

The point is that the string functions are far more usable when they cannot
encounter fail conditions. If every call to getline() needs to be checked,
not for file termination but for memory failure, the program is a mess.
Similarly with the other functions.

The really simple-minded thing about this program, though,
is that it shouldn't be allocating any dynamic memory at all!
It could be a character-at-a-time loop, just testing for '\n'
as it buzzes merrily along -- but no: Illustrations of how simple
things are must always be more complicated than they need to be.

In my opinion that's the hacker's answer.

Kelsey Bjarnason · Jan 28, 2008

Kelsey said:
Kelsey said:

Malcolm McLean wrote:
Here are six functions implemented on top of xmalloc(). No C
programmer should have any triouble providing the implemetations,
though replace and getquote are non-trivial.
[snip]

I've think we've got something quite powerful here, purely because
none of these functions can ever return null for out of memory
conditions. It massively simplifies string handling.
Take a look at glib,
http://library.gnome.org/devel/glib/2.14/glib-Memory-Allocation.html

Click to expand...

Oh, good God. They didn't. Tell me they didn't.

One wonders how many applications they've screwed over with that bit of
asinine idiocy.

Click to expand...

One wonders why one wonders about that only after he learns about
g_malloc. Perhaps because those applications aren't actually screwed?

Really? What's their recovery mechanism on allocation failure, then?

Oh, right... allocations never fail. Just ask Malcolm.

Kelsey Bjarnason · Jan 28, 2008

[snips]

When you refer to someone that holds a different opinion than you do as
a "smart arse who knows nothing", how is that any better than calling
someone an "idiot"? I'm curious how you arrived at this distinction, as
well as how you determined what they don't know from afar.

Indeed. Also how he seems to be incapable of distinguishing between
calling a particular design decision "idiocy" and calling the person(s)
involved "idiot". Like we don't _all_ have brain farts and do stupid
things on occasion.

That people have written and deployed applications with glib doesn't
mean that its design is good, or bad on its own. All it means is
somebody typed 'make' and hit the enter key and out popped a binary
which people use.

Indeed, and the fact it uses this strategy on allocation failure is a
pretty strong argument that it is _not_ a good design. It may be
absolutely fantastic in 17,000 other ways, but this one is about as bad
as it gets.

Kelsey Bjarnason · Jan 28, 2008

[snips]

No idea, ask the developers of that buggy application. Failed fopen() is
not an exceptional condition. Failed malloc() is. Or we are using
different vocabularies.

Apparently so. To me, they are both error conditions, to be handled
appropriately - by the caller. Neither is exceptional.

Then you are just really good. Because it's enormously more typing.

if ( ( file = fopen(...) ) == NULL ) {}
if ( ( ptr = malloc(...) ) == NULL ) {}

Yeah, enormously more, indeed.

And
more than that, it's more design questions too: "what do I do in this
situation, which I can't even possibly test?"

Can't test? Why can't you test an allocation failure? I do it all the
time. It's pretty trivial, actually, if you're using a language which
includes constructs such as if ( condition ) action. You know, like,
say, C.

All this apart from real
problems you have to solve. Yes, *real*. No, g_malloc() aborting an
application is not a real problem. Not for a regular desktop
application.

Except that at least one person *here*, in a comparatively small
community, has reported application crashes *precisely* due to this.

I wish I knew where this notion of "Hey, it's just an application, feel
free to kill it because it's 3:00, or the sky is blue, or whatever other
random event has occurred" has come from. I've been cranking apps for
most of 30 years now, and I have *never* found it acceptable for an
application to simply terminate, unless there is absolutely no other
possible option.

Except you don't open files twenty times in a row in every function in
your application. Memory is quite a different kind of resource.
Different in how you use it, you know.

Different how? Files or memory, each needs to be requested before use,
each can fail on request, each needs to have the request failure dealt
with. If the request is successful, the resource is used then disposed
of by appropriate means.

In terms *relevant to the topic*, there is no difference at all.
Request, cope with possible request failure, use, dispose.

So you click Save button then click Close. The application failed to
process Save click because it failed to allocate memory for the event
structure to put into the event queue, but then it successfully handled
Close because at the same time yet another document was closed and some
memory returned to the malloc pool.

That strikes me as a design flaw in the application. If the user
requested "save and close" and the save failed, what the hell are you
doing processing the close, instead of dealing with the error?

This would be particularly bad since the failure to save was *not*
because a file couldn't be written to, but because a menu event couldn't
be put in a message queue. If you must process the close, at least have
the decency to save the data, possibly in a scratch file which can be
recovered next time around.

Yes, certainly, at some point the options run out. If you can't allocate
space for a message on the queue, you probably also can't allocate
resources for a warning dialog. If you can't create a scratch file *and*
you can't allocate resources for the warning, there may be little you can
do but abort.

That, however, does not excuse the whole notion of "Hey, first thing we
tried failed, so let's just abort."

All allocations are checked. It's what you do when they fail is
different. If malloc(12) failed, then you are screwed because all your
code wants memory.

No, you're not screwed. You have a possible failure condition to deal
with, one which might be an expected condition, one which might not be,
and in either case, there are many possible resolutions to the problem.

No memory => application isn't working.

Or application isn't working optimally. Or _this part_ of the
application isn't working _now_, so try again in five minutes. Or...

Eric Sosman · Jan 28, 2008

Malcolm said:
[...]
The point is that the string functions are far more usable when they
cannot encounter fail conditions.

The point is that "cannot encounter" is not the same thing
as "cannot survive."

In my opinion that's the hacker's answer.

I guess you define "hacker" as "someone whose programs work."
What's your word for "someone whose programs fail when they exhaust
resources they didn't actually need?"

Malcolm McLean · Jan 28, 2008

Eric Sosman said:
Malcolm said:

[...]
The point is that the string functions are far more usable when they
cannot encounter fail conditions.

Click to expand...

The point is that "cannot encounter" is not the same thing
as "cannot survive."

In my opinion that's the hacker's answer.

Click to expand...

I guess you define "hacker" as "someone whose programs work."
What's your word for "someone whose programs fail when they exhaust
resources they didn't actually need?"

hacks work. That's part of the word's definition.
I think what we are looking for is what Dann called "sloppy programming".
I'd call it "loose programming" That is, program that take resources they
don't strictly need, in order to reduce programming time, and quit when they
strictly could continue, for the same reason. So the opposite of "tight
code".

Ben Bacarisse · Jan 28, 2008

while(line = getline(fp))
{
printf("%d: %s", i++, line);
free(line);
}

I think you taken the wrong path. You've made the above simple by
putting the error recovery too deep (in the library). Once execution
has gone deep, the information about what do about the error is
usually lost -- so we are left with your example of a use-called
error function can't do the one thing the caller might do which is to
request less memory.

Simple error handling is so common that you can factor it out as a
function if you want your users to have a simple interface:

void *never_null(void *p)
{
if (!p) exit(EXIT_FAILURE);
return p;
}

If your library can return a NULL and users don't want one, tell then
to call:

while(line = never_null(getline(fp)))

instead of getline. This way they can have simplicity *or* control.
If you are worried about the loss of type-checking, you can compile
with

#define never_null(p) (p)

to be sure.

[Aside: if your algorithms may request 0 bytes, you must ensure that
malloc(0) never returns NULL or this method will signal an error where
none exists.]

Kelsey Bjarnason · Jan 28, 2008

Randy said:
Randy said:

Malcolm McLean wrote:
Here are six functions implemented on top of xmalloc(). No C
programmer should have any triouble providing the implemetations,
though replace and getquote are non-trivial.
[snip]
I've think we've got something quite powerful here, purely because
none of these functions can ever return null for out of memory
conditions. It massively simplifies string handling.
Take a look at glib,
http://library.gnome.org/devel/glib/2.14/glib-Memory-Allocation.html
glib is where bad ideas go to die. Now, if somebody just had the nerve
to tell them....

Click to expand...

You gdon't glike ghaving gall gyour gvariables gprexfed gwith g?

Click to expand...

Why, you don't like the following code?

#include <glib.h>

gint main (gint argc, gchar **argv)
{
gchar *s = g_strdup ("Hello there!"); g_print ("%s\n", s);
g_free (s);
}

G_no, g_I g_don't.

Eric Sosman · Jan 28, 2008

Malcolm said:
Eric Sosman said:

Malcolm said:

[...]
The point is that the string functions are far more usable when they
cannot encounter fail conditions.

Click to expand...

The point is that "cannot encounter" is not the same thing
as "cannot survive."

The really simple-minded thing about this program, though,
is that it shouldn't be allocating any dynamic memory at all!
It could be a character-at-a-time loop, just testing for '\n'
as it buzzes merrily along -- but no: Illustrations of how simple
things are must always be more complicated than they need to be.

In my opinion that's the hacker's answer.

Click to expand...

I guess you define "hacker" as "someone whose programs work."
What's your word for "someone whose programs fail when they exhaust
resources they didn't actually need?"

Click to expand...

hacks work. That's part of the word's definition.
I think what we are looking for is what Dann called "sloppy
programming". I'd call it "loose programming" That is, program that take
resources they don't strictly need, in order to reduce programming time,
and quit when they strictly could continue, for the same reason. So the
opposite of "tight code".

In my opinion that's the spin doctor's answer.

Malcolm McLean · Jan 28, 2008

Ben Bacarisse said:
while(line = getline(fp))
{
printf("%d: %s", i++, line);
free(line);
}

Click to expand...

I think you taken the wrong path. You've made the above simple by
putting the error recovery too deep (in the library). Once execution
has gone deep, the information about what do about the error is
usually lost -- so we are left with your example of a use-called
error function can't do the one thing the caller might do which is to
request less memory.

Simple error handling is so common that you can factor it out as a
function if you want your users to have a simple interface:

void *never_null(void *p)
{
if (!p) exit(EXIT_FAILURE);
return p;
}

If your library can return a NULL and users don't want one, tell then
to call:

while(line = never_null(getline(fp)))

instead of getline. This way they can have simplicity *or* control.
If you are worried about the loss of type-checking, you can compile
with

#define never_null(p) (p)

to be sure.

[Aside: if your algorithms may request 0 bytes, you must ensure that
malloc(0) never returns NULL or this method will signal an error where
none exists.]

The snag there is that getline() does return null. On end of input.

Like Perl's

while( $line = <INPUT> )
{
/* processing here */
}

This idiom is used in thousands of Perl scripts every day.

xmalloc.c - my xmalloc	11	Feb 15, 2008
Observing a container	2	Dec 10, 2013
Does ANSI C allow free() to always be empty?	38	Mar 9, 2014
In the Matter of Herb Schildt: a Detailed Analysis of "C: TheComplete Nonsense"	109	Apr 3, 2010
comp.lang.c Answers to Frequently Asked Questions (FAQ List)	15	Apr 1, 2006
Call for Papers: International Conference on Communications Systemsand Technologies ICCST 2011	0	Jun 14, 2011
Call for Papers: International Conference on Communications Systemsand Technologies ICCST 2010	0	Jun 19, 2010
David Mark's Essential Javascript Tips - Volume #8 - Tip #47E -Attaching and Detaching Event Listene	1	Dec 15, 2011

xmalloc string functions

Kelsey Bjarnason

Kelsey Bjarnason

Malcolm McLean

santosh

Richard Heathfield

Malcolm McLean

Richard Heathfield

santosh

Army1987

Eric Sosman

Malcolm McLean

Kelsey Bjarnason

Kelsey Bjarnason

Kelsey Bjarnason

Eric Sosman

Malcolm McLean

Ben Bacarisse

Kelsey Bjarnason

Eric Sosman

Malcolm McLean

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads