General method for dynamically allocating memory for a string

M

Mark McIntyre

A product for zero dollars and zero cents?

You're confusing "selling" and "recieving money for". People sell
each other things all day every day, without any money changing hands.
For instance, you can sell your knowledge in return for good will.

--
Mark McIntyre

"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it."
--Brian Kernighan
 
T

Tak-Shing Chan

Tak-Shing Chan said:



What would? You seem to be talking about hypothetical code hiding
hypothetical errors. The actual posted code seems to have passed you by
completely.

Yes, I was talking about hypothetical errors. But this is
the same type of argument you used against malloc casts---the
/hypothetical/ error of missing <stdlib.h> when an incomplete
program was posted.

Tak-Shing
 
R

Richard Heathfield

Tak-Shing Chan said:
Yes, I was talking about hypothetical errors.

I wasn't. If you want to start a new thread about hypothetical errors, feel
free.
 
K

Keith Thompson

Richard Heathfield said:
Richard Bos said:

True enough - you shouldn't *rely* on it, in the sense of arbitrarily
littering your code with calls to free(). Nevertheless, setting pointers to
NULL after you're done with them is a /good/ habit, not a bad one. It's
called "defence in depth".

I'm not going to say that I disagree with you, but I'm going to offer
a counterargument anyway.

Setting a free()d pointer to NULL can prevent certain kinds of errors
-- or rather, it can prevent the *symptoms* of certain kinds of
errors.

For example:

some_type *ptr = malloc(sizeof *ptr);
...
free(ptr);
...
/*
* Now I don't remember whether I called free(ptr) or not.
* I'll free it here, just in case.
*/
free(ptr);

As written, the second call to free() invokes undefined behavior. In
fact, evaluating ptr in preparation for the call invokes UB; the call
to free() doesn't really have anything to do with it.

If the first free() call were replaced with:

free(ptr);
ptr = NULL;

then the second free() call wouldn't invoke UB. But it would *still*
(probably) be a symptom of a logical error, namely the failure to keep
track of whether you've already called free(). Setting ptr to NULL
after the first free() would mask the symptom of the error; leaving it
alone could *potentially* cause the second free() to blow up, making
the error easier to detect. (Or it could do nothing; such is the
nature of undefined behavior.)

On the other hand, you could legitimately have reached the point of
the second free() by any of several paths, some of which have free()d
the pointer and some of which haven't. In that case, *if* all the
paths that free() it then set it to NULL, then free()ing it again is
harmless and sensible.

I'd probably be more comfortable with a program design that invokes
free() exactly once, if and only if it's needed, but if having your
program do a little unnecessary work at run time makes it easier to
develop, that's not always a bad thing.
That's certainly true, but it is also a good idea to recognise that you
might be fallible, and to take precautions against that fallibility. A null
pointer is far more useful than a pointer with an indeterminate value.

In an ideal world, the question would never arise; you wouldn't make
that mistake in the first place, or at least you'd already have
corrected it.

In software intended to be maximally robust (which, I hasten to add,
isn't *always* worth the effort), it probably makes sense to implement
this kind of defense in depth *and* to detect and log cases where
errors were caught by the second or later line of defense.
 
G

goose

jacob said:
The revised example will not fool the GC since the block can be reached
from q, since it points between the beginning and the end of the block.

Will not fool which GC? The Boehm GC won't get "fooled"
by that only because it's fairly conservative and tends
to err on the side of leakage rather than premature
collection (AIUI, it will compare bits inside objects,
but I'm willing to be convinced otherwise).

Some will definitely get fooled by that (the GC will
track references to objects and won't bother looking
at every possible byte in the object with every
possible alignment - why should it?).
 
G

goose

Ian Collins wrote:
That's one advantage of the GC library, it imposes minimal overhead and
only logs real or potential leaks.

Or it may not log any at all. Some GC are rather conservative
you know.
 
J

jacob navia

Keith said:
In software intended to be maximally robust (which, I hasten to add,
isn't *always* worth the effort), it probably makes sense to implement
this kind of defense in depth *and* to detect and log cases where
errors were caught by the second or later line of defense.

This is a very good insight.

Why taking risks?

A function call with a NULL pointer is extremely fast
in this days of GHZ CPUs.

Setting a pointer to null is a single memory write,
absolutely nothing.
 
R

Randall

Hello Richard:

We will have to agree to disagree. I agree that my code below if/else
is retarded. I agree that it should have been different. This is what
I SHOULD have coded:

if( str2 != NULL )
printf( "str2: %s\n", str2);

free( str2 );
str2 = 0;
....

It sounds like you have the good fortune to work either by yourself or
in a very small team. I currently work on a 500+ developer contract
here in the United States will millions of lines of code. No "cheap
hack" can keep you safe in such an environment. When dealing with
global data in such an environment the above technique can mean the
difference between mission critical system downtime or a harmless
no-op.

This is not meant to foster an absurd programming practice similar to
this:
if( str2 != NULL )
printf( "str2: %s\n", str2);

free( str2 );
str2 = 0;
....
/* not sure if I really freed the memory */
free( str2 ); // Yea! I'm okay!

It is purely a measure of defensive programming. I have learned these
idioms through working in larger teams with varying levels of expertise
with a lot of little updates to the code base. This has worked well
for my team. I would gladly manage the code if I were the sole
maintainer. In my environment I am not. This idiom works for me. Of
course, your mileage may vary.

-Randall
 
K

Keith Thompson

Randall said:
Hello Richard:

We will have to agree to disagree. I agree that my code below if/else
is retarded. I agree that it should have been different. This is what
I SHOULD have coded:

if( str2 != NULL )
printf( "str2: %s\n", str2);

free( str2 );
str2 = 0;
...

Please don't top-post. See <http://www.caliburn.nl/topposting.html>
(and most of the articles in this newsgroup) for details.

Out of curiosity, is there some reason you use NULL in the comparison
and 0 in the assignment? I would have used NULL for both.
 
W

websnarf

Randall said:
We will have to agree to disagree. I agree that my code below if/else
is retarded. I agree that it should have been different. This is what
I SHOULD have coded:

if( str2 != NULL )
printf( "str2: %s\n", str2);

free( str2 );
str2 = 0;
...

[...] It is purely a measure of defensive programming. I have learned these
idioms through working in larger teams with varying levels of expertise
with a lot of little updates to the code base. This has worked well
for my team. [...]

Indeed, its good practice. But its repetitive, wordy and perhaps not
as expressive as you might intend. How about:

printf ("str2: %s\n", strfForPrintf (str2));
safeFree (str2);

where you have the following macros defined:


You're reaching. In some programming environments you *wrap* your
calls to free(), or you use tools like purify etc, and you get status
reports about attempts to free NULL. I.e., you actually get visibility
into potential programming failures, as opposed to just suffering from
a double free which has UB, and can be difficult to track down or trace
without some effort.
[...] That habit will then bite you when you encounter an
exception; e.g., when you make a copy of a pointer and only set one copy
to null, or when you have to work with other people's code which doesn't
nullify any pointers.
The truly good programming habit, in this case, is to do your bleedin'
bookkeeping, and keep in mind which pointers you have free()d, and which
you haven't. [...]

If you follow your own logic for a second, don't you also have to do
everyone else's book keeping too, since you might be using other
people's code that doesn't nullify pointers? Following
standards/patterns and using safety mechanisms like NULLing pointers is
a far more scalable approach in terms of the number of programmers you
can sustain on a single project.
 
R

Richard Heathfield

Keith Thompson said:

I'm not going to say that I disagree with you, but I'm going to offer
a counterargument anyway.

Good man. :)
Setting a free()d pointer to NULL can prevent certain kinds of errors
-- or rather, it can prevent the *symptoms* of certain kinds of
errors.

Well, it can actually prevent errors. If p is NULL, p's value is not
indeterminate. It's not actually /valid/, but it's testably invalid.
For example:

some_type *ptr = malloc(sizeof *ptr);
...
free(ptr);
...
/*
* Now I don't remember whether I called free(ptr) or not.
* I'll free it here, just in case.
*/
free(ptr);

Sloppy programming, of course. (I typed "Sloopy" on the first attempt, and
was half-tempted to leave it uncorrected.)
As written, the second call to free() invokes undefined behavior. In
fact, evaluating ptr in preparation for the call invokes UB; the call
to free() doesn't really have anything to do with it.

QUite so.
If the first free() call were replaced with:

free(ptr);
ptr = NULL;

then the second free() call wouldn't invoke UB. But it would *still*
(probably) be a symptom of a logical error, namely the failure to keep
track of whether you've already called free().

Sure, but now you have only one error instead of two, and as far as the
compiler is concerned the error is harmless. As far as the programmer's
boss is concerned, however, it may not be.
Setting ptr to NULL
after the first free() would mask the symptom of the error; leaving it
alone could *potentially* cause the second free() to blow up, making
the error easier to detect. (Or it could do nothing; such is the
nature of undefined behavior.)

Or it could destroy an entire continent, which is why I really really don't
like the idea. Better to have a deterministic program that you can debug
predictably, IMHO.
On the other hand, you could legitimately have reached the point of
the second free() by any of several paths, some of which have free()d
the pointer and some of which haven't. In that case, *if* all the
paths that free() it then set it to NULL, then free()ing it again is
harmless and sensible.

In practice, it's very easy to ensure (or rather, *almost* ensure) that all
paths do free it, meaning that there is no need to give it "one for luck".
I mean, of course, using an opaque type. Yes, it's true that some bonehead
can free(p); instead of TDestroy(&p); if he really really insists, which is
why I have to say "almost".

In an ideal world, the question would never arise; you wouldn't make
that mistake in the first place, or at least you'd already have
corrected it.

Indeed. But since it's not an ideal world, it makes sense to make your
programs as debuggable as possible, and in my opinion that means keeping
them deterministic!
In software intended to be maximally robust (which, I hasten to add,
isn't *always* worth the effort), it probably makes sense to implement
this kind of defense in depth *and* to detect and log cases where
errors were caught by the second or later line of defense.

Right.
 
R

Richard Heathfield

(e-mail address removed) said:

How about:

printf ("str2: %s\n", strfForPrintf (str2));
safeFree (str2);

where you have the following macros defined:

#define strfForPrintf(str) ((str)?(str):"<NULL>")
#define safeFree(str) do { free (str); str = NULL; } while (0)

Your second macro is badly named, I think. If it were truly a "safe Free",
it would be okay to invoke it like this: safeFree((void *)rand()); /* ! */

<snip>
 
H

Herbert Rosenau

Your rebuttal is worse than the disease, because:

(1) If fwrite or fread fails, any further use of p will
invoke undefined behaviour.

Like yopu've written some other data. You've lost completely.
(2) Even with error checking inserted, you would still be
leaking memory when fread fails.

Writing data to disk is always risky. When you can't read them back
you're in trouble always. There is no difference between a pointer, a
float or a simple text string. You says clearly: avoid writing data to
disk when you have a need to read them soetimes.
(3) In general, writing pointers to files is always
nonportable.

No, it is always fully portable.

So, I am not sure why you are posting this in a
group that values portability.

What you means is that writing a pointer to a file exit the program
and trying to use the pointer readed from a file will work always.

To cite the twit:

void f(void) {
int *p = GBmalloc(4711);

......

}

GBmalloc does NOT seen that the memory allocated gets unused because p
is not set to NULL even as

void f(void) {

void *p = GBmalloc(4711);
......

p = NULL; /* invokes magically GBFree(p); */

}

is pure crap.

Moving pointers to other fuctions in other translation untis will have
random effects with that garbidge in special when that other
translation units will be called with static arrays even with
dynamically allocated arrasy. It lives in the hands of the programmer
to define when free() should be called because no GB will ever been
able to free it - except in an interpreter.

GB gives in C more problems as it ever can solve when the program is a
bit more complex as an hello world one.

GB in C++ is errornous enough. GB in C is complete senseless. It may
be a bit helpful in a C interpreter but in truly complex applications
it is the main source for UB in any way.

--
Tschau/Bye
Herbert

Visit http://www.ecomstation.de the home of german eComStation
eComStation 1.2 Deutsch ist da!
 
H

Herbert Rosenau

Reference, please?

Practise.
My recollection is that the relevant section says only that
when something is written out to a binary file, that the same
binary value will be read back in. When, though, it comes to a pointer,
that doesn't promise that the reconstituted pointer points to anything.

Reference please.

The pointer will point to exactly the same memory location as it had
as it was written. Which magic will change the memory to be unuseable
only when the pointer pointing to gets written into a file? Clearly
the pointer gets useless when the program holding the data in memory
dies. But solong the program is active in the same run the memory
holds its content.
There is a section of the standard that talks about writing out
pointers and reading them back in, but that has to do with using
the %p printf() format, which Richard did not do.

You knows the difference between binary and text files?

--
Tschau/Bye
Herbert

Visit http://www.ecomstation.de the home of german eComStation
eComStation 1.2 Deutsch ist da!
 
H

Herbert Rosenau

Yes, this can be a problem with the GC.

There are more problems than solutions with GC in C.
As you may know, I am not selling anything here, and I am not saying
that the GC is a "solve it all" magic bullet.

The algorithm of a conservative GC is that if a block of memory
is *reachable* from any of the roots (and in this case a global pointer
is a root) then that block can't be reused.

Is GC an interpreter that it pursures each and any action on the
pointers it gives out? No? You lost already.
Setting pointers to NULL after you are done with them is a harmless
operation, nothing will be destroyed unles it can be destroyed.

It is MUCH simpler than calling free() only ONCE for each block.

So your GC will fail to destroy the area producing memory leaks under
guarantee or it will try to free() static memory. It is easy to
produce that.
If you have several dozen *ALIASES* for a memory block, as you did
in the code above, you must free the block ONLY ONCE, what can be
extremely tricky in a real life situation.

When you are a real programmer and not a stupid hacker it will always
be easy to have only one free() available for each malloc(). Even in
more complex code. When the lifetime of a memory block ends free() it
and set the reference pointer to NULL. You can have a million aliasse
of an pointer - but when you are not only a stupid hacker you would
always know how to identify if a given memory block is in usage or
not. No magic needed, no unreliable GC needed.
If you have visibility, as in this example, OK. But if you don't,
i.e. you are passing memory blocks around to all kinds of routines that
may store that pointer in structures, or pass it again around, it
can be extremely tricky to know how many aliases you already have
for that memory block and to invalidate ALL of them.

You are describing perfectly the situation your GC will fail
miserably.
Setting a pointer to NULL is such a CHEAP operation in ALL machines
that it is not worth speaking about. It means zeroing a
memory location.

Boah, ey! As you says it means to the the pointer to NULL, but not to
free() a memory location used somewhere under some conditions hidden
from an magic GC.

When your CRT is able to tell you how many memory it has given out and
not gotten back already it is easy during debug phase to fix forgotten
free()s. When the magic GC on such points gives you the same message
you mostenly out of luck.

I've written lots of highly complex C programs using intensive
malloc(), realloc() and free() ending always up with no memory leak. I
had to test lots of less complex C++ programs using GC and found
always lots of memory leaks, ending up in weeks to search for the
causes and resolve them.

GC in C makes simply no sense.

Instead to hack blindly around you have to learn to write failsave
programs and let them run 24h/366d/year using only excatly the
resources they need for the current work. You'll avoid GC and C++
whenever possible.

--
Tschau/Bye
Herbert

Visit http://www.ecomstation.de the home of german eComStation
eComStation 1.2 Deutsch ist da!
 
H

Herbert Rosenau

Richard, while you have earned a lot of my respect and admiration through
your posting, I do feel that you are on a little bit of a crusade here. You
seem to be arguing that because garbage collection can't protect against all
memory leaks without care from the user, it is worthless. This is a somewhat
ridiculous argument - after all, manual memory management has precisely the
same property.

No. It is quite more easy to write errorfree programs using
malloc()/free() than having a defective GC. A GC that can't handle
perfectly each and any dynamic memory is perfectly unuseable per
design.
GCs don't give you a license not to think, but I didn't see anybody arguing
that they do.

GC as it should be designed araises this claim. In C it will fail
always miserably, so it is useless.

--
Tschau/Bye
Herbert

Visit http://www.ecomstation.de the home of german eComStation
eComStation 1.2 Deutsch ist da!
 
J

Joe Wright

Richard Heathfield wrote:

[ snip all ]

Some time ago now I attacked the GC problem, as I saw it, with what I
called GE or Garbage Elimination.

It has to do with wrapping malloc, calloc, realloc and free such that
they all come to GE first. GE is basically a manager of a linked list of
allocation data, including addresses and sizes.

GE.h adds 'size_t size(void *p);' to the vocabulary, size of the allocation.

User calls to *alloc are simply recorded in the list and then passed to
their libc namesakes. Calls to free() are looked up in the list and if
found result in a call to libc free() and deletion from the list. If the
user calls free(x) and x is not in the list, we simply return, a NOP.

What do you think of it?
 
J

jacob navia

Joe said:
Richard Heathfield wrote:

[ snip all ]

Some time ago now I attacked the GC problem, as I saw it, with what I
called GE or Garbage Elimination.

It has to do with wrapping malloc, calloc, realloc and free such that
they all come to GE first. GE is basically a manager of a linked list of
allocation data, including addresses and sizes.

GE.h adds 'size_t size(void *p);' to the vocabulary, size of the
allocation.

User calls to *alloc are simply recorded in the list and then passed to
their libc namesakes. Calls to free() are looked up in the list and if
found result in a call to libc free() and deletion from the list. If the
user calls free(x) and x is not in the list, we simply return, a NOP.

What do you think of it?

I am afraid you are actually repeating the code of malloc.
The system function malloc has been probably coded with great care to do
exactly that:

Find a free block in the list of free blocks!

You are just adding a layer that does the same.

jacob
 
C

CBFalconer

Joe said:
Richard Heathfield wrote:

[ snip all ]

Some time ago now I attacked the GC problem, as I saw it, with
what I called GE or Garbage Elimination.

It has to do with wrapping malloc, calloc, realloc and free such
that they all come to GE first. GE is basically a manager of a
linked list of allocation data, including addresses and sizes.

GE.h adds 'size_t size(void *p);' to the vocabulary, size of the
allocation.

User calls to *alloc are simply recorded in the list and then
passed to their libc namesakes. Calls to free() are looked up in
the list and if found result in a call to libc free() and deletion
from the list. If the user calls free(x) and x is not in the list,
we simply return, a NOP.

What do you think of it?

All that is usually available, in some sort of system dependant
manner, in any malloc package. You could examing my nmalloc and
its associated maldbg package in nmalloc.zip, available at:

<http://cbfalconer.home.att.net/download/>

bearing in mind that the package is designed for use with DJGPP,
but can probably be easily ported to many systems.

The problem is that handling the malloced pointers is not enough.
You also have to be able to handle all the pointers created within
any C program by such things as the & operator, and by automatic
conversion of array references. To include these you have to get
into the warp and woof of the complete compiler/code-generator, and
you will have significant efficiency losses since all pointer
references will have to be indirect, through a system table.
 
F

Flash Gordon

jacob said:
Joe said:
Richard Heathfield wrote:

[ snip all ]

Some time ago now I attacked the GC problem, as I saw it, with what I
called GE or Garbage Elimination.

It has to do with wrapping malloc, calloc, realloc and free such that
they all come to GE first. GE is basically a manager of a linked list
of allocation data, including addresses and sizes.

GE.h adds 'size_t size(void *p);' to the vocabulary, size of the
allocation.

User calls to *alloc are simply recorded in the list and then passed
to their libc namesakes. Calls to free() are looked up in the list and
if found result in a call to libc free() and deletion from the list.
If the user calls free(x) and x is not in the list, we simply return,
a NOP.

What do you think of it?

I think that if an invalid pointer is passed it should abort the program
since something serious has gone wrong and you don't know how much else
has been messed up.
I am afraid you are actually repeating the code of malloc.
The system function malloc has been probably coded with great care to do
exactly that:

Find a free block in the list of free blocks!

You are just adding a layer that does the same.


No, his layer does *not* do the same as the C library. Most
implementations will crash at some point after you have called free with
an invalid pointer. Joe Wright's wrapper will prevent that.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,579
Members
45,053
Latest member
BrodieSola

Latest Threads

Top