Garbage Collection in C

J

jacob navia

Abstract
--------
Garbage collection is a method of managing memory by using a "collector"
library. Periodically, or triggered by an allocation request, the
collector looks for unused memory chunks and recycles them.
This memory allocation strategy has been adapted to C (and C++) by the
library written by Hans J Boehm and Alan J Demers.

Why a Garbage Collector?
-----------------------
Standard C knows only the malloc/calloc/free functions. The programmer
must manage each block of memory it allocates, never forgetting to call
the standard function free() for each block. Any error is immediately
fatal, but helas, not with immediate consequences. Many errors like
freeing a block twice (or more) or forgetting to free an allocated
block will be discovered much later (if at all). This type of bugs are
very difficult to find and a whole industry of software packages
exists just to find this type of bugs.

The garbage collector presents a viable alternative to the traditional
malloc/free "manual" allocation strategies. The allocator of Boehm
tries to find unused memory when either an allocation request is
done, or when explicitely invoked by the programmer.

The main advantage of a garbage collector is that the programmer is
freed from the responsability of allocating/deallocating memory. The
programmer requests memory to the GC, and then the rest is *automatic*.


Limitations of the GC.
---------------------
The GC needs to see all pointers in a program. Since it scans
periodically memory, it will assume that any block in its block list is
free to reuse when it can't find any pointers to it. This means that the
programmer can't store pointers in the disk, or in the "windows extra
bytes", as it was customary to do under older windows versions, or
elsewhere.

This is actually not a limitation since most programs do not write
pointers to disk, and expect them to be valid later...
Obviously, there is an infinite way to hide pointers (by XORing them
with some constant for instance) to hide them from the collector.

This is of no practical significance. Pointers aren't XORed in normal
programs, and if you stay within the normal alignment requirements
of the processor, everything works without any problems.

Performance considerations
--------------------------
In modern workstations, the time needed to make a complete sweep in
mid-size projects is very small, measured in some milliseconds. In
programs that are not real time the GC time is completely undetectable.
I have used Boehm's GC in the IDE of lcc-win32, specially in the
debugger. Each string I show in the "automatic" window is allocated
using the GC. In slow machines you can sometimes see a pause of
less than a second, completely undetectable unless you know that is
there and try to find it.

It must be said too that the malloc/free system is slow too, since at
each allocation request malloc must go through the list of free blocks
trying to find a free one. Memory must be consolidated too, to avoid
fragmentation, and a malloc call can become very expensive, depending
on the implementation and the allocation pattern done by the program.


Portability
-----------
Boehm's GC runs under most standard PC and UNIX/Linux platforms. The
collector should work on Linux, *BSD, recent Windows versions, MacOS X,
HP/UX, Solaris, Tru64, Irix and a few other operating systems. Some
ports are more polished than others. There are instructions for porting
the collector to a new platform. Kenjiro Taura, Toshio Endo, and Akinori
Yonezawa have made available a parallel collector.

Conclusions
-----------
The GC is a good alternative to traditional allocation strategies for C
(and C++). The main weakness of the malloc/free system is that it
doesn't scale. It is impossible to be good at doing a mind numbing task
without any error 100% of the time. You can be good at it, you can be
bad at it, but you can NEVER be perfect. It is human nature.

The GC frees you from those problems, and allows you to conecntrate in
the problems that really matter, and where you can show your strength
as software designer. It frees you from the boring task of keeping track
of each memory block you allocate.

jacob
 
B

Bob Martin

in 700822 20061011 175810 Richard Heathfield said:
jacob navia said:


Quite so. Please move discussions of non-C matters to some other newsgroup
where it is topical.

Are you telling Jacob to stay out of your playpen?
 
W

William Hughes

jacob navia wrote:

[...]
This is of no practical significance. Pointers aren't XORed in normal
programs, and if you stay within the normal alignment requirements
of the processor, everything works without any problems.

No, but I might well subtract 1 from my pointers to change the
indexing. I might well pass a pointer to a third party library and
forget about it.

In my view, any risk is too much to solve a relatively minor
problem, memory leaks, especially as GC can only defend
against true memory leaks. If I keep allocating memory,
remember it (so in theory I could use it), but don't use it
GC is not going to help (now if GC could do anything about
memory stomps ...) .

- William Hughes
 
J

jacob navia

William said:
jacob navia wrote:

[...]

This is of no practical significance. Pointers aren't XORed in normal
programs, and if you stay within the normal alignment requirements
of the processor, everything works without any problems.


No, but I might well subtract 1 from my pointers to change the
indexing.

This is allowed of course. Your pointer will be within the bounds
of the pointed-to object and that object will NOT be reclaimed since
there is (at least) one pointer to somewhere in it.

I might well pass a pointer to a third party library and
forget about it.
Who cares?

The GC will see it anyway, because the foreign library is part
of your executable.
In my view, any risk is too much to solve a relatively minor
problem, memory leaks, especially as GC can only defend
against true memory leaks.

No, it defends against double free() too, since you never
call free() all the bugs associated with not calling it
or calling it more than once disappear...

If I keep allocating memory,
remember it (so in theory I could use it), but don't use it
GC is not going to help (now if GC could do anything about
memory stomps ...) .

Well, if you grab memory and memory and memory and you forget
to use it, if you do not keep any pointers to it nothing will happen:

for (i=0; i<100; i++)
a = GC_malloc(100);

only the last block will be protected from the GC, since there is a
pointer to it (a). All others will be reclaimed since there are
no pointers to them.
 
J

jacob navia

Bob said:
Are you telling Jacob to stay out of your playpen?

I do not know what heathfield has against the GC.

Why have this limited view of C, where any deviation from
the holy scriptures is considered an heresy?

This group is about discussions of the C language, and
memory allocation strategies are very important. Why can't we
discuss it here? Because there is no GC in the ISO-Standard?

Nonsense.

jacob
 
K

Keith Thompson

jacob navia said:
Abstract
--------
Garbage collection is a method of managing memory by using a "collector"
library. Periodically, or triggered by an allocation request, the
collector looks for unused memory chunks and recycles them.
This memory allocation strategy has been adapted to C (and C++) by the
library written by Hans J Boehm and Alan J Demers.

Why a Garbage Collector?

And standard C is what we discuss in this newsgroup.

[...]
Limitations of the GC.
---------------------
The GC needs to see all pointers in a program. Since it scans
periodically memory, it will assume that any block in its block list
is free to reuse when it can't find any pointers to it. This means
that the
programmer can't store pointers in the disk, or in the "windows extra
bytes", as it was customary to do under older windows versions, or
elsewhere.

This is actually not a limitation since most programs do not write
pointers to disk, and expect them to be valid later...
Obviously, there is an infinite way to hide pointers (by XORing them
with some constant for instance) to hide them from the collector.

This is of no practical significance. Pointers aren't XORed in normal
programs, and if you stay within the normal alignment requirements
of the processor, everything works without any problems.

I appreciate the fact that, for a change, you've acknowledged the
limitations of GC.

I suggest, though, that it's up to each programmer to decide whether
these limitations are of any practical significance. It's not
difficult (I would think) to write new code that avoids doing the odd
things with pointer values that can cause GC to fail. It could be
*very* difficult to verify that an existing program is GC-safe, or to
modify one that isn't.
Performance considerations
--------------------------
In modern workstations, the time needed to make a complete sweep in
mid-size projects is very small, measured in some milliseconds. In
programs that are not real time the GC time is completely undetectable.
I have used Boehm's GC in the IDE of lcc-win32, specially in the
debugger. Each string I show in the "automatic" window is allocated
using the GC. In slow machines you can sometimes see a pause of
less than a second, completely undetectable unless you know that is
there and try to find it.

A pause of "less than a second" isn't necessarily going to be a
problem for an interactive program like an IDE. It could be fatal for
more time-sensitive applications. Again, you acknowledge the problem,
but you seem to assume that since it's not an issue for you, it's not
going to be an issue for anyone.

[...]
Portability
-----------
Boehm's GC runs under most standard PC and UNIX/Linux platforms. The
collector should work on Linux, *BSD, recent Windows versions, MacOS X,
HP/UX, Solaris, Tru64, Irix and a few other operating systems. Some
ports are more polished than others. There are instructions for porting
the collector to a new platform. Kenjiro Taura, Toshio Endo, and
Akinori Yonezawa have made available a parallel collector.

So it works on Unix-like systems and Windows. If those are the only
systems you use, it's portable enough *for you*, but that doesn't make
it appropriate for a newsgroup that doesn't deal with specific
platforms.
 
W

William Hughes

jacob said:
William said:
jacob navia wrote:

[...]

This is of no practical significance. Pointers aren't XORed in normal
programs, and if you stay within the normal alignment requirements
of the processor, everything works without any problems.


No, but I might well subtract 1 from my pointers to change the
indexing.

This is allowed of course. Your pointer will be within the bounds
of the pointed-to object and that object will NOT be reclaimed since
there is (at least) one pointer to somewhere in it.

I subtracted 1 (yes, this is undefined behaviour,
unless I cast to an integer type first in which case it is
implementation defined behaviour). The resulting pointer may or may
not be within the bounds of the pointed-to object. But what if
I decided to start indexing at 1000?
Who cares?

The GC will see it anyway, because the foreign library is part
of your executable.

But I don't know what the foreign library does with the pointer.
What if the guy who wrote the library likes to start indexing from 2K?
What if the gal who wrote the library didn't use C, but wrote
a self modifying encrypted executable using assembler.
No, it defends against double free() too, since you never
call free() all the bugs associated with not calling it
or calling it more than once disappear...

The double free is even more of a minor problem than the
memory leak.
Well, if you grab memory and memory and memory and you forget
to use it, if you do not keep any pointers to it nothing will happen:

And if, as was explicitely stated, I do keep pointers to it?
for (i=0; i<100; i++)
a = GC_malloc(100);

only the last block will be protected from the GC, since there is a
pointer to it (a). All others will be reclaimed since there are
no pointers to them.

I reiterate, in my view the putative advantages of adding GC to C do
not justify even a very small risk

- William Hughes
 
R

Roland Pibinger

Conclusions

GC is incompatible with C++ (destructors) and inappropriate for the
system programming language C (you didn't even mention the huge memory
overhead of GC).
The main weakness of the malloc/free system is that it
doesn't scale.

It scales when you use appropriate, well-known idioms like symmetric
*alloc and free calls, or high-level solutions like obstacks.
It is impossible to be good at doing a mind numbing task
without any error 100% of the time. You can be good at it, you can be
bad at it, but you can NEVER be perfect. It is human nature.

You have good tools on most platforms to detect the errors, despite
'human nature'.
The GC frees you from those problems, and allows you to conecntrate in
the problems that really matter, and where you can show your strength
as software designer. It frees you from the boring task of keeping track
of each memory block you allocate.

GC handles only one resource, memory. Other resources in a program eg.
file handles, database connections, locks, etc. still need to be
handled by the programmer. If you prefer GC in C, go for it. But your
code becomes dependant on a GC and therefore non-portable and
non-reusable (without that GC).

Best regards,
Roland Pibinger
 
J

jacob navia

Roland said:
GC is incompatible with C++ (destructors) and inappropriate for the
system programming language C (you didn't even mention the huge memory
overhead of GC).

Why should the destructors be touched? They just
do not call free() (delete in C++) and that is it.

The rest of C++ goes on like before.
It scales when you use appropriate, well-known idioms like symmetric
*alloc and free calls, or high-level solutions like obstacks.
Those solutions are difficult at best.
You need to pass pointers around and store them never forgetting to free
them, etc. Or you impose yourself a HEAVY discipline that cripples your
ability to store pointers freely somewhere to use them as needed.

Example:

Thread A allocates a buffer to display some message. It passes
that message to the thread B, that handles the user interface,
displays strings, asks for input etc. Thread B is not connected
to thread A and calls are asynchronous, using a message passing
interface.

Thread B, then, must free the buffer. But then you must ensure
that all messages are allocated and that no calls are done like:
PostMessageToThreadB("Please enter the file name");
because when thread B attempts to free that it will crash.

Of course YOU know this and YOU will not do this type of call,
but when you were in Arizona in a customer place, programmer Z
had to add a message to the code because customer Y needed a fix and
DID NOT KNOW about your conventions...

Those are examples of *real* life, and software construction is
like that, as you know very well.

It is easy to say here:
"Just use some discipline", but it is MUCH HARDER to keep that
in real life.

You have good tools on most platforms to detect the errors, despite
'human nature'.




GC handles only one resource, memory. Other resources in a program eg.
file handles, database connections, locks, etc. still need to be
handled by the programmer.

Well, it will not make your coffee anyway :)

There is a feature of the GC that allows to call a "destructor"
function, when an object will be destroyed. I haven't talked
about it because I consider it dangerous, since it is not really
a destructor, it will not be called when the variable goes out
of scope, but when the GC discovers that it is no longer used, what
can be MUCH later.

If you prefer GC in C, go for it. But your
code becomes dependant on a GC and therefore non-portable and
non-reusable (without that GC).

Best regards,
Roland Pibinger


Well if you feel like, you can always add the calls to free(), but
then, you would see that it is quite a BORING THING TO DO...
 
R

Richard Heathfield

jacob navia said:
I do not know what heathfield has against the GC.

What makes you think I have anything against GC? I have no more objection to
Garbage Collection than I have to tuna, Dusty Springfield, skateboarding,
the tiny little sequins you get on ballgowns, or the Metropolitan District
of South Humberside. None of them happens to be my cup of tea, but I have
no objection to their continued existence. Nor do I have any desire to stop
people talking about them. All I ask is that you do it some place where
they are topical. Automatic garbage collection is no more topical in
comp.lang.c than banana custard or the Falkland Islands.
Why have this limited view of C, where any deviation from
the holy scriptures is considered an heresy?

This is nothing to do with holy scripture, nothing to do with heresy, and
everything to do with topicality.
This group is about discussions of the C language, and
memory allocation strategies are very important.

This group is indeed for discussing the c language, and nuclear defence
strategies are very important. That does not make nuclear defence
strategies topical in comp.lang.c.
Why can't we discuss it here?

You *are* discussing it here. I'm asking you not to, because it's not
topical here. You are free to disagree with me, of course. And I'm free to
think of you as an ignorant bozo with no clue and no brain. Ain't freedom
wonderful?
Because there is no GC in the ISO-Standard?

Nonsense.

Really? Okay, I'll bite - show me where the ISO C Standard defines the
behaviour of GC_malloc.
 
J

jacob navia

Richard said:
You are free to disagree with me, of course. And I'm free to
think of you as an ignorant bozo with no clue and no brain.

Please do not write that heathfield. Mr Thompson will immediately
say that I am insulting you...
 
R

Richard Heathfield

jacob navia said:
Please do not write that heathfield.

Firstly, I didn't say I *do* think of you as an ignorant bozo with no clue
and no brain. I said I'm free to think of you as an ignorant bozo with no
clue and no brain. Whether I *do* think of you as an ignorant bozo with no
clue and no brain depends very much on how long it takes you to grasp the
concept of topicality.

Secondly, you have ignored other people's requests to you not to write some
stuff here in comp.lang.c, so why should anyone pay any attention to your
requests not to write some stuff here in comp.lang.c? You're the "anything
goes" guy, not me, so you should approve that people are free to write
anything they like here, including statements such as "I'm free to think of
you as an ignorant bozo with no clue and no brain".

Note that I maintain the important logical distinction between "I'm free to
think of you as an ignorant bozo with no clue and no brain" and "you're an
ignorant bozo with no clue and no brain". I have said the former but not
the latter, and the former is merely a statement about freedom, not a claim
that you are an ignorant bozo with no clue and no brain - for such a claim
would be in poor taste at best, and I have no desire to make a claim in
poor taste (such as, for example, the claim that you are an ignorant bozo
with no clue and no brain). That would not be polite.
 
A

Al Balmer

Whether I *do* think of you as an ignorant bozo with no
clue and no brain depends very much on how long it takes you to grasp the
concept of topicality.

I can guess, based on the fact that Jacob has been posting off-topic
articles here since at least 2000. IMO, that's quite long enough.
 
F

Flash Gordon

jacob said:
William said:
jacob navia wrote:

[...]
This is of no practical significance. Pointers aren't XORed in normal
programs, and if you stay within the normal alignment requirements
of the processor, everything works without any problems.

No, but I might well subtract 1 from my pointers to change the
indexing.

This is allowed of course. Your pointer will be within the bounds
of the pointed-to object and that object will NOT be reclaimed since
there is (at least) one pointer to somewhere in it.

You missed William's point entirely.

p = malloc(N * sizeof *p) -1; /* I now have an array indexed by 1 */

Non-standard, but people do it. It means that the pointer is no longer
no longer points in to the object, so the GC will free it.
Who cares?

The GC will see it anyway, because the foreign library is part
of your executable.

How to you know the library won't do any of the things that break GC?
Such as paging information, including your pointer, out to disk? Or
compressing it? Or subtracting 1 as above to use it as an array indexed
from 1? IIRC this last is possible if it is a library simply
implementing stuff out of Numerical Recipes in C, something which is
quite possible.
No, it defends against double free() too, since you never
call free() all the bugs associated with not calling it
or calling it more than once disappear...

Still does not solve all the other problems.
Well, if you grab memory and memory and memory and you forget
to use it, if you do not keep any pointers to it nothing will happen:

for (i=0; i<100; i++)
a = GC_malloc(100);

only the last block will be protected from the GC, since there is a
pointer to it (a). All others will be reclaimed since there are
no pointers to them.

You missed the point again. How about if it keeps adding stuff to a tree
and never deletes nodes that are no longer needed? Then there is always
a way to reach the pointers so they never get freed.

How about memory getting tight enough that some of the programs memory
is paged out to a swap file? Then the GC kicks in and forces those pages
to be reloaded forcing other pages to be swapped out and potentially
causing disk thrashing.

Finally, this is not the place for discussing GC as you well know. So I
am unlikely to post further in this thread even to correct any further
errors you make.
 
D

Default User

jacob navia wrote:


[inflammatory off-topic crap]


Ok, that's finally enough for a plonk.




Brian
 
K

Kenny McCormack

jacob navia wrote:


[inflammatory off-topic crap]


Ok, that's finally enough for a plonk.

About 93% of the "[inflammatory off-topic crap]" was written by
heathfield, so I assume that means you are plonking it. Good show!
Wise choice!
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,577
Members
45,054
Latest member
LucyCarper

Latest Threads

Top