Garbage Collection in C

jacob navia · Oct 16, 2006

jacob navia wrote:

It sounds as though you could have the answer to a question of mine. If
people used to store pointers in the extra bytes, but don't do that any
more, what do they do now?

My situation is that I want to find out, when the user does something
to a window, which object is associated with that window. See my post
"Finding the "owner" of a window" in
comp.os.ms-windows.programmer.win32 for more background.

Of course, my question is off-topic on clc, perhaps you might consider
replying to the above post?

Thanks in advance.
Paul.

All these extra bytes stuff can be simulated with an associative
list where you just keep a correspondence of window handle and
pointer to the data:

typedef struct tagAssocList {
struct tagAssocList *Next;
void *pData;
void *WindowHandle;
} ASSOCIATIVE_LIST;

And you are all set.

jacob navia · Oct 16, 2006

Richard said:
(e-mail address removed) said:

We store pointers in the extra bytes.

Just because Jacob Navia says something, that don't necessarily make it so.

You are like a 8 year old...

I see perfectly that you left out the "windows" in your
sentence above...

You lost heathfield. Start again

jacob

Richard Heathfield · Oct 16, 2006

jacob navia said:

You are like a 8 year old...

How ironic.

I see perfectly that you left out the "windows" in your
sentence above...

So? It was implied.

You lost heathfield.

You always say that. One day, it might even be true. But not today.

Richard Bos · Oct 16, 2006

It sounds as though you could have the answer to a question of mine. If
people used to store pointers in the extra bytes, but don't do that any
more, what do they do now?

They store pointers to the pointers. Sometimes they call these "handles"
(which actually are pointers underneath, last time I looked, but of
course this could've changed in the mean time), and sometimes they do
not.
To C, of course, a memcpy() of a pointer works just as well as a
memcpy() of an unsigned long, provided there's memory enough for either.
If you want to add extra limitations to your program by adding a
not-quite-good-enough library, that's the problem of neither ISO C nor
MS Windows.

Richard

Walter Roberson · Oct 16, 2006

jacob navia said:
Note that if the APIs are part of some DLLs the GC will see them anyway.

That statement is vacuously true, since if the number of DLLs that the
GC would pay attention to is precisely zero, that would still be
"some DLLs".

Are you talking about some -particular- DLLs? Are you saying that
the particular GC library you mentioned is able to locate pointers
properly throughout -all- DLLs? Throughout all DLLs specially compiled
with the library??

Your original posting mentioned a number of operating systems, most
of which don't have *any* DLLs. For example, the IRIX operating system
mentioned does not have DLLs (it has DSOs, Dynamic Shared Objects,
but those don't appear to have the same connotations; Windows uses
lots of DLLs, but DSOs are not used particularily often with IRIX,
at least not "Dynamic"-ally.)

Casts to integers do not affect the GC at all.

Amazing. How does the GC deal with the fact that on a number of
systems, pointers are 64 bits, but 'int' is only 32 bits ?

And how does it deal with representation changes that can occur
when casting pointers? About all the C89 standard promises is that

a) any object pointer may be converted to void*

b) void* and char* have the same alignment and representation

c) casting an object pointer to a pointer with a less strict alignment
and back again to the original type is promised to work -- but
representation changes are permitted along the way

d) casting a pointer into an integral type might work (implementation
defined) if the type is wide enough, possibly changing representation
as it goes. Casting an integral type to a pointer might work
(implementation defined). In C89, the result of converting a pointer
to an integral type and back need not compare equal to the original;
in K&R2 and C99 it must compare equal if it is defined at all.

e) function pointers are not object pointers and nothing in the
C89 standard requires that they be convertable to any kind of object
pointer, including no requirement at any level that converting them
to void* will work.

f) function pointers may be converted to a different kind of function
pointer and back again and the result must compare equal to the
original

g) outputting a pointer with a %p format and scanning it back in with %p
format will result in something that compares equal to the original
pointer if the implementation makes %p meaningful at all.

Thus, even just casting an object pointer to char* and storing that
is not certain to result in a bitwise storage identical to the
original object pointer -- and if the garbage collector doesn't know
all the representation conversions that -might- take place and try
them all, it can believe a pointer to be unused when it is still in use.

Keith Thompson · Oct 16, 2006

That statement is vacuously true, since if the number of DLLs that the
GC would pay attention to is precisely zero, that would still be
"some DLLs".

Are you talking about some -particular- DLLs? Are you saying that
the particular GC library you mentioned is able to locate pointers
properly throughout -all- DLLs? Throughout all DLLs specially compiled
with the library??

Your original posting mentioned a number of operating systems, most
of which don't have *any* DLLs. For example, the IRIX operating system
mentioned does not have DLLs (it has DSOs, Dynamic Shared Objects,
but those don't appear to have the same connotations; Windows uses
lots of DLLs, but DSOs are not used particularily often with IRIX,
at least not "Dynamic"-ally.)

I don't know in any detail how DLLs work, so the following is largely
guesswork on my part.

<GUESS>
DLLs provide a way to share executable code between two or more
simultaneously executing processes. But any *data* that's associated
with some process must be stored in the process's own address space,
not in some space associated with the DLL. (Otherwise processes could
step on each other's data, which would be A Bad Thing.)

Amazing. How does the GC deal with the fact that on a number of
systems, pointers are 64 bits, but 'int' is only 32 bits ?

Conversions from 64-bit pointers to 32-bit integers obviously aren't
going to work, regardless of GC. But jacob said "integers"; he didn't
mention the specific type "int". GC shouldn't be affected by
conversions from, say, void* to uintptr_t; the GC code isn't going to
care whether something that looks like an address is stored in an
object of type void*, in an object of type uintptr_t, or (presumably)
in a register used to hold an intermediate result not associated with
any declared object.

And how does it deal with representation changes that can occur
when casting pointers? About all the C89 standard promises is that
[snip]

Thus, even just casting an object pointer to char* and storing that
is not certain to result in a bitwise storage identical to the
original object pointer -- and if the garbage collector doesn't know
all the representation conversions that -might- take place and try
them all, it can believe a pointer to be unused when it is still in use.

It's already been established that at least one GC implementation is
not 100% portable; I think jacob said that Boehm GC works only on
Windows and Linux-like systems. My guess is that it works only on
systems where conversions between pointers and integers, or between
pointers and pointers, do not change the representation. (I'm sure it
could be adapted to work on systems where such conversions do cause a
change of representation, but I don't know of any such systems in real
life.)

jacob navia · Oct 16, 2006

Keith said:
That statement is vacuously true, since if the number of DLLs that the
GC would pay attention to is precisely zero, that would still be
"some DLLs".

Are you talking about some -particular- DLLs? Are you saying that
the particular GC library you mentioned is able to locate pointers
properly throughout -all- DLLs? Throughout all DLLs specially compiled
with the library??

Your original posting mentioned a number of operating systems, most
of which don't have *any* DLLs. For example, the IRIX operating system
mentioned does not have DLLs (it has DSOs, Dynamic Shared Objects,
but those don't appear to have the same connotations; Windows uses
lots of DLLs, but DSOs are not used particularily often with IRIX,
at least not "Dynamic"-ally.)

Click to expand...

I don't know in any detail how DLLs work, so the following is largely
guesswork on my part.

<GUESS>
DLLs provide a way to share executable code between two or more
simultaneously executing processes. But any *data* that's associated
with some process must be stored in the process's own address space,
not in some space associated with the DLL. (Otherwise processes could
step on each other's data, which would be A Bad Thing.)

Amazing. How does the GC deal with the fact that on a number of
systems, pointers are 64 bits, but 'int' is only 32 bits ?

Click to expand...

Conversions from 64-bit pointers to 32-bit integers obviously aren't
going to work, regardless of GC. But jacob said "integers"; he didn't
mention the specific type "int". GC shouldn't be affected by
conversions from, say, void* to uintptr_t; the GC code isn't going to
care whether something that looks like an address is stored in an
object of type void*, in an object of type uintptr_t, or (presumably)
in a register used to hold an intermediate result not associated with
any declared object.

And how does it deal with representation changes that can occur
when casting pointers? About all the C89 standard promises is that

[snip]

Thus, even just casting an object pointer to char* and storing that
is not certain to result in a bitwise storage identical to the
original object pointer -- and if the garbage collector doesn't know
all the representation conversions that -might- take place and try
them all, it can believe a pointer to be unused when it is still in use.

Click to expand...

It's already been established that at least one GC implementation is
not 100% portable; I think jacob said that Boehm GC works only on
Windows and Linux-like systems. My guess is that it works only on
systems where conversions between pointers and integers, or between
pointers and pointers, do not change the representation. (I'm sure it
could be adapted to work on systems where such conversions do cause a
change of representation, but I don't know of any such systems in real
life.)

I agree with this.
1) Obviously if you truncate (through a cast or otherwise) a pointer,
its value is gone, and the GC can't do anything about it...
2) When a cast changes the representation the same as (1) applies
3) Boehm's GC is portable in Unix and under windows. This is a big
portion of the total "market" but it is not everything. Anyway
nobody is saying that is portable to all C systems.

jacob

Keith Thompson · Oct 16, 2006

jacob navia said:
I agree with this.
1) Obviously if you truncate (through a cast or otherwise) a pointer,
its value is gone, and the GC can't do anything about it...
2) When a cast changes the representation the same as (1) applies
3) Boehm's GC is portable in Unix and under windows. This is a big
portion of the total "market" but it is not everything. Anyway
nobody is saying that is portable to all C systems.

Which is why it's largely off-topic in comp.lang.c.

Much of this discussion, which has been of the form:

Q: Can GC be implemented in 100% portable C?
A: No, and here's why.

has been topical. Advocating GC as a universal solution is not.

Bart · Oct 16, 2006

jacob said:
Note that if the APIs are part of some DLLs the GC will see them anyway.
Casts to integers do not affect the GC at all.

But I don't have any guarantees about what those DLLs do with my
pointers. I'd rather deal with memory leaks than with GC-related bugs
in some DLL I didn't write and for which I don't have any source code.

Besides there's more to resource management than malloc/free. Open
files, handles, connections, matched calls to some APIs. Those all have
the same inherent problems that memory allocation has. So why a GC
specifically to deal with just one problem? I suspect that, for C
programmers who would want something like this, just writing a few C++
wrappers that do some magic in constructors/destructors would probably
be a lot less painful than your GC.

Regards,
Bart.

Ben Pfaff · Oct 16, 2006

Bart said:
Besides there's more to resource management than malloc/free. Open
files, handles, connections, matched calls to some APIs. Those all have
the same inherent problems that memory allocation has.

These can usually be handled nicely through a "pool allocator"
interface.

So why a GC
specifically to deal with just one problem? I suspect that, for C
programmers who would want something like this, just writing a few C++
wrappers that do some magic in constructors/destructors would probably
be a lot less painful than your GC.

I think that moving C programs to C++ is likely to be more
painful than, well, almost any other solution.

SM Ryan · Oct 17, 2006

(e-mail address removed)-cnrc.gc.ca (Walter Roberson) wrote:
# In article <[email protected]>,
#
# >Note that if the APIs are part of some DLLs the GC will see them anyway.
#
# That statement is vacuously true, since if the number of DLLs that the
# GC would pay attention to is precisely zero, that would still be
# "some DLLs".

The program must have live pointers in its address space, or
somehow attached to it. For example write the pointer to disk
and erase the in-memory copy will lose the block. If windows
hides pointers outside the address space, you lose.

# >Casts to integers do not affect the GC at all.
#
# Amazing. How does the GC deal with the fact that on a number of
# systems, pointers are 64 bits, but 'int' is only 32 bits ?

It looks for byte strings subject to certain conditions that
look like heap addresses. If you change a pointer so that it
is no longer recognisable, you lose.

# And how does it deal with representation changes that can occur
# when casting pointers? About all the C89 standard promises is that

If the string of bytes making up the address value remain
recognisable, it can sweep and mark. If not, you lose.

Boehm-Demers segregates blocks to pages by block size. From a
pointer you get the page addres; from the page, the block size;
then the block first byte address. Interior block pointers do
not cause problems. Converting the addresses to something
unrecognisable causes problems.

# e) function pointers are not object pointers and nothing in the
# C89 standard requires that they be convertable to any kind of object
# pointer, including no requirement at any level that converting them
# to void* will work.

Code, own variables, heap, and stack are different parts of the
address space. Most operating systems give you some kind of map
from page addresses to memory area.

# All is vanity. -- Ecclesiastes

We are all bytes in the wind.

Boehm-Demers garbage collection doesn't work for all programs on
all systems. The systems it has been ported to do work (ie they
have been verified with system libraries and compilers) if you
write your programs in the restricted language.

Personally I'm not interested in exotic and/or program-me-if-you-dare
systems. And I do not find the rules difficult to follow.

SM Ryan · Oct 17, 2006

# Besides there's more to resource management than malloc/free. Open
# files, handles, connections, matched calls to some APIs. Those all have
# the same inherent problems that memory allocation has. So why a GC
# specifically to deal with just one problem? I suspect that, for C

Some collectors support an operation called finalisation. You can
attach a procedure to a block which is called when the block becomes
garbage. For example, you can attach a file closer to a file object.

# programmers who would want something like this, just writing a few C++
# wrappers that do some magic in constructors/destructors would probably
# be a lot less painful than your GC.

Shows how twisted things have become. OO became big with Smalltalk
that include garbage collection. C++ didn't have garbage collection
and it was soon discoverred that OO programs generated a lot of
heap memory and freeing those was a major pain. So destructors were
invented to move all this freeing into the object class instead of
the code using the object. This created a new problem in ensuring
the destructor was called at the appropriate time; this gets hard
with throw chains, gotos, long jumps, etc. All this complication was
added because garbage collection was unavailable.

Garbage collection + finalisation allows you everything destructors
do with fewer restrictions in the language and less work by
programmers.

jacob navia · Oct 17, 2006

SM said:
# Besides there's more to resource management than malloc/free. Open
# files, handles, connections, matched calls to some APIs. Those all have
# the same inherent problems that memory allocation has. So why a GC
# specifically to deal with just one problem? I suspect that, for C

Some collectors support an operation called finalisation. You can
attach a procedure to a block which is called when the block becomes
garbage. For example, you can attach a file closer to a file object.

# programmers who would want something like this, just writing a few C++
# wrappers that do some magic in constructors/destructors would probably
# be a lot less painful than your GC.

Shows how twisted things have become. OO became big with Smalltalk
that include garbage collection. C++ didn't have garbage collection
and it was soon discoverred that OO programs generated a lot of
heap memory and freeing those was a major pain. So destructors were
invented to move all this freeing into the object class instead of
the code using the object. This created a new problem in ensuring
the destructor was called at the appropriate time; this gets hard
with throw chains, gotos, long jumps, etc. All this complication was
added because garbage collection was unavailable.

Garbage collection + finalisation allows you everything destructors
do with fewer restrictions in the language and less work by
programmers.

EXACTLY.

It is amazing how deep the implications of this go. For instance, it
has been argued that you need constructors/destructors to support
operator overloading in C, since operator overloading needs to destroy
intermediate objects in expressions:
c = (a+b)/(a-b);
when '+' is overloaded, it must create a temporary object. This is
possible using the GC, since you are sure the GC will find the
unneeded object.

An enormous system of tables (DWARF3 tables) has been developed to
handle the throw/catch problem of C++. When you make a throw, the
destructors must be called to free the memory. All this is again
unnecessary using the GC. You can use longjmp/throw/catch without any
fear of making amemory leak.

Note that DWARF3 tables can make 10% or more of the code size in C++
applications)

Passing buffers from one thread to another is much easier with the GC.
You do not need to care when your buffer is going to be reclaimed.
There is no need to synchronize threads and wait for them to use the
buffer. You can send the allocated buffer and forget about it, the GC
will reclaim it anyway.

ETC ETC ETC.

The GC simplifies complex programs in a BIG way. It makes many features
of C++ unnecessary. C is a different thing with the GC.

jacob

Richard · Oct 17, 2006

jacob navia said:
EXACTLY.

It is amazing how deep the implications of this go. For instance, it
has been argued that you need constructors/destructors to support
operator overloading in C, since operator overloading needs to destroy
intermediate objects in expressions:
c = (a+b)/(a-b);
when '+' is overloaded, it must create a temporary object. This is
possible using the GC, since you are sure the GC will find the
unneeded object.

Garbage Collection is the invention of the Devil.

It encourages lax design & programming styles.

I hated it in java and I would hate it in C/C++.

We made do with centralised pool allocation/deallocation libraries and
careful matching of alloc/dealloc for years.

Being sreful whenver malloc was called also meant there was some
consideration for when to do it - we tended to be more tight with our
memory uses.

It suddenly seems very popular to assume anything using malloc is as
leaky as a sieve - it is simply not so.

C is C. Lets keep it that way. A real mans language .....

jacob navia · Oct 17, 2006

Richard said:
Garbage Collection is the invention of the Devil.

It encourages lax design & programming styles.

I hated it in java and I would hate it in C/C++.

We made do with centralised pool allocation/deallocation libraries and
careful matching of alloc/dealloc for years.

Maybe you enlighten us?

It is the second time you mention this "pool libraries" for ressource
management.

What do you mean exactly?

How does the algorithm looks like?

Thanks

jacob

Richard · Oct 17, 2006

jacob navia said:
Maybe you enlighten us?

It is the second time you mention this "pool libraries" for ressource
management.

I did? First time I thought, anyway. However you want them to look.

There are laws : if you call them to allocate, you call them to
deallocate. The difference being that during program development you can
log/track accesses and discover the dangling pointers relatively
quickly : one of the main reasons for any level for abstraction. if your
system maintains a pool of runtime objects, allocate them from a
centralised, controlled pool.

Dont get me wrong : its not "very" easy. But then neither is good C programming
or good "any" programming for that

Nothing magic.

Paul Connolly · Oct 20, 2006

Garbage collection + finalisation allows you everything destructors
do with fewer restrictions in the language and less work by
programmers.

everything? so this garbage collection doesn't use any more memory than hand
written malloc/free code, and the application doesn't run slower than hand
written malloc/free code, and there are no relatively long (compared to
free()) pauses while garbage collection goes on?
wow that's phantastic and incredible.

SM Ryan · Oct 20, 2006

#
# message #
# > Garbage collection + finalisation allows you everything destructors
# > do with fewer restrictions in the language and less work by
# > programmers.
#
# everything? so this garbage collection doesn't use any more memory than hand

Then don't use it.

This involves a technical point of language design. Destructors
were added to C++ because it had no garbage collection. The
result is far more complicated language implementation which
can still has implicit storage management activated.

jacob navia · Oct 20, 2006

Paul said:
message

everything? so this garbage collection doesn't use any more memory than hand
written malloc/free code,

Not really. Garbage is automatically collected. You can fine-tune this,
specifying the threshold or call a gc when you think it is a good moment
to do that.

and the application doesn't run slower than hand
written malloc/free code,

No, it doesn't run slower. Allocation is a bit slower since each
allocation does a bit of a GC (incremental GC)

and there are no relatively long (compared to
free()) pauses while garbage collection goes on?

There are pauses, but in modern workstations this is barely noticeable.

wow that's phantastic and incredible.

Write in assembly. That's a *real* language

Jean-Marc Bourguet · Oct 20, 2006

SM Ryan said:
#
# message #
# > Garbage collection + finalisation allows you everything destructors
# > do with fewer restrictions in the language and less work by
# > programmers.
#
# everything? so this garbage collection doesn't use any more memory than hand

Then don't use it.

This involves a technical point of language design. Destructors
were added to C++ because it had no garbage collection. The
result is far more complicated language implementation which
can still has implicit storage management activated.

Destructors do more than releasing memory. And finalisation is not a
good substitute for these tasks as finalisation is not synchronous.

Yours,

Garbage collection problems	89	Nov 18, 2007
advice/thoughts on garbage collection?	20	Feb 13, 2009
Garbage collection	97	Apr 25, 2008
[OT] Looking for garbage collection library	10	Oct 16, 2010
Garbage collection in C	55	Aug 10, 2003
Manual Memory Management and Automatic Garbage Collection	25	Dec 6, 2010
Garbage Collection	4	Jul 3, 2007
Alternative to Malloc in C	0	May 3, 2022

Garbage Collection in C

jacob navia

jacob navia

Richard Heathfield

Richard Bos

Walter Roberson

Keith Thompson

jacob navia

Keith Thompson

Bart

Ben Pfaff

SM Ryan

SM Ryan

jacob navia

Richard

jacob navia

Richard

Paul Connolly

SM Ryan

jacob navia

Jean-Marc Bourguet

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads