Could you please give me some advice on GC in C?


B

byang

Hi,
I am now designing a library in C, and the libary dynamically
allocate much memory, and now I use reference couting to deal with
memory alloc/free. I mean, the client of this libary should call unref
() for many pointer of the data structure of the library. And this ref/
unref interface impose more additional task for client programmers.
But I am wondering is there a better way? So, I goole for "gc for C",
and got a mark-sweep collector (http://www.hpl.hp.com/personal/
Hans_Boehm/gc/). Could anybody here please give me some advice on GC
of the library? Thanks a lot!


Regards!
Bo
 
Ad

Advertisements

G

Guest

Hi,
    I am now designing a library in C, and the libary dynamically
allocate much memory, and now  I use reference couting to deal with
memory alloc/free. I mean, the client of this libary should call unref
() for many pointer of the data structure of the library. And this ref/
unref interface impose more additional task for client programmers.
But I am wondering is there a better way? So, I goole for "gc for C",
and got a mark-sweep collector (http://www.hpl.hp.com/personal/
Hans_Boehm/gc/). Could anybody here please give me some advice on GC
of the library? Thanks a lot!

GC isn't part of the standard. But the Boehm collector is the best
known one. lcc-win which is implemented by a regular on clc has a GC.
It may also use the Boehm collector. I dunno. Ask on an lcc-win
specific
ng or look for their website for more details.
 
J

jacob navia

byang said:
Hi,
I am now designing a library in C, and the libary dynamically
allocate much memory, and now I use reference couting to deal with
memory alloc/free. I mean, the client of this libary should call unref
() for many pointer of the data structure of the library. And this ref/
unref interface impose more additional task for client programmers.
But I am wondering is there a better way? So, I goole for "gc for C",
and got a mark-sweep collector (http://www.hpl.hp.com/personal/
Hans_Boehm/gc/). Could anybody here please give me some advice on GC
of the library? Thanks a lot!


Regards!
Bo
The lcc-win C compiler proposes the GC in the standard distribution.
It is running in the 32 bit version and has been ported in the 64 bit
version.

Under the linux OS there is gc library easily avilable. The software
from Boehm runs also in a wide variety of environments.

Actually it is VERY EASY to use. Just replace all malloc calls by
GC_malloc, and drop all calls to free().

You can download the lcc-win compiler at the URL below
 
C

CBFalconer

George said:
.... snip ...

I have found leaks with these methods in a lot of code. It
sometimes helps to step through the malloc and free calls with a
debugger to see what is going on. Some systems also have better
tools, and there are some decent free tools like Valgrind.

Take a look at nmalloc, which is built for use with DJGPP. It
includes a full set of debuggery, which will show you faults,
unfreed areas, etc. If you can adapt it to your system go ahead.
See:

<http://cbfalconer.home.att.net/download/nmalloc.zip>
 
G

Gene

Hi,
    I am now designing a library in C, and the libary dynamically
allocate much memory, and now  I use reference couting to deal with
memory alloc/free. I mean, the client of this libary should call unref
() for many pointer of the data structure of the library. And this ref/
unref interface impose more additional task for client programmers.
But I am wondering is there a better way? So, I goole for "gc for C",
and got a mark-sweep collector (http://www.hpl.hp.com/personal/
Hans_Boehm/gc/). Could anybody here please give me some advice on GC
of the library? Thanks a lot!

Reference counting is probably the simplest way to gc library-
allocated objects. However, it has the worst performance among gc
algorithms. You could consider implementing your own gc within the
library. It's pretty easy to write a simple mark-and-sweep or even a
2-space copying collector on top of malloc and free. The ugly part is
that your library caller must register/unregister gc roots. This may
be no worse than reference counting, however. To see one way of doing
this, look at the code for emacs.

Boehm's collector is a beautiful idea, but, as others have said, it
has significant practical problems. Code like

char *s = malloc(1000);
scanf("%s", s);
int len = 0;
while (*s++) {
malloc(1 << 27); // induce a gc
len++;
}
printf("%d\n", len);

will fail with Boehm because the induced gc will not find any pointer
to the start of the 1000 char string, and the string will be
collected. Other more subtle problems occur in less contrived code
due to aggressive optimizations by compilers.
 
J

jacob navia

Gene said:
Boehm's collector is a beautiful idea, but, as others have said, it
has significant practical problems. Code like

char *s = malloc(1000);
scanf("%s", s);
int len = 0;
while (*s++) {
malloc(1 << 27); // induce a gc
len++;
}
printf("%d\n", len);

will fail with Boehm because the induced gc will not find any pointer
to the start of the 1000 char string, and the string will be
collected.

This is completely wrong.

I compiled following program:
#include <gc.h>
int main(void)
{
char *s = GC_malloc(1000);
scanf("%s", s);
int len = 0;
while (*s++) {
GC_malloc(1 << 27); // induce a gc
len++;
}
printf("%d\n", len);
}

And the following run happened:
d:\lcc\mc76\test>tgc
abc
3

Any pointer to the inside part of a block will avoid it being collected
by the gc.



Other more subtle problems occur in less contrived code
due to aggressive optimizations by compilers.


Yeah sure.

But maybe you can give an example?
 
Ad

Advertisements

J

jacob navia

George said:
From what I recall, Boehm also mentions problems with his GC in some papers.

This is utterly vague. Which papers please?
There are certain structures that can lead to memory being retained. Boehm
has a paper that suggests some changes he would make to C to improve the
accuracy of a GC for C, but these changes require compiler support.

What paper?
Most C environments aren't designed for GC, and a conservative collector can
easily mistake a random integer or part of an integer as the case may be,
as a pointer.

That will be collected next gc if the integer is no longer in the stack
Moreover, this can happen randomly, so you might have a user
complaining about your program or library retaining a 1 GB allocation, and
you might never be able to duplicate that, unless you duplicate the exact
usage patterns of the user. That means pressing a button or key at a
specific time, and so on. It's really not the way to go for reliable
software, unless you have the entire environment setup to help the GC (some
systems do this though).

The gc software has been running with lcc-win since 4-5 years at least.
Your vague allegations doesn't cut it. Any specific example?
If you also work with memory mapped files (with mmap and munmap used to work
with segments of the file), or send pointers over sockets, you would find
those objects can lose all references and are thus freed for reuse, or back
to the system.

Yes, this is described in the documentation.

Why should use the example of the pointer being written to a file
then retrieved,an example that the regulars here use since ages.

GC doesn't eliminate all unbounded growth patterns. You can still have
unbounded growth from errors in array growth algorithms, and abstract data
structures not having their elements pruned. In other words: GC doesn't
fix the flaws in your program, and sometimes it makes them more complicated
to track down.

GC doesn't fix the flaws of your program. True. And doesn't make you the
coffee. You STILL have to walk to the coffee machine (shudder).
 
I

Ian Collins

jacob said:
The gc software has been running with lcc-win since 4-5 years at least.
Your vague allegations doesn't cut it. Any specific example?
Speaking as one who has used a GC with C (to keep a leaky binary only
application running and to check for memory leaks in better ones), I
agree with Jacob. I've never seen any problems.
 
C

CBFalconer

jacob said:
George Peter Staplin wrote:
.... snip ...


Yes, this is described in the documentation.

Why should use the example of the pointer being written to a file
then retrieved,an example that the regulars here use since ages.


GC doesn't fix the flaws of your program. True. And doesn't make
you the coffee. You STILL have to walk to the coffee machine
(shudder).

#include <stdlib.h>

void longtime(void) {
/* this takes 1/2 hour to execute */
return;
}

int main(void) {
char *ptr;

if (!(ptr = malloc(128))) exit(EXIT_FAILURE);
ptr += 128;
longtime();
ptr -= 128;
*ptr = 'A';
free(ptr);
return 0;
}

So, after coding longtime() to agree with the comment, do you claim
this program is flawed? Admitted, I had to work at it.
 
B

Ben Bacarisse

CBFalconer said:
#include <stdlib.h>

void longtime(void) {
/* this takes 1/2 hour to execute */
return;
}

int main(void) {
char *ptr;

if (!(ptr = malloc(128))) exit(EXIT_FAILURE);
ptr += 128;
longtime();
ptr -= 128;
*ptr = 'A';
free(ptr);
return 0;
}

So, after coding longtime() to agree with the comment, do you claim
this program is flawed? Admitted, I had to work at it.

What is this code meant to show? You might want to pick an example
where all the ptr stuff can't be removed by the optimiser (or was that
part of the point?).

I have a feeling it shows you don't know how GC would work in C.
 
C

CBFalconer

Ben said:
What is this code meant to show? You might want to pick an example
where all the ptr stuff can't be removed by the optimiser (or was that
part of the point?).

I have a feeling it shows you don't know how GC would work in C.

Possibly true. I am assuming the GC system scans memory for a
pointer to the memory area. If it doesn't find one, it frees that
memory block. ptr+128 points just past the block, so is
legitimate. It could well point to the next block assigned. It is
not dereferenced, so that does not create an error. So I expect
that during the execution of longtime() the allocated block will be
freed. The access by *ptr (which is valid) will fail. The free
(which is legitimate) will fail.

Am I wrong in this analysis? If so, where.
 
Ad

Advertisements

F

Flash Gordon

George said:
I believe you're right with a conservative GC. You might need to run some
other allocations to cause the GC to collect. As far as the GC is
concerned though that + 128 is beyond the allocation, so the memory
reference stored by char *ptr would no longer refer to the memory from the
malloc call, or even a part of it. It depends on the implementation of the
GC, and any fudge factors they might use.

I would expect a conservative GC to NOT free a block with a pointer to
one past it for the simple reason that it CAN be a valid pointer for
that block. It may indeed cause it t not free another block. That is why
conservative GCs are called "conservative"!
 
J

James Kuyper

CBFalconer said:
Do you have any idea how big an imbecile that makes you appear?

I agree that it's an imbecilic argument that he's referring to; but it's
your argument, not his. All RH is trying to do is help you understand
why that is the case. That makes him look foolish only insofar as that
seems to be a doomed cause.
 
B

Ben Bacarisse

CBFalconer said:
Possibly true. I am assuming the GC system scans memory for a
pointer to the memory area. If it doesn't find one, it frees that
memory block. ptr+128 points just past the block, so is
legitimate. It could well point to the next block assigned. It is
not dereferenced, so that does not create an error. So I expect
that during the execution of longtime() the allocated block will be
freed. The access by *ptr (which is valid) will fail. The free
(which is legitimate) will fail.

Am I wrong in this analysis? If so, where.

I think it is wrong to criticise the technique by assuming an
incorrect implementation. I can't image a real GC getting this
wrong. It must consider the one-past-the-end pointer to be a valid
pointer and must not free the 128 byte block. Do you have any evidence
that any GC makes such a basic error?

The Boehm GC does not even touch malloc'd memory, BTW.
 
B

Ben Bacarisse

Chris Dollin said:
Are you sure? Because, what else is there to GC?

You allocate GC memory using GC_malloc (or, better, the macro
GC_MALLOC) and, of course, one can arrange that this be a replacement
for malloc but I understood the two heaps were usually separate.

From http://www.hpl.hp.com/personal/Hans_Boehm/gc/simple_example.html

Interaction with the system malloc

It is usually best not to mix garbage-collected allocation with the
system malloc-free. If you do, you need to be careful not to store
pointers to the garbage-collected heap in memory allocated with the
system malloc.
 
Ad

Advertisements

C

CBFalconer

James said:
I agree that it's an imbecilic argument that he's referring to; but it's
your argument, not his. All RH is trying to do is help you understand
why that is the case. That makes him look foolish only insofar as that
seems to be a doomed cause.

No, he deliberately snipped the portion that shows his imbecility,
below:
^^^^^

Which pointed out that the code would require modification.
 
C

CBFalconer

Ben said:
I think it is wrong to criticise the technique by assuming an
incorrect implementation. I can't image a real GC getting this
wrong. It must consider the one-past-the-end pointer to be a valid
pointer and must not free the 128 byte block. Do you have any evidence
that any GC makes such a basic error?

The Boehm GC does not even touch malloc'd memory, BTW.

Where am I assuming even an incorrect implementation? All I am
assuming is a method of deciding whether a pointer to anywhere
within an allocated block exists. I could probably replace the
+128 with writing a hex version of the pointer value, and restoring
it later, which avoids all arguments about how clost the possible
pointer must be.
 
C

CBFalconer

Ben said:
You allocate GC memory using GC_malloc (or, better, the macro
GC_MALLOC) and, of course, one can arrange that this be a
replacement for malloc but I understood the two heaps were
usually separate.

You can't do that, because malloc has to be system sensitive. It
has to know things not directly available, such as where to get
memory, alignment requirements, etc.
 
Ad

Advertisements

J

jameskuyper

CBFalconer said:
No, he deliberately snipped the portion that shows his imbecility,
below:

^^^^^

Which pointed out that the code would require modification.

Comments like that were never effective at getting you to recognize
the existence of non-standard things; why should someone parodying
your attitudes in order to demonstrate how ridiculous they are be any
more reasonable about such things than you are?
 

Top