garbage collection questions

V

vk02720

Hi,

I am trying to implement garbage collection for C/C++ mainly for
learning purpose. I dont know where to start.

For example -
Basic questions - How do I identify the variables that contain pointers
? Where do I look for these variables and how to tell if what is
assigned to them is a pointer ?

Do I need to implement my own API for malloc/free ( or new/delete ) or
can I create a library that can be linked with the given source code
that uses the standard malloc ? Are there any tradeoffs for either
approach ?

Appreciate any pointers to any info on the web or text books etc.

TIA
 
C

Calum Grant

vk02720 said:
Hi,

I am trying to implement garbage collection for C/C++ mainly for
learning purpose. I dont know where to start.

For example -
Basic questions - How do I identify the variables that contain pointers
? Where do I look for these variables and how to tell if what is
assigned to them is a pointer ?

Do I need to implement my own API for malloc/free ( or new/delete ) or
can I create a library that can be linked with the given source code
that uses the standard malloc ? Are there any tradeoffs for either
approach ?

Appreciate any pointers to any info on the web or text books etc.

TIA

You definitely need to be aware of this:

http://www.hpl.hp.com/personal/Hans_Boehm/gc/
 
L

Luke Meyers

vk02720 said:
I am trying to implement garbage collection for C/C++ mainly for
learning purpose. I dont know where to start.

Not to discourage you, but that's a plenty ambitious project. I humbly
suggest that if you have no idea where to start, your probably biting
off significantly more than you have a reasonable chance of being able
to chew.
For example -
Basic questions - How do I identify the variables that contain pointers
? Where do I look for these variables and how to tell if what is
assigned to them is a pointer ?

Do I need to implement my own API for malloc/free ( or new/delete ) or
can I create a library that can be linked with the given source code
that uses the standard malloc ? Are there any tradeoffs for either
approach ?

Garbage collection cannot simply be glued on to the standard memory
model. There's nothing about the contents of the memory which tells
you who's referencing it. A tool such as a profiler can do some
analysis of this sort, but with an overhead that would be unacceptable
outside of performance testing.

So, the place to start is to understand how the standard memory model
works, in excruciating detail, because what you're talking about is
stripping the whole thing out and replacing it. Still sound like a fun
project? (Okay, I'll confess, it sounds very fun, but I still think
you sound like you don't know what you're in for).
Appreciate any pointers to any info on the web or text books etc.

Andrei Alexandrescu's "Modern C++ Design" has a chapter on writing a
custom allocator. I saw him give a talk recently, though, and he now
disfavors the approach therein in favor of a policy-based design. It's
still instructive, of course. However, the first place I'd start is by
reading all of the relevant material in TC++PL (3rd or Special Ed.) by
Stroustrup.

Luke
 
B

benben

vk02720 said:
Hi,

I am trying to implement garbage collection for C/C++ mainly for
learning purpose. I dont know where to start.

For example -
Basic questions - How do I identify the variables that contain pointers
? Where do I look for these variables and how to tell if what is
assigned to them is a pointer ?

What you can do is, provide functions so that the class implementer can
register the pointers to members. Provide also a virtual base class to
register the object itself.
Do I need to implement my own API for malloc/free ( or new/delete ) or
can I create a library that can be linked with the given source code
that uses the standard malloc ? Are there any tradeoffs for either
approach ?

You can but I doubt you have to...if you are doing heap compacting then yes.
Appreciate any pointers to any info on the web or text books etc.

TIA

Ben
 
R

Rod Pemberton

Calum Grant said:
You definitely need to be aware of this:

http://www.hpl.hp.com/personal/Hans_Boehm/gc/

Thanks for the link. That's the first time I've seen garbage collection for
a compiled language. I've only seen it for interpreted languages...

Rod Pemberton

PS. What's the deal with all the _assholes_ who complain about
_top_posting_? I've never seen so many and so frequent complaints.
Although I usually bottom post, I sometimes top post. Are these guys
Socialist? Communist? I'm Capitalist. I _PAY_ for a good computer. I _PAY_
for an OS. I _PAY_ for good Internet access. I _PAY_ for good usenet
access. I _PAY_ for a good newsreader. Drop the
Socialist/Communist/Asynchronous Dogma and _PAY_ for something...you
Commies.
 
P

Pascal Bourguignon

vk02720 said:
Hi,

I am trying to implement garbage collection for C/C++ mainly for
learning purpose. I dont know where to start.

For example -
Basic questions - How do I identify the variables that contain pointers
? Where do I look for these variables and how to tell if what is
assigned to them is a pointer ?

You have basically two ways:

- either you know them, that is, you are the compiler and you know
where you store the pointers, then you can output this information
in the object files. This is what is done with higher level
languages.

- or you don't know them. Then you cannot know them because stuff like:
{ char storage[100]; ((void*)storage)=malloc(4); } is perfectly valid in C.
So the only think you can do, is to be "conservative" (see Boehm GC),
that is, you scan all the allocated memory, and if you see a bit
pattern that looks like a pointer, then you assume it's a pointer.

What looks like a pointer? Any bitfield that points to a zone where
there is some memory allocated.


To find the variables you must identify the root set: all the
potential pointer in the stack and the global memory.

Do I need to implement my own API for malloc/free ( or new/delete ) or
can I create a library that can be linked with the given source code
that uses the standard malloc ? Are there any tradeoffs for either
approach ?

Well, at a minimum, you'd have to replace free by a version that does
nothing, and malloc by a version that calls the garbage collector from
time to time.
 
S

Shark

vk02720 said:
For example -
Basic questions - How do I identify the variables that contain pointers
? Where do I look for these variables and how to tell if what is
assigned to them is a pointer ?

In theory this is very hard because the internals of the implementation
are hidden. The language people may say: new and malloc allocate on the
heap, but the heap can have different meanings, depending on what data
structures your compiler uses to accomplish the functionality.

If you know what your implementation does for memory
allocation/deallocation (e.g. if you know everything about gcc
compiler) you can make such an attempt. If you just wanna change the
world, try writing a garbage collector for
http://fabrice.bellard.free.fr/otcc/ or
http://fabrice.bellard.free.fr/tcc/ to start with and it will hopefully
give you pointer to work with gcc.
 
C

CBFalconer

vk02720 said:
I am trying to implement garbage collection for C/C++ mainly for
learning purpose. I dont know where to start.

You have already made some fatal mistakes. There is no language
C/C++. There is C, and there is C++. Make up your mind. Do not
cross post between the two languages, because they are different
and have different rules and standards.

Never post with cross-posting without setting followups to the
single appropriate newsgroup.

Always post your efforts.

F'ups set.

--
"If you want to post a followup via groups.google.com, don't use
the broken "Reply" link at the bottom of the article. Click on
"show options" at the top of the article, then click on the
"Reply" at the bottom of the article headers." - Keith Thompson
More details at: <http://cfaj.freeshell.org/google/>
 
A

August Karlstrom

Rod said:
Thanks for the link. That's the first time I've seen garbage collection for
a compiled language. I've only seen it for interpreted languages...

Ever heard of Java?
PS. What's the deal with all the _assholes_ who complain about
_top_posting_? I've never seen so many and so frequent complaints.
Although I usually bottom post, I sometimes top post. Are these guys
Socialist? Communist? I'm Capitalist. I _PAY_ for a good computer. I _PAY_
for an OS. I _PAY_ for good Internet access. I _PAY_ for good usenet
access. I _PAY_ for a good newsreader. Drop the
Socialist/Communist/Asynchronous Dogma and _PAY_ for something...you
Commies.
[end of cliché opinion of stupid American]

Yeah, and let's just ignore the developing countries that don't use the
latest technology etc. So how much are we getting payed for our replies?


August
 
G

Giorgos Keramidas

vk02720 said:
I am trying to implement garbage collection for C/C++ mainly for
learning purpose. I dont know where to start.

You have basically two ways:

- either you know them, that is, you are the compiler and you know
where you store the pointers, then you can output this information
in the object files. This is what is done with higher level
languages.

- or you don't know them. Then you cannot know them because stuff like:
{ char storage[100]; ((void*)storage)=malloc(4); } is perfectly valid in C.

Not really, because storage is not an `lvalue', but garbage collection
in cases like this (when "you don't know all the details about all the
variables") is pretty hard. For instance, it cannot help in bugs like
the following:

void function (void)
{
char storage[4];
memset(storage, 0, 100);
}

But this is not really garbage collection related, I guess.
 
P

Pascal Bourguignon

Giorgos Keramidas said:
vk02720 said:
I am trying to implement garbage collection for C/C++ mainly for
learning purpose. I dont know where to start.

You have basically two ways:

- either you know them, that is, you are the compiler and you know
where you store the pointers, then you can output this information
in the object files. This is what is done with higher level
languages.

- or you don't know them. Then you cannot know them because stuff like:
{ char storage[100]; ((void*)storage)=malloc(4); } is perfectly valid in C.

Not really, because storage is not an `lvalue',

Indeed. I remember having done this kind of thing, but perhaps it has
always been with pointers instead of automatically allocated memory.
Anyways, you can just add a pair of stars to trick the compiler:

% cat m.c ; gcc -S -c -o m.s m.c ; cat m.s
#include <stdlib.h>
int main(void){ char storage[100]; (*(void**)storage)=malloc(4); return(0);}

.file "m.c"
.text
..globl main
.type main, @function
main:
pushl %ebp
movl %esp, %ebp
subl $120, %esp ; allocates char storage[100]
andl $-16, %esp
movl $0, %eax
subl %eax, %esp
subl $12, %esp
pushl $4
call malloc
addl $16, %esp
movl %eax, -120(%ebp) ; store the result of malloc to the
; first four bytes of storage.
movl $0, %eax
leave
ret
.size main, .-main
.ident "GCC: (GNU) 3.3 20030226 (prerelease) (SuSE Linux)"


--
__Pascal Bourguignon__ http://www.informatimago.com/

PLEASE NOTE: Some quantum physics theories suggest that when the
consumer is not directly observing this product, it may cease to
exist or will exist only in a vague and undetermined state.
 
R

Rod Pemberton

August Karlstrom said:
Rod said:
PS. What's the deal with all the _assholes_ who complain about
_top_posting_? I've never seen so many and so frequent complaints.
Although I usually bottom post, I sometimes top post. Are these guys
Socialist? Communist? I'm Capitalist. I _PAY_ for a good computer. I _PAY_
for an OS. I _PAY_ for good Internet access. I _PAY_ for good usenet
access. I _PAY_ for a good newsreader. Drop the
Socialist/Communist/Asynchronous Dogma and _PAY_ for something...you
Commies.
[end of cliché opinion of stupid American]

Yeah, and let's just ignore the developing countries that don't use the
latest technology etc. So how much are we getting payed for our replies?

The offenders in question are American. But since you haven't been paying
attention, you wouldn't know that would you stupid foreigner?. I'm just
tired of reading three (Americans) repeatedly bitch about how other people
post. They need to STFU, and let people reply without being scolded.

Rod Pemberton
 
G

Giorgos Keramidas

Giorgos Keramidas said:
I am trying to implement garbage collection for C/C++ mainly for
learning purpose. I dont know where to start.

You have basically two ways:

- either you know them, that is, you are the compiler and you know
where you store the pointers, then you can output this information
in the object files. This is what is done with higher level
languages.

- or you don't know them. Then you cannot know them because stuff like:
{ char storage[100]; ((void*)storage)=malloc(4); } is perfectly valid in C.

Not really, because storage is not an `lvalue',

Indeed. I remember having done this kind of thing, but perhaps it has
always been with pointers instead of automatically allocated memory.
Anyways, you can just add a pair of stars to trick the compiler:

% cat m.c ; gcc -S -c -o m.s m.c ; cat m.s
#include <stdlib.h>
int main(void){ char storage[100]; (*(void**)storage)=malloc(4); return(0);}

Effectively, this ends up allocating at compile time a storage area of
100 (char) values, and then using only sizeof(void *) bytes of that
storage. Using the allocated memory would require again the same
trickery, to obtain the pointer value.

There's probably no way to 'free' the rest of the original `storage'
bytes, unless one knows already too much about the underlying platform
and the way automatic variables are organized on the stack :-(
 
P

Pascal Bourguignon

Giorgos Keramidas said:
Giorgos Keramidas said:
On Mon, 23 Jan 2006 01:25:14 +0100,
I am trying to implement garbage collection for C/C++ mainly for
learning purpose. I dont know where to start.

You have basically two ways:

- either you know them, that is, you are the compiler and you know
where you store the pointers, then you can output this information
in the object files. This is what is done with higher level
languages.

- or you don't know them. Then you cannot know them because stuff like:
{ char storage[100]; ((void*)storage)=malloc(4); } is perfectly valid in C.

Not really, because storage is not an `lvalue',

Indeed. I remember having done this kind of thing, but perhaps it has
always been with pointers instead of automatically allocated memory.
Anyways, you can just add a pair of stars to trick the compiler:

% cat m.c ; gcc -S -c -o m.s m.c ; cat m.s
#include <stdlib.h>
int main(void){ char storage[100]; (*(void**)storage)=malloc(4); return(0);}

Effectively, this ends up allocating at compile time a storage area of
100 (char) values, and then using only sizeof(void *) bytes of that
storage. Using the allocated memory would require again the same
trickery, to obtain the pointer value.

There's probably no way to 'free' the rest of the original `storage'
bytes, unless one knows already too much about the underlying platform
and the way automatic variables are organized on the stack :-(

My point was that here we have a variable that is declared to hold
only characters, but it actually holds a pointer to some memory block
allocated on the heap. The storage memory is on the stack, as an
automatic variable, so it belongs to the root set (as long as its
function is active, which in the case of main is always). But the
memory allocated on the heap could be subject to garbage collecting.

Now there is no conceptual difference between these two programs (and
possibly no difference at all):

int main(void){ char storage[100];
(*(void**)storage)=malloc(4);
/* here */
return(0);}


int main(void){ char storage[100];
malloc(4); /* could be garbage collected right now */
storage[0]=0x04;
storage[1]=0x12;
storage[2]=0x04;
storage[3]=0x10;
/* here */
return(0);}

At the points marked here, in both case, storage may contain the
pointers to the heap, and the correspond heap memory blocks cannot be
garbage collected.

Since the garbage collector cannot distinguish these cases, it must be
"conservative", and mark the memory blocks corresponding to the
"pointers" any storage contains.
 
V

vk02720

Pascal said:
You have basically two ways:

- either you know them, that is, you are the compiler and you know
where you store the pointers, then you can output this information
in the object files. This is what is done with higher level
languages.

That is exactly what I am trying to find out. Where does the compiler (
or kernel after it loads the program in memory ) store these variables.
How do I get a list of all variables in my program ?
Assume this simple program on Linux. To start, how do I get hold of the
data and stack regions and look into these regions and print the values
assigned to these variables ?

#include <stdio.h>
#include <stdlib.h>

void *ptrglobal;

main()
{
void *ptrlocal = malloc(10);
ptrglobal = malloc(20);
// dump_my_program_vars();
ptrlocal = NULL;
// collect_garbage();
}

- or you don't know them. Then you cannot know them because stuff like:
{ char storage[100]; ((void*)storage)=malloc(4); } is perfectly valid in C.
So the only think you can do, is to be "conservative" (see Boehm GC),
that is, you scan all the allocated memory, and if you see a bit
pattern that looks like a pointer, then you assume it's a pointer.

What looks like a pointer? Any bitfield that points to a zone where
there is some memory allocated.


To find the variables you must identify the root set: all the
potential pointer in the stack and the global memory.

How do I access stack and global memory from within the program ?
 
P

Pascal Bourguignon

vk02720 said:
That is exactly what I am trying to find out. Where does the compiler (
or kernel after it loads the program in memory ) store these variables.
How do I get a list of all variables in my program ?

It's useless to seek this metadata in C!
See my example with { char storage[100];... }


If you want to take the knowledge about the variables from the source
code, you must change the language. Leave C and come to a higher level
programming language. Modula-3, Eiffel, Haskell, Lisp, anything but C
or C++.

- or you don't know them. Then you cannot know them because stuff like:
{ char storage[100]; ((void*)storage)=malloc(4); } is perfectly valid in C.
So the only think you can do, is to be "conservative" (see Boehm GC),
that is, you scan all the allocated memory, and if you see a bit
pattern that looks like a pointer, then you assume it's a pointer.

What looks like a pointer? Any bitfield that points to a zone where
there is some memory allocated.


To find the variables you must identify the root set: all the
potential pointer in the stack and the global memory.

How do I access stack and global memory from within the program ?

I don't know. But if you looked at the sources of BoehmGC which has
already been mentionned, you'd find out...


--
__Pascal Bourguignon__ http://www.informatimago.com/

CONSUMER NOTICE: Because of the "uncertainty principle," it is
impossible for the consumer to simultaneously know both the precise
location and velocity of this product.
 
G

Gordon Burditt

- either you know them, that is, you are the compiler and you know
That is exactly what I am trying to find out. Where does the compiler (
or kernel after it loads the program in memory ) store these variables.
How do I get a list of all variables in my program ?

It is quite probable that you can't, unless the compiler cooperates.
And even then, it won't know about how memory allocated by malloc()
is being used. (In particular, a routine that allocates a block
of memory may not have any idea that its caller several levels up
is casting the pointer to a structure pointer for a linked list node,
then using it as one.)
Assume this simple program on Linux. To start, how do I get hold of the
data and stack regions and look into these regions and print the values
assigned to these variables ?

Compile with debugging information turned on, and learn how to
interpret the debugging information. This is extremely system-specific.
Or write a compiler that puts out debugging information in a format
you know.

#include <stdio.h>
#include <stdlib.h>

void *ptrglobal;

main()
{
void *ptrlocal = malloc(10);
ptrglobal = malloc(20);
// dump_my_program_vars();
ptrlocal = NULL;
// collect_garbage();
}

- or you don't know them. Then you cannot know them because stuff like:
{ char storage[100]; ((void*)storage)=malloc(4); } is perfectly valid in C.
So the only think you can do, is to be "conservative" (see Boehm GC),
that is, you scan all the allocated memory, and if you see a bit
pattern that looks like a pointer, then you assume it's a pointer.

What looks like a pointer? Any bitfield that points to a zone where
there is some memory allocated.


To find the variables you must identify the root set: all the
potential pointer in the stack and the global memory.

How do I access stack and global memory from within the program ?

Load a pointer with a value and dereference it.
A pointer into somewhere in the so-called "stack" can be obtained by
taking the address of an auto variable.
A pointer into somewhere in the global memory area can be obtained
by taking the address of a global variable. Sometimes symbols
like _edata delimit the boundaries of memory sections.

Your own API for malloc/free would need to include something about
what the memory will be used for (will it contain pointers, and
where?). How you do this, I'm not sure.

Gordon L. Burditt
 
F

Flash Gordon

vk02720 said:
That is exactly what I am trying to find out. Where does the compiler (
or kernel after it loads the program in memory ) store these variables.
How do I get a list of all variables in my program ?
Assume this simple program on Linux. To start, how do I get hold of the
data and stack regions and look into these regions and print the values
assigned to these variables ?

<snip>

All this is highly implementation specific. I doubt it is the same on
all Posix systems, let alone all systems of all varieties. So you will
probably have ask in a Linux group. The same applies to the rest of your
questions. Continuing this is certainly off topic for both comp.lang.c
and comp.lang.c++, I'm less certain about comp.unix.programmer so I've
set follow ups there.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,787
Messages
2,569,630
Members
45,338
Latest member
41Pearline46

Latest Threads

Top