SGCL - Garbage Collector for C++

J

James Kanze

Sam said:
James Kanze writes: [snip]
Garbage [snip]
collection is essential in Java, because all objects must be
allocated dynamically, and user defined types can't have value
semantics. It's optional in C++, because we do have value
semantics. Optional, but it still saves the programmer some
unnecessary work.
No, it's not "optional" in C++. It does not exist, at all,
in C++. Go look up the "optional" part of ISO/IEC 14882:2003
that defines whatever you think "garbage collection" means
in the C++ context. I eagerly await the results of your
search.
I don't think that is entirely correct.

It's not correct at all. However...
The C++ standard is worded very cautiously with regard to new
and delete to leave a tremendous amount of freedom to the
implementation. I think, a compliant implementation could use
garbage collection. In particular, I do not know of any
language in the C++ standard that would require the following
program to ask the execution environment for an unbounded
amount of memory:
int main ( void ) {
while ( true ) {
new int ( 5 );
}
}

There are a few legal constructs in C++ which do cause problems
with garbage collection. (They don't occur in any reasonable
real programs, but they are formally there.) When I said it was
"optional", I was thinking in terms of the way threading is
optional. The standard doesn't really define it, and you do
need to fudge a few things to make it fit, but in practice, it's
available to anyone that wants it, and it's actually widely used
in C++ applications already. Just like threading.
 
J

James Kanze

[...]
There is no garbage collection in the "C++ object model", and
yes, I plead guilty to failing to understand completely the
incompetence and lack of experience that would require someone
to resort to such an ugly hack in order to accomplish anything
of significance.

How is it that this doesn't surprise me. Is there anything you
do understand, except for spouting off insulting words about
things you don't understand? So far, you've shown that you
don't know C++, that you don't know Java and that you haven't
the slightest knowledge of software engineering, or even what it
means.
No, it's not "optional" in C++. It does not exist, at all, in C++.

Funny. I use it every day in C++. (Just like I use threads,
and a number of other things which are "optional".)

(I've cut the rest, since the author has stooped to outright
lying, instead of just being intentionally obtuse.)
 
I

Ian Collins

Jerry said:
(e-mail address removed)>, (e-mail address removed)
says...

[ ... ]
Yes. In practice, the C++ committee is at a juncture.
Independantly of any technical considerations, if the next
version of the standard doesn't support garbage collection and
threads, C++ is dead in the water---it will end up in the same
state as C, with a lot of older people used to it who continue
to use it, and argue in favor of it, because it's what they
know, but with less and less new applications being implemented
in it. (Alternatively, the major compiler vendors will simply
copy Microsoft, implement CLI, and call it C++. And the ISO
standard becomes irrelevant. And that's not a scenario I want
to see.)

N2287 attempts (but IMO, mostly fails) to do that. Truly supporting
garbage collection is, IMO, extraordinarily difficult. The problem is
that garbage collection is intended to have no externally visible effect
on the program. Since the standard's requirements are written in terms
of externally visible effects, actually requiring garbage collection is
essentially impossible. The closest N2287 comes is a non-normative note:

[Note: For garbage collected programs, a high quality hosted implemen-
tation should attempt to maximize the amount of unreachable memory it
reclaims. —end note ]
It appears to me that this requires only one additional restriction:
that the program not depend in any way on the bit patterns stored in
pointers, such as hashing a pointer and later assuming that a pointer to
the same data will necessarily hash to the same value. We should also
make it undefined behavior to take a pointer value, store it into a non-
pointer type, and convert it back to a pointer, and then dereference the
result. In fact, I'm not sure that's really guaranteed to work now
(since there may not be any other type capable of holding a pointer
value) -- but I think it should be made explicit that it produces
undefined behavior. At the same time, pointers would retain their
original ordering (e.g. with an array, or as defined by std::less for
pointers to independent objects).
One other case I'm not sure about with C++ (or even C) with garbage
collection is closure where a dynamically created member of a class
contains a pointer to it's containing class. My guess is this is one
case where the garbage collection would not be appropriate. A similar
problem blights the life of JavaScript programmers who have the
misfortune of targeting IE6.
 
R

REH

One other case I'm not sure about with C++ (or even C) with garbage
collection is closure where a dynamically created member of a class
contains a pointer to it's containing class. My guess is this is one
case where the garbage collection would not be appropriate. A similar
problem blights the life of JavaScript programmers who have the
misfortune of targeting IE6.

This is easily handled by garbage collectors that do not use reference
counting. I have written one myself that handles this example.

REH
 
J

James Kanze

One other case I'm not sure about with C++ (or even C) with garbage
collection is closure where a dynamically created member of a class
contains a pointer to it's containing class. My guess is this is one
case where the garbage collection would not be appropriate. A similar
problem blights the life of JavaScript programmers who have the
misfortune of targeting IE6.

I'm not sure I understand the problem. What you seem to be
describing is self referencing---an object contains a pointer to
itself. That doesn't cause any problems whatsoever for the
garbage collectors I'm familiar with (although it does cause a
lot of problems with the smart pointers I know).
 
I

Ian Collins

James said:
I'm not sure I understand the problem. What you seem to be
describing is self referencing---an object contains a pointer to
itself. That doesn't cause any problems whatsoever for the
garbage collectors I'm familiar with (although it does cause a
lot of problems with the smart pointers I know).
No, something like

struct Parent
{
struct Child
{
Parent* parent;

Child( Parent* parent ) : parent(parent) {}
};

Child* child;

Parent() : child(new Child(this)) {}
};
 
S

Sam

James said:
"optional", I was thinking in terms of the way threading is
optional. The standard doesn't really define it, and you do
need to fudge a few things to make it fit, but in practice, it's
available to anyone that wants it, and it's actually widely used
in C++ applications already. Just like threading.

Indeed. The hundreds of large, multithreaded, applications that I've worked
with, in a business setting, for the last 10+ years, they also happened to
always use some sophisticated garbage-collection algorithm. What a
coincidence!

Not.



-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (GNU/Linux)

iD8DBQBHzeZ1x9p3GYHlUOIRAo9YAJ9Y+sP4ugJwDs0cCRuZvsx1190SIwCfRbcF
5mP/1BH0E863tukLG6TyH20=
=JB06
-----END PGP SIGNATURE-----
 
S

Sam

James said:
James said:
James Kanze writes:
[...]
There is no garbage collection in the "C++ object model", and
yes, I plead guilty to failing to understand completely the
incompetence and lack of experience that would require someone
to resort to such an ugly hack in order to accomplish anything
of significance.

How is it that this doesn't surprise me. Is there anything you
do understand, except for spouting off insulting words about
things you don't understand? So far, you've shown that you
don't know C++, that you don't know Java and that you haven't
the slightest knowledge of software engineering, or even what it
means.

Yes, I'm going to definitely have trouble sleeping tonight, based on such an
insightful evaluation of my technical skills. I'm willingly pleading guilty
to completely missing the garbage collection part of ISO/IEC 14882:2003,
perhaps you'll be kind enough to share your vast storehouse of knowledge,
and point me to the relevant parts?
Funny. I use it every day in C++. (Just like I use threads,
and a number of other things which are "optional".)

I'm not surprised. Two years ago, I stabilized and cleaned up a rather
horrible C++ app by:

1) Dumping all usage of Boost, and replacing it with STL.

2) Dumping all kind of crappy threaded code, that had more race conditions,
and holes, than Swiss cheese, and was completely unnecessary. That
particular function, that the dope used threads for, got reimplemented in a
much simpler, and cleaner, fashion using an event-driven paradigm.

Your heralding of the use of threads really does not mean anything. Besides,
threads do give you the ability to do things that are otherwise hard to do
in other ways. Unfortunately, the same cannot be said of so-called "garbage
collection".
(I've cut the rest, since the author has stooped to outright
lying, instead of just being intentionally obtuse.)

Your high horse is very tired. Give the poor beast some rest, will ya?


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (GNU/Linux)

iD8DBQBHzekbx9p3GYHlUOIRAhXLAJ99T56yveC7rSAo1tZyoSC/YdoBRwCeKS0L
KEt6cAn37EUAnsZhWsvLvJY=
=sHae
-----END PGP SIGNATURE-----
 
J

James Kanze

No, something like
struct Parent
{
struct Child
{
Parent* parent;

Child( Parent* parent ) : parent(parent) {}
};
Child* child;
Parent() : child(new Child(this)) {}
};

But that's just an everyday cycle. Why would it cause any
particular problems for a garbage collector? (All modern
garbage collection algorithms handle cycles correctly.)

It is, in fact, an example of the sort of case in which garbage
collection is an important work saver. With boost::shared_ptr,
you'd have to break the cycle somehow, using a boost::weak_ptr
for one of the pointers. Which means you have to take the time
to think about which one; in some cases, the answer might be
trivially obvious, in others no. And that in more complicated
relational networks, it might be very difficult to find a right
answer.
 
I

Ian Collins

James said:
But that's just an everyday cycle. Why would it cause any
particular problems for a garbage collector? (All modern
garbage collection algorithms handle cycles correctly.)
That's all I wanted to know.

Similar constructs in JavaScript defeat the IE6 garbage collector.
 
P

Peter

struct node
{
    node* next;

};

node* root = new node;
node* n = root;

for (int x = 0; x < 1000000; ++x)
{
    n = n->next = new node;

}

How to release memory?


#include <boost/shared_ptr.hpp>


struct CNode
{ boost::shared_ptr<CNode> m_sNext;
};


int main(int argc, char **argv)
{ boost::shared_ptr<CNode> sRoot(new CNode);
CNode *p = sRoot.get();

for (size_t i = 0; i < 100; i++)
p = (p->m_sNext = boost::shared_ptr<CNode>(new CNode)).get();

return 0;

}
 
J

Jon Harrop

James said:
I'm sorry, but it's frustrating. Somewhat upthread, someone
said "recursion", doubtlessly ironically, since I do presume
that everyone would realize that explicitly recursing 1000000
times was a bad idea.

If you think there's something wrong with recursion then there's something
wrong with your chosen language(s) and implementation(s)...
 
J

Jerry Coffin

[ ... ]
No, something like

struct Parent
{
struct Child
{
Parent* parent;

Child( Parent* parent ) : parent(parent) {}
};

Child* child;

Parent() : child(new Child(this)) {}
};

Shouldn't be a problem for anything worthy of being called a garbage
collector at all.

At its most basic, a garbage collector starts from a root set (basically
auto and static variables) and chases any pointers it finds in those,
recursive, until it can't find any pointers it hasn't already chased.
Each object it encounters is marked as being in use.

With a cycle like the above, when there are no more pointers in the root
set to provide access to the struct, it gets collected -- its internal
structure is basically irrelevant at that point (except, possibly, to do
things like finalization on the objects being collected).
 
J

Jerry Coffin

[ ... ]
A non-normative note still creates expectations with regards to
quality of implementation.

In that case I think they should add the note, and enough rules to allow
a garbage collector to work, and forget about the rest of it...
Technically, I think we probably can live without it, although
it seems stupid to do so.

I disagree. Garbage collection enables quite a bit in the design of a
language, but I see _very_ little gain from it in a language that wasn't
designed with GC in mind from the beginning. Yes, there are a few
situations in C++ that would benefit to some minimal degree from GC, but
only a few, and only minimally at that.

Since I've also used (and, for that matter, implemented) a number of
other languages that always have GC (Scheme, Smalltalk, etc.), when I
first found a GC for C++ I was quite enthused. Using it changed my mind
though: I now put a garbage collector for it in about the same class as
a goto. Neither is exactly evil, but well designed code doesn't seem (to
me) to really benefit from either.
I think that non-technical issues
mean that without threads and garbage collection, the language
is doomed to a slow death, much like C today.

I agree about supporting threads -- but I don't see much relationship
between threads and GC.
What do you mean by a _good_ garbage collector?

I probably should have said "decent" or something on that order -- I
wasn't trying to establish what good was, only that we shouldn't start
by restricting it to such a degree that it can't really improve on
what's available right now.

A necessary (but probably not sufficient) condition would be that the GC
is allowed to compact the heap. This at least allows really fast
allocations, and eliminates the possibility of death by fragmentation.
Practically
speaking, the presense of undiscriminated unions means that
perfect garbage collection is probably not in the cards.
(Formally speaking, accessing anything but the last written
member of a union is undefined behavior, and the standard allows
an implementation to maintain a hidden discriminator. In
practice, any implementation which actually did this would break
so much code as to be inviable.) But there are mostly accurate
collections which are used with C++, see for example
http://www.hpl.hp.com/techreports/Compaq-DEC/WRL-88-2.pdf.

I'm not after perfect, or really anything close. I just think if we
restrict the language to allow for garbage collection that we at least
impose enough restrictions that it can be useful.

Looking at it in terms of the classic mark/sweep collector, you seem to
be concerned primarily with the mark phase, and the degree to which
garbage can/will be recognized. I'm more concerned with the sweep phase,
and ensuring that when it's done, it's really done some good. The fact
that there might be some unused blocks still marked as being in use is
nearly a given with GC, and increasing that percentage doesn't bother me
a lot. Having a heap that's still fragmented, and a heap manager that
still has to search through lists of free blocks to do an allocation
bothers me quite a bit more.
That's not guaranteed today, and I've used implementations where
you had to take particular precautions when hashing a pointer,
because different pointers to the same data could have different
bit patterns. (Think of 8086 compilers in model huge, which
only normalized a pointer when necessary.)

Right -- at best it's not clear how much you can really expect today.
I'm pretty sure code that depends on it is vanishingly rare.
It's guaranteed to work today, if there is a sufficiently large
integral type.

Right -- but there's no guarantee that there will be a sufficiently
large integral type, so code that does anything of the sort isn't
portable anyway.
It does mean that a conservative collector is
required. Or a "mostly-accurate" one---the mostly accurate
collectors identify cases where this occurs, and treat them
conseratively, while treating the rest accurately (including
copying, in some cases).

Right -- this is what I was thinking of when I contrasted a GC that's
purely in the library to a GC with compiler support. The compiler needs
to be aware of all the casts from pointer to integer, unions that
include pointers, and so on, making it fairly trivial for it to support
the GC in dealing with those. Implementing the GC entirely in the
library loses that visibility.

[ ... ]
(On another note: I think I'll write up a proposal real quick
for a standard hash function for pointers, since it's something
that a user can't write in an architecture independent manner.)

Putting it into the standard library also makes it trivial to ensure
that it keeps working when/if blocks of memory get moved around. Just
for one obvious possibility, the original address of the memory block
can be stored in a hidden location in the memory block and always used
for the hash, even when/if the block moves.

Of course, that requires that no other block of memory be allocated at
the same address, but (especially with a 64-bit address range) the same
trick that ensures that ordering of addresses also prevents reuse of
addresses (when you copy the data out of a block and start to reuse the
memory, you assign it a new range of addresses. As long as you
consistently either increment or decrement the addresses, you maintain
both order and uniqueness).
 
J

Jerry Coffin

[ ... ]
It would break some of my code which hashes on a pointer.

I would have been surprised if there wasn't any code in the world it
would break -- but I think it's also a pretty easy break to fix. As I
mentioned elsethread, it seems like the obvious thing to do is store the
original address, and always use it for the hash. A hash function in the
standard library could make this entirely transparent.
 
J

James Kanze

There's already one in C++0x. The template std::hash has a
specialization for std::hash<T*>, which gives you a callable type.

Excellent. What about std::hash<type_info>?
 
J

James Kanze

[ ... ]
Technically, I think we probably can live without it, although
it seems stupid to do so.
I disagree. Garbage collection enables quite a bit in the
design of a language, but I see _very_ little gain from it in
a language that wasn't designed with GC in mind from the
beginning. Yes, there are a few situations in C++ that would
benefit to some minimal degree from GC, but only a few, and
only minimally at that.
Since I've also used (and, for that matter, implemented) a
number of other languages that always have GC (Scheme,
Smalltalk, etc.), when I first found a GC for C++ I was quite
enthused. Using it changed my mind though: I now put a garbage
collector for it in about the same class as a goto. Neither is
exactly evil, but well designed code doesn't seem (to me) to
really benefit from either.

You seem to be expecting a lot more from garbage collection than
I'm asking. I don't want to change the basic paradigms of C++.
I want garbage collection for two fairly simple reasons:

-- So that I have less code to write in some specific cases:
dynamic graphs would be an obvious case, where correctly
managing memory can be extremely difficult otherwise, but
more generally, there are a number of cases where, once the
design is done, you could just say, let the garbage
collector take care of the rest.

-- So that I can add error checking code to catch dangling
pointers to objects with specific lifetimes. Knowing that
the memory won't be recycled as long as there is a pointer
to it allows flagging the objects as invalid, and asserting
that the object is valid in member functions.

Simple, conservative collectors, such as the Boehm collector,
are largely sufficient for this.
 
J

James Kanze

If you think there's something wrong with recursion then
there's something wrong with your chosen language(s) and
implementation(s)...

If you think you can recurse a 1000000 times in C or in C++,
you've got another think coming. It just won't work, at least
with the implementations I've got access to.
 
J

James Kanze

#include <boost/shared_ptr.hpp>
struct CNode
{ boost::shared_ptr<CNode> m_sNext;
};
int main(int argc, char **argv)
{ boost::shared_ptr<CNode> sRoot(new CNode);
CNode *p = sRoot.get();

for (size_t i = 0; i < 100; i++)
p = (p->m_sNext = boost::shared_ptr<CNode>(new CNode))..get();

return 0;
}

Core dumps on Solaris, with g++. I expect that it core dumps
most of the time, with most implementations.
 
I

Ioannis Vranos

I think we must follow the approach of C++/CLI (if not adopting it):
Separation of native types and new "managed" types, both having their
own kinds of pointers, and both being able to be intermixed.

Native types work as usual, managed types have two types of destructors,
the usual one for deterministic destruction and one (called finalizer)
for GC-time destruction which is called if the destructor has not been
called already.

The native pointers (*) behave as usually, the managed pointers (^),
called handlers, have some restrictions imposed by the VM (for example
not considering their value as permanent since the GC can move them
around, unless we explicitly "pin" them, etc).


I think trying to make the existing native types to work under a VM with
all their features is impossible. I think the general approach of
C++/CLI is the perfect one (of course one may disagree with some details
of it).
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads


Members online

Forum statistics

Threads
473,767
Messages
2,569,572
Members
45,046
Latest member
Gavizuho

Latest Threads

Top