C++0x Garbage Collection

G

Goalie_Ca

I have been reading (or at least googling) about the potential addition
of optional garbage collection to C++0x. There are numerous myths and
whatnot with very little detailed information.

Will this work be library based or language based and will it be based
on that of managed C++? Then of course there are the finer technical
questions raised (especially due to pointer abuse). Is a GC for C++
just a pipe dream or is there a lot of work in the committee to realise
it.
 
R

Roland Pibinger

I have been reading (or at least googling) about the potential addition
of optional garbage collection to C++0x. There are numerous myths and
whatnot with very little detailed information.

C++ needs no garbage collection because it offers something better:
deterministic resource management with destructors. You should have
googled for 'RAII' instead of 'garbage collection'. Actually, RAII is
the 'unique selling proposition' of C++. Introducing GC into C++ would
only make it a worse Java or C#.

Best wishes,
Roland Pibinger
 
J

Joe Seigh

Goalie_Ca said:
I have been reading (or at least googling) about the potential addition
of optional garbage collection to C++0x. There are numerous myths and
whatnot with very little detailed information.

Will this work be library based or language based and will it be based
on that of managed C++? Then of course there are the finer technical
questions raised (especially due to pointer abuse). Is a GC for C++
just a pipe dream or is there a lot of work in the committee to realise
it.

Part of the problem is there are different forms of GC. Even for tracing
GC there are enough differences that being able to plug in at the library
level might be a little problematic. Whatever solution they pick is not
likely to be neutral as far as the alternative solutions are concerned.
 
J

Jerry Coffin

I have been reading (or at least googling) about the potential addition
of optional garbage collection to C++0x. There are numerous myths and
whatnot with very little detailed information.

Will this work be library based or language based and will it be based
on that of managed C++? Then of course there are the finer technical
questions raised (especially due to pointer abuse). Is a GC for C++
just a pipe dream or is there a lot of work in the committee to realise
it.

There were a couple of lengthy threads about this in
comp.lang.c++.moderated. See:

http://tinyurl.com/s57fq

and:

http://tinyurl.com/p5op2

For starters. When I said lengthy, I wasn't kidding though -- reading
through all this will take considerable time (and this thread will
probably echo many of the same arguments).
 
G

Goalie_Ca

Thanks for those threads. I can see that in general people are divided
at every level on how to approach this problem. To me, the outsider, it
appears that it will not likely make this revision although there is
clearly support for having one included.
 
R

Ron House

Goalie_Ca said:
Thanks for those threads. I can see that in general people are divided
at every level on how to approach this problem. To me, the outsider, it
appears that it will not likely make this revision although there is
clearly support for having one included.

My question is a simple one: how do we combine destructors with GC?
Destructors do not become superfluous just because one usage for them
does. Closing files and shutting down other resources in a timely manner
becomes hard when object termination occurs at an indeterminate time. Is
there a way to do this that is sufficiently practical and efficient to
raise no major objections if put into a standard? It is one thing for an
add-on to do it, as we use or not use the add-on to our own liking; but
a standard is another matter.
 
J

Jerry Coffin

My question is a simple one: how do we combine destructors with GC?

One of the threads I previously cited was titled "Reconciling Garbage
Collection with Deterministic Finalization". Even with little or no
knowledge of the subject matter, the mere fact that the thread went
on for well over 300 posts tends to show that nobody has a really
solid answer for that.
 
A

Alf P. Steinbach

* Ron House:
My question is a simple one: how do we combine destructors with GC?
Destructors do not become superfluous just because one usage for them
does. Closing files and shutting down other resources in a timely manner
becomes hard when object termination occurs at an indeterminate time. Is
there a way to do this that is sufficiently practical and efficient to
raise no major objections if put into a standard? It is one thing for an
add-on to do it, as we use or not use the add-on to our own liking; but
a standard is another matter.

It's very simple: object destruction is not memory reclamation, and
memory reclamation is not object destruction.

The job of a garbage collector is solely to reclaim memory.

If an object has a non-trivial destructor, one with possible side
effects, then that object cannot be automatically destroyed by a garbage
collector in order to reclaim memory, because then the garbage collector
would intrude in the arena of program logic and correctness.

Thus, a C++ garbage collector compatible with the current standard can
reclaim a region of memory only when:

A. There are no live references to the region (circular references
are not live), and

B. All remaining objects in the region have trivial destructors.

The case where there are no remaining objects in the region (i.e. all
have been destroyed) might seem to be of no practical advantage, but it
is if ::eek:perator delete, rather than deallocating at once, just invokes
object destruction and marks the memory for later automatic reclamation,
which can proceed e.g. when the program's later waiting for user input.

This reduces garbage collection in C++ to an /optimization/ and /memory
leak slurper/, not affecting correctness except to the degree it slurps.
 
J

Joe Seigh

Jerry said:
One of the threads I previously cited was titled "Reconciling Garbage
Collection with Deterministic Finalization". Even with little or no
knowledge of the subject matter, the mere fact that the thread went
on for well over 300 posts tends to show that nobody has a really
solid answer for that.
No suprise since it was sort of at the all or nothing level. You could
provide a lower level api for use by GC implementations and let the
market decide. Of course you'd have to recognise that there are different
forms of GC out there, not just tracing GC, otherwise there really wouldn't
be any market to decide. Kind of Henry Ford's you can have any color you
want as long as it's black.
 
R

Ron House

Alf said:
* Ron House:


It's very simple: object destruction is not memory reclamation, and
memory reclamation is not object destruction.

The job of a garbage collector is solely to reclaim memory.

If an object has a non-trivial destructor, one with possible side
effects, then that object cannot be automatically destroyed by a garbage
collector in order to reclaim memory, because then the garbage collector
would intrude in the arena of program logic and correctness.

Thus, a C++ garbage collector compatible with the current standard can
reclaim a region of memory only when:

A. There are no live references to the region (circular references
are not live), and

B. All remaining objects in the region have trivial destructors.

The case where there are no remaining objects in the region (i.e. all
have been destroyed) might seem to be of no practical advantage, but it
is if ::eek:perator delete, rather than deallocating at once, just invokes
object destruction and marks the memory for later automatic reclamation,
which can proceed e.g. when the program's later waiting for user input.

This reduces garbage collection in C++ to an /optimization/ and /memory
leak slurper/, not affecting correctness except to the degree it slurps.

Well it can't be "very simple" if the solution doesn't cover the real
motivation for having GC. I don't give a fig for efficiency (within
reason). The motivation for GC is to prevent errors: remove the need for
deleting dynamic memory so that you remove the need to keep track of it,
which simplifies a whole slew of algorithms and designs. If we have to
keep on keeping track for 'difficult' objects, then the algorithms and
methods will be made complicated anyway; errors will still be possible.
So, can we have GC (that is, auto-reclamation of space, meaning no
programming to call delete) along with deterministic execution of the
destructor? For example, we might be satisfied if we could: 1) be sure
destructors would immediately be called at the end for stack variables,
2) be sure they would be called sooner or later for lost heap variables,
and 3) called deterministically by using delete deliberately.
 
A

Alf P. Steinbach

* Ron House:
Well it can't be "very simple" if the solution doesn't cover the real
motivation for having GC. I don't give a fig for efficiency (within
reason). The motivation for GC is to prevent errors: remove the need for
deleting dynamic memory so that you remove the need to keep track of it,
which simplifies a whole slew of algorithms and designs.

The above does that: managing memory.

If we have to
keep on keeping track for 'difficult' objects, then the algorithms and
methods will be made complicated anyway; errors will still be possible.

They are, but more so when non-memory cleanup responsibility is left to
GC. Look to Java, where the poor programmers have to do manually what
C++ does automatically, because the Java GC idea prevents RAII. GC
manages memory, not other resources, and when cajoled into managing
other resources does an extremely poor job, so poor that you're better
off without it (zombies, meta-invariants, that sort of thing).

So, can we have GC (that is, auto-reclamation of space, meaning no
programming to call delete) along with deterministic execution of the
destructor? For example, we might be satisfied if we could: 1) be sure
destructors would immediately be called at the end for stack variables,
2) be sure they would be called sooner or later for lost heap variables,
and 3) called deterministically by using delete deliberately.

You'd not really want (2). At least, I don't. For example, if you have
an open file and leave it to GC to close it, then after a while you'll
have an arbitrary number of open files hanging around, waiting for GC to
close them -- if ever -- preventing access to those files both from
other programs and your own (IMO relying on GC for this is an error).
 
I

Ian Collins

Ron said:
Well it can't be "very simple" if the solution doesn't cover the real
motivation for having GC. I don't give a fig for efficiency (within
reason). The motivation for GC is to prevent errors:

That should read "prevent errors introduced through lazy programming"
remove the need for
deleting dynamic memory so that you remove the need to keep track of it,
which simplifies a whole slew of algorithms and designs. If we have to
keep on keeping track for 'difficult' objects, then the algorithms and
methods will be made complicated anyway; errors will still be possible.

My platform has an effective GC library, but I only use it during
acceptance test runs, to verify that there aren't any leaks.

GC doesn't belong in the language, if it is to be used at all, it should
be in a library.

RAII is a much more effective and deterministic tool.
 
R

Roland Pibinger

* Ron House:

You'd not really want (2). At least, I don't. For example, if you have
an open file and leave it to GC to close it, then after a while you'll
have an arbitrary number of open files hanging around, waiting for GC to
close them -- if ever -- preventing access to those files both from
other programs and your own (IMO relying on GC for this is an error).

When someone wanted to introduce GC into C++ with a newly-made 'gcnew'
operator the _compiler_ would have to check if all data members of the
gcnew-ed object (and its base classes) have trivial (ie.
non-implemented) destructors. Otherwise the compiler would have to
issue a compile time error. This means that not even the current
std::string could be a member of a gcnew-ed object. Moreover, since
objects with trivial destructors usually are of value type and value
types are best handled with value semantics the introduction of even a
limited form of GC into C++ seems highly questionable.

Best regards,
Roland Pibinger
 
D

Dilip

Ron said:
My question is a simple one: how do we combine destructors with GC?
Destructors do not become superfluous just because one usage for them
does. Closing files and shutting down other resources in a timely manner
becomes hard when object termination occurs at an indeterminate time. Is
there a way to do this that is sufficiently practical and efficient to
raise no major objections if put into a standard? It is one thing for an
add-on to do it, as we use or not use the add-on to our own liking; but
a standard is another matter.

Years ago when C# first surfaced, a Microsoft employee posted a
_lengthy_ analysis on the resource management/deterministic
finalization conflict. The link is here:
http://discuss.develop.com/archives/wa.exe?A2=ind0010A&L=DOTNET&P=R28572
It talks about the pros & cons of reference counting vs. deterministic
finalization. Might be relevant to this thread.
 
R

Ron House

Alf said:
* Ron House:



The above does that: managing memory.

Not if it manages it quirkily - only for 'simple' objects, but not others.
They are, but more so when non-memory cleanup responsibility is left to
GC. Look to Java, where the poor programmers have to do manually what
C++ does automatically, because the Java GC idea prevents RAII. GC
manages memory, not other resources, and when cajoled into managing
other resources does an extremely poor job, so poor that you're better
off without it (zombies, meta-invariants, that sort of thing).

That is what I am seriously considering, which is why I was surprised by
your original remark that it was "very simple". If these ideas are
basically at odds, then it is the opposite of simple, it is impossible.
You'd not really want (2). At least, I don't. For example, if you have
an open file and leave it to GC to close it, then after a while you'll
have an arbitrary number of open files hanging around, waiting for GC to
close them -- if ever -- preventing access to those files both from
other programs and your own (IMO relying on GC for this is an error).

My point in saying "be sure" is to rule out "if ever". I.e., you are not
criticising my requirement, but the violation of it. Now it might be
true that we cannot "have it all" in this matter, but I am looking for
evidence that this has been shown conclusively, as opposed to it merely
being that no one has found out the way to do it yet.
 
K

Kaz Kylheku

Ron said:
My question is a simple one: how do we combine destructors with GC?

The most straightforward way is for the garbage collector to know how
to invoke the equivalent of delete on the objects. (I would abhor a
scheme whereby GC'able objects have to inherit from some special base
class with a virtual destructor; it would be better if this was
intelligent somehow).

For some types of objects, calling the destructor at GC time might be
too late. Fine; those objects have to be coded with two-step
destruction. Deterministically do the actions that have to be taken,
using a chain of member functions. The destructor then behaves as a
finalizer. It ensures that those actions happen, plus any other cleanup
actions.

A class written like that can be used with garbage collection, as well
as with new/delete and RAII.
Destructors do not become superfluous just because one usage for them
does. Closing files and shutting down other resources in a timely manner
becomes hard when object termination occurs at an indeterminate time.

The indeterminate lifetime of an object isn't caused by garbage
collection. It's caused by the semantics of the program. Garbage
collection solves the problem of computing that lifetime.

If you don't have garbage collection or some other scheme, you still
have to compute the lifetime of an object and call for it to be
destroyed by explicit delete.

And if you do that, that delete call may also be too late in releasing
an operating system resource.

You simply have to regard that operating system resource handle has
having its own lifetime, which is contained within the lifetime of the
encapsulating object.

You can take the responsibility for computing that contained lifetime,
and let the garbage collector determine the main lifetime.

You don't want to use the destructor for ending the resource lifetime,
because that will turn your entire object into garbage, while it is
still reachable.

So the obvious thing is release the resource and change the state of
the object to indicate that the object does not have that resource.

Garbage collection is not incompatible with your program knowing when
to release a resource.

class ResourceWrapper {
private:
SomeResource *res;
public:
// ...
void ReleaseResource()
{
if (res != 0) {
DestroyResource(res);
res = 0;
}
}
virtual ~ResourceWrapper()
{
ReleaseResource();
}
};

Okay, so that gets the obvious out of the way. Now the problem.

The issue is that functions like ResourceWrapper::ReleaseResource() are
ad hoc. Whereas a destructor is a formalism built into the language.
The destructor formalism does something nice. Namely, it ensures that
the member and base destructors are called.

Here, the partial cleanup done by ReleaseResource() has the
responsibility of doing whatever needs to be done in the base and
member objects, if anything. If ReleaseResource() is virtual and is
overridden, the derived ReleaseResource() will probably have to call
the parent one.

In the intelligently designed Common Lisp Object System, any method can
be endowed with auxiliary methods which are called if that primary
method is called. The auxiliary methods can be specialized throughout
the class lattice, and be designated as "before", "after" or "around".
In CLOS terminology, the automatic constructor calling in C++ resembles
before methods being fired. Whereas destructors are after-methods. Sort
of. The most derived destructor that is called is kind of like the
primary method, and the base ones that are called are like
after-methods. There is no counterpart to the automatic calling of
destructors on member objects.

C++ could benefit from member functions which can be extended with
auxiliaries. In a class having some virtual function Foo() it would be
nice to be able to define a special overload of Foo() which is always
called before Foo(), and another overload which is always called after.
That is to say, if the virtual Foo() is invoked on the object, then the
before Foo() is called in that class, and in all the derived classes
which also have one. Then the appropriate override of Foo() is invoked,
at whatever level in the class hierarchy that may be, and then the
afters are called, in derived to base order.

With befores and afters, certain aspects of resource cleanup would be
easier to manage. The ResourceWrapper would look like this:

virtual void ReleaseResource()
{
// nothing to do here now; it's moved to the after function
}

after void ReleaseResource()
{
if (res != 0) {
DestroyResource(res);
res = 0;
}
}

So now if ReleaseResource() is called on that object, no matter how
that is derived, the ReleaseResource() after-function is called.
(Provided that no bullshit happens with exceptions!) If someone derives
from this class and overrides ReleaseResource(), that will not prevent
our cleanup from happening.

These before and after functions don't need any weird magic in the
vtable or anything. They are not virtual (and in fact making them so
ought to be forbidden).

It works simply like this. When the compiler translates the virtual
function which looks like this:

void ReleaseResource(int arg)
{
BODY;
}

it generates the code as:

virtual void ReleaseResource(int arg)
{
BEFORE(arg);
BODY;
AFTER(arg);
}

Here, BEFORE represents the name of the outermost "before
ReleaseResource" function, and AFTER the call to the nearest "after
ReleaseResource". Both functions are called in the normal way. Imagine
scope resolution being used to specify them exactly with whatever funny
names they have known to the compiler.

And of course the befores and afters are automatically instrumented
with exception-safe code which ensures their own continuity. E.g. an
"after ReleaseResource(int arg) { BODY; } is generated as:

after void ReleaseResource(int arg)
{
BODY;
AFTER(arg);
}

No exception safety there; that is deliberate. If anything throws, the
subsequent actions are not invoked.
 
K

Kaz Kylheku

Alf said:
It's very simple: object destruction is not memory reclamation, and
memory reclamation is not object destruction.

The job of a garbage collector is solely to reclaim memory.

That is too naive.

An object must remain valid while it is reachable. And while it remains
valid, it will continue to hold whatever resources it has always held.
Those resources must be cleaned up when that object becomes
unreachable.

Mature garbage-collected language implementations all have a way to
register finalization hooks: code run on an object when it's about to
be reclaimed.

They also have features like weak pointers and weak hash tables: these
can refer to objects weakly. When objects are GC'd, they automatically
disappear from weak hash tables, and weak pointers that refer to them
safely become nulll.

The only time it's a problem to do finalization at GC time is when a
more timely behavior is needed. For instance you keep opening files and
depend on GC to close them, you could run out of operating system file
handles, given a sufficiently large heap.

That doesn't mean destructors shouldn't be run at GC time, only that
some responsibilities have to be taken care of before that happens.
If an object has a non-trivial destructor, one with possible side
effects, then that object cannot be automatically destroyed by a garbage
collector in order to reclaim memory, because then the garbage collector
would intrude in the arena of program logic and correctness.

Be that as it may, people do this quite happily.
Thus, a C++ garbage collector compatible with the current standard can
reclaim a region of memory only when:

A. There are no live references to the region (circular references
are not live), and

Of course circular references are not live. The search for reachable
objects begins with global variables and live locals.
B. All remaining objects in the region have trivial destructors.

So in fact you are calling destructors from the garbage collector.
You're merely insisting that they be trivial, because you're too scared
of having programmer code invoked in the context of the garbage
collector.
The case where there are no remaining objects in the region (i.e. all
have been destroyed) might seem to be of no practical advantage, but it
is if ::eek:perator delete, rather than deallocating at once, just invokes
object destruction and marks the memory for later automatic reclamation,
which can proceed e.g. when the program's later waiting for user input.

If you have garbage collection, then operator delete becomes a
dangerous tool which undermines the garbage collector. You should not
be using it. If you use delete, then you create the risk that you are
destroying an object to which references still remain.

Quite simply, the delete operator cannot be trusted, and so the memory
cannot be marked for reclamation.

Since you've established that these objects have only trivial
destructors, then operator delete is reduced to a noop.

But in a better design, where non-trivial destructors are allowed under
GC, what you would do is this: operator delete calls the destructor
chain on the object, but does not deallocate the memory. Instead, it
sets a flag which indicates that the destructors have been called
already. Later, when that object is garbage collected, the collector
will honor that "destroyed already" flag and not invoke a redundant
destructor call. It will just do the memory reclamation.

So everyone is happy. C++ programmers who think garbage collectors
should only hunt down memory and not handle object destruction can
knock themselves out by computing their own object lifetimes and
calling delete. When these programmers screw that up and forget to call
delete, the GC will save their asses by calling the destructors for
them and reclaiming memory. Those guys won't notice that this happened;
in fact if the destructor has some highly visible side effect, they
will probably still take credit for it, even though it was the GC doing
it for them. And so they can go on believing that garbage collectors
should only hunt down memory and not handle object destruction. When
they call delete and then continue to have references to that object
and use them, they will get the undefined behavior that they crave and
deserve.

Perfect!
 
K

Kaz Kylheku

Alf said:
They are, but more so when non-memory cleanup responsibility is left to
GC. Look to Java, where the poor programmers have to do manually what
C++ does automatically, because the Java GC idea prevents RAII.

GC in C++ would not prevent RAII, because you would still be able to
define objects in automatic storage, and as members of other objects,
etc.

{
// RAII at work
ObjectWithResource o;
}

// GC at work
ObjectWithResource *po = new ObjectWithResource;

#if 0
// optional: don't wait for GC, release resource now:
// maybe this is needed, maybe not. depends on resource.
po->releaseResource();
#endif
GC manages memory, not other resources, and when cajoled into managing
other resources does an extremely poor job, so poor that you're better
off without it (zombies, meta-invariants, that sort of thing).

Don't let Java's mistakes, whatever they are, reflect badly on garbage
collection with finalization hooks. There is nothing wrong with having
some function called on an object that is about to be reclaimed, and
it's a sane design for that function to be the destructor, if we are in
C++ land.
Just have a flag so it's not called twice. If the destructor decides to
make that object reachable (for instance using: global_pointer = this)
who cares. The destructor has been called, and won't be called a second
time. End of story.

and 3) (a): called at most once, no matter what.
You'd not really want (2). At least, I don't. For example, if you have
an open file and leave it to GC to close it, then after a while you'll
have an arbitrary number of open files hanging around, waiting for GC to
close them -- if ever -- preventing access to those files both from

There is no connection between (2) and the problem you are describing.
If that happens, it means that the programmer made the mistake of
solely relying on the destructor to clean up that resource.

Under GC discipline, the destructor must be regarded only as a last
resort cleanup for these kinds of resources.

The object should have some alternate method for releasing the file or
whatever resource, and the program should call that method.

The destructor /should/ still release that resource if it still exists
by that time. Not only because that will plug a resource leak, but
because the C++ class should still be useable for RAII discipline, when
instances are defined in automatic storage.

If I have some LogFile object, I'm not going to call Close() on it if I
defined it in a block scope. I want the destructor to do that.

If I used new to dynamically allocate it, I don't want to wait for the
destructor to close the file. So I will call ptr->Close().
other programs and your own (IMO relying on GC for this is an error).

Yes, relying on GC for this is an error. But that doesn't mean GC
shouldn't call destructors. It means that the destructor cannot be the
only way for that cleanup to take place.

But, also, the situation is not necessarily a calamity. If the program
only loses track of a small number of such objects, maybe it's okay for
them to be reclaimed later. It's only if a sufficiently large number of
unreclaimed objects pile up, containing open handles, that it becomes a
problem.

Consider the short lived program. Opens up a bunch of files, does some
work and then dies. The file handles close on process death. GC didn't
even have a chance to run! Who cares.

Also, there could always be an API for invoking GC explicitly. An
application could call for a full GC at specific checkpoints to flush
out these objects. In some applications, that solution could be quite
adequate and easier than coding the explicit calls to close the
resource, in all the right places.

In Common Lisp there is a macro called WITH-OPEN-FILE. It opens a file,
binds it to a variable and establishes a scope for expressions where
that variable is known. No matter how that scope terminates, the file
is closed.

You don't have to use that macro: you can open the file directly using
the underlying API. Then you are on your own. If you don't close it, GC
will still do it. Only who knows when.

All that stuff works. Common Lisp programmers don't sit around
lamenting GC problems; everything that is discussed here is pretty much
a non-issue.

Instead of looking at Java, why not skip that hornet's nest and go
upstream to some of the places where it borrows some of its
misunderstood and misimplemented ideas.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,011
Latest member
AjaUqq1950

Latest Threads

Top