Ron said:
That's the reason finalisers aren't the equivalent of destructors, and
The general reason why destructors aren't the same thing as finalizers
is that they live in different programming languages.
the reason I ask if destructors can be combined with GC - without being
replaced by finalisers.
For example, is it possible to write a GC that cleans up immediately the
last pointer disappears, rather than in a general sweep, and still be
acceptably efficient?
So your definition of a destructor appears to be: something which runs
just after the pointer disappears. Whereas a finalizer is something
that runs just before the object is scavenged and re-entered into the
free store.
The problem is that the C++ destructor meets the latter definition a
lot more closely than the former.
Why is that event interesting when the program erases its last pointer
to an object?
I would argue that an interesting event is when the program loses the
last pointer to an object from an interesting subset of all the
pointers to that object.
Pointers which are not in that set behave semantically like weak
pointers.
For instance, suppose you have implemented some cache of objects from
which they expire based on some aging scheme. That cache has poitners
to the objects, of course. However, when /only/ that cache has a
pointer to some object, then, semantically, that object is practically
as good as garbage. If nobody grabs it before the time comes up, it
will be scavenged.
Moreover, certain actions may have to be taken when the object is
entered into the cache to be expired.
These kinds of schemes are found in C++ programs.
They are sometimes combined with reference counting too. I remember
working on systems where it was known that a certain module held a
reference to an object. Therefore, some clean up actions were triggered
on, guess what, the 2 -> 1 refcount transition. A notification was sent
out through the framework, and then that special module would drop
/its/ reference, which would trigger the destructor and delete.
So in other words, the 2 -> 1 refcount was the real disappearance of
the object, and that module effectively held only a weak pointer. Of
course there was nothing manifestly different about the reference that
it held; it was all in the semantics.
Aren't destructors being used for finalization in these situations?
Formally, what is a finalizer? It is a special entry created in the
memory manager which holds a weak pointer to some object, and a
function which is to be called when /only/ weak pointers to the object
remain, prior to that object being reclaimed.
That function itself is sometimes called a finalizer, transitively.
There is no reason why a C++ destructor cannot serve as that function.
That's what it's for: to perform the last rites on an object.
I think what you're asking for is to have an additional notification
when the program loses its last (non weak) reference to an object.
But computing that notification is the same thing as knowing that the
object's lifetime has ended. You call the destructor, and so you might
as well re-enter the object into the free storage. Once the destructor
has run, the object its reduced to just being memory. You can't do
anything with it, and so there is no point in registering any other
function on it.
Therefore, it doesn't make sense to want 'destructors /and/
finalizers'. It does make sense to have the choice to have delayed
/and/ timely finalization, for different objects.
To make use of that choice, it's necessary to have weak pointers. To
have a resource cleaned up in a timely way, the program has to be able
to indicate that certain references to the encapsulating object are
weak.
Then if the program enables that "just in time" garbage collection, it
will get the finalization trigger as soon as the last non-weak
reference is lost. The object is destroyed and all of the weak
references lapse into null values.
This still leaves the problem of what to do if the program design wants
to only have the resource cleaned up, not the entire object: to have
the resource cleaned up at some point when the object is still
reachable by ordinary non-weak references. The object continues to be
used without that resource. Then the program its on its own anyway.
What it boils down to is this. You have some object O which holds a
resource handle R. the object O is shared by multiple references to it.
A module can either hold a reference to O, or not hold a reference to
O. Only thorugh its reference to O can a module express its interest in
resource in R. The problem is that holding or not holding a reference
is only a boolean value: interest yes, or interest no. But there are
two entities there, O and R. The boolean interest value doesn't hold
enough information to express interest in these two separately.
What the program needs is a way to express two kinds of strong
references:
- references which express interest in O with or without the resource
R.
- reference which express interest in O with the resource R.
Now when there are no more references of the second kind, when the only
references to O that remain are weak references and strong references
which do not care whether R is valid, the resource R can be deallocated
at that point.
How can you implement such a scheme? One way is to implement the second
type of reference, expressing, "interest in O with the resource R",
using reference counting. What you can do is allocate a second object,
call it P, which holds a reference count, a reference to O, and a
method to invoke when the count hits zero. Modules which are interested
in R manipulate pointers to P instead of O. By holding references to P
instead, they express a special interest in O related to the meaning of
the reference count in P.
Modules manipulating P take care to manage the reference count among
themselves: when they copy the pointer, they bump up the refcount, and
when they erase it, they drop the refcount.
When the refcount hits zero, the special method is run on P, which does
something with object O: namely, it destroys resource R, and replaces
that handle with an invalid value, an action that can be encapsulated
in a method on O.
So provided that the refcounts are accurately managed, and the
appropriate kind of reference is used by every module to express the
correct kind of interest, the resource will be accurately managed.
No help is needed from the garbage collector; it can delay
finalization.
Or, instead of refcounting, a special memory arena could be used for
these P objects, an arena where garbage collection is greedy: it tracks
the lifetime of these objects accurately. When all the references to P
object disappear, it is reclaimed right away. Its finalizer action
performs the resource cleanup on P. So the effect is like that of
reference counting.
These P objects could hold references to each other, denoting
hierarchies of interest. Suppose that object P1 expresses interest
related to resource R1 in object O. Object P2 express interest related
to resource R2 in that object. How do you express total interest in O
having both resources? By a third object P12 which holds a reference to
O, and also to P1 and P2. The cleanup method of P12 does nothing with
O. All it does is null out P12's references to P1 and P2. If doing so
erases the last reference to P1, then P1's cleanup action is triggered,
releasing the resource O.R1. Likewise for P2.