If it's mathematically impossible for a garbage collector to find every
unused object, I certainly wasn't told about it when I took compilers
class a couple years ago.
It depends on how you define unused.
Actually a GC should remove unusable objects, i.e. objects which are
no longer reachable, and therefore can no longer affect future
computations.
Of course no GC can predict whether or not a reachable object will
ever be used in the future. So 'memory leaks' are possible even with a
GC. One of, but certainly not the only, reasons that global variables
are frowned upon is that they are opportunities to accumulate
reference chains to objects which otherwise should be GCed.
On the other hand the worst thing a GC can do is to reclaim objects
which ARE reachable. That's why the Ruby GC doesn't claim objects
which are reachable via references on an execution stack. Someone
mentioned this in another post to this thread.
Reference counting was probably the first GC algorithm, it's safe in
the sense that it doesn't free reachable objects, but it doesn't free
non-reachable objects which form a self referencing set.
AIUI, Java uses mark-and-sweep garbage collection just like Ruby, and both
can handle circular references. The downside to mark-and-sweep garbage
collection is the fact that mark-and-sweep can take a while at random
points in the program, so it's not advisable for real-time programming,
and could cause noticable delays in interactive programs.
It depends on the implementation, but most Java VMs use a form of
generational garbage collection. This was first invented by Dave
Ungar for the UC Berkeley implementation of Smalltalk, who observed
that most objects either quickly became dead (unreachable) or lived a
long time. Ungar's original 'generation scavenging' GC algorithm
allocates new objects in a small space called newspace, and when
newspace fills, it moves the live objects in newspace to a new
newspace, leaving the others behind. After an object lives a certain
small number of generations, usually 2**n for a small n (like 2 or 3)
objects get tenured, to a larger "old space" Old space gets
infrequently reclaimed by another algorithm, often mark-sweep. It's
called scavenging because, rather than freeing unused new objects, it
scavenges live ones before throwing away wholesale those left in the
old newspace.
Generational scavenging can also, but not necessarily, have nice
properties in virtual memory systems, since it tends to keep the young
object closer together, which tends to keep the working set smaller.
On the downside, these types of GC algorithms mean that the address of
an object changes over its lifetime, which makes the API for what ruby
calls extensions, and what Smalltalk calls primitives (i.e. code
written in C or another low-level language which deals with objects)
have to deal with volatile object pointers.
It also complicates implementing finalization since dead objects in
newspace are reclaimed by being left behind which makes their
detection more expensive.
It also means
that objects holding other resources (like file handles) may keep those
resources open indefinitely if they are not manually closed. Ruby handles
this by allowing you to use a block to scope the sue of such resources,
such as
open("file") {|f| ... } # after the block, f is now closed.
Of course this is an issue regardless of the GC algorithm, it really
relates to what I said earlier about a GC collecting objects
potentially still in use as a cardinal sin.
Actually the nice thing about Ruby's use of blocks for things like
IO#open is that it nicely handles a lot of cases for which
finalization is used in other languages, and finalization is best used
sparingly since it only gets triggered when and if an object is GCed.
Relying on finalization to free up non-object resources (e.g. to close
files) is not the best practice in general.
Reference counting (used by Perl) keeps a count of how many pointers there
are to each object and frees those objects when the reference count
reaches zero. This can't free circular references because every object in
the circle has a pointer to it from something else in the circle. The
solution when using this kind of system is to keep circular structures
encapsulated in some other object which is written in such a way as to
break the cycle when its reference count drops to zero (in Perl where all
pointers are reference counted), or which can use uncounted pointers
for the circular stuff can take care of disposing of the objects itself (in
C++ where reference counting requires you to create a special pointer
class like a ref_ptr<T>). The advantage to this method is that you know
exactly when your objects are disposed, it works well across processes
(for example COM uses it), and it can be mixed with uncounted code quite
easily (as I mentioned regarding C++).
While mark and sweep does tend to 'stop the world', reference counting
is actually more expensive because of the amount of bookkeeping it
requires. Every assignment requires an increment of the ref count of
the object now referenced, an decrement of the ref count of the object
formerly referenced, and a check to see if that count has gone to
zero.
There are also variants of mark and sweep which allow it to be
incremental, which combines the lower overall overhead with the
ability to amorize the impact over time.