Garbage Collection in C

R

Richard Bos

jacob navia said:
There are pauses, but in modern workstations this is barely noticeable.

Famous last words. Never written a server program? Little bits add up.
Write in assembly. That's a *real* language :)

Wrong. Real programmers _do_ eat quiche. But they bake their own,
because you can't trust fastfood cooks not to give you one of those
sissy quiches Lorraine, or a thing with all fluff and no real meat.

Richard
 
W

William Hughes

jacob said:
There are pauses, but in modern workstations this is barely noticeable.

I do not understand. I thought that GC had to scan the entire
amount of memory used by the program. I regularly have programs
that use over a gibabyte of memory. How is this
scanned without a noticable pause?

- William Hughes
 
J

jacob navia

William said:
jacob navia wrote:




I do not understand. I thought that GC had to scan the entire
amount of memory used by the program. I regularly have programs
that use over a gibabyte of memory. How is this
scanned without a noticable pause?

- William Hughes

1) When you allocate data structures that have no pointers in it,
just data, you tell the GC that there are no pointers in it and that
it doesn't need to look for pointers in those areas. This is an enormous
improvement.

2)
You use 1GB, but if you allocate your data structures correctly,
(see above) the GC doesn't need to scan all of that. Scanning the
stack and following the roots in there, and scanning the data section
is not very time consuming. A large heap (1GB!)
can cause problems but you should be able to optimize away most of it.
 
W

William Hughes

jacob said:
1) When you allocate data structures that have no pointers in it,
just data, you tell the GC that there are no pointers in it and that
it doesn't need to look for pointers in those areas. This is an enormous
improvement.

But I am not allocating the memory. The memory is being allocated
by a library that I do not control. Even if I could determine where
the memory
was being allocated, I don't know that is contains only data
(knowing that it contains mostly data is not good enough).

- William Hughes
 
S

SM Ryan

# Destructors do more than releasing memory. And finalisation is not a
# good substitute for these tasks as finalisation is not synchronous.

If you always know exactly when your destructors are being
called, they're not really destructors--they're methods you
haven't named. Destructors were intended to solve the problem
if collecting anonymous temporaries and the bookkeeping of
collecting local variables and class variables. You know the
destructors will be called before the stack or heap is
overwritten, but the whole point of destructors was you
don't need to know the exact time or place they are called.

If I need to guarentee a program action synchronise to some
external concern, I would have a explicit method to do so
and call it explicitly.
 
S

SM Ryan

# I do not understand. I thought that GC had to scan the entire
# amount of memory used by the program. I regularly have programs
# that use over a gibabyte of memory. How is this
# scanned without a noticable pause?

Then don't use it.
 
J

jacob navia

William said:
But I am not allocating the memory. The memory is being allocated
by a library that I do not control. Even if I could determine where
the memory
was being allocated, I don't know that is contains only data
(knowing that it contains mostly data is not good enough).

- William Hughes

If you are not allocating memory, then you are not deallocating it
either. Then just use the library and be happy...

jacob
 
L

Laurent Deniau

SM said:
# Destructors do more than releasing memory. And finalisation is not a
# good substitute for these tasks as finalisation is not synchronous.

If you always know exactly when your destructors are being
called, they're not really destructors--they're methods you
haven't named. Destructors were intended to solve the problem
if collecting anonymous temporaries and the bookkeeping of
collecting local variables and class variables. You know the
destructors will be called before the stack or heap is
overwritten, but the whole point of destructors was you
don't need to know the exact time or place they are called.

If I need to guarentee a program action synchronise to some
external concern, I would have a explicit method to do so
and call it explicitly.

Then you are not familliar with RAII (Ressource Acquisition Is
Initialization):

{ // code to synchronize
Lock lock();

// user code

} // destructor is called and unlock lock.

This has nothing to do with memory leaks but with potential deadlocks if
an exception is thrown in the user code.

a+, ld.
 
W

William Hughes

jacob said:
If you are not allocating memory, then you are not deallocating it
either. Then just use the library and be happy...

But the library requires the existence of a free() function at link
time. If I link to a null-op free() then nobody deallocates
memory in the library unless the GC does, in which case
the GC has to scan the whole heap.

O.K. Suppose we solve this problem somehow (we link
the library with another malloc/free and tell the GC to
ignore anything done by the library.) I still have the
problem that if I allocate a large piece of memory I cannot
(now or in the future) store a pointer anywhere in it.
Besides being a maintenance nightmare, this means that
I cannot change from storing a data structure that does
not contain pointers to one that does.

- William Hughes
 
J

Jean-Marc Bourguet

SM Ryan said:
# Destructors do more than releasing memory. And finalisation is not a
# good substitute for these tasks as finalisation is not synchronous.

If you always know exactly when your destructors are being
called, they're not really destructors--they're methods you
haven't named. Destructors were intended to solve the problem
if collecting anonymous temporaries and the bookkeeping of
collecting local variables and class variables. You know the
destructors will be called before the stack or heap is
overwritten, but the whole point of destructors was you
don't need to know the exact time or place they are called.

If I need to guarentee a program action synchronise to some
external concern, I would have a explicit method to do so
and call it explicitly.

Destructor were invented to release *ressources* when they are no more
needed. Memory is probably the most common of the ressources freed by
destructors, but it's not the only one. In C++, we rely on destructor to
free other ressources which need to be freed whatever the way the scope is
left (so there is usually no need to put catch clauses to free the
ressources in case of exceptions). Some of these ressources need to be
freed synchronously, a mutex is probably the canonical example of such kind
of ressources. That's so common that there is a name for this idiom: RAII
(Ressource Acquisition Is Initialization).

Another use of destructors is to notify other objects of the
destruction... this use can sometimes be replaced by the use of weak
references in conjunction with a GC, sometimes not.

Yours,
 
R

Richard Tobin

William Hughes said:
I do not understand. I thought that GC had to scan the entire
amount of memory used by the program. I regularly have programs
that use over a gibabyte of memory. How is this
scanned without a noticable pause?

There are over 40 years of research on the subject, so it's not
surprising that this problem has been largely overcome.

In particular, many modern garbage collectors avoid scanning the whole
of memory, except perhaps very rarely.

-- Richard
 
W

William Hughes

Richard said:
There are over 40 years of research on the subject, so it's not
surprising that this problem has been largely overcome.

In particular, many modern garbage collectors avoid scanning the whole
of memory, except perhaps very rarely.

How can this be done in a C context without providing the GC with
extra information (e.g. I am not going to use this bit of
memory to store pointers)? True, a scan at any given
time could insure that certain areal of memory do
not contain pointers. But, given that a program may well
write frequently to a large area of memory, how does the GC
know that this memory does not "now" contain pointers. Scan
on write might aleviate the problem somewhat, but only somewhat,
given that most or all of the memory may be overwritten.
(And how could one would implement "scan on write" without slowing
the program to a crawl?)

- William Hughes
 
J

jacob navia

William said:
How can this be done in a C context without providing the GC with
extra information (e.g. I am not going to use this bit of
memory to store pointers)? True, a scan at any given
time could insure that certain areal of memory do
not contain pointers. But, given that a program may well
write frequently to a large area of memory, how does the GC
know that this memory does not "now" contain pointers. Scan
on write might aleviate the problem somewhat, but only somewhat,
given that most or all of the memory may be overwritten.
(And how could one would implement "scan on write" without slowing
the program to a crawl?)

- William Hughes
Boehm's GC is incremental. Each time you make a GC_malloc call, it will
do a bit of garbage collection, so the full pause is less than it
might be.

Yes, it implies that you design your data structures to avoid mixing
data and pointers. This is not very difficult to do.
 
K

Keith Thompson

jacob navia said:
Boehm's GC is incremental. Each time you make a GC_malloc call, it will
do a bit of garbage collection, so the full pause is less than it
might be.

Yes, it implies that you design your data structures to avoid mixing
data and pointers. This is not very difficult to do.

I see, so there's *another* restriction (though not an absolute one in
this case) that you haven't mentioned until now.

So if I've declared a structure consisting of a large array of doubles
and a single pointer (say, for a linked list), then I need to re-write
my code to use it with GC?
 
R

Richard Tobin

Yes, it implies that you design your data structures to avoid mixing
data and pointers. This is not very difficult to do.
[/QUOTE]
So if I've declared a structure consisting of a large array of doubles
and a single pointer (say, for a linked list), then I need to re-write
my code to use it with GC?

You certainly don't have to do that to use the Boehm collector.

If you wish to improve the performance of the garbage collector, you
can use various functions instead of the usual malloc(). In particular,
you can use GC_MALLOC_ATOMIC() to allocate space that will definitely
not contain any pointers. Presumably that's what Jacob is referring to.

If you're interested, there's lots of information at

http://www.hpl.hp.com/personal/Hans_Boehm/gc/

-- Richard
 
W

William Hughes

So if I've declared a structure consisting of a large array of doubles
and a single pointer (say, for a linked list), then I need to re-write
my code to use it with GC?

You certainly don't have to do that to use the Boehm collector.
[/QUOTE]

Well, it will work, but it might take a *lot* of time.
If you wish to improve the performance of the garbage collector, you
can use various functions instead of the usual malloc(). In particular,
you can use GC_MALLOC_ATOMIC() to allocate space that will definitely
not contain any pointers.

But the person who has to insure that the space will not contain
any pointers is me. To do this I have to allocate different areas
for pointers and data. To do this I have to avoid mixing data
and pointers in the first place. Then, I have to go through
my code and change the appropriate malloc calls
to GC_MALLOC_ATOMIC(). Sounds like rewriting code
to me.

- William Hughes
 
R

Richard Tobin

If you wish to improve the performance of the garbage collector, you
can use various functions instead of the usual malloc(). In particular,
you can use GC_MALLOC_ATOMIC() to allocate space that will definitely
not contain any pointers.
[/QUOTE]
But the person who has to insure that the space will not contain
any pointers is me. To do this I have to allocate different areas
for pointers and data. To do this I have to avoid mixing data
and pointers in the first place. Then, I have to go through
my code and change the appropriate malloc calls
to GC_MALLOC_ATOMIC(). Sounds like rewriting code
to me.

I explained that you don't have to change your code to use the garbage
collector, but you can to make it more efficient. Why do you restate
this as if you're disagreeing with me?

By the way, in most programs you can use GC_MALLOC_ATOMIC() without
changing your data structures. Just use it for the allocations that
would be pointer-free anyway, such as strings and arrays of numbers.

-- Richard
 
W

William Hughes

But the person who has to insure that the space will not contain
any pointers is me. To do this I have to allocate different areas
for pointers and data. To do this I have to avoid mixing data
and pointers in the first place. Then, I have to go through
my code and change the appropriate malloc calls
to GC_MALLOC_ATOMIC(). Sounds like rewriting code
to me.

I explained that you don't have to change your code to use the garbage
collector, but you can to make it more efficient. Why do you restate
this as if you're disagreeing with me?[/QUOTE]

Because although it will "work" without being more efficient, it may
have very long pauses, and may be so slow as to be useless.
You do not have to rewrite code to get it to work, but you
may have to rewrite code to get it to work usefully. (And worse, you
may be unable to rewrite code to get it to work usefully)
By the way, in most programs you can use GC_MALLOC_ATOMIC() without
changing your data structures. Just use it for the allocations that
would be pointer-free anyway, such as strings and arrays of numbers.

And if you have large data structures that do contain pointers
you are SOL. Being told that "most programs" do not have this problem
is cold comfort.

And this does not address the problem of memory allocation in
libraries not under your control (no you cannot simply link them
with another malloc/free as you will then have two mallocs working
on the same address space and same physical memory).

- William Hughes
 
J

jacob navia

Richard said:
So if I've declared a structure consisting of a large array of doubles
and a single pointer (say, for a linked list), then I need to re-write
my code to use it with GC?


You certainly don't have to do that to use the Boehm collector.

If you wish to improve the performance of the garbage collector, you
can use various functions instead of the usual malloc(). In particular,
you can use GC_MALLOC_ATOMIC() to allocate space that will definitely
not contain any pointers. Presumably that's what Jacob is referring to.

If you're interested, there's lots of information at

http://www.hpl.hp.com/personal/Hans_Boehm/gc/

-- Richard[/QUOTE]

Well, yes. Mr Hughes was speaking about an application using
1GB RAM. In those cases optimization could be necessary.
GC_MALLOC_ATOMIC() optimizes the GC scan.
 
J

jacob navia

William said:
Well, it will work, but it might take a *lot* of time.




But the person who has to insure that the space will not contain
any pointers is me. To do this I have to allocate different areas
for pointers and data. To do this I have to avoid mixing data
and pointers in the first place. Then, I have to go through
my code and change the appropriate malloc calls
to GC_MALLOC_ATOMIC(). Sounds like rewriting code
to me.

- William Hughes

It is rewriting code. If you have an application that uses 1GB
of RAM, it is normal that you optimize RAM usage and the GC
if you use it.

Since you have said that the allocations do not happen
under your control, you shouldn't use the GC for that application
in my opinion. Your third party library is working, and since
you do not have the source code you are stuck with it.

The GC is not magic, and it will save you some work but there
is a minimal effort from your side to be done. If you do not
want to do ANY effort just go on using your system as it is.

We are not gaining any money with the GC, so if you do not use
it we do not lose any "sale" !!!

jacob
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,576
Members
45,054
Latest member
LucyCarper

Latest Threads

Top