Andre - A response to your post in the "C# memory problem: no end for our problem ?" thread

A

Andreas Suurkuusk

Hi,

I just noticed your post in the "C# memory problem: no end for our problem?"
thread.
In the post you implied that I do not how the garbage collector works and
that I mislead people. Since the thread is over a month old, I decided to
start a new one with my response.

Please see my comments inline.

Andre said:
You're right about the threshold.. however it has nothing to do with the
processors cache size and is saying "the amount of physical memory" is
not entirely correct.. read Jones and Lins "Garbage Collection -
Algorithms for automatic dynamic memory management" for more information

I have not read the book you're recommending, but I can't understand how
this book can tell how the "generational garbage collector implemented by
the CLR" determines the treshold values. I do not know for sure that the
treshold is dependent on the processor cache size, but I've seen it
mentioned in articles written about the GC of the CLR and I think it
makes sense. The size of the generation #0 treshold is initially about 160KB
(as far as I remember) and this value is of course dynamically updated after
a GC.
I suspect that the book you're referring to gives some algorithm for
determining the treshold value, but it's up to the implementor to use
whatever input parameters they see fit (physical memory, cache size,
number of surviving instances, etc).
A simple test can prove this:

Add the following code to a .NET program.

Random rnd = new Random();
object[] data = new object[10001];
while( true )
{
for( int i=0; i < 10000; i++ )
{
switch( rnd.Next( 3 ) )
{
case 0:
data = "This is test" + i;
data[i+1] = "This is test" + (i+1);
break;
case 1:
data = new int[10];
break;
case 2:
data = new object();
break;
}
}
}

When inserting the above code in the Main method of a console application,
the managed heap will use between 400 KB and 5000KB of memory and the CPU
utilization will be 100%. Even though the Task Manager cannot show the size
of the managed memory, it will show that the used memory doesn't keep
increasing until all physical memory is used (when running the test, 290 MB
of physical memory was still available). The numbers presented were
retrieved from (the upcoming) next version of out .NET Memory Profiler.


So what were you trying to prove again? Again - don't say "amount of
phsyical size", say "heap size" or "managed heap size" ... it only uses
a little of it and the threshold increases as more memory is consumed
and *if* more memory is available.


You left out the following part in the snippet of my post:

"A common misconception regarding the the garbage collector of the CLR is
that it runs whenever the system runs out of physical memory, or when there
is some idle time it can use to clean up the memory."

This text was followed by the "This is not true..." sentence, which you
started your post with.

I was simply trying to prove that the GC collects memory even if there's is
plenty of physical memory left, and even if there's no idle processor time.
I've seen several persons trying to explain why their application is using
large amounts of memory by claiming that the garbage collector collects
at idle time in low memory situations.

If you don't know how a generatatioanl GC works, what makes you think
you can write an article on it (and mislead people)? Read Jones and Lins
for more information... in a nut shell, a generational garbage collector
improves performance by dividing the managed heap into different
'generations' and moves objects to these spaces according to the number
of times they survive a collection.. if an object survives a certain
number of GCs, it is moved from nursery (the first generation) to the
next generation.... this effectively improves performance because the GC
collects older generations at a very low rate .. this is because of the
hypothesis that "all objects die young".. and so the first generation
gets frequent GCs but since nursery size is set to a small figure, GC is
effectively fast and quick.

I beleive I have a very good knowledge of the generational GC of the CLR,
and your "in a nut shell" explanation of a generational GC is more or less
exactly the same explanation I've seen in many different articles. The thing
that all these articles fail to address is how this increases performance.

I'll try to make a short explanation of what I mean.

The hypothesis you mention ("all objects die young) is probably better
stated as "most objects die young and those that don't will live forever".
Dividing the heap into generations merely on this hypothesis will not
increase performance significantly. Only objects that survive a GC will need
to be relocated (e.g. compacted), and those objects are assumed to live
forever. Thus, old objects can be compacted into the bottom of the heap and
after that they will probably not need to be relocated very often (since all
neighbouring objects are also old and are assumed to live forever, or at
least for a long time).
This behaviour will be the same even if no generational GC is used. The main
thing solved by a generational GC is reducing the number of references to
look at when performing a collect.

Consider a case where you have an application with 1 million long lived
instances, each having 5 references to other instances. If this application
is performing a large amount of allocations of short-lived instances, a gen
#0 collection may be triggered several times per second. Without optimizing
the references to look at, the GC would have to look at every one of the 5
million references to make sure that none of them references a gen #0
instance. What the generational GC does is to keep track if any reference
has changed in instances in older generations by using "write barriers".
When a GC (gen 0 or gen 1) is performed, only the references that have
changed in older generations need to be looked at. This optimization may
very well reduce the number of references to look at from 5 millions to
close to zero, a very significant improvement. Of course the garbage
collector still has to look at all the stack based references (local
variables and
method parameters) and other internal references; this is not
affected by the generational garbage collector.

After I posted the original message, I found the following articles on
MSDN:
http://msdn.microsoft.com/library/en-us/dndotnet/html/highperfmanagedapps.asp
and
http://msdn.microsoft.com/library/en-us/dndotnet/html/dotnetgcbasics.asp
(watch for linewraps)

These articles do mention the use of write barriers used by the garbage
collector and they also provide more low level information about the garbage
collector, making me less motivated to write an article.

Anyway, if I write an article about the garbage collector, it will focus on
implementation details of the CLR garbage collector implemented by
Microsoft, it will not be a description of garbage collectors in general.

Finally, I don't understand how the phrase "the CLR uses generations to
improve the performance of the garbage collector" is technically (and maybe
even entirely) incorrect, as you stated in your next post. As you said, "the
*garbage collector* implemented by the CLR 'is a' generational Garbage
collector", but I think it's quite OK (albeit not perfect) to say
that a generational garbage collector uses generations.


Best regards,

Andreas Suurkuusk
SciTech Software AB
Download our .NET Memory Profiler at http://www.scitech.se/memprofiler
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,731
Messages
2,569,432
Members
44,836
Latest member
BuyBlissBitesCBD

Latest Threads

Top