[ANN] Mongrel 0.3.13.4 Pre-Release -- Ruby's LEAK Fixed (Death ToMutex!)

Zed Shaw · Aug 26, 2006

Howdy Folks,

This release is after painstaking analysis of a memory leak that was
reported by Bradley Taylor, reduced by myself, and then fixed after much
work. You should all thank Bradley for finding the bizarre fix.

It turns out the Ruby has a memory leak when you use pretty much any
thread locking primitive other than Sync (Mutex, Monitor, etc.):

http://pastie.caboo.se/10194

The fix (for whatever reason) is to use Sync and put it in a block:

http://pastie.caboo.se/10317

Those two scripts are mini versions of how Mongrel manages threads so
that I could figure out a solution or get some input. The graph is
reported ram usage samples 1/second. As you can see the first leaking
graph goes up and doesn't go down, the second (fixed) graph cycles
properly.

** This is a Ruby issue, so if you have software using Mutex or Monitor,
change to Sync now. **

Tests of this latest pre-release show that the RAM is properly cycled by
the GC and that it's actually finally solved. If you run your app using
this release and you still have a leak then use the memory debugging
tools mongrel has to rule out your code (see below).

CHANGES

* No more allow_concurrency. Until Ruby's fixed I can't let people do
this anymore.
* USR1 debugging. If you're wondering about how Mongrel's locking of
Rails impacts your application, or what is causing BAD CLIENT then just
hit your mongrel_rails with USR1 and Mongrel will tell you.
* More extensive and accurate memory debugging. Use -B and look at the
log/mongrel_log/objects.log to get a good idea of counts of objects,
delta changes in counts, and mean+standard deviation lengths of objects
with length methods.
* Fixes a few places where sockets are closed and left in CLOSE_WAIT.

INSTALLING

As per usual:

sudo gem install mongrel --source=http://mongrel.rubyforge.org/releases/

Initial tests show it works on 1.8.5 and is actually faster, but this is
unsupported for now.

TESTING THIS RELEASE

If you want to test the memory leak, here's the process:

1) Start your application in *production* mode:
mongrel_rails start -e production

2) Hit it with USR1:
killall -USR1 mongrel_rails

3) Start running something that prints out the ram (here's my fish
code):
while sleep 1
ps aux | grep mongrel_rails | grep -v grep | grep -v gvim | ruby -aln
-e "puts split[4 .. 5].join(',')"
end

4) Thrash a simple rails controller with httperf:
httperf --server 127.0.0.1 --port 3000 --num-conns 1000 --rate 120
--uri /testuri

What you want to do is adjust num-conns and rate until Mongrel reports
"X threads waiting for /testuri..."

The bug only manifests itself when threads pile up behind the guard
around Rails dispatching. This is also how you'd find out which Rails
actions are too slow.

Please report any bugs you find in this release, and a Win32 release
will come out after I'm sure it works for everyone else.

ara.t.howard · Aug 26, 2006

Howdy Folks,

This release is after painstaking analysis of a memory leak that was
reported by Bradley Taylor, reduced by myself, and then fixed after much
work. You should all thank Bradley for finding the bizarre fix.

It turns out the Ruby has a memory leak when you use pretty much any
thread locking primitive other than Sync (Mutex, Monitor, etc.):

http://pastie.caboo.se/10194

The fix (for whatever reason) is to use Sync and put it in a block:

http://pastie.caboo.se/10317

Those two scripts are mini versions of how Mongrel manages threads so
that I could figure out a solution or get some input. The graph is
reported ram usage samples 1/second. As you can see the first leaking
graph goes up and doesn't go down, the second (fixed) graph cycles
properly.

** This is a Ruby issue, so if you have software using Mutex or Monitor,
change to Sync now. **

Tests of this latest pre-release show that the RAM is properly cycled by
the GC and that it's actually finally solved. If you run your app using
this release and you still have a leak then use the memory debugging
tools mongrel has to rule out your code (see below).

hi zed-

if you are really serious about fixing your leak i suggest you re-work your
tests. as i mentioned before they have several race conditions, not least of
which that they both start a random number of threads, not 1000 as the code
suggests (you can easily confirm by printing out the number of times the
thread init loop executes). further, sync.rb is the single ruby lib i've had
memory issues with on production systems. i have never managed to figure out
why that is...

in any case a careful script which allocated memory in a thread, waits for all
threads to finish allocation, checks memory, and then kills all threads before
checking again shows some suprising results which you should read carefully:

using mutex shows a nice cycle of memory freed:

harp:~ > cat a.rb.mutex
using: Mutex
n: 420
iter: 0
with 420 threads holding memory : 44.0%
with 0 threads holding memory : 13.0%
iter: 1
with 420 threads holding memory : 43.9%
with 0 threads holding memory : 13.0%
iter: 2
with 420 threads holding memory : 44.1%
with 0 threads holding memory : 13.3%
iter: 3
with 420 threads holding memory : 44.1%
with 0 threads holding memory : 13.2%
iter: 4
with 420 threads holding memory : 44.0%
with 0 threads holding memory : 13.5%
iter: 5
with 420 threads holding memory : 44.1%
with 0 threads holding memory : 13.2%
iter: 6
with 420 threads holding memory : 43.9%
with 0 threads holding memory : 13.2%
iter: 7
with 420 threads holding memory : 44.2%
with 0 threads holding memory : 13.2%
iter: 8
with 420 threads holding memory : 44.1%
with 0 threads holding memory : 13.5%
iter: 9
with 420 threads holding memory : 44.1%
with 0 threads holding memory : 13.9%

using sync, on the other hand, looks leaky, though i'm not saying it is.

harp:~ > cat a.rb.sync
using: Sync
n: 420
iter: 0
with 420 threads holding memory : 43.8%
with 0 threads holding memory : 1.0%
iter: 1
with 420 threads holding memory : 43.8%
with 0 threads holding memory : 2.0%
iter: 2
with 420 threads holding memory : 43.8%
with 0 threads holding memory : 2.7%
iter: 3
with 420 threads holding memory : 43.8%
with 0 threads holding memory : 3.5%
iter: 4
with 420 threads holding memory : 43.8%
with 0 threads holding memory : 3.8%
iter: 5
with 420 threads holding memory : 43.8%
with 0 threads holding memory : 4.6%
iter: 6
with 420 threads holding memory : 43.8%
with 0 threads holding memory : 5.4%
iter: 7
with 420 threads holding memory : 43.8%
with 0 threads holding memory : 6.4%
iter: 8
with 420 threads holding memory : 43.8%
with 0 threads holding memory : 7.2%
iter: 9
with 420 threads holding memory : 43.7%
with 0 threads holding memory : 8.1%

here is the code, note that it's quite careful to only create a fixed number
of threads, to wait for them to each init a mb of memory, and only then to
check memory usage. likewise for checking after killing all threads - it's
done immediately after killing threads and running gc. here is the code:

harp:~ > cat a.rb
require 'thread'
require 'sync'

class TestThreads
def initialize which, n
c = case which
when /mutex/io
Mutex
when /sync/io
Sync
end
@guard = c.new
@n = Integer n
puts "using: #{ c.name }"
puts "n: #{ @n }"
end

def pct_mem # linux specific field pos i'm sure
stdout = `ps v #{ Process.pid }`
stdout.split(%r/\n/).last.strip.split(%r/\s+/)[8] + '%'
end

def tq
q = Queue.new
t = Thread.new{
mb = @guard.synchronize{ 0.chr * (2 ** 20) }
q.push :ready
Thread.stop
}
[t, q]
end

def run
list = []
10.times do |i|
puts "iter: #{ i }"

# load 1000 threads up
@n.times{ list << tq }

# wait for all threads to init memory with mb of data
list.each{|t,q| q.pop}

# show memory usage
GC.start
puts "with #{ list.size } threads holding memory : #{ pct_mem }"

# kill all threads - clean up
list.each{|t,q| t.kill}
list.clear
sleep 1 until Thread.list.size == 1

# show memory usage
GC.start
puts "with 0 threads holding memory : #{ pct_mem }"
end
end
end

$VERBOSE = nil
STDOUT.sync = true
Thread.abort_on_exception = true
trap('INT'){ exit }

which, n, ignored = ARGV
TestThreads.new(which, n).run

in any case, i'd carefully examine your tests (or the rails code if that is
indeed what it's modeled after) to make sure that they test
Mutex/Sync/Thread/Ruby and not your os virtual memory system and look closely
at the results again - like i said, i have had issues with sync.rb.

the point here is that it is probably the code in question and not Mutex per
se that was causing your process to grow in vmsize.

regards.

-a

Zed Shaw · Aug 26, 2006

if you are really serious about fixing your leak i suggest you re-work your
tests. as i mentioned before they have several race conditions, not least of
which that they both start a random number of threads, not 1000 as the code
suggests (you can easily confirm by printing out the number of times the
thread init loop executes). further, sync.rb is the single ruby lib i've had
memory issues with on production systems. i have never managed to figure out
why that is...

Ara, this is uncool. Seriously man, don't e-mail me personally and then
e-mail the list the exact same e-mail. Pick one and fight your battle
there.

As I mentioned to you before, the evidence shows you are wrong. Sure
you've cooked up a script that has a memory leak with Sync, but that
script doesn't operate the way Mongrel does. The sample I developed
does operate the way Mongrel does. It's called a "bug reduction". I'm
not going to test the leak in Mongrel with a reduction that doesn't
simulate Mongrel.

Also, as I said before, this shows the leak:

http://pastie.caboo.se/10194

And this script, with just 3 lines changed to use Sync shows it's fixed:

http://pastie.caboo.se/10317

With graphs even Ara! Graphs! We ran these tests for 30-40 minutes with
hundreds of thousands of threads being cycled during the tests.

Not to mention about 8 other people switching to Sync report their leaks fixed,
our own test script showing it's fixed, Mongrel tests showing it's fixed,
several other frameworks showing it, and you can't argue with the evidence.

If your script has a leak then fine, just don't do that. Ultimately though the
ruby lang guys need to fix this because either way, there's leaks. For now,
Mongrel is not leaking and I'm happy with that.

Now, I'd appreciate it if you'd maybe spend your energy looking into the ruby
source to find this leak rather than bothering me about it.

Thank you.

William Crawford · Aug 26, 2006

Zed said:
Ara, this is uncool. Seriously man, don't e-mail me personally and then
e-mail the list the exact same e-mail. Pick one and fight your battle
there.

Maybe he simply cc'd you on the reply to the list? I doubt it was a
personal attack. At worst, probably a mistake. He's trying to be
helpful.

As I mentioned to you before, the evidence shows you are wrong. Sure
you've cooked up a script that has a memory leak with Sync, but that
script doesn't operate the way Mongrel does. The sample I developed
does operate the way Mongrel does. It's called a "bug reduction". I'm
not going to test the leak in Mongrel with a reduction that doesn't
simulate Mongrel.

I think the point here is that it isn't necessarily Sync/Mutex/Ruby, but
the way you use it. He managed to show Mutex leaking, you managed to
show Sync leaking. And you eached managed to show neither of them
leaking.

With graphs even Ara! Graphs! We ran these tests for 30-40 minutes
with
hundreds of thousands of threads being cycled during the tests.

Graphs don't make it true.

Not to mention about 8 other people switching to Sync report their leaks
fixed,
our own test script showing it's fixed, Mongrel tests showing it's
fixed,
several other frameworks showing it, and you can't argue with the
evidence.

This has only been tested for a day. There's still huge possibilities
that what you're seeing isn't what you think you're seeing. We've all
made that mistake more than once.

If your script has a leak then fine, just don't do that. Ultimately
though the
ruby lang guys need to fix this because either way, there's leaks. For
now,
Mongrel is not leaking and I'm happy with that.

Now, I'd appreciate it if you'd maybe spend your energy looking into the
ruby
source to find this leak rather than bothering me about it.

Thank you.

He's doing the same thing you are. Trying to expose the leak for what
it is. You can't go telling him to 'fix ruby' rather than bother the
list when you've done the same thing he has, just a day earlier. He's
submitting the information he has collected so others can try to figure
this out as well.

None of this is a personal attack. Nobody has said you didn't fix a
leak. There simply may be more going on here than anybody has seen yet.

Bob Hutchison · Aug 26, 2006

in any case, i'd carefully examine your tests (or the rails code if
that is
indeed what it's modeled after) to make sure that they test
Mutex/Sync/Thread/Ruby and not your os virtual memory system and
look closely
at the results again - like i said, i have had issues with sync.rb.

the point here is that it is probably the code in question and not
Mutex per
se that was causing your process to grow in vmsize.

I ran your test on OS/X looking at VSZ and RSS. And, like you,
initially got Sync with no leak visible, and mutex with what looks
like a bad leak. However, I notice that you only called GC once. I
have a years old habit of always running GC at least three times when
I really wanted GC to run (and in Java I had a loop that ran GC until
it stopped freeing stuff which in some cases was eight or nine
times). Superstition? Apparently not. On OS X, when I run GC three
times neither sync nor mutex show a memory leak.

Zed, just for fun, try running GC a few times in a row (like
GC.start; GC.start; GC.start) .

Cheers,
Bob

----
Bob Hutchison -- blogs at <http://www.recursive.ca/
hutch/>
Recursive Design Inc. -- <http://www.recursive.ca/>
Raconteur -- <http://www.raconteur.info/>
xampl for Ruby -- <http://rubyforge.org/projects/xampl/>

Bob Hutchison · Aug 26, 2006

I ran your test on OS/X looking at VSZ and RSS. And, like you,
initially got Sync with no leak visible, and mutex with what looks
like a bad leak. However, I notice that you only called GC once. I
have a years old habit of always running GC at least three times
when I really wanted GC to run (and in Java I had a loop that ran
GC until it stopped freeing stuff which in some cases was eight or
nine times). Superstition? Apparently not. On OS X, when I run GC
three times neither sync nor mutex show a memory leak.

Zed, just for fun, try running GC a few times in a row (like
GC.start; GC.start; GC.start)

Well I tried your test on OS X. The Sync had no problem, the mutex
showed the memory growth (though it eventually (fifth iteration I
think) cleaned itself up). I modified your test to create exactly
1000 threads and call GC three times at the end, things were better,
i.e. it released its memory more quickly than without, but still not
good. I ended up with:

GC.start
`sync; sync; sync`
sleep 1
GC.start
`sync; sync; sync`
sleep 1
GC.start
`sync; sync; sync`
sleep 1
GC.start
`sync; sync; sync`
sleep 1

and this made a bigger difference. The memory usage was much more
tightly bound.

(And yes, the three calls to sync are also on purpose... in the late
70s through the 80s, calling sync once didn't guarantee anything, you
had to call it a few times, three generally worked... I don't know
the current situation because it is easy enough to type
sync;sync;sync (well, in truth, I usually alias sync to the three
calls))

But of course, the point is that despite appearances there is likely
no memory leak at all on OS X, just some kind of long term cycle of
process resource utilisation -- this is a complex situation, Ruby GC,
process resource utilisation/optimisation, and system optimisation
all interacting. Who knows what's actually going on.

So.

Cheers,
Bob

----
Bob Hutchison -- blogs at <http://www.recursive.ca/
hutch/>
Recursive Design Inc. -- <http://www.recursive.ca/>
Raconteur -- <http://www.raconteur.info/>
xampl for Ruby -- <http://rubyforge.org/projects/xampl/>

ara.t.howard · Aug 26, 2006

Ara, this is uncool. Seriously man, don't e-mail me personally and then
e-mail the list the exact same e-mail. Pick one and fight your battle
there.

hi zed - calm down a minute! ;-) i have do idea what my mailer did but
suspect it has something to do with all the recipients and the fact that it
has to swtich roles to post to rails/ruby-talk. in any case, it wasn't by
design.

As I mentioned to you before, the evidence shows you are wrong.

but my evidence shows you are! i'm merely pointing out it's unclear at this
point.

Sure you've cooked up a script that has a memory leak with Sync, but that
script doesn't operate the way Mongrel does. The sample I developed does
operate the way Mongrel does. It's called a "bug reduction".

i think it's 'bug production' though - for instance, your loop that starts
'1000' threads starts a random number of them, your gc call, as bob showed,
doesn't seem to be quite enough (not that it's your fault). and there are
race conditions/edge cases with the os memory subsystem.

I'm not going to test the leak in Mongrel with a reduction that doesn't
simulate Mongrel.

but you could change the code on mongrel right!? bear in mind you have not
just been talking about changing your code, but advocating changing ruby and
all ruby software - you have to back up that sort of thing!

Also, as I said before, this shows the leak:

http://pastie.caboo.se/10194

but it doesn't show a leak on my system, on bob's system, or on ed's system?
just tons of memory usage. are you ignoring the fact that no one else who has
run it shows a leak either? how about the fact that electric fence, mpatrol,
and mtrace also show no leak? how about the fact that very few people have
had issues with mutex and memory?

And this script, with just 3 lines changed to use Sync shows it's fixed:

http://pastie.caboo.se/10317

With graphs even Ara! Graphs! We ran these tests for 30-40 minutes with
hundreds of thousands of threads being cycled during the tests.

well, you can graph my numbers too and they'll show a sawtooth for mutex and
linearly increasing line for sync, at least for the number of iterations i
posted. which is, of course, exactly the opposite of what you showed. btw,
i'd be glad if both of us could be proven wrong.

Not to mention about 8 other people switching to Sync report their leaks
fixed, our own test script showing it's fixed, Mongrel tests showing it's
fixed, several other frameworks showing it, and you can't argue with the
evidence.

i'm sure that's not true. what has happened is that they've shown a reduction
in memory usage. i don't think anything else has even been able to confirm
your code leaked - just that i grew large - eg had a bad memory
allocation/free cycle. reads bob's post for more info.

If your script has a leak then fine, just don't do that.

exactly my point too!

Ultimately though the ruby lang guys need to fix this because either way,
there's leaks.

here i disagree. i'm not sure either of us has show a leak in __ruby__. we
may have demonstrated a bad memory usage pattern for certain memory subsystems
- this is far fetched, read comp.unix.programmer and you can see thousands of
threads on this subject. it's a very frustrating one for many people.

For now, Mongrel is not leaking and I'm happy with that.

granted. however - screaming that everyone should re-write their code which
has used anything other than sync is just a bit extreme considering no one
else can show a leak - only badly behaving scripts - isn't it!?

Now, I'd appreciate it if you'd maybe spend your energy looking into the
ruby source to find this leak rather than bothering me about it.

the whole reason i'm posting is that i too have had issues with mutex/sync etc
and not solved it to my satisfaction. if you are satisfied fixing a heisenbug
and calling it good then fine. however the other reason i was posting is that
i've fixed hundreds of bugs in long running near realtime systems where the
coder wrote somthing they didn't really understand that seemed to 'fix' the
problem, only to create new ones and release a program that's even more
difficult to debug. it's very suprising how many peices of code work out of
luck but really shouldn't - if you're ok with that then fine. i just don't
happen to be.

regards.

-a

M. Edward (Ed) Borasky · Aug 26, 2006

[snip snip snip]

And nobody has yet answered my questions about the platform. The fact
that two different people see different behavior of the Ruby interpreter
in different environments makes me wonder if there aren't some
underlying race conditions or similar platform gotchas at work here, in
addition to a Ruby problem and a Mongrel workaround for a Ruby problem.

So ... Zed ... how many processors do you have, how much RAM and what OS
are you running?

Ara ... how many processors do you have, how much RAM and what OS are
you running?

khaines · Aug 27, 2006

but it doesn't show a leak on my system, on bob's system, or on ed's system?
just tons of memory usage. are you ignoring the fact that no one else who
has
run it shows a leak either? how about the fact that electric fence, mpatrol,
and mtrace also show no leak? how about the fact that very few people have
had issues with mutex and memory?

I have been playing with this all morning.

I added a line to the script to print the object count when it prints how
many threads are still spinning:

oc = 0; ObjectSpace.each_object {oc += 1}; STDERR.puts "OC: #{oc}"

And I changed it so that it creates a specific number of threads, like
other people have, and I played both with keeping the explicit GC.start
and removing it.

I altered the array creation line in the thread to use memory faster:

30000.times {|i| ta << "#{i}" }

Making a string out of that number is a much hungrier operation that, much
more quickly, seems to show Mutex "leaking".

Finally, I added a three second sleep between when one set of 1000 threads
finishes and when the next is started.

Other than that, I left it alone.

One version of the code I have been using is here:

http://rafb.net/paste/results/NbwVMo51.html

Then I sat here and ran the Mutex version and the Sync version over and
over again.

The Mutex version ends up with many more objects sitting in ObjectSpace
than the Sync based version has, and while the Sync based version's
object counts are pretty consistent, the Mutex version's are volatile,
jumping up and down wildly, with a trend towards an increasing average
count.

However, I don't think that's the whole story here. Sync, because it is a
lot more complex, it also a lot slower than Mutex. This speed issue is
part of the key to the appearance that Mutex leaks while Sync does not.

If I add one line to the test code that uses a Mutex, I get very different
results.

select(nil,nil,nil,0.025)

The number of objects floating in objectspace at any given time, while
still a bit more volatile than with the Sync version, is now close to the
same overall numbers that I see in the Sync version, and does not show a
continually upward trending average count.

In addition, RAM usage at the OS level does not continue to increase.

This strongly suggests to me that the reported leak in Mutex is no leak at
all, and that impression is boltered by having looked at the Mutex code --
it's very simple and itself creates very few objects.

Rather, in the test cases, using a Mutex just lets the threads build
objects faster than the GC thread can clean them up. Slowing the threads
down, either by using Sync, or by sticking a micro delay into them, gives
the GC enough time to keep pace, and this is the only way that I have
found to show any sort of leak with Mutex.

Kirk Haines

ara.t.howard · Aug 27, 2006

[snip snip snip]

And nobody has yet answered my questions about the platform. The fact
that two different people see different behavior of the Ruby interpreter
in different environments makes me wonder if there aren't some
underlying race conditions or similar platform gotchas at work here, in
addition to a Ruby problem and a Mongrel workaround for a Ruby problem.

So ... Zed ... how many processors do you have, how much RAM and what OS
are you running?

Ara ... how many processors do you have, how much RAM and what OS are
you running?

fortytwo :~ > cat /etc/redhat-release; uname -srm; cat /proc/cpuinfo; cat /proc/meminfo
Fedora Core release 3 (Heidelberg)
Linux 2.6.12-1.1381_FC3 i686
processor : 0
vendor_id : AuthenticAMD
cpu family : 6
model : 4
model name : AMD Athlon(tm) processor
stepping : 2
cpu MHz : 1201.740
cache size : 256 KB
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 1
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 mmx fxsr syscall mmxext 3dnowext 3dnow
bogomips : 2375.68

MemTotal: 483260 kB
MemFree: 23380 kB
Buffers: 27868 kB
Cached: 273552 kB
SwapCached: 70392 kB
Active: 294380 kB
Inactive: 128220 kB
HighTotal: 0 kB
HighFree: 0 kB
LowTotal: 483260 kB
LowFree: 23380 kB
SwapTotal: 983032 kB
SwapFree: 796192 kB
Dirty: 40 kB
Writeback: 0 kB
Mapped: 150804 kB
Slab: 28032 kB
CommitLimit: 1224660 kB
Committed_AS: 599536 kB
PageTables: 3764 kB
VmallocTotal: 540664 kB
VmallocUsed: 3600 kB
VmallocChunk: 536872 kB
HugePages_Total: 0
HugePages_Free: 0
Hugepagesize: 4096 kB

cheers.

-a

M. Edward (Ed) Borasky · Aug 27, 2006

Bob Hutchison wrote:

[snip]

Well I tried your test on OS X. The Sync had no problem, the mutex
showed the memory growth (though it eventually (fifth iteration I think)
cleaned itself up). I modified your test to create exactly 1000 threads
and call GC three times at the end, things were better, i.e. it released
its memory more quickly than without, but still not good. I ended up with:

GC.start
`sync; sync; sync`
sleep 1
GC.start
`sync; sync; sync`
sleep 1
GC.start
`sync; sync; sync`
sleep 1
GC.start
`sync; sync; sync`
sleep 1

and this made a bigger difference. The memory usage was much more
tightly bound.

(And yes, the three calls to sync are also on purpose... in the late 70s
through the 80s, calling sync once didn't guarantee anything, you had to
call it a few times, three generally worked... I don't know the current
situation because it is easy enough to type sync;sync;sync (well, in
truth, I usually alias sync to the three calls))

But of course, the point is that despite appearances there is likely no
memory leak at all on OS X, just some kind of long term cycle of process
resource utilisation -- this is a complex situation, Ruby GC, process
resource utilisation/optimisation, and system optimisation all
interacting. Who knows what's actually going on.

Finally someone with some platform details!! <vbg>

OK ... here's my take

1. The OS "three-sync" thing is, as you pointed out, a throwback to days
when you needed to do that sort of thing. It's superstition.

2. IIRC OS X is a BSD-type kernel rather than a Linux kernel, so at
least we know a different memory manager still needs "help" to deal with
this kind of application.

3. Typing a single "sync" into a *Linux* kernel when there's a lot of
"stuff" built up in RAM is a very bad idea. It will force the system
into an I/O bound mode and lock everybody out until the kernel has
cleaned up after itself. Either OS X has a better memory and I/O manager
than Linux, or you didn't have a lot of "stuff" built up from this
simple test. The second and third syncs are both unnecessary and
harmless.

4. Deleting references to no-longer-needed objects and then explicitly
calling the garbage collector has a longer history and tradition than
UNIX. It is "standard software engineering practice" in any environment
that has garbage collection. Just last week, I had to stick such a call
into an R program to keep it from crashing (on a Windows machine with 2
GB of RAM!)

For the software engineering philosophers on the list, what's the
difference between a language that forces the engineer to explicitly
manage dynamic memory allocation and de-allocation, and one that
supposedly relieves the engineer from that need -- until you crash in a
production system that worked on smaller test cases a couple of months
ago?

5. Can somebody run these Ruby leak tests/demos on a Windows XP or 2003
Server with multiple processors? I'm really curious what happens.

Patrick Hurley · Aug 27, 2006

5. Can somebody run these Ruby leak tests/demos on a Windows XP or 2003
Server with multiple processors? I'm really curious what happens.

System Info:
OS Name Microsoft Windows XP Professional
Version 5.1.2600 Service Pack 2 Build 2600
OS Manufacturer Microsoft Corporation
System Name DADDY
System Manufacturer Dell Inc.
System Model Precision WorkStation 490
System Type X86-based PC
Processor x86 Family 6 Model 15 Stepping 6 GenuineIntel ~2327 Mhz
Processor x86 Family 6 Model 15 Stepping 6 GenuineIntel ~2327 Mhz
BIOS Version/Date Dell Inc. A01, 6/30/2006
SMBIOS Version 2.3
Windows Directory C:\WINDOWS
System Directory C:\WINDOWS\system32
Boot Device \Device\HarddiskVolume1
Locale United States
Hardware Abstraction Layer Version = "5.1.2600.2180 (xpsp_sp2_rtm.040803-2158)"
User Name DADDY\patrick
Time Zone Eastern Standard Time
Total Physical Memory 2,048.00 MB
Available Physical Memory 598.48 MB
Total Virtual Memory 2.00 GB
Available Virtual Memory 1.96 GB
Page File Space 3.85 GB
Page File C:\pagefile.sys

Ruby 1.8.4

I see a slow growth in the virtual bytes counter at the first
iteration is ~32Meg, but every few iterations I see it kick up a notch
after 20+ iterations it is now ~42Meg. Now just as interestingly the
working set size on the app is rock solid just under 12Meg with no
visible growth.

Let me know if you need any more information
pth

Zed Shaw · Aug 27, 2006

[snip snip snip]

And nobody has yet answered my questions about the platform. The fact
that two different people see different behavior of the Ruby interpreter
in different environments makes me wonder if there aren't some
underlying race conditions or similar platform gotchas at work here, in
addition to a Ruby problem and a Mongrel workaround for a Ruby problem.

So ... Zed ... how many processors do you have, how much RAM and what OS
are you running?

Ara ... how many processors do you have, how much RAM and what OS are
you running?

Good question Ed, from what I know it was tested all over by most people
who use it. The leak was tested and confirmed on the following that I
know of:

* Ubuntu Linux on Dual Core Dual 1.83mhz
* Mac OSX on Intel, not sure about PPC.
* FreeBSD (not sure which version or what processors)
* Win32 -- And it appears that Mutex works but Sync doesn't on Win32 so
we're basically screwed.

Zed Shaw · Aug 27, 2006

For the software engineering philosophers on the list, what's the
difference between a language that forces the engineer to explicitly
manage dynamic memory allocation and de-allocation, and one that
supposedly relieves the engineer from that need -- until you crash in a
production system that worked on smaller test cases a couple of months
ago?

This is exactly the problem I'm complaining about. It's not that Ruby's
GC isn't collecting the ram fast enough or anything like that. If Ruby
runs out or I call GC.start then dammit clear the dead objects. It's
not a suggestion, it's a command. Collect the RAM. Don't do it lazy
until my OS or process crashes.

So, for me, despite everyone talking about timeouts and stuff (and I
know people are just trying to find out why Ruby does this) it's all
useless since I still have a program crashing and I'm now looking at a
total redesign to work around a GC.

And yet, all I really need is a single option I can pass to ruby that
says "never go above X" instead of this psuedo-leak-until-crash crap.

5. Can somebody run these Ruby leak tests/demos on a Windows XP or 2003
Server with multiple processors? I'm really curious what happens.

Luis Lavena ran our tests on Win32 and found that Mutex works but Sync
doesn't, so I'm back to square one. I'm probably going to do a fast
patch so that people can just pick either and see if the leak goes away.

Daniel Martin · Aug 27, 2006

in any case a careful script which allocated memory in a thread,
waits for all threads to finish allocation, checks memory, and then
kills all threads before checking again shows some suprising results
which you should read carefully:

There's a problem with how this script measures its memory usage - it
measures %MEM, which is proportional to RSS. However, RSS is just how
much of the program's memory is *resident*; i.e. not swapped out.

To look at how much total memory is used, the variable you want is
VSZ, which isn't available in a "ps v" but is in a "ps u". I modified
your script to look at the %MEM, VSZ, and RSS numbers from "ps u" and
also to run GC.start a variable number of times.

Here's what I found, for running GC.start 4 times for mutex and sync,
on my system. VSZ is the middle of the three numbers; I really should
prune away the other two measures, as they are just indicative of how
actively my machine feels like swapping at that moment:

using: Mutex
n: 400
gctimes: 4
iter: 0
with 400 threads holding memory : 83.9 432136 262688
with 0 threads holding memory : 4.0 23792 12760
iter: 1
with 400 threads holding memory : 86.6 435044 271356
with 0 threads holding memory : 4.3 29808 13584
iter: 2
with 400 threads holding memory : 82.6 435060 258664
with 0 threads holding memory : 4.3 35872 13564
iter: 3
with 400 threads holding memory : 92.1 435240 288316
with 0 threads holding memory : 4.8 39052 15300
iter: 4
with 400 threads holding memory : 91.9 435012 287916
with 0 threads holding memory : 4.4 43252 13944
iter: 5
with 400 threads holding memory : 92.2 435060 288920
with 0 threads holding memory : 12.2 48440 38276
iter: 6
with 400 threads holding memory : 90.0 435000 282032
with 0 threads holding memory : 8.3 53516 26236
iter: 7
with 400 threads holding memory : 90.0 435012 281944
with 0 threads holding memory : 10.6 55500 33332
iter: 8
with 400 threads holding memory : 91.9 435108 287776
with 0 threads holding memory : 4.9 59632 15496
iter: 9
with 400 threads holding memory : 88.4 435024 276876
with 0 threads holding memory : 11.4 64828 35760

using: Sync
n: 400
gctimes: 4
iter: 0
with 400 threads holding memory : 87.1 432216 272720
with 0 threads holding memory : 4.4 23980 13864
iter: 1
with 400 threads holding memory : 85.4 435276 267484
with 0 threads holding memory : 4.4 31196 13880
iter: 2
with 400 threads holding memory : 85.0 435400 266116
with 0 threads holding memory : 4.2 37564 13280
iter: 3
with 400 threads holding memory : 89.8 435408 281320
with 0 threads holding memory : 4.1 43568 12876
iter: 4
with 400 threads holding memory : 92.1 435400 288608
with 0 threads holding memory : 4.4 47796 13988
iter: 5
with 400 threads holding memory : 91.5 435348 286668
with 0 threads holding memory : 8.6 51804 26984
iter: 6
with 400 threads holding memory : 91.3 435312 285848
with 0 threads holding memory : 4.4 56664 13828
iter: 7
with 400 threads holding memory : 92.0 435360 288156
with 0 threads holding memory : 9.1 61024 28788
iter: 8
with 400 threads holding memory : 91.9 435316 287780
with 0 threads holding memory : 11.2 63508 35184
iter: 9
with 400 threads holding memory : 91.2 435364 285672
with 0 threads holding memory : 11.0 67260 34468

What this tells me is that there is a memory leak, but it appears that
both sync and mutex suffer from it. It also appears to be, at this
point, relatively minor. I haven't looked at modifying Zed Shaw's
script to have it record its own numbers, so I can't verify what he
saw. It may make a big difference that Zed's script allocates memory
by building up a big array, whereas yours does so by making a big
string - I could easily believe that there's something subtle hidden
in the interaction of Array and Mutex, for example.

M. Edward (Ed) Borasky · Aug 27, 2006

Zed said:
This is exactly the problem I'm complaining about. It's not that Ruby's
GC isn't collecting the ram fast enough or anything like that. If Ruby
runs out or I call GC.start then dammit clear the dead objects. It's
not a suggestion, it's a command. Collect the RAM. Don't do it lazy
until my OS or process crashes.

And when a lazy garbage collector meets a lazy OS memory manager, all
heck breaks loose. Three syncs, a garbage collect, repeat thrice? This
*is* the 21st century, right? Do the &$%^# I/O, dagnabbit!!

So, for me, despite everyone talking about timeouts and stuff (and I
know people are just trying to find out why Ruby does this) it's all
useless since I still have a program crashing and I'm now looking at a
total redesign to work around a GC.

And yet, all I really need is a single option I can pass to ruby that
says "never go above X" instead of this psuedo-leak-until-crash crap.

If I may use the J word ... the Java run-time has stuff like that. Set
your RAM size too small and the garbage collector is about all that
runs. Set it too large, and the JRE and its workload gets paged or
swapped out.

So you buy more RAM. Guess what? Some Linux kernels perform *worse* the
more RAM they need to manage! Tables to manage high memory (> 1 GB) are
kept in low memory (< 1 GB). It doesn't take much for the tables to fill
up low memory. So you buy 64-bit processors.

But Windows has similar problems -- it's not just Linux. I don't know
about BSD or Mac OS X. As the marine biologist said, "Bah Humpback!"

Luis Lavena ran our tests on Win32 and found that Mutex works but Sync
doesn't, so I'm back to square one. I'm probably going to do a fast
patch so that people can just pick either and see if the leak goes away.

Yeah, I figured Windows was gonna fail one way or another. And it
probably varies depending on whether you use MinGW or Visual Stupido to
compile Ruby <weg>

Well, enough trolling for one night ... I'm going to see if I can write
a web server in GForth in a 16-bit address space.

<ducking>

M. Edward (Ed) Borasky · Aug 27, 2006

Daniel said:
There's a problem with how this script measures its memory usage - it
measures %MEM, which is proportional to RSS. However, RSS is just how
much of the program's memory is *resident*; i.e. not swapped out.

You're assuming here that the OS is accurately counting RSS and that the
user-space tool is correctly picking it up from the OS. That's not
necessarily a given in Linux.

What's worse, no two programmers have
the same view of how one should export performance counters in /proc,
nor is there often much documentation except the source. "vi
/usr/src/linux/fs/proc/*.c" is your friend.

To look at how much total memory is used, the variable you want is
VSZ, which isn't available in a "ps v" but is in a "ps u". I modified
your script to look at the %MEM, VSZ, and RSS numbers from "ps u" and
also to run GC.start a variable number of times.

Yeah, but ... threads share the same virtual address space, making the
"%MEM" more or less meaningless. To use the J word again, a Java runtime
can have ten or twenty or more threads, all sharing the same address
space and all having the same VSZ. They get counted more than once.
Ugly. Basically, if you want accurate memory usage information from
Linux, you need to read from /dev/kmem and walk the tables yourself.

User Name · Aug 27, 2006

M. Edward (Ed) Borasky said:
Yeah, but ... threads share the same virtual address space, making the
"%MEM" more or less meaningless. To use the J word again, a Java runtime
can have ten or twenty or more threads, all sharing the same address
space and all having the same VSZ. They get counted more than once.
Ugly. Basically, if you want accurate memory usage information from
Linux, you need to read from /dev/kmem and walk the tables yourself.

You're telling me I shouldn't be using %MEM. I already knew that, and
that was why I was making this post. I'm basing my conclusions off VSZ.

VSZ is the total amount of virtual memory allocated by the OS to the
application, and is also the number that Linux uses when deciding what
to kill. Since this began with Mongrel processes being killed, this
is the relevant number. As ruby threads aren't seen as threads by the
underlying OS, what you see with java (where the OS does see the java
threads) really isn't relevant. As far as the OS is concerned, ruby
is just one process like other processes - ruby threads don't, for
example, take advantage of SMP machines and execute in parallel on
separate CPUs.

Bob Hutchison · Aug 27, 2006

4. Deleting references to no-longer-needed objects and then explicitly
calling the garbage collector has a longer history and tradition than
UNIX. It is "standard software engineering practice" in any
environment
that has garbage collection. Just last week, I had to stick such a
call
into an R program to keep it from crashing (on a Windows machine
with 2
GB of RAM!)

Indeed.

So, instead of just counting the live objects, I decided to take a
look at what was there.

Guess what I found.

Using Ara's test and mutex, I was getting a kind of oscillation of
memory use. Not a nice profile at all. Having a look at the objects
still in ObjectSpace what do I see? why the Thread objects! I was
running Ara's test with 100 threads and there was 101 threads in
ObjectSpace (well, sometimes 103).

So I set the guard to nil before running the cleanup step, and
created a new mutex at the beginning of the iteration. Now only one
Thread in the ObjectSpace. And guess what? A nice *constant* memory
utilisation (according to VSZ).

The Mutex was somehow holding the Thread object in memory and that
was holding other stuff.

You're right, deleting references to no-longer-needed objects is
standard. I suppose it is also standard to make sure you get *all* of
them

Cheers,
Bob

----
Bob Hutchison -- blogs at <http://www.recursive.ca/
hutch/>
Recursive Design Inc. -- <http://www.recursive.ca/>
Raconteur -- <http://www.raconteur.info/>
xampl for Ruby -- <http://rubyforge.org/projects/xampl/>

Bob Hutchison · Aug 27, 2006

1. The OS "three-sync" thing is, as you pointed out, a throwback to
days
when you needed to do that sort of thing. It's superstition.

Well I certainly hope it is superstition. But it is kind of like
those fancy new stove tops that are not supposed to get hot enough to
burn you... *you* try it out, I'll keep my hands away

FWIW, I also still write 0 == N rather than N == 0 all the time.
Sometimes the burn was so bad that it is psychologically easier to
just keep up the avoidance behaviour

Cheers,
Bob

----
Bob Hutchison -- blogs at <http://www.recursive.ca/
hutch/>
Recursive Design Inc. -- <http://www.recursive.ca/>
Raconteur -- <http://www.raconteur.info/>
xampl for Ruby -- <http://rubyforge.org/projects/xampl/>

[ANN] 0.3.13.4 Pre-Release -- Almost Ready	1	Aug 8, 2006
[ANN] Mongrel Web Server 0.3.12 -- Pre-Release	5	Mar 27, 2006
[ANN] Mongrel 0.3.13.4 Pre-Release -- PID Files Working	0	Aug 11, 2006
[ANN] ShinyCocos pre-release	0	May 27, 2009
[ANN] Mongrel 0.3.2 -- The Right Site/All Requests Answered	1	Feb 14, 2006
[ANN] Mongrel 0.3.9 -- GemPlugin Based	0	Mar 6, 2006
[ANN] Mongrel 0.3 -- Runs Ruby On Rails	0	Feb 10, 2006
Mongrel Web Server 0.3.12 -- Finally Out	0	Mar 30, 2006

[ANN] Mongrel 0.3.13.4 Pre-Release -- Ruby's LEAK Fixed (Death ToMutex!)

Zed Shaw

ara.t.howard

Zed Shaw

William Crawford

Bob Hutchison

Bob Hutchison

ara.t.howard

M. Edward (Ed) Borasky

khaines

ara.t.howard

M. Edward (Ed) Borasky

Patrick Hurley

Zed Shaw

Zed Shaw

Daniel Martin

M. Edward (Ed) Borasky

M. Edward (Ed) Borasky

User Name

Bob Hutchison

Bob Hutchison

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads