Should *most* memory be release back to the system?

B

Blackie

If anyone can explain this I would appreciate it.

I have an IT group knocking Ruby saying that it never releases memory
back to the system (to be available to procs other than Ruby) so
feeling somewhat defensive I went and wrote a dumb script that goes
back and forth between two methods, one cat'ing huge strings and the
other parsing an xml doc with Hpricot.

"task" tops off (in top) around 50M and "other_task" peaks around 150M
(on my machine, CentOS, lastest stable 1.8.6) but when we return to
running "task" for extended periods, memory usage remains at ~150M.

Forgive my ignorance. Can anyone explain this behavior or at least
point me to a place to educate myself?

Many thanks, the script I'm running is below.

--------
#!/usr/bin/env ruby
require 'rubygems'
require 'hpricot'

def other_task
a = []
9999.times do
a << "12345678901234567890123456789012345678901234567890" * 100
end
nil
end

def task
# 500K xml data
data = File.readlines("very_large_output.xml").to_s
temp = Hpricot.XML data
nil
end

puts "In task"
10.times {|i| task; p i}
puts "In other task"
100.times {|i| other_task; p i}
puts "In task (Should memory go down?)"
100.times {|i| task; p i}
 
Y

Yohanes Santoso

Blackie said:
If anyone can explain this I would appreciate it.

It's your OS (I meant kernel and libc).

http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-talk/160726
and
http://www.crowdedweb.com/articles/...s-and-application-memory-consumption-patterns.

The second link is dead, but there may be cached copies around.

The first one shows that on some OS, if there is a hole in your memory
usage, that hole will not be returned by your libc to the kernel or
the kernel won't reclaim it. Either way, your OS is not taking it
back.

The second one (if you ever find a copy) adds some more observations
and also a handy tweak for FreeBSD that cause the OS to take back
holes (IIRC).

But most importantly, one should know that RSS and VSZ are not
Accurate Measure of Memory Usage.

YS.


I have an IT group knocking Ruby saying that it never releases memory
back to the system (to be available to procs other than Ruby) so
feeling somewhat defensive I went and wrote a dumb script that goes
back and forth between two methods, one cat'ing huge strings and the
other parsing an xml doc with Hpricot.

"task" tops off (in top) around 50M and "other_task" peaks around 150M
(on my machine, CentOS, lastest stable 1.8.6) but when we return to
running "task" for extended periods, memory usage remains at ~150M.

Forgive my ignorance. Can anyone explain this behavior or at least
point me to a place to educate myself?

Many thanks, the script I'm running is below.

--------
#!/usr/bin/env ruby
require 'rubygems'
require 'hpricot'

def other_task
a = []
9999.times do
a << "12345678901234567890123456789012345678901234567890" * 100
end
nil
end

def task
# 500K xml data
data = File.readlines("very_large_output.xml").to_s
temp = Hpricot.XML data
nil
end

puts "In task"
10.times {|i| task; p i}
puts "In other task"
100.times {|i| other_task; p i}
puts "In task (Should memory go down?)"
100.times {|i| task; p i}
 
B

Blackie

I do appreciate your help (and no that link is lost to the mists of
time as far as I can google.) :)

I do understand that they are not useful for leak detection, but this
is just observing the peak during the life of the script. During
"other_task" I can see the usage rise *and fall* so I know the OS is
reclaiming *some* memory in places...but the part I don't understand
is why it isn't returning to "tasks" original base of 50M when only
"task" is running.

I hate to sound dense, but I need to convince some fairly hard-headed
sysadmins.

Blackie said:
If anyone can explain this I would appreciate it.

It's your OS (I meant kernel and libc).

http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-talk/160726
andhttp://www.crowdedweb.com/articles/2006/03/22/ruby-on-rails-and-appli....

The second link is dead, but there may be cached copies around.

The first one shows that on some OS, if there is a hole in your memory
usage, that hole will not be returned by your libc to the kernel or
the kernel won't reclaim it. Either way, your OS is not taking it
back.

The second one (if you ever find a copy) adds some more observations
and also a handy tweak for FreeBSD that cause the OS to take back
holes (IIRC).

But most importantly, one should know that RSS and VSZ are not
Accurate Measure of Memory Usage.

YS.


I have an IT group knocking Ruby saying that it never releases memory
back to the system (to be available to procs other than Ruby) so
feeling somewhat defensive I went and wrote a dumb script that goes
back and forth between two methods, one cat'ing huge strings and the
other parsing an xml doc with Hpricot.
"task" tops off (in top) around 50M and "other_task" peaks around 150M
(on my machine, CentOS, lastest stable 1.8.6) but when we return to
running "task" for extended periods, memory usage remains at ~150M.
Forgive my ignorance. Can anyone explain this behavior or at least
point me to a place to educate myself?
Many thanks, the script I'm running is below.
def other_task
a = []
9999.times do
a << "12345678901234567890123456789012345678901234567890" * 100
end
nil
end
def task
# 500K xml data
data = File.readlines("very_large_output.xml").to_s
temp = Hpricot.XML data
nil
end
puts "In task"
10.times {|i| task; p i}
puts "In other task"
100.times {|i| other_task; p i}
puts "In task (Should memory go down?)"
100.times {|i| task; p i}
 
R

Robert Klemme

2007/10/11 said:
I do appreciate your help (and no that link is lost to the mists of
time as far as I can google.) :)

I do understand that they are not useful for leak detection, but this
is just observing the peak during the life of the script. During
"other_task" I can see the usage rise *and fall* so I know the OS is
reclaiming *some* memory in places...but the part I don't understand
is why it isn't returning to "tasks" original base of 50M when only
"task" is running.

I hate to sound dense, but I need to convince some fairly hard-headed
sysadmins.

:)

There are a few things to say to this.

First, it seems reasonable to hold on to memory that has once been
grabbed because you can expect that your process needs that memory
again. There is no point in allocating and deallocation memory from
the OS all the time.

Then, even though memory is allocated does not mean that physical
memory is actually used. IIRC pure allocation just reserves memory,
only when you try to access it the first time the OS generates a page
fault and you get physical memory. If memory is unused for a while
chances are that it's paged out to disk *if* other processes need
resources. If not, there is no problem anyway. Granted, if a machine
does not have enough virtual memory configured this can lead to
problems for long running programs but then I'd say the machine is
probably misconfigured anyway.

Now, what was the third point I had in mind? Ah yes: I believe
typically JVM's behave the same. However, Sun's JVM also does heave
copying around which might help the OS because objects will be packed
on fewer memory pages so that more pages are idle and can be swapped
out. I believe the current Ruby interpreter does not copy objects but
you can verify this by looking at the source code.

All in all, this is a strange reason to ban a programming language
from a machine IMHO. Other reasons seem more reasonable to me
(management overhead for keeping the installation up to date etc.).

Kind regards

robert
 
M

MenTaLguY

During "other_task" I can see the usage rise *and fall* so I know the
OS is reclaiming *some* memory in places...but the part I don't
understand is why it isn't returning to "tasks" original base of 50M
when only "task" is running.

This is not actually a problem specific to Ruby, but applies to the
majority of software which does not use a compacting allocator (this
includes C, with malloc/free[1]).

Typically, memory is allocated in blocks from the OS, and parceled out
to individual objects from there. When there are no more active objects
in a block, the block could theoretically be returned to the OS. In
practice that can be hard to do, since individual "stragglers" can
prevent entire blocks from being reclaimed. The block can still be
reused within the program to allocate new objects, but it may not be
possible to return to the OS while the program is still running[2].

Ruby does make a reasonable effort to return unused blocks ("heaps" in
the parlance of gc.c) to the OS[3], when it is possible to do so. But
it is not always possible to do so.
I hate to sound dense, but I need to convince some fairly hard-headed
sysadmins.

Do they use Perl? Perl 5 does not even try to return memory to the OS.

-mental

[1] malloc/free can be even worse, if implemented using brk/sbrk,
since a single live object at a high address can "pin" the entire
rest of the heap below it, not just a single block

[2] a process' memory is always reclaimed by the OS once it exits

[3] see free_unused_heaps in Ruby's gc.c
 
S

Sylvain Joyeux

Ruby does make a reasonable effort to return unused blocks ("heaps" in
the parlance of gc.c) to the OS[3], when it is possible to do so. But
it is not always possible to do so.
Ruby does not free heaps. It it supposed to do so, but the way it allocates
very big heaps forbids him to do that in practice. Search the 'gc.c --
possible logic error?' thread on ruby-core for details.

Nonetheless, you have a lot of objects in less than 10M of heaps (and I
mean *a lot*), so the 150M memory usage is certainly not due to that.
 
J

Joel VanderWerf

Yohanes said:
But most importantly, one should know that RSS and VSZ are not
Accurate Measure of Memory Usage.

Do you mean not accurate as (1) or (2):

(1) an estimate of how much memory is used by the interpreter for
objects, program, etc.

or

(2) an estimate of how much memory the kernel has allocated to the process.
 
M

MenTaLguY

Ruby does make a reasonable effort to return unused blocks ("heaps" in
the parlance of gc.c) to the OS[3], when it is possible to do so. But
it is not always possible to do so.
Ruby does not free heaps. It it supposed to do so, but the way it
allocates very big heaps forbids him to do that in practice.

That's true; Ruby tends to use relatively large heap sizes, which
makes it unlikely for there to be any unused heaps which can be freed.

Using per-heap freelists and preferring newer heaps for allocation
would help, although it would mean additional overhead to determine
which heap a freed object belonged to.

-mental
 
S

Sylvain Joyeux

preferring newer heaps for allocation would help
Actually, it does not really. The probability to have at least one object
kept in, say, 10000 (mimumn heap size) is quite big. I'll have more raw
data to show when I have time to do a page about them.

Sylvain
 
B

Blackie

Thank you everyone. I can now make some relatively informed and ration
explanations to my cohorts here.

Have a great day!
 
R

Robert Klemme

2007/10/12 said:
Thank you everyone. I can now make some relatively informed and ration
explanations to my cohorts here.

Let us know the outcome. :)

robert
 
Y

Yohanes Santoso

ara.t.howard said:
sorry, for jumping in, but i'd love your opinion on this yohanes:

http://drawohara.tumblr.com/post/14421265

Ah, I'm sorry. I have been swamped recently and has not been able to
read this mailbox.

Too bad you're on Darwin. Had you been on Linux (and on glibc), I'd
suggest you to patch ruby so that it executes this first thing after
it starts:

mallopt(M_MMAP_THRESHOLD, 0); /* declared in malloc.h */

What this does is to make all allocation using mmap instead of
sbrk. This allows all free() to return the allocated space back to the
kernel. Doing this eliminates the possiblity that VSZ climbs because
of memory fragmentation. If VSZ still climbs, then there are some
garbage somewhere not released. OTOH, this causes syscall for every
allocation.

I hope there is a suitable equivalent in Darwin.

YS.
 
Y

Yohanes Santoso

Joel VanderWerf said:
Do you mean not accurate as (1) or (2):

(1) an estimate of how much memory is used by the interpreter for
objects, program, etc.

or

(2) an estimate of how much memory the kernel has allocated to the process.

It's #1.

For #2, it's quite difficult what with shareable memory and such. VSZ
is a good approximation for that if there isn't that much shareable
memory.

YS.
 
M

M. Edward (Ed) Borasky

Yohanes said:
Ah, I'm sorry. I have been swamped recently and has not been able to
read this mailbox.

Too bad you're on Darwin. Had you been on Linux (and on glibc), I'd
suggest you to patch ruby so that it executes this first thing after
it starts:

mallopt(M_MMAP_THRESHOLD, 0); /* declared in malloc.h */

What this does is to make all allocation using mmap instead of
sbrk. This allows all free() to return the allocated space back to the
kernel. Doing this eliminates the possiblity that VSZ climbs because
of memory fragmentation. If VSZ still climbs, then there are some
garbage somewhere not released. OTOH, this causes syscall for every
allocation.

I hope there is a suitable equivalent in Darwin.

YS.

Is this worth making part of the standard Ruby build on Linux?
 
Y

Yohanes Santoso

M. Edward (Ed) Borasky said:
Is this worth making part of the standard Ruby build on Linux?

Definitely not. This is just for diagnostic. It's also quite expensive
because you incur syscall for each memory allocation&deallocation.

If VSZ climbs up because of memory fragmentation, then it is really a
perception problem. No unnecessary real memory will be wasted because
of the process memory management the kernel does. The only real
annoyance is when the process terminates, the kernel will be swapping
in all those pages. This is an annoyance that could borderline to
become a problem depending on your requirement. Some OSes, like
FreeBSD, has a special option that disable the swapping in of all
those pages when a process terminates[1].

If VSZ climbs up because garbages are not being collected, then you
can try to fix the problem. Most probably they are fixable from user
code (having references to unnecessary objects). If the problem is
because of ruby's GC quirknesses (few but could be very difficult to
fix), then it's a toss-up. But I still say the cost is probably not
worth it.

I don't favour the long-running process model for server. I prefer to
fork() for each request. So I'm rarely bothered by whatever ruby's GC
quirknesses that I may have triggered. I understand that this approach
is not trendy anymore and RoR does not support this model, but I'm
just throwing it out in the open for an alternative work-around where
possible.


YS.






Footnotes:
[1] Thanks to Daniel DeLorme for the link: http://web.archive.org/web/20061023...s-and-application-memory-consumption-patterns
 
A

ara.t.howard

I don't favour the long-running process model for server. I prefer to
fork() for each request. So I'm rarely bothered by whatever ruby's GC
quirknesses that I may have triggered. I understand that this approach
is not trendy anymore and RoR does not support this model, but I'm
just throwing it out in the open for an alternative work-around where
possible.

that's quite interesting because, while i'm not the memory expert you
are, i've settled on exactly that model for the many many server
process i've written for 24x7 systems: the robustness simply cannot
be beaten.

kind regards.

a @ http://codeforpeople.com/
 
M

MenTaLguY

Is this worth making part of the standard Ruby build on Linux?

The downside is that using mmap for every allocation can result in
a large number of distinct memory mappings, hurting performance.

Ideally, Ruby could allocate heaps using mmap rather than malloc,
on platforms where it was feasible to do so (generally this means
private mappings of /dev/zero, which not all Unices support). On
Windows you'd probably want to use VirtualAlloc().

-mental
 
J

Jeremy Kemper

I don't favour the long-running process model for server. I prefer to
fork() for each request. So I'm rarely bothered by whatever ruby's GC
quirknesses that I may have triggered. I understand that this approach
is not trendy anymore and RoR does not support this model, but I'm
just throwing it out in the open for an alternative work-around where
possible.

Hi Yohanes: most Rails deployments use a process pool but with longer
lifetime than a single request. You can let the process exit after
max_child_requests and rely on the parent to fork or the process
supervisor to respawn. Perhaps it isn't trendy? but this approach is
common and works well for both Ruby and Rails.

Also of interest, Hongli Lai has recently done some work to make Ruby
GC copy-on-write friendly and thus more attractive to fork:
http://izumi.plan99.net/blog/

Best,
jeremy
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,770
Messages
2,569,583
Members
45,075
Latest member
MakersCBDBloodSupport

Latest Threads

Top