Should most memory be release back to the system?

Joel VanderWerf · Oct 18, 2007

Yohanes said:
mallopt(M_MMAP_THRESHOLD, 0); /* declared in malloc.h */

Very interesting. I'd like to use this as a diagnostic.

I patched ruby's main.c to call mallopt() before anything else. It seems
to be using a huge amount of memory, though. Is this normal? Starting
the following:

$ ruby -e 'sleep 100'

Causes:

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
21422 vjoel 17 0 142m 140m 1892 S 0.0 13.9 0:00.62 ruby

I have 1.8.6p110 and ubuntu 7.04:

$ ruby -v
ruby 1.8.6 (2007-09-23 patchlevel 110) [i686-linux]
$ uname -a
Linux tumbleweed 2.6.20-16-generic #2 SMP Sun Sep 23 19:50:39 UTC 2007
i686 GNU/Linux

MenTaLguY · Oct 18, 2007

Very interesting. I'd like to use this as a diagnostic.

I patched ruby's main.c to call mallopt() before anything else. It seems
to be using a huge amount of memory, though. Is this normal?

Using mmap for individual allocations means that each allocation
gets rounded up to the nearest multiple of the page size
(typically 4k).

That aside, I'm not sure it's that useful as a diagnostic; note
that only M_MMAP_MAX chunks will ever be allocated at a time with
mmap; further allocations beyond that are allocated the normal way
by advancing sbrk.

-mental

M. Edward (Ed) Borasky · Oct 19, 2007

ara.t.howard said:
that's quite interesting because, while i'm not the memory expert you
are, i've settled on exactly that model for the many many server process
i've written for 24x7 systems: the robustness simply cannot be beaten.

kind regards.

a @ http://codeforpeople.com/

fork() (or clone() in Linux) is cheap ... it's actually *instantiating*
the thread or process that costs! Depending how smart your kernel is,
you could be doing it one page fault at a time. And no matter *how*
smart your kernel is, above a certain ratio of virtual process size over
real process size, it's going to start thrashing. Pay me now or pay me
later, etc.

That's what's so attractive about lightweight communicating processes --
emphasis on *lightweight*. It doesn't cost much to start them up, move
them around, kill them, etc.

Yohanes Santoso · Oct 19, 2007

MenTaLguY said:
Using mmap for individual allocations means that each allocation
gets rounded up to the nearest multiple of the page size
(typically 4k).

That aside, I'm not sure it's that useful as a diagnostic; note
that only M_MMAP_MAX chunks will ever be allocated at a time with
mmap; further allocations beyond that are allocated the normal way
by advancing sbrk.

-mental

I suppose all these could be offered as an extension module. At the
very least allow: Malloc.m_mmap_threshold=(),m_mmap_max=(), and
stat() which returns wrapped struct mallinfo.[1] This is so you don't
have to pay for allocations not done by your app like allocations
internal to ruby and/or RoR.

From struct mallinfo, it seems you can get the actual memory use with
hblkhd + uordblks which would provide yet another way to diagnose the
raising VSZ problem. Perhaps mental can confirm this?

YS.

Footnotes:
[1] http://www.gnu.org/software/libc/manual/html_node/Statistics-of-Malloc.html#Statistics-of-Malloc

Yohanes Santoso · Oct 19, 2007

Ara, my knowledge is limited to whatever few ad-hoc experimentations
I've done.

Ed,

fork() (or clone() in Linux) is cheap ... it's actually
*instantiating* the thread or process that costs!

What do you mean by 'instantiating'? When you fork() a new process is
created and scheduled. That seems instantiated enough for me.

Depending how smart your kernel is, you could be doing it one page
fault at a time.

And no matter *how* smart your kernel is, above a certain ratio of
virtual process size over real process size, it's going to start
thrashing.

Do you have an example? I don't quite get what you meant. I'm not sure
why the ratio value of VSZ over real process size (I assume it's not
RSZ which is the resident size) matters. Can the approximate value of
this ratio be determined?

My understanding is a process thrashes because its working set during
the thrashing period cannot be paged in in its entirety. This could be
because of limited resources (pressure from other processes, etc.) or
bug in the kernel.

That's what's so attractive about lightweight communicating
processes -- emphasis on *lightweight*. It doesn't cost much to
start them up, move them around, kill them, etc.

Regards,
YS.

khaines · Oct 19, 2007

fork() (or clone() in Linux) is cheap ... it's actually *instantiating* the
thread or process that costs! Depending how smart your kernel is, you could

This is just a side note, but the sentence above reminded me of it.

Some time ago I wrote a Mongrel variation that used fork() on incoming
requests instead of spawning a thread. Throughput on it was lousy,
comparatively. Somewhere around an order of magnitude worse than using
Ruby threads.

It would work for modest volume sites, but there was a large response time
tax imposed by forking versus using the Ruby threads. This was tested on
a Linux box, though it was an older (2.4.x kernel).

Kirk Haines

Robert Klemme · Oct 19, 2007

2007/10/19 said:
This is just a side note, but the sentence above reminded me of it.

Some time ago I wrote a Mongrel variation that used fork() on incoming
requests instead of spawning a thread. Throughput on it was lousy,
comparatively. Somewhere around an order of magnitude worse than using
Ruby threads.

It would work for modest volume sites, but there was a large response time
tax imposed by forking versus using the Ruby threads. This was tested on
a Linux box, though it was an older (2.4.x kernel).

Did you fork for every request? If so then it seems there might be a
more optimal solution (starting worker processes and then sending off
requests to them via DRb for example).

Cheers

robert

M. Edward (Ed) Borasky · Oct 19, 2007

This is just a side note, but the sentence above reminded me of it.

Some time ago I wrote a Mongrel variation that used fork() on incoming
requests instead of spawning a thread. Throughput on it was lousy,
comparatively. Somewhere around an order of magnitude worse than using
Ruby threads.

It would work for modest volume sites, but there was a large response
time tax imposed by forking versus using the Ruby threads. This was
tested on a Linux box, though it was an older (2.4.x kernel).

Kirk Haines

Yeah ... I should have been more explicit. When you do a fork/clone in
Linux, an "empty" process/thread is created. You get a task control
block and an empty memory map and that's about it. That doesn't take
very much time or space.

But when you actually want that process/thread to do something, its code
(text) pages have to be given page frames and loaded into RAM, which I
called "instantiating". And those code pages refer to data pages and
*those* have to be given page frames in RAM, they read data from disk
and *those* pages have to be given page frames in RAM, etc. That's
"demand paging" -- nothing happens until an instruction gets a page
fault, unless you count the kernel's lookahead mechanisms.

In the high-level view, most "modern" operating systems -- Solaris,
Windows, Linux and BSD/MacOS -- work the same way. There are minor
variations on what things are called and various tuning knobs, but
essentially you have pages on disk, page frames in RAM,
page-fault-driven on-demand movement of code and data into RAM and some
background processes/daemons/kernel threads that try to maintain a
balance of all the many demands for page frames.

When it works, it works well, and when it doesn't work, it fails
spectacularly -- disk thrashing, out-of-memory process killers, response
times on the order of minutes for one-second tasks, freezing screens,
etc. And the solution is to add more RAM or have the software use less RAM.

Now the killer is this: the platform (hardware and OS) designers make a
bunch of compromises so that you can get "acceptable" performance for a
lot of different languages -- compiled or interpreted, static memory
allocation or dynamic memory allocation, explicit memory
allocation/deallocation or garbage collection, etc. And the language
designers make a bunch of compromises so that you can get "acceptable"
performance on modern operating systems. It's almost as if the two types
of designers communicate with each other only every fifteen years or so.

What's even more interesting is that proposals to change this -- to
integrate language design and platform design -- almost always fall back
to an experiment that was tried and failed (commercially, not
technically): Lisp machines.

khaines · Oct 19, 2007

Did you fork for every request? If so then it seems there might be a
more optimal solution (starting worker processes and then sending off
requests to them via DRb for example).

Yeah. I was just exploring the idea of memory management through fork().
Starting worker processes and distributing requests to them is
essentially what is happening when one makes a cluster of mongrels through
one of the available clustering solutions.

Kirk Haines

Michal Suchanek · Oct 19, 2007

In the high-level view, most "modern" operating systems -- Solaris,
Windows, Linux and BSD/MacOS -- work the same way. There are minor
variations on what things are called and various tuning knobs, but
essentially you have pages on disk, page frames in RAM,
page-fault-driven on-demand movement of code and data into RAM and some
background processes/daemons/kernel threads that try to maintain a
balance of all the many demands for page frames.

When it works, it works well, and when it doesn't work, it fails
spectacularly -- disk thrashing, out-of-memory process killers, response
times on the order of minutes for one-second tasks, freezing screens,
etc. And the solution is to add more RAM or have the software use less RAM.

Well, the memory subsystem is quite underdeveloped on the "general
purpose" OSes. You normally do not get resource accounting unless you
do realtime or some specialized OS but you at least get priorities for
cpu time. Nothing like that for memory. It is all just best effort,
distributed more or less proportionally to the amount of pages the
process has touched recently, and when it runs out something randomly
breaks.

Now the killer is this: the platform (hardware and OS) designers make a
bunch of compromises so that you can get "acceptable" performance for a
lot of different languages -- compiled or interpreted, static memory
allocation or dynamic memory allocation, explicit memory
allocation/deallocation or garbage collection, etc. And the language
designers make a bunch of compromises so that you can get "acceptable"
performance on modern operating systems. It's almost as if the two types
of designers communicate with each other only every fifteen years or so.

I cannot imagine what else you can do when you want an OS that runs
pretty much all languages. All that the OS can do is hand out pages,
and only the language runtime can manage the data inside those pages.
Unless you tailor the OS to one specific language or virtual machine
you cannot get anything more.

The POSIX interface might make it easier to allocate through growing
the heap rather than allocating individual pages. But still mapping
individual pages only helps in the situation when you have one huge
hole (which can be swapped out anyway), and data at the end of the
heap. This is a just special case of fragmentation. Clever allocators
can make fragmentation less likely and less severe but in the end you
cannot completely fix it unless you have a means of condensing your
data on your heap. And that you must do yourself, the OS cannot do
that. A VM may do it for you if you use an interpreted language. You
could even modify your C compiler and runtime to use indirect pointers
but then you would lose the single benefit of C - binary
compatibility.

What's even more interesting is that proposals to change this -- to
integrate language design and platform design -- almost always fall back
to an experiment that was tried and failed (commercially, not
technically): Lisp machines.

Well, that's where you get if you manage the language objects in the
OS (assuming that a lisp machine is the thing where you basically run
lisp runtime on the bare metal). It's perfectly integrated but you
lose the ability to run other languages easily because you have to map
them somehow to your chosen language. For some that are similar enough
it might be easy, for others difficult, and for some (near)
impossible.

It's been done for several languages already. You get a nice toy and
perhaps an environment for embedded or specialized systems. But not a
general purpose desktop system because you want the ability to run any
language in which a piece of software happens to e written.

Thanks

Michal

M. Edward (Ed) Borasky · Oct 20, 2007

Michal said:
Well, the memory subsystem is quite underdeveloped on the "general
purpose" OSes. You normally do not get resource accounting unless you
do realtime or some specialized OS but you at least get priorities for
cpu time. Nothing like that for memory. It is all just best effort,
distributed more or less proportionally to the amount of pages the
process has touched recently, and when it runs out something randomly
breaks.

You're right ... memory management technology (hardware or OS) hasn't
improved substantially since the days of Peter Denning and System\360.

Part of that is due to the fact that the equations necessary to come
to some reasonable conclusions about alternatives are ghastly. They're
much more difficult to deal with than those that govern networking, for
example, which is why routers are so smart these days and memory
management is stuck in a time warp.

I cannot imagine what else you can do when you want an OS that runs
pretty much all languages. All that the OS can do is hand out pages,
and only the language runtime can manage the data inside those pages.
Unless you tailor the OS to one specific language or virtual machine
you cannot get anything more.

But that's pretty much what we have now, that one specific language
being C. There was a time when operating systems and compilers were
written either in assembler or other "system programming" languages like
Bliss. But now most operating systems are written in C, most compilers
and interpreters are written in C, and it's only end-user applications
that tend to be written in all the other languages. So really, you get
"pretty much all languages" by writing their compilers or interpreters
in C.

So you could tailor the OS (and hardware) to C. (That's actually where
the "RISC revolution" was headed, until Intel found a way to
out-manufacture the RISC chip vendors.) But that's not what has
happened. Instead, the OS acts as a kind of "middleware" between
compilers and interpreters and the hardware, and there's another layer
of middleware inside the chip between the OS and a "RISC core" that
actually does the arithmetic and string operations. The Intel Mac was
only the last nail in the coffin of RISC.

Well, that's where you get if you manage the language objects in the
OS (assuming that a lisp machine is the thing where you basically run
lisp runtime on the bare metal). It's perfectly integrated but you
lose the ability to run other languages easily because you have to map
them somehow to your chosen language. For some that are similar enough
it might be easy, for others difficult, and for some (near)
impossible.

Actually, you can write compilers for other languages on Lisp machines,
and you can write an operating system in Lisp too. I don't know how well
suited Lisp is to running an OS, but it's an excellent language for
writing compilers and interpreters. But we don't have Lisp machines
today for the same reason we don't have many RISC machines today -- the
alternatives had more powerful marketing and manufacturing.

M. Edward (Ed) Borasky · Oct 20, 2007

Yeah. I was just exploring the idea of memory management through
fork(). Starting worker processes and distributing requests to them is
essentially what is happening when one makes a cluster of mongrels
through one of the available clustering solutions.

Is that what's known as the "Mongrel Hordes?"

<ducking>

Michal Suchanek · Oct 20, 2007

There was a paper a few years ago on the idea of a GC that could
communicate with the virtual memory manager, so that it could sweep pages
when they happened to be already swapped in, rather than intentionally
pulling them in purely to do the sweep. Or something along those lines;
the key is that right now, you have no way to know if accessing an address
will produce a page fault.

You could probably see your memory map and your pagefaults, after all
it's your memory. I am not sure it would buy you anything, though. You
have to visit all pages to perform the garbage collection. You could
possibly optimize the order a bit but the pages will be probably
interlinked in weird ways and unless you do something very clever you
can easily get lost.

It makes a lot of sense. I always assume that big VSZs (at least in
garbage-collected apps) are bad because they lead to eventual paging, but I
don't know nearly enough to prove or even investigate that.

It's inevitable if the garbage collector accesses all the memory, and
most do. You could have reference counting that only discards stuff on
currently mapped pages (which is what reference counting does most of
the time anyway - it only decreases the count for stuff that is
accessed). However, reference counting is not sufficient to get rid of
garbage so you have to sift through the whole heap eventually, and
that pages it in in its entirety.

Some applications might get away with large VSZ and no paging but it
proves that the memory is in fact useless and probably was not
discarded only because of memory management errors - leaks.

Thanks

Michal

Michal Suchanek · Oct 21, 2007

You're right ... memory management technology (hardware or OS) hasn't
improved substantially since the days of Peter Denning and System\360.
Part of that is due to the fact that the equations necessary to come
to some reasonable conclusions about alternatives are ghastly. They're
much more difficult to deal with than those that govern networking, for
example, which is why routers are so smart these days and memory
management is stuck in a time warp.

Actually routers aren't that smart either. IP succeeded because it
does not need smart routers. Everybody figures that just adding more
bandwidth is easier than to make more efficient use of the current
bandwidth. The state of the art router does (beyond the bare minimum
needed to function as a router) classify the traffic into a few
priority classes and implements some logic that makes higher priority
traffic somewhat more likely to come through. Again, the only thing it
gets over memory management are some crude priorities.

But that's pretty much what we have now, that one specific language
being C. There was a time when operating systems and compilers were
written either in assembler or other "system programming" languages like
Bliss. But now most operating systems are written in C, most compilers
and interpreters are written in C, and it's only end-user applications
that tend to be written in all the other languages. So really, you get
"pretty much all languages" by writing their compilers or interpreters
in C.

So you could tailor the OS (and hardware) to C. (That's actually where
the "RISC revolution" was headed, until Intel found a way to
out-manufacture the RISC chip vendors.) But that's not what has
happened. Instead, the OS acts as a kind of "middleware" between
compilers and interpreters and the hardware, and there's another layer
of middleware inside the chip between the OS and a "RISC core" that
actually does the arithmetic and string operations. The Intel Mac was
only the last nail in the coffin of RISC.

C is akin to assembly and languages like Pascal that also use pointers
and raw memory access. The evolution went from machine code to machine
specfic assembly and then to C and other languages that try to hide
cpu and platform differences. Also from running on bare metal to
virtualisation (which strikes back today in the form of xen or vmware
that actually allow to divide memory between taks - at some expense)
single task OSes, and multitasking OSes.
To make use of C and similar languages easier, current OSes provide
reusable and shareable services and support for libraries which you
would hardly find in machine code. However, there are very few means
for managing multiple processes actually running in parallel. The
security models of current systems are a joke, there is near
non-existent resource management. It feels like were are halfway
towards multitasking OSes currently.

The RISC cpus weren't that big win. The instruction set is simpler and
more symmetric (which is what improves over time even for intel, and
the 64-bit version is way better than 32-bit from what I have heared).
Theoretically the simpler instructions could give the programmer more
control to do better optimization. But in practice the optimization
performed by the compilers is lousy on any architecture you pick, and
the simpler instructions require more memory to record the program.
Add some interesting features that expose more of the internal working
of the cpu like delayed branching or imperfect interrupts, and you get
a big mess most of the time. Yes, the compilers might improve over
time. But writing even a working compiler becomes more difficult with
these instruction sets that expose too much.

Actually, you can write compilers for other languages on Lisp machines,
and you can write an operating system in Lisp too. I don't know how well
suited Lisp is to running an OS, but it's an excellent language for
writing compilers and interpreters. But we don't have Lisp machines
today for the same reason we don't have many RISC machines today -- the
alternatives had more powerful marketing and manufacturing.

I do not see any problem with using Lisp for the system interface. You
will have to write some kernel in a lower level language to provide
the lisp runtime but then you can make the syscall interface in Lisp.
You could probably do lots of stuff that is currently in Linux also in
Lisp.

However, Lisp is a functional language. While there is well known art
of or interpreting procedural languages in procedural languages,
functional languages in functional languages, and even functional
languages in procedural languages, I haven't heard of an interpreter
of a procedural language written in a functional language (even an
experimental, let alone useful). Since I am not an expert in the field
there might be monographies piling on the topic without me noticing.
So far I have seen only one or two articles about unsolved
difficulties with writing such interpreter.

Even if you wrote a an interpreter for Ruby, Python, and whatnot in
Lisp there is still a fundamental problem that prevents general use. C
(and C++ and assembly) code is used to get the speed nearing that of
running on the bare metal for some specialized tasks. Once you turn
everything into Lisp objects you give up that, and you cannot get it
back.

Thanks

Michal

Xavier Noria · Oct 22, 2007

I haven't heard of an interpreter
of a procedural language written in a functional language (even an
experimental, let alone useful).

There's Pugs:

http://www.pugscode.org

-- fxn

Martin DeMello · Oct 22, 2007

However, Lisp is a functional language. While there is well known art
of or interpreting procedural languages in procedural languages,
functional languages in functional languages, and even functional
languages in procedural languages, I haven't heard of an interpreter
of a procedural language written in a functional language (even an
experimental, let alone useful). Since I am not an expert in the field
there might be monographies piling on the topic without me noticing.
So far I have seen only one or two articles about unsolved
difficulties with writing such interpreter.

One of the examples in the OCaml book is a small Basic interpreter.
http://caml.inria.fr/pub/docs/oreilly-book/html/book-ora058.html

Also, Scheme is mostly functional, but Common Lisp is multiparadigm.

martin

Michal Suchanek · Oct 22, 2007

There's Pugs:

http://www.pugscode.org

Thanks for the replies, looks like I have really missed some recent stuff here.

Michal

M. Edward (Ed) Borasky · Oct 22, 2007

Martin said:
Also, Scheme is mostly functional, but Common Lisp is multiparadigm.

Both are really Lisp 1.5 with some simple core semantics changes and
different libraries.

But seriously, both are "mostly functional" but
contain imperative features. The core semantic difference between Scheme
and Common Lisp is how the language treats "foo" in the following:

(foo arg1 arg2 arg3)

M. Edward (Ed) Borasky · Oct 22, 2007

Xavier said:
There's Pugs:

http://www.pugscode.org

-- fxn

Yet another reason not to learn Haskell.

Xavier Noria · Oct 22, 2007

Both are really Lisp 1.5 with some simple core semantics changes
and different libraries. But seriously, both are "mostly
functional" but contain imperative features.

In what sense? I don't think Lisp is mostly functional nowadays any
more than Perl is mostly functional so to speak. You _can_ write in a
functional style in both languages, but in my view they are multi-
paradigm nowadays.

-- fxn

Threads, Queues and possible memory leak	3	Jun 26, 2008
memory leak	26	Oct 20, 2009
should we Go now?	12	Feb 24, 2013
FFI Memory Leak	4	Dec 24, 2009
the most useful code i've written in a while	4	Apr 5, 2007
Where Should Variables be Declared?	2	Sep 16, 2009
Memory Question	6	Feb 26, 2006
Dynamic indexing (multi-dimensional-indexing) (probably my most important/valuable posting up to thi	30	Jul 1, 2011

Should most memory be release back to the system?

Joel VanderWerf

MenTaLguY

M. Edward (Ed) Borasky

Yohanes Santoso

Yohanes Santoso

khaines

Robert Klemme

M. Edward (Ed) Borasky

khaines

Michal Suchanek

M. Edward (Ed) Borasky

M. Edward (Ed) Borasky

Michal Suchanek

Michal Suchanek

Xavier Noria

Martin DeMello

Michal Suchanek

M. Edward (Ed) Borasky

M. Edward (Ed) Borasky

Xavier Noria

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads

Should *most* memory be release back to the system?

Joel VanderWerf

MenTaLguY

M. Edward (Ed) Borasky

Yohanes Santoso

Yohanes Santoso

khaines

Robert Klemme

M. Edward (Ed) Borasky

khaines

Michal Suchanek

M. Edward (Ed) Borasky

M. Edward (Ed) Borasky

Michal Suchanek

Michal Suchanek

Xavier Noria

Martin DeMello

Michal Suchanek

M. Edward (Ed) Borasky

M. Edward (Ed) Borasky

Xavier Noria

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads

Should most memory be release back to the system?