Will 'free' return the memory Immediately to the OS ?

I

Ian Collins

As per subject-line, thank-you

Why couldn't you just write

Will 'free' return the memory Immediately to the OS ?

Assuming there is an OS, Yes and no. It depends on how the OS works.
Memory is typically mapped to a process in pages and the allocator
allocates memory from these. Most operating systems will leave unused
pages mapped to the process unless they are needed elsewhere.
 
J

James Kuyper

As per subject-line, thank-you

The malloc family is often implemented by allocating large blocks from
the operating system, and handing out only a small portion of each block
at a time. free() cannot possibly return a block of memory to the OS
until all allocations from that block are free()'d, so you shouldn't
expect an immediate increase in memory available to other processes.

Even when free() could return a block of memory to the OS, it might not
choose to do so; the standard doesn't require it.

On many systems, it doesn't matter much whether or not the memory is
returned; if a sufficiently large block of memory remains unused for a
sufficiently long period of time, the OS may have means of detecting
that fact and swapping it out to disk, transparently to the user. The
memory needn't even have been free()d yet.
 
K

Keith Thompson

Raj Pashwar said:
As per subject-line, thank-you

Please include the question in the body of your message:

Will 'free' return the memory Immediately to the OS ?

Here's what the C standard says:

The free function causes the space pointed to by ptr to be
deallocated, that is, made available for further allocation.

It's unspecified whether "made available for further allocation"
means that it's returned to the OS. Typically it isn't.
Some implementations might return free()d memory to the OS, but
it's not practical to do so in all cases. For example, if you have
three allocated chunks of memory that happen to be contiguous, and
you free() the second one, it's likely to be difficult to remove
it from the current program's ownership while keeping its neighbors.
 
B

Ben Pfaff

The FAQ says:

7.25: I have a program which mallocs and later frees a lot of memory,
but I can see from the operating system that memory usage
doesn't actually go back down.

A: Most implementations of malloc/free do not return freed memory
to the operating system, but merely make it available for future
malloc() calls within the same program.
 
S

Stephen Sprunk

No [modern PC] OS is likely to have the capability to hand the run-time
library a chunk of memory smaller than the page size supported by the
memory management hardware. I don't know what this figure is nowadays,
but I'd guess maybe between 256K and 1M.

On x86, the page sizes are 4kB (normal) and 2MB/4MB (large), with the
latter depending on whether PAE is enabled. Other architectures vary.
You typically wouldn't be able to allocate from the operating system,
for example, a 30-byte chunk of memory.

Nor would you want them to, even if they were able, because that would
greatly increase the number of (relatively expensive) system calls.
The run-time library is likely to take these large chunks allocated by
the operating system and subdivide them so it can hand out smaller
chunks via malloc().

Exactly; it's more efficient to do it that way because it reduces the
number of (relatively expensive) system calls.
There is no guarantee that the algorithm used by malloc() and free()
would allow any memory to be returned to the OS until every block had
been free()'d.

Depending on the system calls available, it may not even be possible.
For instance, the traditional Unix brk()/sbrk() interface dictates a
fixed starting address and variable ending address for the entire heap.
The newer anonymous mmap()/munmap() interface allows allocating and
deallocating individual pages or groups of pages. I don't know how the
Windows interface works, but I'd guess it's similar to the latter.

S
 
S

Stephen Sprunk

No [modern PC] OS is likely to have the capability to hand the run-time
library a chunk of memory smaller than the page size supported by the
memory management hardware. I don't know what this figure is nowadays,
but I'd guess maybe between 256K and 1M.

On x86, the page sizes are 4kB (normal) and 2MB/4MB (large), with the
latter depending on whether PAE is enabled. Other architectures vary.

I believe you, but I'm somewhat surprised at this figure. Hardware
memory management is usually a tradeoff between silicon complexity,
complexity of setting up the hardware by the software, and
granularity.

I figured with a modern PC having maybe 4G of memory, you might want
to divide the memory into maybe 1,000 - 10,000 pages, so I would have
figured page sizes in the range of 512K - 4M. 4K surprises me.

Remember, these page sizes were dictated by the i386, when 4kB was still
a fairly big chunk of memory. Most PCs at the time had far less than
4MB of RAM. Virtual memory was mostly about swapping to disk, so the
smaller the pages, the faster a page fault is handled. Also, smaller
pages meant you could get more of the active set in RAM instead of
constantly swapping in and out large pages that were mostly inactive.

The only real cost to small pages is the need for large TLBs to avoid
walking the page table more than once per page, but in most cases that
simply isn't a problem these days for most workloads--or at least not
enough of a problem to justify the incredibly high cost of changing the
ISA. For certain workloads, eg. database servers, there are special OS
APIs to get large pages. This can be a major performance improvement,
but only if you have enough RAM to prevent them from being evicted.

S
 
K

Kaz Kylheku

No [modern PC] OS is likely to have the capability to hand the run-time
library a chunk of memory smaller than the page size supported by the
memory management hardware. I don't know what this figure is nowadays,
but I'd guess maybe between 256K and 1M.

On x86, the page sizes are 4kB (normal) and 2MB/4MB (large), with the
latter depending on whether PAE is enabled. Other architectures vary.

I believe you, but I'm somewhat surprised at this figure. Hardware
memory management is usually a tradeoff between silicon complexity,
complexity of setting up the hardware by the software, and
granularity.

I figured with a modern PC having maybe 4G of memory, you might want
to divide the memory into maybe 1,000 - 10,000 pages, so I would have
figured page sizes in the range of 512K - 4M. 4K surprises me.

At 1000 pages, the external fragmentation would be bad. The nice thing
about small page sizes is that it is easier for a process to return
unneeded pages to the operating system. It's easier to find a contiguous
page-aligned 4K chunk of memory which is not in use and free it, than to
find such a 4Mb chunk of memory.

But, never mind that, the real problem is that address spaces do not share
pages, and neither do distinct memory mappings within address spaces.

Firstly, look at your "ps" or "task manager" and see how many processes are
running. You need at least that many pages because you would never allocate two
processes into the same page: you lose protection (as well as the abilility to
link the base executables to run at the same fixed virtual address).

Then, within a process, it is customary to use different page ranges for
different memory mappings, like stack (main one and per-thread), executable, or
shared library. For large malloc requests, it is good to use a function like
mmap. When free is called on that, it can be entirely returned to the operating system.

Every single one of these uses would eat into your allotment of 1000 pages,
quickly gobbling up all of it.
I'm too lazy to look it all up. The x86 world at the low levels has
an entry cost. Right now I'm in the ARM world.

4K pages sizes are common in the ARM world, the MIPS, world.

Linux/MIPS only goes up to 64K page sizes (at kernel compile time)
though the R8000 TLB goes from 4K to 16M.

It doesn't make sense to use very large page sizes globally.
 
K

Keith Thompson

David T. Ashley said:
No [modern PC] OS is likely to have the capability to hand the run-time
library a chunk of memory smaller than the page size supported by the
memory management hardware. I don't know what this figure is nowadays,
but I'd guess maybe between 256K and 1M.

On x86, the page sizes are 4kB (normal) and 2MB/4MB (large), with the
latter depending on whether PAE is enabled. Other architectures vary.

I believe you, but I'm somewhat surprised at this figure. Hardware
memory management is usually a tradeoff between silicon complexity,
complexity of setting up the hardware by the software, and
granularity.

I figured with a modern PC having maybe 4G of memory, you might want
to divide the memory into maybe 1,000 - 10,000 pages, so I would have
figured page sizes in the range of 512K - 4M. 4K surprises me.

I'm too lazy to look it all up. The x86 world at the low levels has
an entry cost. Right now I'm in the ARM world.

An x86 system might have hundreds of processes running simultaneously,
and some of those are going to be quite small. A large page size would
substantially increase total memory usage.
 
E

Eric Sosman

No [modern PC] OS is likely to have the capability to hand the run-time
library a chunk of memory smaller than the page size supported by the
memory management hardware. I don't know what this figure is nowadays,
but I'd guess maybe between 256K and 1M.

On x86, the page sizes are 4kB (normal) and 2MB/4MB (large), with the
latter depending on whether PAE is enabled. Other architectures vary.

I believe you, but I'm somewhat surprised at this figure. Hardware
memory management is usually a tradeoff between silicon complexity,
complexity of setting up the hardware by the software, and
granularity.

I figured with a modern PC having maybe 4G of memory, you might want
to divide the memory into maybe 1,000 - 10,000 pages, so I would have
figured page sizes in the range of 512K - 4M. 4K surprises me.

I'm too lazy to look it all up. The x86 world at the low levels has
an entry cost. Right now I'm in the ARM world.

<off-topic> Consider the silicon cost of things like a TLB, and
the run-time cost of a TLB miss, and realize that a large page size
can reduce both at the same time. Many systems nowadays support several
page sizes simultaneously, just for better TLB use. </off-topic>
 
S

Stephen Sprunk

On 16-May-12 12:29, David T. Ashley wrote:
No [modern PC] OS is likely to have the capability to hand the run-time
library a chunk of memory smaller than the page size supported by the
memory management hardware. I don't know what this figure is nowadays,
but I'd guess maybe between 256K and 1M.

On x86, the page sizes are 4kB (normal) and 2MB/4MB (large), with the
latter depending on whether PAE is enabled. Other architectures vary.

I believe you, but I'm somewhat surprised at this figure. Hardware
memory management is usually a tradeoff between silicon complexity,
complexity of setting up the hardware by the software, and
granularity.

I figured with a modern PC having maybe 4G of memory, you might want
to divide the memory into maybe 1,000 - 10,000 pages, so I would have
figured page sizes in the range of 512K - 4M. 4K surprises me.

At 1000 pages, the external fragmentation would be bad. The nice thing
about small page sizes is that it is easier for a process to return
unneeded pages to the operating system. It's easier to find a contiguous
page-aligned 4K chunk of memory which is not in use and free it, than to
find such a 4Mb chunk of memory.

Of course. Also, swapping a 4MB page to/from disk is a lot slower and
may require evicting another 4MB page that is still (partially) active.
Smaller pages allows faster swapping and keeping more of the active set
in memory, at the expense of more page tables and larger TLBs.
But, never mind that, the real problem is that address spaces do not share
pages, and neither do distinct memory mappings within address spaces.

What? I thought that shared libraries and COW for forking relied on
using the same physical pages in multiple processes--though in the
former case they might be mapped to different virtual locations via
different sets of page tables.
Every single one of these uses would eat into your allotment of 1000 pages,
quickly gobbling up all of it.

I think he meant that a single process's address space shouldn't require
more than 1,000 (or 10,000) pages, not a global limit on the number of
pages for all processes.

S
 
J

Joe keane

I figured with a modern PC having maybe 4G of memory, you might want
to divide the memory into maybe 1,000 - 10,000 pages, so I would have
figured page sizes in the range of 512K - 4M. 4K surprises me.

The OS can have a larger page size. It's easy to make 16 KB pages on
hardware that is designed for 4 KB pages. But not the other way around.

The size of a 'segment' [unit that the memory management requests more
space from the OS] would be more like 2 MB.
 
S

Stephen Sprunk

The OS can have a larger page size. It's easy to make 16 KB pages on
hardware that is designed for 4 KB pages. But not the other way around.

You can _emulate_ larger pages by always handling several of them at the
same time, but the hardware does what the hardware does.

I question whether this is a good idea. Any reduction in page faults
due to loading multiple pages at a time is likely to be offset by an
increase in page faults due to evicting multiple pages at a time. So,
overall, you spend more time swapping pages to/from disk for no gain.
The size of a 'segment' [unit that the memory management requests more
space from the OS] would be more like 2 MB.

"Segment" has a very different meaning on many architectures.

Also, I would expect malloc() et al to request/release memory from the
OS in whatever multiple of the page size it found convenient, not some
arbitrarily large (or small, for some workloads) amount like 2MB.

S
 
S

Stephen Sprunk

On 16-May-12 12:29, David T. Ashley wrote:
No [modern PC] OS is likely to have the capability to hand the run-time
library a chunk of memory smaller than the page size supported by the
memory management hardware. I don't know what this figure is nowadays,
but I'd guess maybe between 256K and 1M.

On x86, the page sizes are 4kB (normal) and 2MB/4MB (large), with the
latter depending on whether PAE is enabled. Other architectures vary.

I believe you, but I'm somewhat surprised at this figure. Hardware
memory management is usually a tradeoff between silicon complexity,
complexity of setting up the hardware by the software, and
granularity.

I figured with a modern PC having maybe 4G of memory, you might want
to divide the memory into maybe 1,000 - 10,000 pages, so I would have
figured page sizes in the range of 512K - 4M. 4K surprises me.

I'm too lazy to look it all up. The x86 world at the low levels has
an entry cost. Right now I'm in the ARM world.

<off-topic> Consider the silicon cost of things like a TLB, and
the run-time cost of a TLB miss, and realize that a large page size
can reduce both at the same time. Many systems nowadays support several
page sizes simultaneously, just for better TLB use. </off-topic>

OTOH, as soon as you have to start swapping those large pages to and
from disk, you blow away the gains from more efficient TLB usage.

Except for workloads where a single process randomly accesses enough
active pages to overflow the TLBs _and_ the machine has enough RAM that
those pages can be locked, large pages don't seem to make sense. It
apparently occurred to the i386 designers to allow for that case, and
hear it's common for large database servers, but that's about it.

It seems advantageous to have intermediate page sizes, but that also
adds hardware complexity--and forces the OS to decide what page size to
use for what, which it will inevitably get wrong.

S
 
E

Eric Sosman

OTOH, as soon as you have to start swapping those large pages to and
from disk, you blow away the gains from more efficient TLB usage.

True. "Seven decimal orders of magnitude" has a charming
way of obliterating small gains and losses.

(In other words: As soon as you swap, you've already blown
your whole performance story. CPU ~= 2GHz, disk ~= 0.0000001GHz,
CPU runs ~20,000,000 cycles while waiting for one count them one
disk operation. If each CPU cycle were one breath at an adult
rate of <20 respirations per minute, one disk operation would
take upwards of two years. Hell, by the time the disk finishes
the cache is stony cold; the CPU has "forgotten" what it was
doing. What were *you* doing on May 18, 2010, and what did you
have for dinner on that day?)
Except for workloads where a single process randomly accesses enough
active pages to overflow the TLBs _and_ the machine has enough RAM that
those pages can be locked, large pages don't seem to make sense. It
apparently occurred to the i386 designers to allow for that case, and
hear it's common for large database servers, but that's about it.

It seems advantageous to have intermediate page sizes, but that also
adds hardware complexity--and forces the OS to decide what page size to
use for what, which it will inevitably get wrong.

Yes, it will inevitably be wrong -- sometimes. Unlike us,
who are inevitably wrong nearly always.
 
S

Stephen Sprunk

True. "Seven decimal orders of magnitude" has a charming
way of obliterating small gains and losses.

(In other words: As soon as you swap, you've already blown
your whole performance story. CPU ~= 2GHz, disk ~= 0.0000001GHz,
CPU runs ~20,000,000 cycles while waiting for one count them one
disk operation. If each CPU cycle were one breath at an adult
rate of <20 respirations per minute, one disk operation would
take upwards of two years. Hell, by the time the disk finishes
the cache is stony cold; the CPU has "forgotten" what it was
doing. What were *you* doing on May 18, 2010, and what did you
have for dinner on that day?)

There's that problem too, but I was thinking more along the lines of 4MB
pages taking ~1000 times as long as 4kB pages to evict and reload.

Also, the only time I've heard of TLBs being a real-world problem is in
the case of database servers, which apparently spend most of their time
following long chains of pointers hither and yon around the address
space. Large pages reduce the number of TLB entries needed in that case
by a factor of ~1000, preventing TLB overflow. But that case only
arises when all of the pages are locked in RAM; if they aren't, page
faults to repopulate the TLBs are be a minor problem compared to disk
access, per your argument.

S
 
B

Ben Pfaff

Stephen Sprunk said:
There's that problem too, but I was thinking more along the lines of 4MB
pages taking ~1000 times as long as 4kB pages to evict and reload.

I doubt that that is true, though. I'd suspect that 4 MB pages
take less than 100 times as long as 4 kB pages to evict and
reload, maybe much less.

I tried to find some measurements online to support this point,
but couldn't quickly find anything relevant.
 
J

Joe keane

I question whether this is a good idea.

It is a good idea. These are -disk drives-.

If your drive has a decent buffer, the time to get additional pages
could be no more than to transfer data over the bus.
 
I

Ian Collins

It is a good idea. These are -disk drives-.

If your drive has a decent buffer, the time to get additional pages
could be no more than to transfer data over the bus.

If only life were that simple! Consider a diskless client.

The point you snipped:

"The OS can have a larger page size. It's easy to make 16 KB pages on
hardware that is designed for 4 KB pages. But not the other way around."

A reasonable OS will allow multiple page sizes, in part to minimise the
use of MMU slots when applications use large amounts of memory. One
negative side effect is the bigger the page size an application uses,
the more likely it is to hit an out of memory condition due to
fragmentation.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,483
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top