Eric Sosman said:
There seems to be some confusion about the word "virtual."
Saying that the OS "clears virtual pages" sounds like nonsense
to me: the term "virtual page" could only refer to a range of
virtual addresses, and since addresses hold no data they can't
be cleared at all. The OS clears the *physical* pages to which
a range of addresses is mapped.
Ordinary applications cannot see real memory. They only see
virtual memory. When a virtual address is referenced, the
hardware will attempt to translate the virtual address into
a real address. From the point of view of the application,
a virtual address is just an address. That's where its data
is located.
Real (physical) pages are cleared only when necessary. That is
on the first reference via a virtual address. The hardware
attempts to translate the virtual address and gets an indication
that the virtual address is not mapped to a real page. The OS
gets control and decides why it's not mapped.
(1) The virtual page has never been allocated by the current
process. This is an unrecoverable page or segment fault situation.
Usually the thread that was in control at the time of the fault
is terminated by the OS. If that thread is the main thread of
the process, then the process itself is terminated.
(2) The virtual page has been allocated by the current process,
and this is the first reference to the virtual page. A real page
is allocated by the OS and assigned to the virtual page. The OS
clears the real page to zero and resumes the thread the encountered
the page fault to redrive the failing instruction. The thread never
sees the fault.
(3) The virtual page has been allocated by the current process,
and has been paged-out due to memory requirements for the underlying
real page. The thread that caused the page fault is suspended while
the data for the virtual page is scheduled for page-in. When the
page-in completes, the data for the virtual page is assigned to
a real page (which may be different from the real page that last
held the data for that virtual page). The thread is resumed to
redrive the instruction that encountered the page fault.
The virtual confusion also pervades the description of how
an expired process' memory gets cleared when it's handed to a
new process. This does not and should not depend on a coincidence
of the virtual addresses at which the two processes map the page;
it has, in fact, nothing at all to do with virtual addresses.
The O/S clears the page to protect the first process' information
from the second process, and that information resides in the
physical page, no matter at what address range it happens to
occupy in either of the processes.
Yes, of course. That point may not have been clear in my earlier
post.
On another matter (and this, I confess, is more of an opinion
than hard-and-fast fact), I can't concur that it's "good programming
practice" for a process to pre-free its own pages. In the first
place, the means of doing so are not very portable -- yes, every
virtual memory O/S has *some* means of mapping and unmapping pages,
but the means differ from one O/S to another and tend not to be
perfect analogs.
Of course there are portable means. The C language has free(),
which corresponds with calloc() and malloc(). If the application
performs a malloc(), then it should also perform a free() when
it's possible. A normal end of the application should always
free its allocated resources, rather than depending on the OS
to do it. That's good programming practice. In the case of an
abnormal exit, then it's probably alright to punt and let the
OS handle it.
In the second place, the process cannot do a
thorough job of freeing its pages (unmapping your stack and your
instructions might be *one* way to terminate a process, but ...),
so the O/S is going to have to do at least some of the work in
any event.
If the OS is providing stackframes, then OS will deal with them.
If the application is managing its own stackframes, then the
application can and should manage them, including freeing them
when the application ends.
And finally, throwing away your pages runs a good
chance of interfering with other agencies that are "looking at"
your program, things like profilers and debuggers.
That's really not the application's concern. If the application
has "hooks" built into it for preserving state information, then
so be it. Otherwise, design and implement the application according
to good programming practices by tracking and managing resources
in a responsible manner. If some other "agency", like a profiler
or a debugger, wants to poke around inside an application's data
areas, then it's the responsibility of that agency to avoid
contaminating the application. It would be near impossible for
an application to predict, detect, or interact with a potential
external agency, like a profiler or debugger, and just plain
fruitless to try to predict the future in that regard.