Using virtual memory and/or disk to save reduce memory footprint

James Kanze · Mar 9, 2009

This isn't even true for the processors you claim to know, let alone
for others. Eg, ARM-CPUs (at least up to 9) have a virtually addressed
cache and this means that not only is the TLB flushed in case of an
address space switch (as on Intel, IIRC as side effect of writing to
CR3) but the complete content of the cache needs to be tanked as well.

I'm not familiar with the ARM, but Intel processors are capable
of maintaining up to six address spaces active at a time
(supposing by address space on an Intel, you mean a segment).

Rainer Weikusat · Mar 9, 2009

James Kanze said:
James Kanze <[email protected]> writes:

Click to expand...

[...]

This isn't even true for the processors you claim to know, let alone
for others. Eg, ARM-CPUs (at least up to 9) have a virtually addressed
cache and this means that not only is the TLB flushed in case of an
address space switch (as on Intel, IIRC as side effect of writing to
CR3) but the complete content of the cache needs to be tanked as well.

Click to expand...

I'm not familiar with the ARM, but Intel processors are capable
of maintaining up to six address spaces active at a time
(supposing by address space on an Intel, you mean a segment).

No, I don't mean 'a segment', because otherwise, I had written
that. 'An adressspace' on x86-processors is defined by a page
directory, whose entries point to a set of page tables. The address of
the page directory is stored in a register named 'CR3' (for 'control
register 3') and writing to this register in order to switch to a
different address space causes the present contents of the TLB
('translation lookaside buffer') to be invalidated. In itself, such an
address space is linear, which is basically achieved by setting all
segment registers to refer to a single segment which starts at zero
and has a size of 4G.

I doubt that any contemporary operating system for 32-bit
Intel-processors uses a different address space layout. It is certain
that neither Windows nor any of the UNIX(*)-like operating systems
do, cf

The virtual address space for 32-bit Windows is 4 gigabytes
(GB) in size and divided into two partitions: one for use by
the process and the other reserved for use by the system.
http://msdn.microsoft.com/en-us/library/aa366912(VS.85).aspx

and even OS/2 already worked in this way.

At least 'current' (x86) Linux-versions support putting the kernel
into its own address space, but mostly in order to enable 32-bit
systems to use large amounts of memory (>>4G) at the expense of having
to switch page directories whenever a processes changes from executing
userspace code to kernel code and back.

David Schwartz · Mar 9, 2009

I'm not familiar with the ARM, but Intel processors are capable
of maintaining up to six address spaces active at a time
(supposing by address space on an Intel, you mean a segment).

Yeah, just like a person is capable of having a child and running a
marathon. But I am not capable of either. Yes, this is theoretically
possible for an Intel CPU, but the trade-offs are simply too great,
and so nobody does this.

DS

Casper H.S. Dik · Mar 9, 2009

Yeah, just like a person is capable of having a child and running a
marathon. But I am not capable of either. Yes, this is theoretically
possible for an Intel CPU, but the trade-offs are simply too great,
and so nobody does this.

When memory became so cheap that putting multiple GBs into any PC,
you could also get a 64 bit CPU and 64 bit operating systems.

(Typically, a 64 bit OS can run 32 bit applications and those 32 bit
applications can then run in the full 32 bit address space)

Casper

Pawel Dziepak · Mar 9, 2009

James said:
> That's simply not true, or at least it wasn't when I did my
evaluations (admittedly on an Intel 80386, quite some time ago).
And the address space of the 80386 was considerably more than
4GB; you could address 4GB per segment. (In theory, you could
have up to 64K segments, but IIRC, in practice, there were some
additional limitations.)

That's not true. According to Intel manuals it is not possible to use
linear addresses >4GB.

You will pay a performance hit when you first load a segment
register, but this is a one time affaire, normally taking place
when you switch modes.

Switching modes (or at least current cr3 what also invalidates TLB)
takes place each time you access kernel (for example by system call)
while placing it in a separated address space on x86.

I'm not too sure what you mean by "two address spaces". If you
mean two separate segments, at least in older Intel processors,
the TLB would remain valid as long as the segment identifier
remained in a segment register; there was one per segment.

Currently the main operating systems use protected flat model in which
segments are not really used.
Virtual address spaces are created using paging. Segments have nothing
to do here.

(I'll admit that I find both Windows and Linux unacceptable
here. The Intel processor allows accessing far more than 4GB;
limiting a single process to 4GB is an artificial constraint,
imposed by the OS. Limiting it to even less is practically
unacceptable.)

That's not true. Paging doesn't allow (in 32-bit mode) virtual memory
addresses >4GB. Using extensions like PAE or PSE-36 can only increase
the amount of available physical memory.
These extension also don't support segmentation, so there is no way
single process can access more than 4GB on x86 machine in 32-bit.

Pawel Dziepak

Scott Lurndal · Mar 9, 2009

James Kanze said:
Still, from a QoI point of view, I would not generally expect
the OS to take much of the users address space---a couple of KB,
at the most.

It's really a function of the processor architecture. With an
architecture such as 32-bit intel x86, the operating system and
applications share a single virtual address space[*]. The operating
system reserves some portion of it (unixware used to be 2g/2g, iirc,
and linux can be built for several splits). The kernels, to avoid
having to map and unmap when accessing physical memory usually
identity map physical memory to the (shared) virtual address space
(which causes problems with more than 2GB dram on 2/2 splits and
more than 1GB dram on 1/3 splits).

Other architectures (MIPS, Sparc, 88k) have different address spaces
for kernel vs user mode and don't suffer this virtual address space overhead.

scott

[*] - For performance. There are methods to give applications
the entire virtual address space but they would significantly slow
down transitions between userspace and kernelspace.

James Kanze · Mar 10, 2009

James Kanze wrote:

That's not true. According to Intel manuals it is not possible
to use linear addresses >4GB.

And? The addressing of an Intel isn't linear. Never has been.

Switching modes (or at least current cr3 what also invalidates
TLB) takes place each time you access kernel (for example by
system call) while placing it in a separated address space on
x86.

Switching modes has a definite cost. Which you always pay, each
"system call". (A "system function" in source code doesn't
always have to resolve to a "system call", with mode switch,
e.g. pthread_lock on an uncontested mutex.) Switching modes is
(or at least was on early Intel 32 bits) orthogonal to which
segment registers are loaded.

Currently the main operating systems use protected flat model
in which segments are not really used.

In other words, currently, the main operating systems are not
using the processor to its fullest, and artificially creating
restrictions for themselves and for client code.

Virtual address spaces are created using paging. Segments have
nothing to do here.

Except if the authors of the OS understand the Intel
architecture, and want to exploit it, rather than pretending
that all the world is a VAX.

That's not true. Paging doesn't allow (in 32-bit mode) virtual
memory addresses >4GB. Using extensions like PAE or PSE-36 can
only increase the amount of available physical memory. These
extension also don't support segmentation, so there is no way
single process can access more than 4GB on x86 machine in
32-bit.

Because of an artificial limitation in the OS. That's what I
said.

Pawel Dziepak · Mar 10, 2009

James said:
And? The addressing of an Intel isn't linear. Never has been.

Linear addresses are created from 'segment selctor

ffset' addresses.
That's why even using segments you can't access more than 4GB. It's
described in sections 3.1 and 3.4 of Volume 3 of Intel 64 and IA-32
Architecture Software Developer's Manual.

Switching modes has a definite cost. Which you always pay, each
"system call". (A "system function" in source code doesn't
always have to resolve to a "system call", with mode switch,
e.g. pthread_lock on an uncontested mutex.) Switching modes is
(or at least was on early Intel 32 bits) orthogonal to which
segment registers are loaded.

That's true, but if you place kernel in another address space current
page directory is also switched on each system call what produces
additional overhead (flushing TLB, etc) that is not present when kernel
is in the same address space.

In other words, currently, the main operating systems are not
using the processor to its fullest, and artificially creating
restrictions for themselves and for client code. (...)
Except if the authors of the OS understand the Intel
architecture, and want to exploit it, rather than pretending
that all the world is a VAX. (...)
Because of an artificial limitation in the OS. That's what I
said.

Yes, that's what you said, and that's *not* true. I see that further
discussion is pointless. The following are parts of Intel 64 and IA-32
Architecture Software Developer's Manual that states that this is a
limitation in the CPU, not OS:
(already mentioned) Volume 3, 3.1 "Memory Management Overview" - linear
address space, etc
Volume 3, 3.3 "Physical Address Space"
Volume 3, 3.4 "Logical and linear addresses" and one of the most
important parts: "the linear address space is flat (unsegmented),
2^32-byte address space, with addresses ranging from 0 to FFFFFFFFh".

Please, point me to parts of the x86 documentation which states that 4GB
address space is only a limitation created by OSes.

Pawel Dziepak

Jasen Betts · Mar 10, 2009

Hi,

I am writing a C++ GUI tool. I sometimes run out of memory (exceed the
2GB limit) on a 32bit Linux machine. I have optimized my GUI's
database a lot (and still working on it) to reduce the runtime memory
footprint.

I was thinking of ways to off-load part of the database to virtual
memory or disk and reading it back in when required. Does anyone out
there have any papers or suggestions in that direction?

If you can use two (or several) processes instead of one, each can then
have 2GB.

more details may elicit targeted solutions.
"gui tool" could be anything:

SVG editor, video editor, 3d modeling, DTP, file manager, text editor,
etc...

what sort of data is the memory mostly full of?

Bo Persson · Mar 10, 2009

Pawel said:
Linear addresses are created from 'segment selctorffset'
addresses. That's why even using segments you can't access more
than 4GB. It's described in sections 3.1 and 3.4 of Volume 3 of
Intel 64 and IA-32 Architecture Software Developer's Manual.

You can't access more than 4GB at a time, but you could have more than
one segment, each with up to 4 GB of virtual address space. You would
have to swap one segment to disk, to be able to load another.

Please, point me to parts of the x86 documentation which states
that 4GB address space is only a limitation created by OSes.

The limitation is that the most popular OSes decided to have just one
4GB segment. The designers originally thought that would be enough.

Bo Persson

Bo Persson · Mar 10, 2009

James said:
In other words, currently, the main operating systems are not
using the processor to its fullest, and artificially creating
restrictions for themselves and for client code.

Except if the authors of the OS understand the Intel
architecture, and want to exploit it, rather than pretending
that all the world is a VAX.

They did understand actually, but couldn't imagine that it would ever
be needed. 4GB seemed like infinity (much like 640k did, a bit earlier

.

I have heard from the original designers of Win32 (Windows NT 3.x)
that they briefly considered the use of a segmented architecture, but
quickly decided that there would be no use for that:

"True, using a segmented memory model you could have several 4GB
segments, but to load a new segment you would first have to swap the
old 4 GB out to disk. Wherever would you find such an enourmous
disk?!".

That decided it.

Bo Persson

nick · Mar 10, 2009

what sort of data is the memory mostly full of?

I am working on a schematic editor. It is used for composing circuits
at different levels of abstraction (gates/transistors/PCB). I have run
through the program it is free of any major memory leaks. One of the
designs I am working with is huge, biggest that I have seen. My call
to malloc() fails. Because the core algorithms are still sound for
most designs, I am wary of changing them.

I guess I am looking for a way to hook into a memory allocator that
has support for caching to disk. I have yet to try Cxxstl.

Regards
John

Rainer Weikusat · Mar 10, 2009

Bo Persson said:
You can't access more than 4GB at a time, but you could have more than
one segment, each with up to 4 GB of virtual address space. You would
have to swap one segment to disk, to be able to load another.

Duh. According to Intel-documentation, any segmented address is
linearized into a 32-bit linear address, meaning, it is not possible
to address more than 4G of virtual memory on an 80386 or later
processor. Independently of this, it is possible to use as many
4G-sized virtual address spaces (still defined by a page directory) as
one would like to have, without using segmentation at all. Which is
exactly the situation you describe.

The limitation is that the most popular OSes decided to have just one
4GB segment.

Since the linear address space is at most 4G, no more than 4G can be
addressable at the same time, no matter which memory model is being
used. Assuming any type of memory protection is in use to separate
code running at different privilege levels, some of this 4G linear
address space will need to be used for 'communication'. Using the
PMMU, a process can have 'nearly' 4G for its own. Usually, this isn't
done because most processes don't need anything even close to that and
mapping the kernel into the same address space as a process means the
process can go from executing userspace code to executing kernel code
without having to flush the translation lookaside buffer.

How often does this need to be written until it finally arrives in the
land behind the mirror?

The designers originally thought that would be enough.

Get yourself a decent book on the topic.

Pawel Dziepak · Mar 10, 2009

Bo said:
You can't access more than 4GB at a time, but you could have more than
one segment, each with up to 4 GB of virtual address space. You would
have to swap one segment to disk, to be able to load another.

Thanks, I admit I didn't think of that. However it doesn't look to be
useful. You can achieve the same thing by manual swapping with similar
performance hit, but you get rid of using different segments.

> The limitation is that the most popular OSes decided to have just one
4GB segment. The designers originally thought that would be enough.

I really don't think that it was only OS developers fault. Segmentation
is not present in some architectures and creates some additional
problems if you want to implement high-level language that can use many
different segments.

Pawel Dziepak

Alf P. Steinbach · Mar 10, 2009

* Pawel Dziepak:

Segmentation
is not present in some architectures and creates some additional
problems if you want to implement high-level language that can use many
different segments.

Since this is wildly cross-posted I'd better mention I'm posting in [comp.lang.c++].

C++ is to some extent influenced by the possibility of a segmented architecture,
by separating code and data pointers. Formally, void* can hold a data pointer,
but not a code pointer. So in a sense C++ is designed for this...

And hypothetically e.g. Windows NT could from the beginning have provided
resistance to malware by having executable code in a read-only segment, instead
of waiting through umpteen versions for read-only pages (much lower level
detection of code modification attempts). Thus, by being designed for separate
code and data address spaces, C++ is in a sense designed to allow security. But
one cost is that some very desirable things then become hard to do, e.g. the old
technique of dynamically generating trampolines (small forwarder code snippets
with some hardwired data), which now in malware times we can't use, but must use
less efficient and less elegant techniques instead.

Anyway, while of course acknowledging the problems of a badly designed
compromise segmented architecture like the original i8086, I think the example
of separating code and data shows that it's not segmentation per se that is
problematic: rather, it's how it's done, and for what.

Cheers, just my $0.02,

- Alf

Pawel Dziepak · Mar 10, 2009

Alf said:
C++ is to some extent influenced by the possibility of a segmented
architecture, by separating code and data pointers. Formally, void* can
hold a data pointer, but not a code pointer. So in a sense C++ is
designed for this...

That's obviously true but I meant that it is hard to simultaneously use
to different data segments in C++ what is needed to have more than 4GB
address space available on 32-bit systems.

Pawel Dziepak

Alf P. Steinbach · Mar 10, 2009

* Andy Champ:

Alf,

I think you'll find that code has been in read-only segments since NT
3.1.

I think you're right (not that I'm entirely wrong, though!

).

A different segment selector is used for the code segment, and it appears to be
very read-only...

But it's like a door with a huge lock, placed in the middle of an open room. No
problem walking around that door instead of trying to walk through it. For as it
was, and still is in Windows XP at least, CS::blah (code segment offset blah)
and DS:blah (data segment offset blah) map to the same address.

So it's no problem executing code in the data segment, e.g. for the purpose of
infecting a machine via a buffer overrun.

What I should have written, instead of "having", was that Windows, or any OS,
should have /required/ code to be in read-only execute-only segment, i.e. not
mapping code and data segment to the same linear address range.

Code below -- sorry about rusty assembler skills! -- exemplifies executing
data as code, which as I remarked (on "trampoline") also has benign uses:

<code language="Microsoft MASM">
..386
..MODEL flat, stdcall

STD_OUTPUT_HANDLE EQU -11

GetStdHandle PROTO NEAR32 stdcall,
nStdHandle

WORD

WriteFile PROTO NEAR32 stdcall,
hFile

WORD, lpBuffer:NEAR32, nNumberOfBytesToWrite

WORD,
lpNumberOfBytesWritten:NEAR32, lpOverlapped:NEAR32

ExitProcess PROTO NEAR32 stdcall,
dwExitCode

WORD

..STACK 4096

..DATA

msg DB "Yay!", 13, 10
written DW 0
hStdOut DD 0

pureData DB 512 DUP (0) ; 1/2 KB array

..CODE
getN:
mov eax, 'D'
ret
nBytesCode EQU $ - getN

_start:
; Directly modify the code in the little function -- gah!
mov ebx, offset getN
;mov byte ptr [ebx+1], 'N' ; This would crash, it's read-only.

; Instead copy the little function to data buffer.
mov esi, offset getN
mov edi, offset pureData
mov ecx, nBytesCode
rep movsb

; Modify the copy.
mov byte ptr [pureData+1], 'N'

; Mess up the stack and call the copy.
push offset changeZeMessage ; return address
push offset pureData ; jump here (roundabout 'cause i'm rusty in
asm!)
ret ; Calling dynamically generated function in pure data
changeZeMessage:
mov byte ptr [msg], al ; Message now changed to "Nay!"

doTell:
INVOKE GetStdHandle,
STD_OUTPUT_HANDLE ; Standard output handle
mov [hStdOut], eax

INVOKE WriteFile,
hStdOut, ; File handle for screen
NEAR32 PTR msg, ; Address of string
LENGTHOF msg, ; Length of string
NEAR32 PTR written, ; Bytes written
0 ; Overlapped mode

INVOKE ExitProcess,
0 ; Result code for parent process

PUBLIC _start
END
</code>

<result>
C:\cppfn\test> ml /nologo /c /coff flat.asm
Assembling: flat.asm

C:\cppfn\test> link /nologo flat.obj kernel32.lib /entry:_start /subsystem:console

C:\cppfn\test> flat
Nay!

C:\cppfn\test> _
</result>

On a more modern computer, or perhaps in Vista (above was in XP Prof), I imagine
this program will simply crash.

But of old, in Windows NT family, no problem executing data as code, and that's
what I was thinking of -- but it sort of garbled on the way from now seldom
used sort of machine code level part of brain to keyboard... :-(

What has come in recently is DEP - no-execute data segments, which
is especially important for the stack.

Uhm, yes, sounds like it, except that I had the impression that this is at the
/page/ level, not at the segment level?

The code-separate-from-data may well have been designed for Harvard
architecture machines.
Yes.

I don't know the term "trampoline" outside gymnastics.

See e.g. <url: http://en.wikipedia.org/wiki/Trampoline_(computers)>.

It's not complete article.

Basic machine code level trampoline in C++ programming is just like the
generated routine in program example above: it's dynamically generated machine
code that puts some hardwired value in function result register (here eax),
typically that's a pointer to a class type object, and jumps to some static
routine that in turn calls some given member function on that object. You
generate the trampoline and gives its address to a function that calls back on
the trampoline. And voilà, the call-back is forwarded to your object.

Cheers & thanks! & also hth.

,

- Alf

James Kanze · Mar 11, 2009

* Pawel Dziepak:

Since this is wildly cross-posted I'd better mention I'm
posting in [comp.lang.c++].

C++ is to some extent influenced by the possibility of a
segmented architecture, by separating code and data pointers.
Formally, void* can hold a data pointer, but not a code
pointer. So in a sense C++ is designed for this...

C++ just follows C here. But as you say, the language was
designed so as not to impose a linear addressing model. I've
actually used C and C++ on an Intel 80386 with 48 bit pointers
(and so, at the time---before long long, no integral type was
big enough to hold a pointer). From a QoI point of view, one
would expect all OS's to support this.

(Also, FWIW: I just checked the Intel manuals, and the linear
address space is 36 bits, not 32. So there's absolutely no
argument in favor of taking part of the users 32 bit address
space for the system.)

James Kanze · Mar 11, 2009

They did understand actually, but couldn't imagine that it
would ever be needed. 4GB seemed like infinity (much like 640k
did, a bit earlier .

In other words, they made the same mistake twice

.

I can understand this restriction in some of the earlier Windows
versions; they were really just hacks to get something to work
anyway. But both Windows NT and Linux date from an epoch where
machines did have 4GB and more. Not desktop machines, of
course, but some machines. And knowing Moore's law, and its
implications, simple common sense says to not close any doors.

I have heard from the original designers of Win32 (Windows NT
3.x) that they briefly considered the use of a segmented
architecture, but quickly decided that there would be no use
for that:

"True, using a segmented memory model you could have several
4GB segments, but to load a new segment you would first have
to swap the old 4 GB out to disk. Wherever would you find such
an enourmous disk?!".

That decided it.

As Alf has pointed out, security argues for the use of segments
even more than memory. The earliest 80386 systems I used used
them for this reason. And despite your quote, some systems at
the time the NT was being developed did have such memories,
although they were rare (super computers, etc.). Still about
the time development on the NT started, DEC was already feeling
preasure to move to 64 bits (the Alpha), partially, at least,
because of address space limitations. (But much of the
developments seem to be parallel.)

Rainer Weikusat · Mar 11, 2009

James Kanze said:
* Pawel Dziepak:

Click to expand...

Since this is wildly cross-posted I'd better mention I'm
posting in [comp.lang.c++].

Click to expand...

C++ is to some extent influenced by the possibility of a
segmented architecture, by separating code and data pointers.
Formally, void* can hold a data pointer, but not a code
pointer. So in a sense C++ is designed for this...

Click to expand...

C++ just follows C here. But as you say, the language was
designed so as not to impose a linear addressing model. I've
actually used C and C++ on an Intel 80386 with 48 bit pointers
(and so, at the time---before long long, no integral type was
big enough to hold a pointer). From a QoI point of view, one
would expect all OS's to support this.

There are no 48-bit pointers on 80386 or later, because the
segment

ffset-address is only used to calculate a 32-bit linear
address.

(Also, FWIW: I just checked the Intel manuals, and the linear
address space is 36 bits, not 32.

Directly quoting from the relevant section (Intel® 64 and IA-32
Architectures Software Developer's Manual, Volume 1: Basic
Architecture, 3.3.6):

Beginning with P6 family processors, the IA-32 architecture
supports addressing of up to 64 GBytes (236 bytes) of physical
memory. A program or task could not address locations in this
address space directly. Instead, it addresses individual
linear address spaces of up to 4 GBytes that mapped to
64-GByte physical address space through a virtual memory
management mechanism. Using this mechanism, an operating
system can enable a program to switch 4-GByte linear address
spaces within 64-GByte physical address space.

Besides, using PAE requires using the MMU, which, in turn, leads back
to the issue with the TLB being flushed as side effect of reloading
the page directory register. Which people may like to avoid unless
they require that large an address space. And then, they can just
enable it.

JVM/Java memory footprint	16	Jan 29, 2007
look for C++ "Keith Gorlen smalltalk like classes" or NIHCL to reduce memory footprint used by STL	1	Dec 5, 2003
Fastest way to store ints and floats on disk	2	Aug 7, 2008
Scaling with ASP and Server Memory Issues	0	Jan 31, 2006
help on how to save/load this data structure?	5	May 18, 2005
The devolution of English language and slothful c.l.p behaviors exposed!	50	Jan 24, 2012
comp.lang.c Answers to Frequently Asked Questions (FAQ List)	15	Apr 1, 2006
Evaluating Business Intelligence Options for Software developers	0	Nov 15, 2007

Using virtual memory and/or disk to save reduce memory footprint

James Kanze

Rainer Weikusat

David Schwartz

Casper H.S. Dik

Pawel Dziepak

Scott Lurndal

James Kanze

Pawel Dziepak

Jasen Betts

Bo Persson

Bo Persson

nick

Rainer Weikusat

Pawel Dziepak

Alf P. Steinbach

Pawel Dziepak

Alf P. Steinbach

James Kanze

James Kanze

Rainer Weikusat

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads