Using virtual memory and/or disk to save reduce memory footprint

Alf P. Steinbach · Mar 11, 2009

* Rainer Weikusat:

James Kanze said:
James Kanze said:

* Pawel Dziepak:
Segmentation is not present in some architectures and
creates some additional problems if you want to implement
high-level language that can use many different segments.
Since this is wildly cross-posted I'd better mention I'm
posting in [comp.lang.c++].
C++ is to some extent influenced by the possibility of a
segmented architecture, by separating code and data pointers.
Formally, void* can hold a data pointer, but not a code
pointer. So in a sense C++ is designed for this...

Click to expand...

C++ just follows C here. But as you say, the language was
designed so as not to impose a linear addressing model. I've
actually used C and C++ on an Intel 80386 with 48 bit pointers
(and so, at the time---before long long, no integral type was
big enough to hold a pointer). From a QoI point of view, one
would expect all OS's to support this.

Click to expand...

There are no 48-bit pointers on 80386 or later,

I'm sorry, that's incorrect.

The number of bits per pointer depends on your language and pointer type.
Certainly in assembler you can use them

; with MASM-like syntax the DP
directive allocates 6 bytes for a pointer. And any compiler for a language with
sufficiently flexible language definition, e.g. C or C++, can support them.

because the
segmentffset-address is only used to calculate a 32-bit linear
address.

That's right for the ole' 386, but the "because" is just silly.

Directly quoting from the relevant section (Intel® 64 and IA-32
Architectures Software Developer's Manual, Volume 1: Basic
Architecture, 3.3.6):

Beginning with P6 family processors, the IA-32 architecture
supports addressing of up to 64 GBytes (236 bytes) of physical
memory. A program or task could not address locations in this
address space directly. Instead, it addresses individual
linear address spaces of up to 4 GBytes that mapped to
64-GByte physical address space through a virtual memory
management mechanism. Using this mechanism, an operating
system can enable a program to switch 4-GByte linear address
spaces within 64-GByte physical address space.

You know, 64 GiB => 36 bits per physical address.

So this quote quote does not support your position, but James'. And I think the
issue discussed in this paragraph is what James was referring to, although it
appears that he stumbled on the Intel-specific terminology. He should have
referred to physical addresses, not linear addresses.

And so it happens that in spite of the incorrect arguments/terminology,
regarding the factual it seems that you're both at least partially right, heh.

Cheers & hth.,

- Alf

Rainer Weikusat · Mar 11, 2009

[...]

There are no 48-bit pointers on 80386 or later,

Click to expand...

[...]

because the segmentffset-address is only used to calculate a
32-bit linear address.

Click to expand...

That's right for the ole' 386, but the "because" is just silly.

Since the 48 bits are mapped to 32 bits, the size of a pointer is 32
bits and not 48 bits.

You know, 64 GiB => 36 bits per physical address.

So this quote quote does not support your position, but James'.

We were talking about the 'linear address space' available to a
program. This linear address space, as contained literally in the text
above, is 4G or 32 bits wide pointers. If you try to actually read the
text beyond the first sentence, you should be able to find the
following statement:

A program or task could not address locations in this address
space directly.

This is the second sentence. The third would be

Instead, it addresses individual linear address spaces of up
to 4 GBytes.

But, hey, nobody would expect that a C++-using x86 hillbilly could
read ...

Alf P. Steinbach · Mar 11, 2009

* Rainer Weikusat:

[...]

There are no 48-bit pointers on 80386 or later,

Click to expand...

[...]

because the segmentffset-address is only used to calculate a
32-bit linear address.

Click to expand...

That's right for the ole' 386, but the "because" is just silly.

Click to expand...

Since the 48 bits are mapped to 32 bits, the size of a pointer is 32
bits and not 48 bits.

Based on earlier experience with brickwall minds, there seems to be no point in
trying to enlighten you. Sorry. You're stupid.

Cheers,

- Alf

Pawel Dziepak · Mar 11, 2009

James said:
(Also, FWIW: I just checked the Intel manuals, and the linear
address space is 36 bits, not 32. So there's absolutely no
argument in favor of taking part of the users 32 bit address
space for the system.)

Volume 3, 3.4: "A linear address is a 32-bit address in the processor's
linear address space. Like the physical address space, the linear
address space is flat (unsegmented), 2^32-byte address space with
addresses ranging from 0 to FFFFFFFFh."
I don't see anything that can make me believe that linear address space
is 36 bits. Virtual addresses are always 32-bit, hence virtual address
space is 4GB.
Volume 3, 3.3: "the IA-32 architecture also supports an extension of the
physical address space to 2^36 bytes (64 GBytes);"
However, it is impossible to directly use physical address space and
that's why programs are given no more than 4GB virtual address space.
There are also logical addresses (used with segmentation) which are 48-bit.

I think that we shouldn't argue what is the size of address space,
because that is said in Intel manual. The problem is that only
segmentation (logical addresses) makes simultaneous access to more than
4GB possible, and even then it is highly inefficient and (IMHO) rather
useless.

Segmentation lacks few features of paging, that make using it harder
from the kernel's point of view. An example may be dealing with
concurrent memory allocation and noncontinuous pages. Of course it is
always possible to use it together with paging but then you have two
similar mechanisms doing almost the same.
The next problem is portability, paging is supported by a large number
of architectures, segmentation - not really.
When most of OSes were designed 4GB was so big number of memory, that
nobody worried about it. That was a mistake, but IMHO when you have to
use some strange techniques to access more memory on a certain
architecture that's a sign that you have to change to 64-bit hardware,
what undoubtedly would be a better solution.

Pawel Dziepak

Rainer Weikusat · Mar 11, 2009

Pawel Dziepak said:
Volume 3, 3.4: "A linear address is a 32-bit address in the processor's
linear address space. Like the physical address space, the linear
address space is flat (unsegmented), 2^32-byte address space with
addresses ranging from 0 to FFFFFFFFh."
I don't see anything that can make me believe that linear address space
is 36 bits.

Oh, that's easy: Stopping to read once one is convinced to have found
what one was searching for, ie after the first sentence.

[...]

The problem is that only segmentation (logical addresses) makes
simultaneous access to more than 4GB possible,

Simultaneous access to more than 4G of virtual memory is impossible
because the size of the linear address space of the (80386 or later
x86) CPU is only 4G. It is possible to have different '4G' spaces
inside a running system and the OS can switch between them. This is
(how often do I need to write this?) accomplished by changing the page
directory to point to a different address space. Insofar a computer
has more than 4G of memory, multiple of these '4G spaces' can be
resident in physical memory simultaneously (although only one can be
used at any given time) when PAE is in uses. But since this doesn't
help a single 'program' (Intel terminology) anyhow, which is still
restricted to having an address space whose size is at most 4G, this
is besides the point.

Completely independent of this, 'a program' can either chose to have a
flat 4G address space or structure this 4G address space into
different regions by means of using appropriately setup segments.

Bo Persson · Mar 11, 2009

Rainer said:
James Kanze said:

* Pawel Dziepak:
Segmentation is not present in some architectures and
creates some additional problems if you want to implement
high-level language that can use many different segments.

Click to expand...

Since this is wildly cross-posted I'd better mention I'm
posting in [comp.lang.c++].

Click to expand...

C++ is to some extent influenced by the possibility of a
segmented architecture, by separating code and data pointers.
Formally, void* can hold a data pointer, but not a code
pointer. So in a sense C++ is designed for this...

Click to expand...

C++ just follows C here. But as you say, the language was
designed so as not to impose a linear addressing model. I've
actually used C and C++ on an Intel 80386 with 48 bit pointers
(and so, at the time---before long long, no integral type was
big enough to hold a pointer). From a QoI point of view, one
would expect all OS's to support this.

Click to expand...

There are no 48-bit pointers on 80386 or later, because the
segmentffset-address is only used to calculate a 32-bit linear
address.

(Also, FWIW: I just checked the Intel manuals, and the linear
address space is 36 bits, not 32.

Click to expand...

Directly quoting from the relevant section (Intel® 64 and IA-32
Architectures Software Developer's Manual, Volume 1: Basic
Architecture, 3.3.6):

Beginning with P6 family processors, the IA-32 architecture
supports addressing of up to 64 GBytes (236 bytes) of physical
memory. A program or task could not address locations in this
address space directly. Instead, it addresses individual
linear address spaces of up to 4 GBytes that mapped to
64-GByte physical address space through a virtual memory
management mechanism. Using this mechanism, an operating
system can enable a program to switch 4-GByte linear address
spaces within 64-GByte physical address space.

Have you also checked section 3.3.1 of that manual? The paragraph
"Segmented Memory Model" ends with this sentence:

"Programs running on an IA-32 processor can address up to 16,383
segments of different sizes and types, and each segment can be as
large as 2^32 bytes."

This is where you get your 48-bit pointers, a 16 bit segment selector
and a 32 bit offset within the segment. It doesn't give you access to
more RAM, but it extends the virtual address space a whole lot. Of
course, as this manual wants to show us how great the non-segmented
Intel-64 model is, it doesn't stress that point very much.

Bo Persson

Pawel Dziepak · Mar 11, 2009

Rainer said:
Simultaneous access to more than 4G of virtual memory is impossible
because the size of the linear address space of the (80386 or later
x86) CPU is only 4G.

[...]

I generally agree but there is way to access more than 4GB of memory:

Bo said:
You can't access more than 4GB at a time, but you could have more than
one segment, each with up to 4 GB of virtual address space. You would
have to swap one segment to disk, to be able to load another.

You can't deny that this allow program to access more than 4GB, but yes,
in all other cases it is impossible.

Pawel Dziepak

James Kanze · Mar 12, 2009

[...]

There are no 48-bit pointers on 80386 or later,

Click to expand...

Click to expand...

[...]

because the segmentffset-address is only used to calculate a
32-bit linear address.

Click to expand...

That's right for the ole' 386, but the "because" is just silly.

Click to expand...

Since the 48 bits are mapped to 32 bits, the size of a pointer
is 32 bits and not 48 bits.

First, as Alf has pointed out, the size of a pointer depends on
the compiler---a compiler could (and some do) use any size it
wants. But of course, the actual size of the pointer isn't very
relevant---the original 8086 had 32 bit pointers, but could only
address 1MB.

Second, according to the documentation at the Intel site, the
48 bits are mapped to 36 bits, not 32. So while I was
misremembering some of the details of how the virtual memory
worked, the fact remains that there's absolutely no reason for
an OS to take significant address space from the user space (if
that user space is limited to 4BB, of course---I'd argue that
it's actually a poor policy to impose this limit on a user
process).

We were talking about the 'linear address space' available to a
program.

What does "linear address space" have to do with anything? The
Intel architecture doesn't use linear addressing.

This linear address space, as contained literally in the text
above, is 4G or 32 bits wide pointers. If you try to actually
read the text beyond the first sentence, you should be able to
find the following statement:

A program or task could not address locations in this address
space directly.

Which, of course, is completely false.

This is the second sentence. The third would be

Instead, it addresses individual linear address spaces of up
to 4 GBytes.

But, hey, nobody would expect that a C++-using x86 hillbilly
could read ...

As you've just proved.

James Kanze · Mar 12, 2009

Volume 3, 3.4: "A linear address is a 32-bit address in the
processor's linear address space. Like the physical address
space, the linear address space is flat (unsegmented),
2^32-byte address space with addresses ranging from 0 to
FFFFFFFFh."
I don't see anything that can make me believe that linear
address space is 36 bits. Virtual addresses are always 32-bit,
hence virtual address space is 4GB.
Volume 3, 3.3: "the IA-32 architecture also supports an
extension of the physical address space to 2^36 bytes (64
GBytes);"
However, it is impossible to directly use physical address
space and that's why programs are given no more than 4GB
virtual address space.

Let's see if I understand what you're saying. Intel has
provided some additional addressing modes which it is impossible
to use.

There are also logical addresses (used with segmentation)
which are 48-bit.

I think that we shouldn't argue what is the size of address
space, because that is said in Intel manual. The problem is
that only segmentation (logical addresses) makes simultaneous
access to more than 4GB possible, and even then it is highly
inefficient and (IMHO) rather useless.

You need segmentation for security reasons; a client process
will not use the same segments as the OS. And the processor has
six segment registers---you can access up to six segments
simultaneously, and at no real additional cost. (Loading a
segment is fairly expensive, but typically, you have to do that
when changing to system mode anyway, for security reasons.)

Segmentation lacks few features of paging, that make using it
harder from the kernel's point of view.

Segmentation and paging are more or less orthogonal. And with
an Intel 32 bit processor, you always use segmentation, "There
is no mode bit to disable segmentatin" (A direct quote from the
Intel manual.) Some things (particularly security issues) are
best handled by segments, others (paging) by virtual memory. A
good OS for the Intel will use both.

An example may be dealing with concurrent memory allocation
and noncontinuous pages. Of course it is always possible to
use it together with paging but then you have two similar
mechanisms doing almost the same.

I don't think you understand segmentation, at least not as it is
implemented in the Intel. It's very different from paging,
almost orthogonal. And it can't be turned off.

The next problem is portability, paging is supported by a
large number of architectures, segmentation - not really.

That's true. And the few others that support segmentation
define it very differently---what IBM calls a segment on its
mainframes has almost nothing to do with what Intel calls a
segment. But then, the addressing architecture of the Intel
*is* different than that of most other processors. As is a
number of other things. Any time you port an OS to a new
architecture (or even a new implementation of an existing
architecture), there are a number of things that have to be
adapted: things like context swapping and memory management tend
to be very processor specific.

When most of OSes were designed 4GB was so big number of
memory, that nobody worried about it.

I don't know about "most OSes", but both Linux and Windows NT
are fairly recent; DEC was already feeling the crunch of only 32
bit addressing in the VAX when Dave Cutler and his group left
DEC for Microsoft to implement Windows NT, and Torvalds started
Linux even later. Most desktop machines couldn't support 4GB,
but larger machines could. And of course, everyone is aware of
Moore's law.

That was a mistake, but IMHO when you have to use some strange
techniques to access more memory on a certain architecture
that's a sign that you have to change to 64-bit hardware, what
undoubtedly would be a better solution.

Using a 64 bit architecture is certainly a better solution. If
only because even 4 GB isn't really enough for todays
processors.

Pawel Dziepak · Mar 12, 2009

James said:
Second, according to the documentation at the Intel site, the
48 bits are mapped to 36 bits, not 32.

[...]

Not really.
Volume 3A, 3.2.5
"The processor's paging mechanism divides the linear address space (into
which segments are mapped) into pages. These linear address space pages
are then mapped to pages in the physical address space."

Physical address space - 36 bits.
Linear address space - 32 bits.

What does "linear address space" have to do with anything? The
Intel architecture doesn't use linear addressing.

That's not true.
Volume 3A, 3.4
"The processor translates every logical address into a linear address."

Pawel Dziepak

Pawel Dziepak · Mar 12, 2009

James said:
Let's see if I understand what you're saying. Intel has
provided some additional addressing modes which it is impossible
to use.

I rally don't know which are the modes it is impossible to use.

> You need segmentation for security reasons; a client process
will not use the same segments as the OS. And the processor has
six segment registers---you can access up to six segments
simultaneously, and at no real additional cost. (Loading a
segment is fairly expensive, but typically, you have to do that
when changing to system mode anyway, for security reasons.)

To achieve needed level of security flat model is used.
Obviously you can access up to six segments with no real cost, but you
can't access more than 4GB using them without performance hit. (I'd
recommend you reading about how logical addresses are transformed into
linear addresses and then into physical addresses.)

Segmentation task is to organize and protect memory what is similar to
the paging main task. However, the way they do this is significantly
different.

Segmentation and paging are more or less orthogonal. And with
an Intel 32 bit processor, you always use segmentation, "There
is no mode bit to disable segmentatin" (A direct quote from the
Intel manual.) Some things (particularly security issues) are
best handled by segments, others (paging) by virtual memory. A
good OS for the Intel will use both.

Segmentation is always enabled, but you can use protected flat model,
which allows you to get rid off all its disadvantages. And of course by
"not using segmentation" I meant "using protected flat model".

Pawel Dziepak.

Jasen Betts · Mar 12, 2009

I am working on a schematic editor. It is used for composing circuits
at different levels of abstraction (gates/transistors/PCB). I have run
through the program it is free of any major memory leaks. One of the
designs I am working with is huge, biggest that I have seen. My call
to malloc() fails. Because the core algorithms are still sound for
most designs, I am wary of changing them.

I guess I am looking for a way to hook into a memory allocator that
has support for caching to disk. I have yet to try Cxxstl.

a memory allocator won't give your aplications a larger address space,
and basically it appears you've hit the address space limit.

look into loading each subcircuit into a separate process,
or finding a way to use the memory more efficiently.

Jasen Betts · Mar 12, 2009

Alf,

I think you'll find that code has been in read-only segments since NT
3.1. What has come in recently is DEP - no-execute data segments, which
is especially important for the stack.

The code-separate-from-data may well have been designed for Harvard
architecture machines.

I don't know the term "trampoline" outside gymnastics.

they are also mentioned in the GCC documentation.

Rainer Weikusat · Mar 12, 2009

James Kanze said:
[...]

This linear address space, as contained literally in the text
above, is 4G or 32 bits wide pointers. If you try to actually
read the text beyond the first sentence, you should be able to
find the following statement:

Click to expand...

A program or task could not address locations in this address
space directly.

Click to expand...

Which, of course, is completely false.

http://download.intel.com/design/processor/manuals/253665.pdf

page 3-12 (xpdf: 76)

It may be false nevertheless, but that would then be something to be
discussed with Intel.

Bart van Ingen Schenau · Mar 13, 2009

James Kanze said:
James Kanze said:

[...]

This linear address space, as contained literally in the text
above, is 4G or 32 bits wide pointers. If you try to actually
read the text beyond the first sentence, you should be able to
find the following statement:
A program or task could not address locations in this address
space directly.

Click to expand...

Click to expand...

Which, of course, is completely false.

Click to expand...

http://download.intel.com/design/processor/manuals/253665.pdf

page 3-12 (xpdf: 76)

It may be false nevertheless, but that would then be something to be
discussed with Intel.

Thank for the link, but it does not make it any clearer for me if a
process can access more than 2^32 bytes when it uses a segmented
memory model.
For the flat memory model it is clearly stated that it uses 32-bit
linear addresses.
For the segmented model, it is stated that the address is translated
to a linear address, but it is also stated that you can have multiple
segments of 4G each.

What I am missing is an explanation how (segment,offset) pairs are
translated to linear addresses and in how far different
(segment,offset) pairs can/will address the same byte.
Without this information, I can't tell if the limitation of 32-bit
linear addresses equally applies to the segmented memory model.

Bart v Ingen Schenau

Nate Eldredge · Mar 13, 2009

Bart van Ingen Schenau said:
James Kanze said:

[...]

This linear address space, as contained literally in the text
above, is 4G or 32 bits wide pointers. If you try to actually
read the text beyond the first sentence, you should be able to
find the following statement:

Click to expand...

A program or task could not address locations in this address
space directly.

Click to expand...

Which, of course, is completely false.

Click to expand...

http://download.intel.com/design/processor/manuals/253665.pdf

page 3-12 (xpdf: 76)

It may be false nevertheless, but that would then be something to be
discussed with Intel.

Click to expand...

Thank for the link, but it does not make it any clearer for me if a
process can access more than 2^32 bytes when it uses a segmented
memory model.
For the flat memory model it is clearly stated that it uses 32-bit
linear addresses.
For the segmented model, it is stated that the address is translated
to a linear address, but it is also stated that you can have multiple
segments of 4G each.

What I am missing is an explanation how (segment,offset) pairs are
translated to linear addresses and in how far different
(segment,offset) pairs can/will address the same byte.
Without this information, I can't tell if the limitation of 32-bit
linear addresses equally applies to the segmented memory model.

Consider the instruction

MOV DS:[12345h], AL

The descriptor table entry corresponding to the selector in DS contains
a 32-bit "base". The "linear address" in the instruction is then, by
definition, the sum of this 32-bit base with the 32-bit offset (12345h)
given in the instruction. It is a 32-bit value.

The linear address is then used as an index into the (multilevel) page
table. Note I say *the* page table, there is only one active in the CPU
at any given time (located by the privileged CR3 register). The result
of this lookup is a 32-bit (or 36-bit, in the case of PAE) physical
address. This address is put on the memory bus for the store operation.

Notice that once the linear address is computed, no further use is made
of the segment. In particular, suppose that DS refers to a segment with
base 12345h and ES refers to one with base 12346h. Then DS:[1h] and
ES:[0h] will have the same linear address (12345h + 1h = 12346h + 0h =
12346h). Thus, they will necessarily resolve to the same physical
address, since they both go throught the same single page table.

So segmentation does not get you around the 4G limit because the
translation of seg

fs results in a 32 bit linear address, and there is
only one linear address space total (because there is only one page
table), not one per segment.

It would have been possible for Intel to implement a PAE-like feature by
making the base address in a segment descriptor more than 32 bits. This
would then have the effect of allowing different segments to give you
different windows, each up to 4G large, into a larger address space, and
you could switch between them just by loading different selectors. I
think this is the sort of thing you have in mind. But that is not what
Intel did. The longer addresses only enter in the second stage of
address translation, in the paging mechanism.

There was a good reason for this choice. Modern operating systems, like
Linux and (AFAIK) Windows, find the segmented memory model inconvenient
and avoid it as much as possible. Linux essentially uses a single
segment, with base address 0, for everything; and all protection,
virtual memory management, etc, is done through the page tables.
User-mode programs use the 32-bit offset only for their pointers; in
fact, gcc has no support for 48-bit seg

fs "far" pointers. (This is
because most 32-bit CPUs other than the 386 family provide only a paging
mechanism. The segmentation mechanism on the 386 is rather unique; it's
really a holdover from a poorly-received feature of the 286.)
So it made more sense to incorporate PAE into the paging mechanism,
which systems already used, rather than to force them to rewrite their
compilers to use the undesired segmentation mechanism.

Thus, under PAE, in order to get access to a different part of the
36-bit physical address space, the page tables have to be modified.
This is a privileged operation, and so would require an operating system
call. It would be rather expensive and not something you would want to
have to do often in your program.

So the main way that PAE is used is to allow multiple processes each to
use up to 4G of memory. This presents no problem, since the kernel
maintains a unique page table for each process; on each context switch,
when a new process is scheduled to run, it loads CR3 with a pointer to
the new process's page table. These different page tables for different
processes can easily be set to refer to different regions of the 64G
physical address space.

I hope this helps.

Alf P. Steinbach · Mar 13, 2009

* Bart van Ingen Schenau:

Thank for the link, but it does not make it any clearer for me if a
process can access more than 2^32 bytes when it uses a segmented
memory model.
For the flat memory model it is clearly stated that it uses 32-bit
linear addresses.
For the segmented model, it is stated that the address is translated
to a linear address, but it is also stated that you can have multiple
segments of 4G each.

What I am missing is an explanation how (segment,offset) pairs are
translated to linear addresses and in how far different
(segment,offset) pairs can/will address the same byte.
Without this information, I can't tell if the limitation of 32-bit
linear addresses equally applies to the segmented memory model.

Someone more familiar with this than me may possibly clarify a bit, I only
looked up the documentation.

For 32-bit operation, paging translates from 32-bit linear addresses to either
32 or 36 bit physical addresses. Given a logical address SEG:OFFSET in user
code, the segment selector SEG (16 bits) chooses a segment descriptor in either
a process-specific descriptor table (the Local Descriptor Table) or a global
common one (the Global Descriptor Table), depending on a bit in the selector.
The chosen entry then gives the segment's linear base address. The OFFSET is
added, and that's that: you have a 32-bit linear address, which depending on
modes etc. may now be further translated via paging, or used directly.

On the original 386 there was no possibility that paging could further translate
the 32-bit linear address to a 36-bit physical address.

But on a modern processor it's this possibility that is the crux of the
discussion, as I see it, because, as James noted, if there was no way of
influencing that translation, if all SEG:OFFSET addresses always were translated
to the same 32-bit linear address space whose addresses in turn were always
translated via paging to the same 32-bit subset of physical addresses, one would
have a 36-bit extension that couldn't be accessed, a kind of lunatic design.

Accessing that larger range of physical addresses is done by changing the page
translation, either by changing the pointer to the paging tables or by changing
the paging tables themselves. This can be a bit costly because the processor
optimizes the page translation by caching some page translation information (the
TLB mentioned earlier in the thread), which means that changing the paging
invalidates cached information. But anyway, essentially each process is limited
to 4 GiB unless the OS provides some kind of (essentially, this is not Intel's
terminology) bank switching, and the system as a whole is limited to 4 GiB
unless the OS allows costly change of page translation on each process switch.

The costs are however not necessarily significant.

For example, Windows does things in mysterious ways -- I don't know how --
such that memory accesses quite often end up accessing the disk (swapping). In
effect Windows transforms the computer from an electronic one to a mechanical
one, many many orders of magnitude slower, and perhaps the only realistic fix
for that is solid state mass storage. In the context of such extraordinary
inefficiency a little cache invalidation does perhaps not matter so much...

Cheers & hth.,

- Alf

Nate Eldredge · Mar 13, 2009

Alf P. Steinbach said:
Accessing that larger range of physical addresses is done by changing
the page translation, either by changing the pointer to the paging
tables or by changing the paging tables themselves. This can be a bit
costly because the processor optimizes the page translation by caching
some page translation information (the TLB mentioned earlier in the
thread), which means that changing the paging invalidates cached
information. But anyway, essentially each process is limited to 4 GiB
unless the OS provides some kind of (essentially, this is not Intel's
terminology) bank switching, and the system as a whole is limited to 4
GiB unless the OS allows costly change of page translation on each
process switch.

The costs are however not necessarily significant.

Indeed, modern operating systems already do this anyway, since it's how
processes are kept isolated from one another (physical pages that are
private to one process are not mapped in the page tables of another).
So the costs, while perhaps significant, are unavoidable and already
being paid.

For example, Windows does things in mysterious ways -- I don't know how --
such that memory accesses quite often end up accessing the disk
(swapping).

Not sure if this is a serious question or just a jab at Windows, but
swapping is handled via paging as well. The OS can "swap out" a page by
saving its contents to disk and marking it as "not present" in the page
table; the physical memory can then be used for something else (perhaps
swapping in another page). The next time the process accesses that
page, a page fault occurs, which is trapped by the OS. It notes the
page being accessed, locates the associated data on the disk, loads it
into physical memory, and updates the page table to refer to its new
physical location. The process then restarts the instruction that
accessed the page and goes on its merry way.

In effect Windows transforms the computer from an
electronic one to a mechanical one, many many orders of magnitude
slower, and perhaps the only realistic fix for that is solid state
mass storage.

Solid state mass storage is also usually orders of magnitude slower than
RAM. A better fix is more RAM, a 64-bit address space where all of it
can be mapped at once, and a swapping algorithm that is better at
selecting pages that are rarely accessed.

In the context of such extraordinary inefficiency a
little cache invalidation does perhaps not matter so much...

True. But in the normal case where swapping is not heavy, the cost of
invalidating this cache (often called the TLB) is certainly significant
and must be taken into account. That is not to say that it's always
avoidable.

Bart van Ingen Schenau · Mar 16, 2009

Bart van Ingen Schenau said:
Bart van Ingen Schenau said:

Thank for the link, but it does not make it any clearer for me if a
process can access more than 2^32 bytes when it uses a segmented
memory model.
For the flat memory model it is clearly stated that it uses 32-bit
linear addresses.
For the segmented model, it is stated that the address is translated
to a linear address, but it is also stated that you can have multiple
segments of 4G each.

Click to expand...

What I am missing is an explanation how (segment,offset) pairs are
translated to linear addresses and in how far different
(segment,offset) pairs can/will address the same byte.
Without this information, I can't tell if the limitation of 32-bit
linear addresses equally applies to the segmented memory model.

Click to expand...

Consider the instruction

MOV DS:[12345h], AL

The descriptor table entry corresponding to the selector in DS contains
a 32-bit "base". The "linear address" in the instruction is then, by
definition, the sum of this 32-bit base with the 32-bit offset (12345h)
given in the instruction. It is a 32-bit value.

Thank you. This was the bit I was missing.

For background, I am familiar with the 8086 segmentation scheme, where
a 16-bit segment and 16-bit offset combine to form a 20-bit address,
but I was not sure how much of this scheme had been retained for the
protected mode addressing. Apparently only the notion of a segment and
an offset.

<snip - rest of explanation>

Bart v Ingen Schenau

JVM/Java memory footprint	16	Jan 29, 2007
look for C++ "Keith Gorlen smalltalk like classes" or NIHCL to reduce memory footprint used by STL	1	Dec 5, 2003
Fastest way to store ints and floats on disk	2	Aug 7, 2008
Scaling with ASP and Server Memory Issues	0	Jan 31, 2006
help on how to save/load this data structure?	5	May 18, 2005
The devolution of English language and slothful c.l.p behaviors exposed!	50	Jan 24, 2012
comp.lang.c Answers to Frequently Asked Questions (FAQ List)	15	Apr 1, 2006
Evaluating Business Intelligence Options for Software developers	0	Nov 15, 2007

Using virtual memory and/or disk to save reduce memory footprint

Alf P. Steinbach

Rainer Weikusat

Alf P. Steinbach

Pawel Dziepak

Rainer Weikusat

Bo Persson

Pawel Dziepak

James Kanze

James Kanze

Pawel Dziepak

Pawel Dziepak

Jasen Betts

Jasen Betts

Rainer Weikusat

Bart van Ingen Schenau

Nate Eldredge

Alf P. Steinbach

Nate Eldredge

Bart van Ingen Schenau

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads