Using virtual memory and/or disk to save reduce memory footprint

Discussion in 'C++' started by nick, Mar 5, 2009.

  1. nick

    nick Guest

    Hi,

    I am writing a C++ GUI tool. I sometimes run out of memory (exceed the
    2GB limit) on a 32bit Linux machine. I have optimized my GUI's
    database a lot (and still working on it) to reduce the runtime memory
    footprint.

    I was thinking of ways to off-load part of the database to virtual
    memory or disk and reading it back in when required. Does anyone out
    there have any papers or suggestions in that direction?

    Regards
    Nick
    nick, Mar 5, 2009
    #1
    1. Advertising

  2. On Mar 4, 4:59 pm, nick <> wrote:

    > I am writing a C++ GUI tool. I sometimes run out of memory (exceed the
    > 2GB limit)  on a 32bit Linux machine. I have optimized my GUI's
    > database a lot (and still working on it) to reduce the runtime memory
    > footprint.


    > I was thinking of ways to off-load part of the database to virtual
    > memory or disk and reading it back in when required. Does anyone out
    > there have any papers or suggestions in that direction?


    You are running out of process virtual address space. You want to find
    techniques that conserve process virtual address space. So the first
    question is, what's using up all your address space?

    If it's memory that you've allocated, you need to allocate less
    memory. One solution might be to use a file on disk instead of memory
    mapped space. A memory-mapped file consumes process virtual address
    space, but if you use 'pread' and 'pwrite' instead, no vm space is
    needed.

    DS
    David Schwartz, Mar 5, 2009
    #2
    1. Advertising

  3. nick

    nick Guest

    Thanks for your tips David! My tool is schematic driven PCB design
    tool. There are multiple levels of schematics that can be composed.
    In my profiling, most of the memory is taken up during the elaboration
    of the various schematics i.e. storing of the various design-related
    informations and then connecting them up.


    Regards
    Nick
    nick, Mar 5, 2009
    #3
  4. On Mar 4, 5:49 pm, nick <> wrote:

    > Thanks for your tips David! My tool is schematic driven PCB design
    > tool. There are multiple levels of schematics that can be composed.
    > In my profiling, most of the memory is taken up during the elaboration
    > of the various schematics i.e. storing of the various design-related
    > informations and then connecting them up.


    Rather than storing them in memory, why not store them in a file on
    disk?

    If there's enough physical memory, the file will stay in cache anyway.
    If there isn't enough physical memory, trying to keep it in memory
    would result in it swapping to disk anyway.

    So it should be roughly performance neutral, but save a lot of vm
    space.

    DS
    David Schwartz, Mar 5, 2009
    #4
  5. nick

    nick Guest

    > Why are you flattening the whole thing in memory?  I'm not criticizing,
    > just asking.  (I spent about 6 months working full-time on a schematic
    > editor for VLSI designs.)


    Yes, you are right. Do you have any publications that I can look up or
    suggestions on how to avoid elaborating everything in memory?

    I know what I have just asked would probably be proprietary
    information but if there is any public domain information that you
    could point me to, I would be eternally grateful :)

    Regards
    Nick
    nick, Mar 5, 2009
    #5
  6. nick

    Guest

    nick wrote:
    > > Why are you flattening the whole thing in memory?  I'm not criticizing,
    > > just asking.  (I spent about 6 months working full-time on a schematic
    > > editor for VLSI designs.)

    >
    > Yes, you are right. Do you have any publications that I can look up or
    > suggestions on how to avoid elaborating everything in memory?
    >
    > I know what I have just asked would probably be proprietary
    > information but if there is any public domain information that you
    > could point me to, I would be eternally grateful :)
    >
    > Regards
    > Nick


    I remember somebody suggest me this site, this might be helpful to
    your work

    http://stxxl.sourceforge.net/

    Haven't used it though :)
    , Mar 5, 2009
    #6
  7. nick

    James Kanze Guest

    On Mar 5, 3:20 am, Jeff Schwab <> wrote:
    > nick wrote:
    > > I am writing a C++ GUI tool. I sometimes run out of memory
    > > (exceed the 2GB limit) on a 32bit Linux machine.


    > What 2GB limit? Do you mean that the machine only has 2GB of
    > RAM? Or do you mean that you're exceeding the 3 or 4 GB limit
    > on address space?


    I was wondering about that myself. I've had programs on a 32
    bit Linux that used well over 3 MB.

    > > I have optimized my GUI's database a lot (and still working
    > > on it) to reduce the runtime memory footprint.


    > > I was thinking of ways to off-load part of the database to
    > > virtual memory or disk and reading it back in when required.


    > The kernel already does that for you, automatically. Doing it manually
    > is called "overlaying."


    > http://en.wikipedia.org/wiki/Overlay_(programming)


    The kernel can only do it when the entire image would fit into
    the virtual address space (4 MB under 32 bit Linux).
    "Overlaying" will allow a lot more. And it's only called
    overlaying when you swap in and out code and named variables; if
    you're just buffering data, the name doesn't apply. (As an
    extreme example, programs like grep or sed easily work on data
    sets that are in the Gigabyte range or larger; they only hold a
    single line in memory at a time, however. And I'm sure you
    wouldn't call this overlaying.)

    FWIW: I don't think that the Linux linker supports overlay
    generation, at least in the sense I knew it 25 or 30 years ago.
    (Although explicitly load and unload dynamic objects, that
    probably comes out to the same thing.)

    --
    James Kanze (GABI Software) email:
    Conseils en informatique orientée objet/
    Beratung in objektorientierter Datenverarbeitung
    9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
    James Kanze, Mar 5, 2009
    #7
  8. James Kanze schrieb:
    >> [..] Or do you mean that you're exceeding the 3 or 4 GB limit
    >> on address space?


    > I was wondering about that myself. I've had programs on a 32
    > bit Linux that used well over 3 MB.

    (Probably 3GB...) PAE enabled?
    http://en.wikipedia.org/wiki/Physical_Address_Extension

    Regards,
    Benjamin
    Benjamin Rampe, Mar 6, 2009
    #8
  9. nick

    James Kanze Guest

    On Mar 5, 3:05 pm, Jeff Schwab <> wrote:
    > James Kanze wrote:
    > > FWIW: I don't think that the Linux linker supports overlay
    > > generation, at least in the sense I knew it 25 or 30 years ago.


    [...]
    > Oddly, those kinds of overlays (IIUC) are coming back into
    > fashion, for processors with lots of little cores, each with
    > its own small, dedicated cache. On cell, the loading and
    > unloading have to be managed manually, by the programmer.


    I don't know what the current situation is. (My current
    applications run on dedicated machines with enough real memory
    that they never have to page out.) But I remember talking to
    people who worked on the mainframe Fortran when virtual memory
    was being introduced; their experience was that programs using
    virtual memory, instead of overlays, were often several orders
    of magnitude slower. When you had to explicitly load an
    overlay, you thought about it, and did it as infrequently as
    possible. When the system decides to load or unload a virtual
    page is invisible to you, and you can end up with a lot more
    paging than you wanted. (On the other hand, increase the size
    of the real memory, and the virtual memory based system will
    cease paging entirely. Where as the overlays will still by
    loaded and unloaded each time you request it.)

    --
    James Kanze (GABI Software) email:
    Conseils en informatique orientée objet/
    Beratung in objektorientierter Datenverarbeitung
    9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
    James Kanze, Mar 6, 2009
    #9
  10. nick

    James Kanze Guest

    On Mar 6, 3:02 am, Benjamin Rampe <> wrote:
    > James Kanze schrieb:
    >
    > >> [..] Or do you mean that you're exceeding the 3 or 4 GB limit
    > >> on address space?

    > > I was wondering about that myself. I've had programs on a 32
    > > bit Linux that used well over 3 MB.


    > (Probably 3GB...)


    Yes.

    > PAE enabled?
    > http://en.wikipedia.org/wiki/Physical_Address_Extension


    No. 3GB easily fits in the address space of a 32 bit processor.

    --
    James Kanze (GABI Software) email:
    Conseils en informatique orientée objet/
    Beratung in objektorientierter Datenverarbeitung
    9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
    James Kanze, Mar 6, 2009
    #10
  11. nick

    James Kanze Guest

    On Mar 6, 1:14 pm, Jeff Schwab <> wrote:
    > James Kanze wrote:
    > >> PAE enabled?
    > >>http://en.wikipedia.org/wiki/Physical_Address_Extension


    > > 3GB easily fits in the address space of a 32 bit processor.


    > The Linux kernel needs about 1 GiB for housekeeping. User
    > code is typically stuck within the lower 3 GiB. If you need
    > more address space than that, either fork, or get a 64-bit
    > processor. :)


    But not in your process address space, I hope. (I'm pretty sure
    that I've had processes with more than 3GB mapped to the
    process, but I don't remember the details---it was probably
    under Solaris.)

    --
    James Kanze (GABI Software) email:
    Conseils en informatique orientée objet/
    Beratung in objektorientierter Datenverarbeitung
    9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
    James Kanze, Mar 7, 2009
    #11
  12. nick

    James Kanze Guest

    On Mar 6, 1:12 pm, Jeff Schwab <> wrote:
    > James Kanze wrote:
    > > increase the size
    > > of the real memory, and the virtual memory based system will
    > > cease paging entirely. Where as the overlays will still by
    > > loaded and unloaded each time you request it.


    > Couldn't that be handled the same way we handle manual
    > swapping of data? I.e., couldn't unused code pages be
    > unloaded only conditionally? In fact, this sounds like a
    > tailor-made case for C++, with various template instantiations
    > for different amounts of RAM.


    Probably. But in practice, no one is developing new code with
    overlays, so it doesn't matter.

    --
    James Kanze (GABI Software) email:
    Conseils en informatique orientée objet/
    Beratung in objektorientierter Datenverarbeitung
    9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
    James Kanze, Mar 7, 2009
    #12
  13. nick

    Eric Sosman Guest

    Jeff Schwab wrote:
    > James Kanze wrote:
    >> [...]
    >> (I'm pretty sure
    >> that I've had processes with more than 3GB mapped to the
    >> process, but I don't remember the details---it was probably
    >> under Solaris.)

    >
    > 32-bit Solaris? 64-bit Linux is already a different ball of
    > address-space wax, and I'm not sure what Solaris does here. The last
    > time I developed for Solaris was five years ago, on 64-bit Sparc.


    A 32-bit process on Solaris can use "almost all" of its
    nominal 4GB address space. I forget precisely how much space
    Solaris claims for its own purposes, but it's in the vicinity
    of a couple megabytes.

    A 64-bit process on Solaris can use somewhat more ...

    (Disclaimer: I work for Sun, but don't speak for Sun.)

    --
    Eric Sosman
    lid
    Eric Sosman, Mar 7, 2009
    #13
  14. nick

    James Kanze Guest

    On Mar 7, 12:52 pm, Jeff Schwab <> wrote:
    > James Kanze wrote:
    > > On Mar 6, 1:14 pm, Jeff Schwab <> wrote:
    > >> James Kanze wrote:
    > >>>> PAE enabled?
    > >>>>http://en.wikipedia.org/wiki/Physical_Address_Extension


    > >>> 3GB easily fits in the address space of a 32 bit
    > >>> processor.


    > >> The Linux kernel needs about 1 GiB for housekeeping. User
    > >> code is typically stuck within the lower 3 GiB. If you
    > >> need more address space than that, either fork, or get a
    > >> 64-bit processor. :)


    > > But not in your process address space, I hope.


    > It is indeed the same address space, although user-space code
    > trying to access the uppermos GB directly will just get a
    > segv. The default allocation limit is 3056 MB for user-space.
    > I'm told there's a kernel patch that can override this limit,
    > but I've never used it.


    > > (I'm pretty sure that I've had processes with more than 3GB
    > > mapped to the process, but I don't remember the details---it
    > > was probably under Solaris.)


    > 32-bit Solaris?


    That's a good question. My code was compiled in 32 bit mode,
    but the OS was Solaris 2.8, running on a 64 bit Sparc.

    Still, from a QoI point of view, I would not generally expect
    the OS to take much of the users address space---a couple of KB,
    at the most. My attitude might be influenced here by the
    requirements when I worked on OS's... and the maximum user
    address space was either 64KB or 1 MB. But if the address space
    was 32 bits, I'd feel cheated if the OS didn't allow me to use
    very close to 4GB (provided sufficient other resources, of
    course).

    --
    James Kanze (GABI Software) email:
    Conseils en informatique orientée objet/
    Beratung in objektorientierter Datenverarbeitung
    9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
    James Kanze, Mar 8, 2009
    #14
  15. James Kanze wrote:
    >> The Linux kernel needs about 1 GiB for housekeeping. User
    >> code is typically stuck within the lower 3 GiB. If you need
    >> more address space than that, either fork, or get a 64-bit
    >> processor. :)

    >
    > But not in your process address space, I hope. (I'm pretty sure
    > that I've had processes with more than 3GB mapped to the
    > process, but I don't remember the details---it was probably
    > under Solaris.)


    In the most common approach kernel is placed in the top part of address
    space of *each* process. For example there is 3GB for user and 1GB for
    kernel. That's not because it's the easiest way, that's because it's the
    most efficient way. Copying data from kernel to user space or vice versa
    is limited only by memory bandwidth. The main disadvantage is, of
    course, reducing the size of space available for user mode process.

    There is also another way where kernel is placed in a separated address
    space, hence process is given 4GB address space. I don't know if it is
    used by any other architecture and OS than Solaris on sun4u. That's
    because sun4u provides mechanism called 'address space indicator' which
    allows to efficiently copy data from one address space to another. Of
    course, it is also possible to implement on x86 (or maybe other
    architectures) but it's very slow and inefficient.

    Additionally, I would like to mention that PAE is not increasing the
    size of address space in any way. Limit of 4GB remains. What PAE gives
    to the kernel is the possibility to use up to 64GB of *physical* memory.

    Pawel Dziepak
    Pawel Dziepak, Mar 8, 2009
    #15
  16. James Kanze <> writes:
    > On Mar 7, 12:52 pm, Jeff Schwab <> wrote:
    >> James Kanze wrote:


    [...]

    >> > (I'm pretty sure that I've had processes with more than 3GB
    >> > mapped to the process, but I don't remember the details---it
    >> > was probably under Solaris.)

    >
    >> 32-bit Solaris?

    >
    > That's a good question. My code was compiled in 32 bit mode,
    > but the OS was Solaris 2.8, running on a 64 bit Sparc.
    >
    > Still, from a QoI point of view, I would not generally expect
    > the OS to take much of the users address space---a couple of KB,
    > at the most. My attitude might be influenced here by the
    > requirements when I worked on OS's... and the maximum user
    > address space was either 64KB or 1 MB. But if the address space
    > was 32 bits, I'd feel cheated if the OS didn't allow me to use
    > very close to 4GB (provided sufficient other resources, of
    > course).


    Other people 'feel' that the overhead of a TLB- (and maybe even
    cache-) flush is too high for a system call and because of this (as
    already written by someone else), the kernel is mapped into the
    address space of each process, like any other shared library would be.
    Rainer Weikusat, Mar 8, 2009
    #16
  17. nick

    James Kanze Guest

    On Mar 8, 5:10 pm, Pawel Dziepak <> wrote:
    > James Kanze wrote:
    > >> The Linux kernel needs about 1 GiB for housekeeping. User
    > >> code is typically stuck within the lower 3 GiB. If you need
    > >> more address space than that, either fork, or get a 64-bit
    > >> processor. :)


    > > But not in your process address space, I hope. (I'm pretty sure
    > > that I've had processes with more than 3GB mapped to the
    > > process, but I don't remember the details---it was probably
    > > under Solaris.)


    > In the most common approach kernel is placed in the top part
    > of address space of *each* process. For example there is 3GB
    > for user and 1GB for kernel. That's not because it's the
    > easiest way, that's because it's the most efficient way.


    There's no difference in performance if the system knows how to
    manage the virtual memory. At least on the processors I know.

    > Copying data from kernel to user space or vice versa is
    > limited only by memory bandwidth.


    Copying data from kernel to user space suffers the same
    constraints as copying it between two places in user space. If
    the memory is already mapped, it is limited by memory bandwidth.
    If it isn't, then you'll get a page fault. This is totally
    independant of whether the memory is in the user address range
    or not.

    > The main disadvantage is, of course, reducing the size of
    > space available for user mode process.


    > There is also another way where kernel is placed in a
    > separated address space, hence process is given 4GB address
    > space. I don't know if it is used by any other architecture
    > and OS than Solaris on sun4u. That's because sun4u provides
    > mechanism called 'address space indicator' which allows to
    > efficiently copy data from one address space to another. Of
    > course, it is also possible to implement on x86 (or maybe
    > other architectures) but it's very slow and inefficient.


    I don't know about other architectures, but Intel certainly
    supports memory in different segments being mapped at the same
    time; it actually offers more possibilities here than the Sparc.

    --
    James Kanze (GABI Software) email:
    Conseils en informatique orientée objet/
    Beratung in objektorientierter Datenverarbeitung
    9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
    James Kanze, Mar 8, 2009
    #17
  18. James Kanze wrote:
    >> Copying data from kernel to user space or vice versa is
    >> limited only by memory bandwidth.

    >
    > Copying data from kernel to user space suffers the same
    > constraints as copying it between two places in user space. If
    > the memory is already mapped, it is limited by memory bandwidth.
    > If it isn't, then you'll get a page fault. This is totally
    > independant of whether the memory is in the user address range
    > or not.


    That's true only in the first approach when kernel is placed at the top
    of address space of each process. Indeed, copying between kernel and
    user space is as efficient as copying between to places in user space.
    That's why this approach is used in all common kernels on x86 and
    similar architectures.
    If process was given whole address space (4GB), then kernel would have
    to be in a separated address space what would cause copying between two
    address spaces what is much less efficient (TLB overhead, TSS switches,
    etc).

    > I don't know about other architectures, but Intel certainly
    > supports memory in different segments being mapped at the same
    > time; it actually offers more possibilities here than the Sparc.


    But only sun4u allows to efficiently access two address spaces at the
    same time. In all Intel (and similar) processors TLB would get invalidated.

    Pawel Dziepak
    Pawel Dziepak, Mar 8, 2009
    #18
  19. James Kanze <> writes:
    > On Mar 8, 5:10 pm, Pawel Dziepak <> wrote:
    >> James Kanze wrote:
    >> >> The Linux kernel needs about 1 GiB for housekeeping. User
    >> >> code is typically stuck within the lower 3 GiB. If you need
    >> >> more address space than that, either fork, or get a 64-bit
    >> >> processor. :)

    >
    >> > But not in your process address space, I hope. (I'm pretty sure
    >> > that I've had processes with more than 3GB mapped to the
    >> > process, but I don't remember the details---it was probably
    >> > under Solaris.)

    >
    >> In the most common approach kernel is placed in the top part
    >> of address space of *each* process. For example there is 3GB
    >> for user and 1GB for kernel. That's not because it's the
    >> easiest way, that's because it's the most efficient way.

    >
    > There's no difference in performance if the system knows how to
    > manage the virtual memory. At least on the processors I know.


    This isn't even true for the processors you claim to know, let alone
    for others. Eg, ARM-CPUs (at least up to 9) have a virtually addressed
    cache and this means that not only is the TLB flushed in case of an
    address space switch (as on Intel, IIRC as side effect of writing to
    CR3) but the complete content of the cache needs to be tanked as well.
    Rainer Weikusat, Mar 8, 2009
    #19
  20. nick

    James Kanze Guest

    On Mar 8, 11:09 pm, Pawel Dziepak <> wrote:
    > James Kanze wrote:
    > >> Copying data from kernel to user space or vice versa is
    > >> limited only by memory bandwidth.


    > > Copying data from kernel to user space suffers the same
    > > constraints as copying it between two places in user space.
    > > If the memory is already mapped, it is limited by memory
    > > bandwidth. If it isn't, then you'll get a page fault. This
    > > is totally independant of whether the memory is in the user
    > > address range or not.


    > That's true only in the first approach when kernel is placed
    > at the top of address space of each process. Indeed, copying
    > between kernel and user space is as efficient as copying
    > between to places in user space. That's why this approach is
    > used in all common kernels on x86 and similar architectures.
    > If process was given whole address space (4GB), then kernel
    > would have to be in a separated address space what would cause
    > copying between two address spaces what is much less efficient
    > (TLB overhead, TSS switches, etc).


    That's simply not true, or at least it wasn't when I did my
    evaluations (admittedly on an Intel 80386, quite some time ago).
    And the address space of the 80386 was considerably more than
    4GB; you could address 4GB per segment. (In theory, you could
    have up to 64K segments, but IIRC, in practice, there were some
    additional limitations.)

    You will pay a performance hit when you first load a segment
    register, but this is a one time affaire, normally taking place
    when you switch modes.

    > > I don't know about other architectures, but Intel certainly
    > > supports memory in different segments being mapped at the
    > > same time; it actually offers more possibilities here than
    > > the Sparc.


    > But only sun4u allows to efficiently access two address spaces
    > at the same time. In all Intel (and similar) processors TLB
    > would get invalidated.


    I'm not too sure what you mean by "two address spaces". If you
    mean two separate segments, at least in older Intel processors,
    the TLB would remain valid as long as the segment identifier
    remained in a segment register; there was one per segment.

    (I'll admit that I find both Windows and Linux unacceptable
    here. The Intel processor allows accessing far more than 4GB;
    limiting a single process to 4GB is an artificial constraint,
    imposed by the OS. Limiting it to even less is practically
    unacceptable.)

    --
    James Kanze (GABI Software) email:
    Conseils en informatique orientée objet/
    Beratung in objektorientierter Datenverarbeitung
    9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
    James Kanze, Mar 9, 2009
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. francois
    Replies:
    1
    Views:
    445
    Peter Koch Larsen
    Dec 5, 2003
  2. Replies:
    3
    Views:
    481
    perry
    May 13, 2004
  3. George2
    Replies:
    1
    Views:
    320
    Alf P. Steinbach
    Jan 31, 2008
  4. zl2k
    Replies:
    10
    Views:
    443
    Cholo Lennon
    Apr 4, 2011
  5. R. Sterrenburg

    Huge memory footprint using SOAP extensions!

    R. Sterrenburg, Dec 2, 2003, in forum: ASP .Net Web Services
    Replies:
    0
    Views:
    136
    R. Sterrenburg
    Dec 2, 2003
Loading...

Share This Page