Reclaiming (lots of) memory

Discussion in 'Python' started by Thomas Rast, Oct 2, 2004.

  1. Thomas Rast

    Thomas Rast Guest

    Hello everyone

    My scenario is somewhat involved, so if you're in a hurry, here's an
    abstract: I have a daemon that needs about 80MB of RAM to build its
    internal data structures, but can pack them to 20MB for the rest of
    its lifetime. The other 60MB are never returned to the operating
    system. I've thought of building and packing the structures in a
    fork()ed process, then piping them over to the main part; is there an
    easier way to get my RAM back?

    Explanation:

    The program is part of a toy IRC bot which reads roughly 100'000 lines
    from logfiles and builds a huge dict of dicts which is then used to
    build sentences much like word-based dissociated press would. All
    entries are ints representing words or, as values of the inner
    dictionaries, frequencies.

    The final dictionary has about 325'000 entries, and the python process
    uses around 80MB of RAM. See the test case at the end of my post:
    fill() creates a similiar data structure. If you run it as a script
    and compare the 'ps' outputs, you'll see that 'del d' frees only a
    small part of the memory to the OS; on my Linux system, about 10%.

    Using the standard 'struct' module, I can pack the keys and values of
    the outer dictionary into strings, which brings memory usage down to
    about 20M. Unfortunately, this has to be done as a second step (as
    opposed to always keeping the dictionary in that format), otherwise it
    would slow down things too much. Writing everything to disk (using
    e.g. 'bsddb3') suffers from the same problem; I really have to do the
    initial work in RAM to get acceptable speed.

    So, do I really have to do this in two separate processes? Would it
    help if I implemented the data storage part as a C extension module?
    Or am I worrying too much about a problem that is better left to the
    OS's memory management (i.e. swapping)?

    Of course I'd also appreciate ideas for more efficient data layouts
    ;-)

    Thomas

    ---8<---
    #!/usr/bin/python

    import random
    import os

    def fill(map):
    random.seed(0)
    for i in xrange(300000):
    k1 = (random.randrange(100),
    random.randrange(100),
    random.randrange(100))
    k2 = random.randrange(100)
    if k1 in map:
    d = map[k1]
    else:
    d = dict()
    d[k2] = d.get(k2, 0) + 1
    map[k1] = d

    if __name__ == "__main__":
    os.system('ps v')
    d = dict()
    fill(d)
    os.system('ps v')
    del d
    os.system('ps v')
    --->8---

    --
    If you want to reply by mail, substitute my first and last name for
    'foo' and 'bar', respectively, and remove '.invalid'.
    Thomas Rast, Oct 2, 2004
    #1
    1. Advertising

  2. Thomas Rast

    Paul Rubin Guest

    Thomas Rast <> writes:
    > So, do I really have to do this in two separate processes? Would it
    > help if I implemented the data storage part as a C extension module?
    > Or am I worrying too much about a problem that is better left to the
    > OS's memory management (i.e. swapping)?
    >
    > Of course I'd also appreciate ideas for more efficient data layouts


    When faced with such problems I always find it worthwhile to remember
    that the phone company managed to issue itemized long-distance bills
    to each of its customers, even in the 1960's using mainframe computers
    with a few hundred kb of memory and a room full of magtape drives like
    you see in old movies. I ask myself "how would a 1960's mainframe
    programmer do this, without megabytes of ram?".

    I'm not sure what your dictionary of dictionaries is supposed to do,
    but from your dissociated press description it sounds like you want to
    look at the pairs of adjacent pairs of words and build up a
    probability table. In that case I'd do something like:

    1. Sequentially read the logs files, and for each pair of adjacent words,
    write that pair to a temp file tmp1.
    2. Sort tmp1 with os.system("sort").
    3. Make a sequential pass through tmp1 to build a second temp file tmp2,
    which is a compressed tmp1. For example, if tmp1 has the lines
    apple pie
    apple computer
    apple pie
    apple sauce

    then make a single line in tmp2:

    apple: pie:2,computer:1,sauce:1

    Do that in the obvious way by building a temporary dict for the words
    that follow "apple". Also, while doing this, build a Python dict
    giving the offset in tmp2 for the line for each word, i.e.

    d['apple'] = ofile.tell()
    and similarly for all the other words (you have a Python dict with
    325k entries which is each a single int; that shouldn't be too large).
    There are of course lots of tradeoffs you can make to give a smaller
    dict, if you need to. You could also use the array module to implement
    an int-valued hash table without the overhead of a Python object in each slot.

    4. Use os.mmap to map the file tmp2 into memory. Then to get the
    successor words to 'apple', simply use the in-memory dict to get the
    appropriate offset in tmp2, read and parse the line from that offset,
    and emit the appropriate word.

    5. There's lots of other optimization and compression you can do to
    make tmp2 smaller, which can reduce your paging load, but the above
    should be enough to get you started.
    Paul Rubin, Oct 2, 2004
    #2
    1. Advertising

  3. Thomas Rast

    Paul Rubin Guest

    Paul Rubin <http://> writes:
    > apple pie
    > apple computer
    > apple pie
    > apple sauce


    That of course should have said

    > apple computer
    > apple pie
    > apple pie
    > apple sauce


    since the file is sorted.
    Paul Rubin, Oct 2, 2004
    #3
  4. Thomas Rast

    Thomas Rast Guest

    Paul Rubin <http://> writes:

    > I'm not sure what your dictionary of dictionaries is supposed to do,
    > but from your dissociated press description it sounds like you want to
    > look at the pairs of adjacent pairs of words and build up a
    > probability table. In that case I'd do something like:
    >

    [... use the 'sort' utility on a temporary file ...]

    Thanks a lot for your input! I'm still not sure what exactly I'm
    going to do, since I'd also like to update the tables as new lines
    come in, but I may just have to regenerate them periodically. In any
    case, your post certainly gives me some ideas.

    - Thomas

    --
    If you want to reply by mail, substitute my first and last name for
    'foo' and 'bar', respectively, and remove '.invalid'.
    Thomas Rast, Oct 3, 2004
    #4
  5. Thomas Rast <> wrote:
    > My scenario is somewhat involved, so if you're in a hurry, here's an
    > abstract: I have a daemon that needs about 80MB of RAM to build its
    > internal data structures, but can pack them to 20MB for the rest of
    > its lifetime. The other 60MB are never returned to the operating
    > system.


    First a recap on how memory allocation works! This is general - I
    don't know exactly how the python memory allocation works, but there
    aren't too many ways of doing it under linux.

    The way malloc() and friends work under linux is to claim large blocks
    of memory from the OS. This is either done with brk()/sbrk() or with
    mmmap("/dev/zero") (depending on exactly which version of glibc you
    are using etc).

    For brk()/sbrk() malloc keeps a single area which grows and grows.
    This area *can* be shrunk, but only if there is nothing that was
    allocated in the way.

    mmap("/dev/zero") behaves in pretty much the same way (ie it shrinks
    and grows), except for large allocation (>4k I think) the allocation
    is done in its own mmap() area. This means that when it is freed it
    completely disappears.

    So for either of these methods freeing memory back to the OS relies on
    the fact that there has been nothing allocated that will stop the
    memory shrinking back down all the way. In a dynamic object language
    like python thats pretty difficult to guarantee - perhaps impossible!

    The one ray of hope is the mmap("/dev/zero") for blocks bigger than
    4k. I'm pretty sure this is how modern glibc's work by default.

    What you really want to do is get the allocation of the initial dict
    and all of its data to be in its own mmap() block. No idea whether
    this is possible in python though! Its the kind of thing you could do
    in C++ if you could be bothered to wade through all the tedious syntax
    ;-)

    When I've had this sort of problem in the past I've usually resorted
    to an on disk database (dbm of some kind). These are usually fast
    enough. It would have the advantage that you'd only have to parse the
    log files once too. They are reasonably fast to access too as all the
    hot pages soon get into the OS page cache.

    Also as suggested by another poster /usr/bin/sort is very, very good
    at huge datasets using minimal memory.

    > I've thought of building and packing the structures in a fork()ed
    > process, then piping them over to the main part;


    Thats a neat idea. You could also build a dbm in the forked process.

    > is there an easier way to get my RAM back?


    I don't think so, but hopefully an expert who knows something about
    how python allocates its objects will speak up now!

    ....

    Just for fun I converted your program to use anydbm. It then took 135
    seconds to run (vs 12 for your original code), but only used 4 MB of
    memory top. The values of the dbm are pickled hashes. The keys could
    be too, but I decided to just use a tab seperated string...

    import random
    import os
    import anydbm
    import cPickle

    def fill(map):
    random.seed(0)
    for i in xrange(300000):
    k1 = "%s\t%s\t%s" % (random.randrange(100),
    random.randrange(100),
    random.randrange(100))
    k2 = random.randrange(100)
    if k1 in map:
    d = cPickle.loads(map[k1])
    else:
    d = dict()
    d[k2] = d.get(k2, 0) + 1
    map[k1] = cPickle.dumps(d, -1)

    if __name__ == "__main__":
    show_me = 'ps v %d' % os.getpid()
    os.system(show_me)
    d = anydbm.open("ztest", 'n')
    fill(d)
    os.system(show_me)
    del d
    os.system(show_me)

    --
    Nick Craig-Wood <> -- http://www.craig-wood.com/nick
    Nick Craig-Wood, Oct 4, 2004
    #5
  6. Thomas Rast

    Paul Rubin Guest

    Nick Craig-Wood <> writes:
    > Just for fun I converted your program to use anydbm. It then took 135
    > seconds to run (vs 12 for your original code), but only used 4 MB of
    > memory top. The values of the dbm are pickled hashes. The keys could
    > be too, but I decided to just use a tab seperated string...


    But I think it used much more than 4 MB of memory if you count the
    amount of system cache that held that dbm file. Without the caching
    there'd have had to be real disk seeks for all those dbm updates. So
    I think if you had a lot more data, enough that you couldn't keep the
    dbm in cache any more, you'd get a horrible slowdown. That's why you
    may be better off with a sorting-based method, that only needs
    sequential disk operations and not a lot of random seeks.
    Paul Rubin, Oct 4, 2004
    #6
  7. Paul Rubin <> wrote:
    > Nick Craig-Wood <> writes:
    > > Just for fun I converted your program to use anydbm. It then took 135
    > > seconds to run (vs 12 for your original code), but only used 4 MB of
    > > memory top. The values of the dbm are pickled hashes. The keys could
    > > be too, but I decided to just use a tab seperated string...

    >
    > But I think it used much more than 4 MB of memory if you count the
    > amount of system cache that held that dbm file.


    The dbm file is only 10 MB though, so thats 10MB of memory extra used
    which *is* reclaimable!

    > Without the caching there'd have had to be real disk seeks for all
    > those dbm updates. So I think if you had a lot more data, enough
    > that you couldn't keep the dbm in cache any more, you'd get a
    > horrible slowdown.


    You've only got to keep the index of the dbm in cache and its designed
    for that purpose.

    You do get lots of disk io when making the dbm though - I suspect it
    does a lot of fsync() - you may be able to turn that off.

    > That's why you may be better off with a sorting-based method, that
    > only needs sequential disk operations and not a lot of random
    > seeks.


    A sorting based method will be better for huge datasets that are only
    compiled once. I would have thought a dbm would win for lots of
    updates and medium sized datasets.

    --
    Nick Craig-Wood <> -- http://www.craig-wood.com/nick
    Nick Craig-Wood, Oct 4, 2004
    #7
  8. Thomas Rast

    Thomas Rast Guest

    Nick Craig-Wood <> writes:

    > mmap("/dev/zero") behaves in pretty much the same way (ie it shrinks
    > and grows), except for large allocation (>4k I think) the allocation
    > is done in its own mmap() area. This means that when it is freed it
    > completely disappears.


    Ok. That explains why e.g.

    >>> l = [0] * 10000000 # 10 million
    >>> del l


    frees the memory used for l, as expected.

    > Thats a neat idea. You could also build a dbm in the forked
    > process.


    I've implemented it that way now. Using a pipe to transfer the data
    to the dbm appears to essentially double the I/O overhead and is very
    inefficient.

    > > is there an easier way to get my RAM back?

    >
    > I don't think so, but hopefully an expert who knows something about
    > how python allocates its objects will speak up now!


    Well, can't blame Python if you'd need black magic even in C ;-)

    Thanks for the explanations!

    Thomas

    --
    If you want to reply by mail, substitute my first and last name for
    'foo' and 'bar', respectively, and remove '.invalid'.
    Thomas Rast, Oct 4, 2004
    #8
  9. Thomas Rast

    Thomas Rast Guest

    Nick Craig-Wood <> writes:

    > Paul Rubin <> wrote:
    > > But I think it used much more than 4 MB of memory if you count the
    > > amount of system cache that held that dbm file.

    >
    > The dbm file is only 10 MB though, so thats 10MB of memory extra used
    > which *is* reclaimable!


    Just to clear this up, the problem isn't that the process takes too
    much memory while initializing. The machine that runs it has 256MB
    RAM, and for all I care it can fill all available memory (when the
    logfiles have grown to about four times what they are today, it
    will...). The problem was that it wouldn't free *unused* RAM after
    initialization.

    Anyway, it works now. Thanks for your help Nick and Paul!

    - Thomas

    --
    If you want to reply by mail, substitute my first and last name for
    'foo' and 'bar', respectively, and remove '.invalid'.
    Thomas Rast, Oct 4, 2004
    #9
  10. Thomas Rast

    Andrew Dalke Guest

    Thomas Rast wrote:
    > I've implemented it that way now. Using a pipe to transfer the data
    > to the dbm appears to essentially double the I/O overhead and is very
    > inefficient.

    ...
    > Well, can't blame Python if you'd need black magic even in C ;-)


    You could also try using shared memory. There's the raw
    layer, like http://0pointer.de/lennart/projects/libshbuf/
    or a higher layer like POSH ( http://poshmodule.sourceforge.net/ ).

    I've used neither. The letter only works on Intel (and
    perhaps only Linux) because it does locking with some inline
    asm.

    Andrew
    Andrew Dalke, Oct 5, 2004
    #10
  11. Thomas Rast

    James Kew Guest

    "Thomas Rast" <> wrote in message
    news:...
    > Just to clear this up, the problem isn't that the process takes too
    > much memory while initializing. [...]
    > The problem was that it wouldn't free *unused* RAM after
    > initialization.


    Is this really a problem though, if the process is long-running after
    initialisation? If the pages are unused after initialisation, won't they
    eventually get swapped out to disk making the physical RAM available
    elsewhere?

    Or is this too simple a view?

    James
    James Kew, Oct 5, 2004
    #11
  12. "James Kew" <> wrote in message news:<>...
    > "Thomas Rast" <> wrote in message
    > news:...
    > > Just to clear this up, the problem isn't that the process takes too
    > > much memory while initializing. [...]
    > > The problem was that it wouldn't free *unused* RAM after
    > > initialization.

    >
    > Is this really a problem though, if the process is long-running after
    > initialisation? If the pages are unused after initialisation, won't they
    > eventually get swapped out to disk making the physical RAM available
    > elsewhere?
    >



    Why swap out 60MB of "unused" memory??? That's a waste of CPU cycles
    and disk space.

    What if you wanted to run his program on a machine with 128MB of RAM
    and NO swap or extremely slow swap. If it freed the memory it would
    run nicely with other apps, if it doesn't return the memory to the OS,
    it might crash if the system needs to swap or the app might fail if
    everything slows down too much while swapping those 60MB of unused
    memory.

    --
    Luis P Caamano
    Luis P Caamano, Oct 22, 2004
    #12
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. CJ

    Reclaiming locks

    CJ, Oct 29, 2007, in forum: C Programming
    Replies:
    6
    Views:
    324
    David Schwartz
    Oct 30, 2007
  2. brad
    Replies:
    9
    Views:
    369
    Bruno Desthuilliers
    Jun 19, 2008
  3. softwarepearls_com

    Reclaiming this group for Java enthusiasts

    softwarepearls_com, Sep 29, 2008, in forum: Java
    Replies:
    4
    Views:
    390
    Arne Vajhøj
    Oct 4, 2008
  4. Mario
    Replies:
    1
    Views:
    146
  5. coolneo
    Replies:
    9
    Views:
    187
    coolneo
    Jan 30, 2007
Loading...

Share This Page