RE: Populating huge data structures from disk

Discussion in 'Python' started by Michael Bacarella, Nov 6, 2007.


  1. > > For various reasons I need to cache about 8GB of data from disk into

    core on
    > > application startup.

    >
    > Are you sure? On PC hardware, at least, doing this doesn't make any
    > guarantee that accessing it actually going to be any faster. Is just
    > mmap()ing the file a problem for some reason?
    >
    > I assume you're on a 64 bit machine.


    Very sure. If we hit the disk at all performance drops unacceptably. The
    application
    has low locality of reference so on-demand caching isn't an option. We get
    the behavior
    we want when we pre-cache; the issue is simply that it takes so long to
    build this cache.

    > > Building this cache takes nearly 2 hours on modern hardware. I am

    surprised
    > > to discover that the bottleneck here is CPU.
    > >
    > > The reason this is surprising is because I expect something like this to

    be
    > > very fast:
    > >
    > > #!python
    > > import array
    > >
    > > a = array.array('L')
    > >
    > > f = open('/dev/zero','r')
    > >
    > > while True:
    > >
    > > a.fromstring(f.read(8))

    >
    > This just creates the same array over and over, forever. Is this
    > really the code you meant to write? I don't know why you'd expect an
    > infinite loop to be "fast"...


    Not exactly. fromstring() appends to the array. It's growing the array
    towards
    infinity. Since infinity never finishes it's hard to get an idea of how
    slow
    this looks. Let's do 800MB instead.

    Here's an example of loading 800MB in C:

    $ time ./eat800

    real 0m44.939s
    user 0m10.620s
    sys 0m34.303s

    $ cat eat800.c
    #include <stdio.h>
    #include <stdlib.h>
    #include <fcntl.h>

    int main(void)
    {
    int f = open("/dev/zero",O_RDONLY);
    int vlen = 8;
    long *v = malloc((sizeof (long)) * vlen);
    int i;

    for (i = 0; i < 100000000; i++) {
    if (i >= vlen) {
    vlen *= 2;
    v = (long *)realloc(v,(sizeof (long)) * vlen);
    }
    read(f,v+i,sizeof (long));
    }
    return 0;
    }

    Here's the similar operation in Python:
    $ time python eat800.py

    real 3m8.407s
    user 2m40.189s
    sys 0m27.934s

    $ cat eat800.py
    #!/usr/bin/python

    import array
    a = array.array('L')

    f = open('/dev/zero')
    for i in xrange(100000000):
    a.fromstring(f.read(8))


    They spend about the same amount of time in system, but Python spends 4.7x
    as much
    CPU in userland as C does.

    And there's no solace in lists either:

    $ time python eat800.py

    real 4m2.796s
    user 3m57.865s
    sys 0m3.638s

    $ cat eat800.py
    #!/usr/bin/python

    import struct

    d = []
    f = open('/dev/zero')
    for i in xrange(100000000):
    d.append(struct.unpack('L',f.read(8))[0])


    cPickle with protocol 2 has some promise but is more complicated because
    arrays can't be pickled. In a perfect world I could do something like this
    somewhere in the backroom:

    x = lengthy_number_crunching()
    magic.save_mmap("/important-data")

    and in the application do...

    x = magic.mmap("/important-data")
    magic.mlock("/important-data")

    and once the mlock finishes bringing important-data into RAM, at
    the speed of your disk I/O subsystem, all accesses to x will be
    hits against RAM.


    Any thoughts?
     
    Michael Bacarella, Nov 6, 2007
    #1
    1. Advertising

  2. Michael Bacarella

    Neil Cerutti Guest

    On 2007-11-06, Michael Bacarella <> wrote:
    > And there's no solace in lists either:
    >
    > $ time python eat800.py
    >
    > real 4m2.796s
    > user 3m57.865s
    > sys 0m3.638s
    >
    > $ cat eat800.py
    > #!/usr/bin/python
    >
    > import struct
    >
    > d = []
    > f = open('/dev/zero')
    > for i in xrange(100000000):
    > d.append(struct.unpack('L',f.read(8))[0])
    >
    >
    > cPickle with protocol 2 has some promise but is more complicated because
    > arrays can't be pickled. In a perfect world I could do something like this
    > somewhere in the backroom:
    >
    > x = lengthy_number_crunching()
    > magic.save_mmap("/important-data")
    >
    > and in the application do...
    >
    > x = magic.mmap("/important-data")
    > magic.mlock("/important-data")
    >
    > and once the mlock finishes bringing important-data into RAM, at
    > the speed of your disk I/O subsystem, all accesses to x will be
    > hits against RAM.
    >
    >
    > Any thoughts?


    Disable the garbage collector, use a while loop and manual index
    instead of an iterator, preallocate your list, e.g.,
    [None]*100000000, and hope they don't have blasters!

    --
    Neil Cerutti
     
    Neil Cerutti, Nov 6, 2007
    #2
    1. Advertising

  3. "Michael Bacarella" <> writes:

    > cPickle with protocol 2 has some promise but is more complicated because
    > arrays can't be pickled.


    This is not true:

    >>> import array
    >>> a = array.array('L')
    >>> a.extend(xrange(10))
    >>> a

    array('L', [0L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L])
    >>> import cPickle as pickle
    >>> s = pickle.dumps(a, -1)
    >>> pickle.loads(s)

    array('L', [0L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L])

    But I don't think unpickling will be any faster than array.fromstring.

    Anyway, why not use array.fromfile instead? It's actually *faster*
    than the C code you posted:

    $ time ./eat80 # not eat800 because I didn't feel like waiting
    ../eat80 0.58s user 2.43s system 93% cpu 3.226 total

    $ cat eat80.py
    #!/usr/bin/python
    import array
    a = array.array('L')
    f = open('/dev/zero')
    a.fromfile(f, 10000000)
    print len(a)

    $ time ./eat80.py
    10000000
    ../eat80.py 0.02s user 0.00s system 48% cpu 0.058 total
     
    Hrvoje Niksic, Nov 6, 2007
    #3
  4. Michael Bacarella

    Paul Rubin Guest

    "Michael Bacarella" <> writes:
    > Very sure. If we hit the disk at all performance drops
    > unacceptably. The application has low locality of reference so
    > on-demand caching isn't an option. We get the behavior we want when
    > we pre-cache; the issue is simply that it takes so long to build
    > this cache.


    The way I do it is run a separate process that mmaps the file and
    reads one byte from each page every half hour or so. You are right
    that it makes a huge difference.
     
    Paul Rubin, Nov 6, 2007
    #4
  5. > > Very sure. If we hit the disk at all performance drops
    > > unacceptably. The application has low locality of reference so
    > > on-demand caching isn't an option. We get the behavior we want when
    > > we pre-cache; the issue is simply that it takes so long to build
    > > this cache.

    >
    > The way I do it is run a separate process that mmaps the file and
    > reads one byte from each page every half hour or so. You are right
    > that it makes a huge difference.


    Why not just disable swap?
     
    Michael Bacarella, Nov 6, 2007
    #5
  6. Michael Bacarella

    Paul Rubin Guest

    "Michael Bacarella" <> writes:
    > > The way I do it is run a separate process that mmaps the file and
    > > reads one byte from each page every half hour or so. You are right
    > > that it makes a huge difference.

    >
    > Why not just disable swap?


    The system is demand paged. If swap is disabled, pages can still get
    released from memory since they can be paged back in later. Remember
    we are talking about an actual disk file (persistent data) that
    happens to be mmap'ed into a running program. So paging would be
    to the file system and not the swap area.
     
    Paul Rubin, Nov 7, 2007
    #6
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. =?ISO-8859-2?Q?Martin_MOKREJ=A9?=

    Writing huge Sets() to disk

    =?ISO-8859-2?Q?Martin_MOKREJ=A9?=, Jan 10, 2005, in forum: Python
    Replies:
    7
    Views:
    295
    Bengt Richter
    Jan 11, 2005
  2. =?ISO-8859-15?Q?Martin_MOKREJ=A6?=

    Re: Writing huge Sets() to disk

    =?ISO-8859-15?Q?Martin_MOKREJ=A6?=, Jan 10, 2005, in forum: Python
    Replies:
    4
    Views:
    370
    Paul Rubin
    Jan 11, 2005
  3. Alfonso Morra
    Replies:
    11
    Views:
    720
    Emmanuel Delahaye
    Sep 24, 2005
  4. Replies:
    12
    Views:
    523
    santosh
    Nov 15, 2006
  5. Replies:
    3
    Views:
    506
Loading...

Share This Page