Cache a large list to disk

Discussion in 'Python' started by Chris, May 17, 2004.

  1. Chris

    Chris Guest

    I have a set of routines, the first of which reads lots and lots of
    data from disparate regions of disk. This read routine takes 40
    minutes on a P3-866 (with IDE drives). This routine populates an
    array with a number of dictionaries, e.g.,

    [{'el2': 0, 'el3': 0, 'el1': 0, 'el4': 0, 'el5': 0},
    {'el2': 15, 'el3': 21, 'el1': 9, 'el4': 33, 'el5': 51},
    {'el2': 35, 'el3': 49, 'el1': 21, 'el4': 77, 'el5': 119},
    {'el2': 45, 'el3': 63, 'el1': 27, 'el4': 99, 'el5': 153}]
    (not actually the data i'm reading)

    This information is acted upon by subsequent routines. These routines
    change very often, but the data changes very infrequently (the
    opposite pattern of what I'm used to). This data changes once per
    week, so I can safely cache this data to a big file on disk, and read
    out of this big file -- rather than having to read about 10,000 files
    -- when the program is loaded.

    Now, if this were C I'd know how to do this in a pretty
    straightforward manner. But being new to Python, I don't know how I
    can (hopefully easily) write this data to a file, and then read it out
    into memory on subsequent launches.

    If anyone can provide some pointers, or even some sample code on how
    to accomplish this, it would be greatly appreciated.

    Thanks in advance for any help.
    -cjl
     
    Chris, May 17, 2004
    #1
    1. Advertising

  2. Chris

    Karl Chen Guest

    Karl Chen, May 17, 2004
    #2
    1. Advertising

  3. Chris

    Peter Otten Guest

    Chris wrote:

    > week, so I can safely cache this data to a big file on disk, and read
    > out of this big file -- rather than having to read about 10,000 files
    > -- when the program is loaded.
    >
    > Now, if this were C I'd know how to do this in a pretty
    > straightforward manner. But being new to Python, I don't know how I
    > can (hopefully easily) write this data to a file, and then read it out
    > into memory on subsequent launches.


    Have a look at pickle:

    >>> data = [{'el2': 0, 'el3': 0, 'el1': 0, 'el4': 0, 'el5': 0},

    .... {'el2': 15, 'el3': 21, 'el1': 9, 'el4': 33, 'el5': 51},
    .... {'el2': 35, 'el3': 49, 'el1': 21, 'el4': 77, 'el5': 119},
    .... {'el2': 45, 'el3': 63, 'el1': 27, 'el4': 99, 'el5': 153}]
    >>>
    >>> import cPickle as pickle # cPickle is pickle implemented in C
    >>> pickle.dump(data, file("tmp.pickle", "w"))
    >>> data_reloaded = pickle.load(file("tmp.pickle"))
    >>> data_reloaded == data, data_reloaded is data

    (True, False)
    >>> data_reloaded

    [{'el2': 0, 'el3': 0, 'el1': 0, 'el4': 0, 'el5': 0}, {'el2': 15, 'el
    3': 21, 'el1': 9, 'el4': 33, 'el5': 51}, {'el2': 35, 'el3': 49, 'el1
    ': 21, 'el4': 77, 'el5': 119}, {'el2': 45, 'el3': 63, 'el1': 27, 'el
    4': 99, 'el5': 153}]

    Peter
     
    Peter Otten, May 17, 2004
    #3
  4. Peter Otten wrote:

    > Chris wrote:
    >
    >> week, so I can safely cache this data to a big file on disk, and read
    >> out of this big file -- rather than having to read about 10,000 files
    >> -- when the program is loaded.
    >>
    >> Now, if this were C I'd know how to do this in a pretty
    >> straightforward manner. But being new to Python, I don't know how I
    >> can (hopefully easily) write this data to a file, and then read it out
    >> into memory on subsequent launches.

    >
    > Have a look at pickle:
    >
    >>>> data = [{'el2': 0, 'el3': 0, 'el1': 0, 'el4': 0, 'el5': 0},

    > ... {'el2': 15, 'el3': 21, 'el1': 9, 'el4': 33, 'el5': 51},
    > ... {'el2': 35, 'el3': 49, 'el1': 21, 'el4': 77, 'el5': 119},
    > ... {'el2': 45, 'el3': 63, 'el1': 27, 'el4': 99, 'el5': 153}]
    >>>>
    >>>> import cPickle as pickle # cPickle is pickle implemented in C


    And, yes, cPickle is faster. A lot faster.

    There are switches you can throw to have it use binary instead of sticking
    to readable characters for some savings, too.
     
    Svein Ove Aas, May 17, 2004
    #4
  5. Chris

    Paul Rubin Guest

    (Chris) writes:
    > I have a set of routines, the first of which reads lots and lots of
    > data from disparate regions of disk. This read routine takes 40
    > minutes on a P3-866 (with IDE drives). This routine populates an
    > array with a number of dictionaries, e.g.,
    >
    > [{'el2': 0, 'el3': 0, 'el1': 0, 'el4': 0, 'el5': 0},
    > {'el2': 15, 'el3': 21, 'el1': 9, 'el4': 33, 'el5': 51},
    > {'el2': 35, 'el3': 49, 'el1': 21, 'el4': 77, 'el5': 119},
    > {'el2': 45, 'el3': 63, 'el1': 27, 'el4': 99, 'el5': 153}]
    > (not actually the data i'm reading)


    The dict entries are the same for each list item?

    > Now, if this were C I'd know how to do this in a pretty
    > straightforward manner. But being new to Python, I don't know how I
    > can (hopefully easily) write this data to a file, and then read it out
    > into memory on subsequent launches.
    >
    > If anyone can provide some pointers, or even some sample code on how
    > to accomplish this, it would be greatly appreciated.


    I dunno what the question is. You can open files, seek on them, etc.
    in Python just like in C. You can use the mmap module to map a file
    into memory. If you want to lose some efficiency, you can write out
    the Python objects (dicts, lists, etc) with the pickle or cpickle modules.
     
    Paul Rubin, May 18, 2004
    #5
  6. Chris

    Chris Guest

    Ah, poifect =) This'll do just fine. Thanks, folks.


    Karl Chen <> wrote in message news:<>...
    > Use the pickle or shelve modules.
    >
    > http://www.python.org/doc/current/lib/module-pickle.html
    >
    > http://www.python.org/doc/current/lib/module-shelve.html
    >
    > >>>>> "Chris" == Chris <> writes:

    > Chris> straightforward manner. But being new to Python, I
    > Chris> don't know how I can (hopefully easily) write this data
    > Chris> to a file, and then read it out into memory on
    > Chris> subsequent launches.
     
    Chris, May 18, 2004
    #6
  7. Chris <> wrote:
    > I have a set of routines, the first of which reads lots and lots of
    > data from disparate regions of disk. This read routine takes 40
    > minutes on a P3-866 (with IDE drives). This routine populates an
    > array with a number of dictionaries, e.g.,
    >
    > [{'el2': 0, 'el3': 0, 'el1': 0, 'el4': 0, 'el5': 0},
    > {'el2': 15, 'el3': 21, 'el1': 9, 'el4': 33, 'el5': 51},
    > {'el2': 35, 'el3': 49, 'el1': 21, 'el4': 77, 'el5': 119},
    > {'el2': 45, 'el3': 63, 'el1': 27, 'el4': 99, 'el5': 153}]
    > (not actually the data i'm reading)
    >
    > This information is acted upon by subsequent routines. These routines
    > change very often, but the data changes very infrequently (the
    > opposite pattern of what I'm used to). This data changes once per
    > week, so I can safely cache this data to a big file on disk, and read
    > out of this big file -- rather than having to read about 10,000 files
    > -- when the program is loaded.
    >
    > Now, if this were C I'd know how to do this in a pretty
    > straightforward manner. But being new to Python, I don't know how I
    > can (hopefully easily) write this data to a file, and then read it out
    > into memory on subsequent launches.
    >
    > If anyone can provide some pointers, or even some sample code on how
    > to accomplish this, it would be greatly appreciated.


    as already mentioned, use cPickle or shelve
    However, depending how big and how many your dictionaries are,
    you can use *dbm databases instead of dictionaries, with numbers
    packed up using struct module (I found out it is sometimes much
    efficient than using shelve).
    Looking at your sample, you could even reorganize the data as:
    {'el2': [0, 15, 35, 45],
    'el3': [0, 21, 49, 63],
    ...
    }
    and use one big dbm database, with lists represented as array objects -
    that is going to give you major memory efficiency boost.

    If the arrays are going to be big (like really BIG, of some tens
    of megabytes), you can store them one per file, and use mmap
    to access them - I am doing now something similar


    --
    -----------------------------------------------------------
    | Radovan Garabík http://melkor.dnp.fmph.uniba.sk/~garabik/ |
    | __..--^^^--..__ garabik @ kassiopeia.juls.savba.sk |
    -----------------------------------------------------------
    Antivirus alert: file .signature infected by signature virus.
    Hi! I'm a signature virus! Copy me into your signature file to help me spread!
     
    Radovan Garabik, May 18, 2004
    #7
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Jas Shultz
    Replies:
    0
    Views:
    974
    Jas Shultz
    Dec 3, 2003
  2. Paul J. Lucas

    Write-to-disk cache via WeakReferences?

    Paul J. Lucas, Jul 23, 2005, in forum: Java
    Replies:
    6
    Views:
    449
    Paul J. Lucas
    Jul 25, 2005
  3. Paul J. Lucas
    Replies:
    2
    Views:
    388
  4. Paul J. Lucas
    Replies:
    5
    Views:
    422
  5. Replies:
    12
    Views:
    529
    santosh
    Nov 15, 2006
Loading...

Share This Page