Replacement for the shelve module?

Discussion in 'Python' started by Forafo San, Aug 19, 2011.

  1. Forafo San

    Forafo San Guest

    Folks,
    What might be a good replacement for the shelve module, but one that
    can handle a few gigs of data. I'm doing some calculations on daily
    stock prices and the result is a nested list like:

    [[date_1, floating result 1],
    [date_2, floating result 2],
    ....
    [date_n, floating result n]]

    However, there are about 5,000 lists like that, one for each stock
    symbol. Using the shelve module I could easily save them to a file
    ( myshelvefile['symbol_1') = symbol_1_list) and likewise retrieve the
    data. But shelve is deprecated AND when a lot of data is written
    shelve was acting weird (refusing to write, filesizes reported with an
    "ls" did not make sense, etc.).

    Thanks in advance for your suggestions.
     
    Forafo San, Aug 19, 2011
    #1
    1. Advertising

  2. Forafo San

    Ken Watford Guest

    On Fri, Aug 19, 2011 at 11:31 AM, Forafo San <> wrote:
    > Folks,
    > What might be a good replacement for the shelve module, but one that
    > can handle a few gigs of data. I'm doing some calculations on daily
    > stock prices and the result is a nested list like:


    For what you're doing, I would give PyTables a try.
     
    Ken Watford, Aug 19, 2011
    #2
    1. Advertising

  3. On 19/08/11 17:31, Forafo San wrote:
    > Folks,
    > What might be a good replacement for the shelve module, but one that
    > can handle a few gigs of data. I'm doing some calculations on daily
    > stock prices and the result is a nested list like:
    >
    > [[date_1, floating result 1],
    > [date_2, floating result 2],
    > ...
    > [date_n, floating result n]]
    >
    > However, there are about 5,000 lists like that, one for each stock
    > symbol. Using the shelve module I could easily save them to a file
    > ( myshelvefile['symbol_1') = symbol_1_list) and likewise retrieve the
    > data. But shelve is deprecated AND when a lot of data is written
    > shelve was acting weird (refusing to write, filesizes reported with an
    > "ls" did not make sense, etc.).
    >
    > Thanks in advance for your suggestions.


    Firstly, since when is shelve deprecated? Shouldn't there be a
    deprecation warning on http://docs.python.org/dev/library/shelve.html ?

    If you want to keep your current approach of having an object containing
    all the data for each symbol, you will have to think about how to
    serialise the data, as well as how to store the documents/objects
    individually. For the serialisation, you can use pickle (as shelve does)
    or JSON (probably better because it's easier to edit directly, and
    therefore easier to debug).
    To store these documents, you could use a huge pickle'd Python
    dictionary (bad idea), a UNIX database (dbm module, anydbm in Python2;
    this is what shelve uses), or simple the file system: one file per
    serialised object.

    Looking at your use case, however, I think what you really should use is
    a SQL database. SQLite is part of Python and will do the job nicely.
    Just use a single table with three columns: symbol, date, value.

    Thomas
     
    Thomas Jollans, Aug 19, 2011
    #3
  4. Forafo San

    Forafo San Guest

    On Aug 19, 11:54 am, Thomas Jollans <> wrote:
    > On 19/08/11 17:31, Forafo San wrote:
    >
    >
    >
    >
    >
    >
    >
    >
    >
    > > Folks,
    > > What might be a good replacement for the shelve module, but one that
    > > can handle a few gigs of data. I'm doing some calculations on daily
    > > stock prices and the result is a nested list like:

    >
    > > [[date_1, floating result 1],
    > >  [date_2, floating result 2],
    > > ...
    > >  [date_n, floating result n]]

    >
    > > However, there are about 5,000 lists like that, one for each stock
    > > symbol. Using the shelve module I could easily save them to a file
    > > ( myshelvefile['symbol_1') = symbol_1_list) and likewise retrieve the
    > > data. But shelve is deprecated AND when a lot of data is written
    > > shelve was acting weird (refusing to write, filesizes reported with an
    > > "ls" did not make sense, etc.).

    >
    > > Thanks in advance for your suggestions.

    >
    > Firstly, since when is shelve deprecated? Shouldn't there be a
    > deprecation warning onhttp://docs.python.org/dev/library/shelve.html?
    >
    > If you want to keep your current approach of having an object containing
    > all the data for each symbol, you will have to think about how to
    > serialise the data, as well as how to store the documents/objects
    > individually. For the serialisation, you can use pickle (as shelve does)
    > or JSON (probably better because it's easier to edit directly, and
    > therefore easier to debug).
    > To store these documents, you could use a huge pickle'd Python
    > dictionary (bad idea), a UNIX database (dbm module, anydbm in Python2;
    > this is what shelve uses), or simple the file system: one file per
    > serialised object.
    >
    > Looking at your use case, however, I think what you really should use is
    > a SQL database. SQLite is part of Python and will do the job nicely.
    > Just use a single table with three columns: symbol, date, value.
    >
    > Thomas


    Sorry. There is no indication that shelve is deprecated. I was using
    it on a FreeBSD system and it turns out that the bsddb module is
    deprecated and confused it with the shelve module.

    Thanks Ken and Thomas for your suggestions -- I will play around with
    both and pick one.
     
    Forafo San, Aug 19, 2011
    #4
  5. Forafo San

    Miki Tebeka Guest

    You might check one of many binary encoders (like Avro, Thrift ...).
    The other option is to use a database, sqlite3 is pretty fast (if you schema is fixed). Otherwise you can look at some NoSQL ones (like MongoDB).
     
    Miki Tebeka, Aug 19, 2011
    #5
  6. Forafo San

    Robert Kern Guest

    On 8/19/11 10:49 AM, Ken Watford wrote:
    > On Fri, Aug 19, 2011 at 11:31 AM, Forafo San<> wrote:
    >> Folks,
    >> What might be a good replacement for the shelve module, but one that
    >> can handle a few gigs of data. I'm doing some calculations on daily
    >> stock prices and the result is a nested list like:

    >
    > For what you're doing, I would give PyTables a try.


    For a few gigs of stock price data, this is what I use. Much better than SQLite
    for that amount of data.

    --
    Robert Kern

    "I have come to believe that the whole world is an enigma, a harmless enigma
    that is made terrible by our own mad attempt to interpret it as though it had
    an underlying truth."
    -- Umberto Eco
     
    Robert Kern, Aug 19, 2011
    #6
  7. Forafo San wrote:

    > Folks,
    > What might be a good replacement for the shelve module, but one that
    > can handle a few gigs of data. I'm doing some calculations on daily
    > stock prices and the result is a nested list like:
    >
    > [[date_1, floating result 1],
    > [date_2, floating result 2],
    > ...
    > [date_n, floating result n]]
    >
    > However, there are about 5,000 lists like that, one for each stock
    > symbol.



    You might save some memory by using tuples rather than lists:

    >>> sys.getsizeof(["01/01/2000", 123.456]) # On a 32-bit system.

    40
    >>> sys.getsizeof(("01/01/2000", 123.456))

    32


    By the way, you know that you should never, ever use floats for currency,
    right?

    http://vladzloteanu.wordpress.com/2...ting-point-issues-explained-for-ruby-and-ror/
    http://stackoverflow.com/questions/3730019/why-not-use-double-or-float-to-represent-currency


    > Using the shelve module I could easily save them to a file
    > ( myshelvefile['symbol_1') = symbol_1_list) and likewise retrieve the
    > data. But shelve is deprecated


    It certainly is not.

    http://docs.python.org/library/shelve.html
    http://docs.python.org/py3k/library/shelve.html

    Not a word about it being deprecated in either Python 2.x or 3.x.


    > AND when a lot of data is written
    > shelve was acting weird (refusing to write, filesizes reported with an
    > "ls" did not make sense, etc.).


    I would like to see this replicated. If it is true, that's a bug in shelve,
    but I expect you're probably doing something wrong.



    --
    Steven
     
    Steven D'Aprano, Aug 19, 2011
    #7
  8. Forafo San

    Robert Kern Guest

    On 8/19/11 3:36 PM, Steven D'Aprano wrote:

    > By the way, you know that you should never, ever use floats for currency,
    > right?


    That's just incorrect. You shouldn't use (binary) floats for many *accounting*
    purposes, but for many financial/econometric analyses, floats are de rigeur and
    work much better than decimals (either floating or fixed point). If you are
    collecting gigs of stock prices, you are much more likely to be doing the latter
    than the former.

    --
    Robert Kern

    "I have come to believe that the whole world is an enigma, a harmless enigma
    that is made terrible by our own mad attempt to interpret it as though it had
    an underlying truth."
    -- Umberto Eco
     
    Robert Kern, Aug 19, 2011
    #8
  9. Robert Kern wrote:

    > On 8/19/11 3:36 PM, Steven D'Aprano wrote:
    >
    >> By the way, you know that you should never, ever use floats for currency,
    >> right?

    >
    > That's just incorrect. You shouldn't use (binary) floats for many
    > *accounting* purposes, but for many financial/econometric analyses, floats
    > are de rigeur and work much better than decimals (either floating or fixed
    > point). If you are collecting gigs of stock prices, you are much more
    > likely to be doing the latter than the former.



    That makes sense, and I stand corrected.



    --
    Steven
     
    Steven D'Aprano, Aug 20, 2011
    #9
  10. > Robert Kern wrote:
    >
    >>That's just incorrect. You shouldn't use (binary) floats for many
    >>*accounting* purposes, but for many financial/econometric analyses, floats
    >>are de rigeur and work much better than decimals


    There's a certain accounting package I work with that *does*
    use floats -- binary ones -- for accounting purposes, and
    somehow manages to get away with it. Not something I would
    recommend trying at home, though.

    --
    Greg
     
    Gregory Ewing, Aug 21, 2011
    #10
  11. On Sun, Aug 21, 2011 at 1:37 AM, Gregory Ewing
    <> wrote:
    > There's a certain accounting package I work with that *does*
    > use floats -- binary ones -- for accounting purposes, and
    > somehow manages to get away with it. Not something I would
    > recommend trying at home, though.
    >


    Probably quite a few, actually. It's not a very visible problem so
    long as you always have plenty of "spare precision", and you round
    everything off to two decimals (or however many for your currency).
    Eventually you'll start seeing weird results that are a cent off, but
    you won't notice them often. And hey. You store $1.23 as 1.23, and it
    just works! It must be the right thing to do!

    Me, I store dollars-and-cents currency in cents. Always. But that's
    because I never need fractional cents. I'm not sure what the best way
    to handle fractional cents is, but I'm fairly confident that this
    isn't it:

    http://thedailywtf.com/Articles/Price-in-Nonsense.aspx

    ChrisA
     
    Chris Angelico, Aug 21, 2011
    #11
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. PiedmontBiz
    Replies:
    0
    Views:
    300
    PiedmontBiz
    Jun 13, 2004
  2. Michael P. Soulier

    oddness in shelve module

    Michael P. Soulier, Jun 22, 2005, in forum: Python
    Replies:
    2
    Views:
    350
    John Machin
    Jun 22, 2005
  3. softwindow
    Replies:
    4
    Views:
    272
    Jim Segrave
    May 26, 2006
  4. Thomas Ploch
    Replies:
    0
    Views:
    239
    Thomas Ploch
    Jan 10, 2007
  5. Guillaume Bog

    Shelve or pickle module

    Guillaume Bog, May 18, 2008, in forum: Python
    Replies:
    0
    Views:
    257
    Guillaume Bog
    May 18, 2008
Loading...

Share This Page