Replacement for the shelve module?

F

Forafo San

Folks,
What might be a good replacement for the shelve module, but one that
can handle a few gigs of data. I'm doing some calculations on daily
stock prices and the result is a nested list like:

[[date_1, floating result 1],
[date_2, floating result 2],
....
[date_n, floating result n]]

However, there are about 5,000 lists like that, one for each stock
symbol. Using the shelve module I could easily save them to a file
( myshelvefile['symbol_1') = symbol_1_list) and likewise retrieve the
data. But shelve is deprecated AND when a lot of data is written
shelve was acting weird (refusing to write, filesizes reported with an
"ls" did not make sense, etc.).

Thanks in advance for your suggestions.
 
K

Ken Watford

Folks,
What might be a good replacement for the shelve module, but one that
can handle a few gigs of data. I'm doing some calculations on daily
stock prices and the result is a nested list like:

For what you're doing, I would give PyTables a try.
 
T

Thomas Jollans

Folks,
What might be a good replacement for the shelve module, but one that
can handle a few gigs of data. I'm doing some calculations on daily
stock prices and the result is a nested list like:

[[date_1, floating result 1],
[date_2, floating result 2],
...
[date_n, floating result n]]

However, there are about 5,000 lists like that, one for each stock
symbol. Using the shelve module I could easily save them to a file
( myshelvefile['symbol_1') = symbol_1_list) and likewise retrieve the
data. But shelve is deprecated AND when a lot of data is written
shelve was acting weird (refusing to write, filesizes reported with an
"ls" did not make sense, etc.).

Thanks in advance for your suggestions.

Firstly, since when is shelve deprecated? Shouldn't there be a
deprecation warning on http://docs.python.org/dev/library/shelve.html ?

If you want to keep your current approach of having an object containing
all the data for each symbol, you will have to think about how to
serialise the data, as well as how to store the documents/objects
individually. For the serialisation, you can use pickle (as shelve does)
or JSON (probably better because it's easier to edit directly, and
therefore easier to debug).
To store these documents, you could use a huge pickle'd Python
dictionary (bad idea), a UNIX database (dbm module, anydbm in Python2;
this is what shelve uses), or simple the file system: one file per
serialised object.

Looking at your use case, however, I think what you really should use is
a SQL database. SQLite is part of Python and will do the job nicely.
Just use a single table with three columns: symbol, date, value.

Thomas
 
F

Forafo San

Folks,
What might be a good replacement for the shelve module, but one that
can handle a few gigs of data. I'm doing some calculations on daily
stock prices and the result is a nested list like:
[[date_1, floating result 1],
 [date_2, floating result 2],
...
 [date_n, floating result n]]
However, there are about 5,000 lists like that, one for each stock
symbol. Using the shelve module I could easily save them to a file
( myshelvefile['symbol_1') = symbol_1_list) and likewise retrieve the
data. But shelve is deprecated AND when a lot of data is written
shelve was acting weird (refusing to write, filesizes reported with an
"ls" did not make sense, etc.).
Thanks in advance for your suggestions.

Firstly, since when is shelve deprecated? Shouldn't there be a
deprecation warning onhttp://docs.python.org/dev/library/shelve.html?

If you want to keep your current approach of having an object containing
all the data for each symbol, you will have to think about how to
serialise the data, as well as how to store the documents/objects
individually. For the serialisation, you can use pickle (as shelve does)
or JSON (probably better because it's easier to edit directly, and
therefore easier to debug).
To store these documents, you could use a huge pickle'd Python
dictionary (bad idea), a UNIX database (dbm module, anydbm in Python2;
this is what shelve uses), or simple the file system: one file per
serialised object.

Looking at your use case, however, I think what you really should use is
a SQL database. SQLite is part of Python and will do the job nicely.
Just use a single table with three columns: symbol, date, value.

Thomas

Sorry. There is no indication that shelve is deprecated. I was using
it on a FreeBSD system and it turns out that the bsddb module is
deprecated and confused it with the shelve module.

Thanks Ken and Thomas for your suggestions -- I will play around with
both and pick one.
 
M

Miki Tebeka

You might check one of many binary encoders (like Avro, Thrift ...).
The other option is to use a database, sqlite3 is pretty fast (if you schema is fixed). Otherwise you can look at some NoSQL ones (like MongoDB).
 
R

Robert Kern

For what you're doing, I would give PyTables a try.

For a few gigs of stock price data, this is what I use. Much better than SQLite
for that amount of data.

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco
 
S

Steven D'Aprano

Forafo said:
Folks,
What might be a good replacement for the shelve module, but one that
can handle a few gigs of data. I'm doing some calculations on daily
stock prices and the result is a nested list like:

[[date_1, floating result 1],
[date_2, floating result 2],
...
[date_n, floating result n]]

However, there are about 5,000 lists like that, one for each stock
symbol.


You might save some memory by using tuples rather than lists:
sys.getsizeof(["01/01/2000", 123.456]) # On a 32-bit system. 40
sys.getsizeof(("01/01/2000", 123.456))
32


By the way, you know that you should never, ever use floats for currency,
right?

http://vladzloteanu.wordpress.com/2...ting-point-issues-explained-for-ruby-and-ror/
http://stackoverflow.com/questions/3730019/why-not-use-double-or-float-to-represent-currency

Using the shelve module I could easily save them to a file
( myshelvefile['symbol_1') = symbol_1_list) and likewise retrieve the
data. But shelve is deprecated

It certainly is not.

http://docs.python.org/library/shelve.html
http://docs.python.org/py3k/library/shelve.html

Not a word about it being deprecated in either Python 2.x or 3.x.

AND when a lot of data is written
shelve was acting weird (refusing to write, filesizes reported with an
"ls" did not make sense, etc.).

I would like to see this replicated. If it is true, that's a bug in shelve,
but I expect you're probably doing something wrong.
 
R

Robert Kern

By the way, you know that you should never, ever use floats for currency,
right?

That's just incorrect. You shouldn't use (binary) floats for many *accounting*
purposes, but for many financial/econometric analyses, floats are de rigeur and
work much better than decimals (either floating or fixed point). If you are
collecting gigs of stock prices, you are much more likely to be doing the latter
than the former.

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco
 
S

Steven D'Aprano

Robert said:
That's just incorrect. You shouldn't use (binary) floats for many
*accounting* purposes, but for many financial/econometric analyses, floats
are de rigeur and work much better than decimals (either floating or fixed
point). If you are collecting gigs of stock prices, you are much more
likely to be doing the latter than the former.


That makes sense, and I stand corrected.
 
G

Gregory Ewing

There's a certain accounting package I work with that *does*
use floats -- binary ones -- for accounting purposes, and
somehow manages to get away with it. Not something I would
recommend trying at home, though.
 
C

Chris Angelico

There's a certain accounting package I work with that *does*
use floats -- binary ones -- for accounting purposes, and
somehow manages to get away with it. Not something I would
recommend trying at home, though.

Probably quite a few, actually. It's not a very visible problem so
long as you always have plenty of "spare precision", and you round
everything off to two decimals (or however many for your currency).
Eventually you'll start seeing weird results that are a cent off, but
you won't notice them often. And hey. You store $1.23 as 1.23, and it
just works! It must be the right thing to do!

Me, I store dollars-and-cents currency in cents. Always. But that's
because I never need fractional cents. I'm not sure what the best way
to handle fractional cents is, but I'm fairly confident that this
isn't it:

http://thedailywtf.com/Articles/Price-in-Nonsense.aspx

ChrisA
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,766
Messages
2,569,569
Members
45,042
Latest member
icassiem

Latest Threads

Top