Memory limit to dict?

P

Peter Beattie

I was wondering whether certain data structures in Python, e.g. dict,
might have limits as to the amount of memory they're allowed to take up.
Is there any documentation on that?

Why am I asking? I'm reading 3.6 GB worth of BLAST output files into a
nested dictionary structure (dict within dict ...). Looks something like
this:

{ GenomeID:
{ ProteinID:
{ GenomeID:
{ ProteinID, Score, PercentValue, EValue } } } }

Now, the thing is: Even on a machine with 16 GB RAM, the program
terminates with a MemoryError, obviously long before the machine's RAM
is used up.

I've no idea how far the Windows task manager's resource monitor can be
trusted -- probably not as far as I could throw a heavy-set St Bernard
--, but it seems to stop roughly when that monitor records a swap file
size of 2.2 GB.

Barring any revamping of the code itself, which I will have to do
anyway, is there anything so far that would indicate a problem inherent
to Python?

(I can post the relevant code too, of course, if that would help.)

TIA!
 
B

Burton Samograd

I've no idea how far the Windows task manager's resource monitor can be
trusted -- probably not as far as I could throw a heavy-set St Bernard
--, but it seems to stop roughly when that monitor records a swap file
size of 2.2 GB.

Not being a windows expert at all, but I would assume with 32 bit
windows each process in the system can have an address space of ~2
gigs. In linux the process address space is split in half, bottom 2
gigs for OS mappings, top for the process, so it looks like you might
just be hitting the maximum allowed address space mapping.

You should partition your data into hierarchial modules and let python
do the swapping for you...although you have 16 gigs (I have to put a
holy crap after that!) you will always run into process limits, at
least until true 64 bit os's are in vouge.
 
F

Felipe Almeida Lessa

Em Ter, 2006-04-11 às 19:45 +0200, Peter Beattie escreveu:
I was wondering whether certain data structures in Python, e.g. dict,
might have limits as to the amount of memory they're allowed to take up.
Is there any documentation on that?

Why am I asking? I'm reading 3.6 GB worth of BLAST output files into a
nested dictionary structure (dict within dict ...). Looks something like
this:

{ GenomeID:
{ ProteinID:
{ GenomeID:
{ ProteinID, Score, PercentValue, EValue } } } }

I don't have the answer to your question and I'll make a new one: isn't
the overhead (performance and memory) of creating dicts too large to be
used in this scale?

I'm just speculating, but I *think* that using lists and objects may be
better.

My 2 cents,
 
S

Steve M

An alternative is to use ZODB. For example, you could use the BTree
class for the outermost layers of the nested dict, and a regular dict
for the innermost layer. If broken up properly, you can store
apparently unlimited amount of data with reasonable performance.

Just remember not to iterate over the entire collection of objects
without aborting the transaction regularly.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,770
Messages
2,569,583
Members
45,073
Latest member
DarinCeden

Latest Threads

Top