Python dictionary size/entry limit?

intelliminer · Feb 21, 2009

I wrote a script to process textual data and extract phrases from
them, storing these phrases in a dictionary. It encounters a
MemoryError when there are about 11.18M keys in the dictionary, and
the size is about 1.5GB. I tried multiple times, and the error occurs
everytime at exactly the same place (with the same number of keys in
the dict). I then split the dictionary into two using a simple
algorithm:

if str[0]<='m':
dict=dict1
else:
dict=dict2

#use dict...

And it worked fine. The total size of the two dictionaries well
exceeded 2GB yet no MemoryError occured.

I have 1GB of pysical memory and 3GB in pagefile. Is there a limit to
the size or number of entries that a single dictionary can possess? By
searching on the web I can't find a clue why this problem occurs.

intelliminer · Feb 21, 2009

[email protected] said:
[email protected] said:

I wrote a script to process textual data and extract phrases from
them, storing these phrases in a dictionary. It encounters a
MemoryError when there are about 11.18M keys in the dictionary, and
the size is about 1.5GB. I tried multiple times, and the error occurs
everytime at exactly the same place (with the same number of keys in
the dict). I then split the dictionary into two using a simple
algorithm:

Click to expand...

if str[0]<='m':
dict=dict1
else:
dict=dict2

Click to expand...

#use dict...

Click to expand...

And it worked fine. The total size of the two dictionaries well
exceeded 2GB yet no MemoryError occured.

Click to expand...

I have 1GB of pysical memory and 3GB in pagefile. Is there a limit to
the size or number of entries that a single dictionary can possess? By
searching on the web I can't find a clue why this problem occurs.

Click to expand...

From what can be deducted from the headers of your message:
X-HTTP-UserAgent: Mozilla/5.0 (Windows; U; Windows NT 5.1;...
you are using windows?
It seems either python or windows memory management somehow prevent
the use of continuous memory areas that large.
We've got such an example somewhere down the list which was similar
(iirc it was a large string in memory) which runned perfectly
with linux. You can try yourself maybe by installing ubuntu
on the same host. (If you feel fit you can even skip the install
and run it off life CD but then you need to fiddle a little to
get swap space on disk)

Regards
Tino

smime.p7s
4KViewDownload

Yes, it's winxp, I forgot to mention it.
Thanks for the reply.

Stefan Behnel · Feb 21, 2009

I wrote a script to process textual data and extract phrases from
them, storing these phrases in a dictionary. It encounters a
MemoryError when there are about 11.18M keys in the dictionary, and
the size is about 1.5GB.
[...]
I have 1GB of pysical memory and 3GB in pagefile. Is there a limit to
the size or number of entries that a single dictionary can possess? By
searching on the web I can't find a clue why this problem occurs.

Python dicts are only limited by what your OS returns as free memory.
However, when a dict grows, it needs to resize, which means that it has to
create a bigger copy of itself and redistribute the keys. For a dict that
is already 1.5GB big, this can temporarily eat a lot more memory than you
have, at least more than two times as much as the size of the dict itself.

You may be better served with one of the dbm databases that come with
Python. They live on-disk but do the usual in-memory caching. They'll
likely perform a lot better than your OS level swap file.

Stefan

Sean · Feb 21, 2009

Stefan said:
(e-mail address removed) wrote:
You may be better served with one of the dbm databases that come with
Python. They live on-disk but do the usual in-memory caching. They'll
likely perform a lot better than your OS level swap file.

Stefan

the bsddb module has the feature that access to the database uses the
same methods as python dictionaries.

Sean

intelliminer · Feb 22, 2009

[email protected] said:
[email protected] said:

I wrote a script to process textual data and extract phrases from
them, storing these phrases in a dictionary. It encounters a
MemoryError when there are about 11.18M keys in the dictionary, and
the size is about 1.5GB.
[...]
I have 1GB of pysical memory and 3GB in pagefile. Is there a limit to
the size or number of entries that a single dictionary can possess? By
searching on the web I can't find a clue why this problem occurs.

Click to expand...

Python dicts are only limited by what your OS returns as free memory.
However, when a dict grows, it needs to resize, which means that it has to
create a bigger copy of itself and redistribute the keys. For a dict that
is already 1.5GB big, this can temporarily eat a lot more memory than you
have, at least more than two times as much as the size of the dict itself..

You may be better served with one of the dbm databases that come with
Python. They live on-disk but do the usual in-memory caching. They'll
likely perform a lot better than your OS level swap file.

Stefan

Ummm, I didn't know about the dbm databases. It seems there are many
different
modules for this kind of tasks: gdbm, berkeley db, cdb, etc. I'm
needing to implement
a constant hashtable with a large number of keys, but only a small
fraction of them
will be accessed frequently, the read speed is crucial. It would be
ideal if
the implementation caches all the frequently used key/value pairs in
memory. Which
module should I use? And is there a way to specify the amount of
memory it uses for caching?
BTW, the target platform is Linux.

Thank you.

Stefan Behnel · Feb 22, 2009

Ummm, I didn't know about the dbm databases. It seems there are many
different
modules for this kind of tasks: gdbm, berkeley db, cdb, etc. I'm
needing to implement
a constant hashtable with a large number of keys, but only a small
fraction of them
will be accessed frequently, the read speed is crucial. It would be
ideal if
the implementation caches all the frequently used key/value pairs in
memory. Which
module should I use? And is there a way to specify the amount of
memory it uses for caching?
BTW, the target platform is Linux.

Their interfaces are mostly compatible to Python dicts. Just keep your code
independent at the beginning and benchmark it against all of them.

Stefan

Martin v. Löwis · Feb 22, 2009

Is there a limit to the size or number of entries that a single
dictionary can possess?

On a 32-bit system, the dictionary can have up to 2**31 slots,
meaning that the maximum number of keys is slightly smaller
(about 2**30).

As others have pointed out, Python's or Windows' memory management
might have led to fragmentation, so that no contiguous piece of
memory to hold the resized dictionary can be found.

One solution to that problem would be to use a 64-bit system.

Regards,
Martin

Stefan Behnel · Feb 25, 2009

Martin said:
On a 32-bit system, the dictionary can have up to 2**31 slots,
meaning that the maximum number of keys is slightly smaller
(about 2**30).

Which, in practice, means that the size is limited by the available memory.

Stefan

Martin v. Löwis · Feb 25, 2009

On a 32-bit system, the dictionary can have up to 2**31 slots,

Which, in practice, means that the size is limited by the available memory.

Right. Each slot takes 12 bytes, so the storage for the slots alone
would consume all available address space.

From that point of view, you can't possibly have more than 314M slots
in a 32-bit address space (roughly 2**28).

Regards,
Martin

vorn1 · Feb 27, 2019

intelliminer said:
I wrote a script to process textual data and extract phrases from
them, storing these phrases in a dictionary. It encounters a
MemoryError when there are about 11.18M keys in the dictionary, and
the size is about 1.5GB. I tried multiple times, and the error occurs
everytime at exactly the same place (with the same number of keys in
the dict). I then split the dictionary into two using a simple
algorithm:

if str[0]<='m':
dict=dict1
else:
dict=dict2

#use dict...

And it worked fine. The total size of the two dictionaries well
exceeded 2GB yet no MemoryError occured.

I have 1GB of pysical memory and 3GB in pagefile. Is there a limit to
the size or number of entries that a single dictionary can possess? By
searching on the web I can't find a clue why this problem occurs.

I didn't know about the dbm databases. It seems there are many
different
modules for this kind of tasks: gdbm, berkeley db, cdb, etc. I'm
needing to implement
a constant hashtable with a large number of keys, but only a small
fraction of them
will be accessed frequently, the read speed is crucial. It would be
ideal if
the implementation caches all the frequently used key/value pairs in
memory. Which
module should I use? And is there a way to specify the amount of
memory it uses for caching?
BTW, the target platform is Linux.

Thank you.

Link Dictionary	8	Jun 11, 2007
Python bug with dictionary	1	Jul 21, 2003
HoA bug? Size limit?	7	Nov 17, 2006
"RuntimeError: dictionary changed size during iteration" ; Good atomiccopy operations?	29	Mar 11, 2006
Memory Problems in Windows 2003 Server	7	Oct 12, 2007
python multiprocessing proxy	0	Oct 5, 2009
Storable.pm size limit?	3	Oct 6, 2004
An associative container	21	Nov 17, 2009

Python dictionary size/entry limit?

intelliminer

intelliminer

Stefan Behnel

Sean

intelliminer

Stefan Behnel

Martin v. Löwis

Stefan Behnel

Martin v. Löwis

vorn1

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads