Sanity check on use of dictionaries

D

dopey483

I am manipulating lots of log files (about 500,000 files and about 30Gb
in total) to get them into a little SQL db. Part of this process is
"normalisation" and creating tables of common data. I am creating
dictionaries for these in a simple {value,key} form.

In terms of memory and performance what are the reasonable limits for a
dictionary with a key and a 16 character string? eg; if I read in one
of my tables from disk into a dictionary, what sizing is comfortable?
100,000 entries? 1,000,000 entries? Lookup times and memory
requirements are my main worries.

(Running Python 2.3.4 on RH Ent, dual-Xeon with 2GB memory)
 
F

Fredrik Lundh

I am manipulating lots of log files (about 500,000 files and about 30Gb
in total) to get them into a little SQL db. Part of this process is
"normalisation" and creating tables of common data. I am creating
dictionaries for these in a simple {value,key} form.

In terms of memory and performance what are the reasonable limits for a
dictionary with a key and a 16 character string? eg; if I read in one
of my tables from disk into a dictionary, what sizing is comfortable?
100,000 entries? 1,000,000 entries? Lookup times and memory
requirements are my main worries.

you don't specify what a "key" is, but the following piece of code took
less than a minute to write, ran in roughly two seconds on my machine,
and results in a CPython process that uses about 80 megabytes of memory.
.... k = str(i).zfill(16)
.... d[k] = k
....'0000000000999999'

since dictionaries use hash tables, the lookup time is usually
independent of the dictionary size. also see:

http://www.effbot.org/pyfaq/how-are-dictionaries-implemented.htm

</F>
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,479
Members
44,899
Latest member
RodneyMcAu

Latest Threads

Top