dict would be very slow for big data

F

forrest yang

hi
i am trying to insert a lot of data into a dict, which may be
10,000,000 level.
after inserting 100000 unit, the insert rate become very slow, 50,000/
s, and the entire time used for this task would be very long,also.
would anyone know some solution for this case?

thanks
 
S

Steve Howell

hi
i am trying to insert a lot of data into a dict, which may be
10,000,000 level.
after inserting 100000 unit, the insert rate become very slow, 50,000/
s, and the entire time used for this task would be very long,also.
would anyone know some solution for this case?

Are you running out of memory? What are your keys? Are you able to
gather any more specific data about the slowdown--do all operations
slow down equally or are there spurts of slowness?
 
S

Steven D'Aprano

hi
i am trying to insert a lot of data into a dict, which may be 10,000,000
level.
after inserting 100000 unit, the insert rate become very slow, 50,000/
s, and the entire time used for this task would be very long,also. would
anyone know some solution for this case?

You don't give us enough information to answer.


How are you generating the data?

What are the keys and the values?

Are you sure it is slow to insert into the dict, or could some other part
of your processing be slow?

Does performance change if you turn garbage collection off?

import gc
gc.disable()
# insert your items
gc.enable()


Can you show us a sample of the data, and the code you are using to
insert it into the dict?

Do you have enough memory? If you run an external process like top, can
you see memory exhaustion, CPU load or some other problem?
 
B

Bruno Desthuilliers

forrest yang a écrit :
hi
i am trying to insert a lot of data into a dict, which may be
10,000,000 level.
after inserting 100000 unit, the insert rate become very slow, 50,000/
s, and the entire time used for this task would be very long,also.
would anyone know some solution for this case?

Hint : computer's RAM is not an infinite resource.
 
T

Tim Chase

i am trying to insert a lot of data into a dict, which may be
10,000,000 level.
after inserting 100000 unit, the insert rate become very slow, 50,000/
s, and the entire time used for this task would be very long,also.
would anyone know some solution for this case?

As others have mentioned, you've likely run out of RAM and the
slowness you feel is your OS swapping your process to disk.

If you need fast dict-like access to your data, I'd recommend
shifting to a database -- perhaps the stock "anydbm" module[1].
The only catch is that it only supports strings as keys/values.
But Python makes it fairly easy to marshal objects in/out of
strings. Alternatively, you could use the built-in (as of
Python2.5) sqlite3 module to preserve your datatypes and query
your dataset with the power of SQL.

-tkc


[1]
http://docs.python.org/library/anydbm.html
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,483
Members
44,902
Latest member
Elena68X5

Latest Threads

Top