how to build a dict including a large number of data

W

wanzathe

hi everyone
i'm a newbie to python :)
i have a binary file named test.dat including 9600000 records.
the record format is int a + int b + int c + int d
i want to build a dict like this: key=int a,int b values=int c,int d
i choose using bsddb and it takes about 140 seconds to build the dict.
what can i do if i want to make my program run faster?
or is there another way i can choose?
Thanks in advance.

My Code:
-----------------------------------------------------------------------------------
my_file = file('test.dat','rb')
content = my_file.read()
record_number = len(content) / 16

db = bsddb.btopen('test.dat.db','n',cachesize=500000000)
for i in range(0,record_number):
a = struct.unpack("IIII",content[i*16:i*16+16])
db['%d_%d' % (a[0],a[1])] = '%d_%d' % (a[2],a[3])

db.close()
my_file.close()
 
C

Chris

hi everyone
i'm a newbie to python :)
i have a binary file named test.dat including 9600000 records.
the record format is int a + int b + int c + int d
i want to build a dict like this: key=int a,int b values=int c,int d
i choose using bsddb and it takes about 140 seconds to build the dict.
what can i do if i want to make my program run faster?
or is there another way i can choose?
Thanks in advance.

My Code:
-----------------------------------------------------------------------------------
my_file = file('test.dat','rb')
content = my_file.read()
record_number = len(content) / 16

db = bsddb.btopen('test.dat.db','n',cachesize=500000000)
for i in range(0,record_number):
a = struct.unpack("IIII",content[i*16:i*16+16])
db['%d_%d' % (a[0],a[1])] = '%d_%d' % (a[2],a[3])

db.close()
my_file.close()

my_file = file('test.dat','rb')
db = bsddb.btopen('test.dat.db','n',cachesize=500000000)
content = myfile.read(16)
while content:
a = struct.unpack('IIII',content)
db['%d_%d' % (a[0],a[1])] = '%d_%d' % (a[2],a[3])
content = myfile.read(16)

db.close()
my_file.close()

That would be more memory efficient, as for speed you would need to
time it on your side.
 
F

Fredrik Lundh

wanzathe said:
i have a binary file named test.dat including 9600000 records.
the record format is int a + int b + int c + int d
i want to build a dict like this: key=int a,int b values=int c,int d
i choose using bsddb and it takes about 140 seconds to build the dict.

you're not building a dict, you're populating a persistent database.
storing ~70000 records per second isn't that bad, really...
what can i do if i want to make my program run faster?
or is there another way i can choose?

why not just use a real Python dictionary, and the marshal module for
serialization?

</F>
 
W

wanzathe

you're not building a dict, you're populating a persistent database.
storing ~70000 records per second isn't that bad, really...


why not just use a real Python dictionary, and the marshal module for
serialization?

</F>

hi,Fredrik Lundn
you are right, i'm populating a persistent database.
i plan to use a real Python dictionary and use cPickle for
serialization at first, but it did not work because the number of
records is too large.
Thanks
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,774
Messages
2,569,598
Members
45,151
Latest member
JaclynMarl
Top