Berkely Db. How to iterate over large number of keys "quickly"

lazy · Aug 2, 2007

I have a berkely db and Im using the bsddb module to access it. The Db
is quite huge (anywhere from 2-30GB). I want to iterate over the keys
serially.
I tried using something basic like

for key in db.keys()

but this takes lot of time. I guess Python is trying to get the list
of all keys first and probbaly keep it in memory. Is there a way to
avoid this, since I just want to access keys serially. I mean is there
a way I can tell Python to not load all keys, but try to access it as
the loop progresses(like in a linked list). I could find any accessor
methonds on bsddb to this with my initial search.
I am guessing BTree might be a good choice here, but since while the
Dbs were written it was opened using hashopen, Im not able to use
btopen when I want to iterate over the db.

lazy · Aug 2, 2007

Sorry, Just a small correction,

a way I can tell Python to not load allkeys, but try to access it as
the loop progresses(like in a linked list). I could find any accessor
methonds on bsddb to this with my initial search.

I meant, "I couldn't find any accessor methonds on bsddb to do
this(i.e accesing like in a linked list) with my initial search"

marduk · Aug 2, 2007

I have a berkely db and Im using the bsddb module to access it. The Db
is quite huge (anywhere from 2-30GB). I want to iterate over the keys
serially.
I tried using something basic like

for key in db.keys()

but this takes lot of time. I guess Python is trying to get the list
of all keys first and probbaly keep it in memory. Is there a way to
avoid this, since I just want to access keys serially. I mean is there
a way I can tell Python to not load all keys, but try to access it as
the loop progresses(like in a linked list). I could find any accessor
methonds on bsddb to this with my initial search.
I am guessing BTree might be a good choice here, but since while the
Dbs were written it was opened using hashopen, Im not able to use
btopen when I want to iterate over the db.

try instead

key = db.firstkey()
while key != None:
# do something with db[key]
key = db.nextkey(key)

Christoph Haas · Aug 2, 2007

I have a berkely db and Im using the bsddb module to access it. The Db
is quite huge (anywhere from 2-30GB). I want to iterate over the keys
serially.
I tried using something basic like

for key in db.keys()

but this takes lot of time. I guess Python is trying to get the list
of all keys first and probbaly keep it in memory. Is there a way to
avoid this, since I just want to access keys serially.

Does db.iterkeys() work better?

Christoph

Ian Clark · Aug 2, 2007

lazy said:
I have a berkely db and Im using the bsddb module to access it. The Db
is quite huge (anywhere from 2-30GB). I want to iterate over the keys
serially.
I tried using something basic like

for key in db.keys()

but this takes lot of time. I guess Python is trying to get the list
of all keys first and probbaly keep it in memory. Is there a way to
avoid this, since I just want to access keys serially. I mean is there
a way I can tell Python to not load all keys, but try to access it as
the loop progresses(like in a linked list). I could find any accessor
methonds on bsddb to this with my initial search.
I am guessing BTree might be a good choice here, but since while the
Dbs were written it was opened using hashopen, Im not able to use
btopen when I want to iterate over the db.

db.iterkeys()

Looking at the doc for bsddb objects[1] it mentions that "Once
instantiated, hash, btree and record objects support the same methods as
dictionaries." Then looking at the dict documentation[2] you'll find the
dict.iterkeys() method that should do what you're asking.

Ian

[1] http://docs.python.org/lib/bsddb-objects.html
[2] http://docs.python.org/lib/typesmapping.html

lazy · Aug 3, 2007

lazy said:
lazy said:

I have a berkely db and Im using the bsddb module to access it. The Db
is quite huge (anywhere from 2-30GB). I want to iterate over the keys
serially.
I tried using something basic like

Click to expand...

for key in db.keys()

Click to expand...

but this takes lot of time. I guess Python is trying to get the list
of all keys first and probbaly keep it in memory. Is there a way to
avoid this, since I just want to access keys serially. I mean is there
a way I can tell Python to not load all keys, but try to access it as
the loop progresses(like in a linked list). I could find any accessor
methonds on bsddb to this with my initial search.
I am guessing BTree might be a good choice here, but since while the
Dbs were written it was opened using hashopen, Im not able to use
btopen when I want to iterate over the db.

Click to expand...

db.iterkeys()

Looking at the doc for bsddb objects[1] it mentions that "Once
instantiated, hash, btree and record objects support the same methods as
dictionaries." Then looking at the dict documentation[2] you'll find the
dict.iterkeys() method that should do what you're asking.

Ian

[1]http://docs.python.org/lib/bsddb-objects.html
[2]http://docs.python.org/lib/typesmapping.html

Thanks. I tried using db.first and then db.next for subsequent keys.
seems to be faster. Thanks for the pointers

How to quickly search over a large number of files using python?	3	Sep 25, 2013
Better way to iterate over indices?	4	Jun 21, 2011
how to iterate over several lists?	6	Jun 5, 2009
How to "gunzip-iterate" over a file?	3	Jul 29, 2009
Iterate through a list of tuples for processing	0	Sep 20, 2013
I want to code whatsapp phone number validator. The script will	1	Feb 20, 2023
How to use Densenet121 in monai	0	Feb 16, 2024
How to create n number of threads	1	Jun 28, 2011

Berkely Db. How to iterate over large number of keys "quickly"

lazy

lazy

marduk

Christoph Haas

Ian Clark

lazy

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads