Best dbm to use?

G

Guest

I'm creating an persistant index of a large 63GB file
containing millions of peices of data. For this I would
naturally use one of python's dbm modules. But which is the
best to use?

The index would be created with something like this:
fh=open('file_to_index')
db=dbhash.open('file_to_index.idx')
for obj in fh:
db[obj.name]=fh.tell()

The index should serve two purposes. Random access and
sequential stepped access. Random access could be dealt with
by the hash table ability for example:
fh.seek(db[name])
obj=fh.GetObj()

However, I may want to access the i'th element in the file.
Something like this:
fh.seek(db.GetElement(i))
obj=fh.GetObj()

This is where the hash table breaks down and a b-tree would
serve my purpose better. Is there a unified data structure
that I could use or am I doomed to maintaining two seperate
index's?

Thanks in advance for any help.

-Brian
 
I

Ivan Voras

I'm creating an persistant index of a large 63GB file
containing millions of peices of data. For this I would
naturally use one of python's dbm modules. But which is the
best to use?

BDB4, but consider using sqlite - it's really simple, holds all data in
a single file and it's more supported (in the sense: there are bindings
for sqlite for almost any language/environment out there, and the file
format is stable). It's also very fast and you can later add more
information you want to store (by adding more fields to table).
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,754
Messages
2,569,527
Members
44,998
Latest member
MarissaEub

Latest Threads

Top