R
Robert Feldt
Hi,
I need a simple way to have a potentially very large hash mapping
Strings to Ruby objects. It needs to scale to on the order of 1 million
entries with up to 1kb of data per entry (even though it might end up
with maybe 2-300 bytes per entry for starters). Access time is not
critical but low-to-medium mem usage is. Deletion of (and insert of new)
keys/values will be fairly frequent so should be supported.
I've been thinking of doing it with some simple files/dir structure and
a "front end" object which hides the details (of marshal/load of the
objects and mapping of keys to the marshaled data etc) but maybe there
is something simpler that might be of use?
SDBM does not seem to cut it; a simple test with insertion of random
strings grows non-linearly in the file size and thus do not scale. Could
sqlite or something similar handle this better?
However, I would rather go for something simple than having to bundle a
"full" db.
If it simplifies you can assume the keys and/or values are of fixed
sizes since that is very likely but I'm not sure it will really be
possible to keep that invariant so more flexbile solutions might be
needed. Solution ideas which support caching is nice but not required
(mostly since I can add that later if the need is there).
Any ideas, experience, code or pointers that can help me design/choose
this is of interest.
Thanks,
Robert
I need a simple way to have a potentially very large hash mapping
Strings to Ruby objects. It needs to scale to on the order of 1 million
entries with up to 1kb of data per entry (even though it might end up
with maybe 2-300 bytes per entry for starters). Access time is not
critical but low-to-medium mem usage is. Deletion of (and insert of new)
keys/values will be fairly frequent so should be supported.
I've been thinking of doing it with some simple files/dir structure and
a "front end" object which hides the details (of marshal/load of the
objects and mapping of keys to the marshaled data etc) but maybe there
is something simpler that might be of use?
SDBM does not seem to cut it; a simple test with insertion of random
strings grows non-linearly in the file size and thus do not scale. Could
sqlite or something similar handle this better?
However, I would rather go for something simple than having to bundle a
"full" db.
If it simplifies you can assume the keys and/or values are of fixed
sizes since that is very likely but I'm not sure it will really be
possible to keep that invariant so more flexbile solutions might be
needed. Solution ideas which support caching is nice but not required
(mostly since I can add that later if the need is there).
Any ideas, experience, code or pointers that can help me design/choose
this is of interest.
Thanks,
Robert