Looking for a Fast Persistent Store

Discussion in 'Ruby' started by Bob Hutchison, Aug 9, 2006.

  1. Hi,

    I'm looking for a persistent store, where:
    * I can use it from Ruby
    * it is fast
    * it is transactional
    * it can update more than one store in a single transaction
    * simple String -> String mappings in the store are sufficient
    * ideally uses files in the filesystem

    These kinds of stores are normally implemented as persistent hashes
    or BTrees (or both). I know that Sleepycat's Berkeley DB <http://
    www.sleepycat.com/> does this, and I've used Java based systems that
    do this. I also know of some C based things but they don't have Ruby
    wrappers. I can't find anything in the Ruby world, and I don't care
    too much if it isn't pure Ruby.

    I know about Purple <http://purple.rubyforge.org/> and the QDMB Ruby
    wrapper <http://qdbm.sourceforge.net/>. Neither do the multiple hash/
    BTree in a transaction thing.

    The trick appears to be with transactions.

    I know this can be done easily enough in a Relational DB, but I know
    that, for example, JDBC/MySQL combination is significantly slower at
    what I need to do than Perst (a pure Java thing that's startlingly
    fast). I've taken a look at sqlite, which satisfies all requirements
    but I can't shake this feeling that things can be better.

    Does anyone have any ideas?

    Thanks,
    Bob

    ----
    Bob Hutchison -- blogs at <http://www.recursive.ca/
    hutch/>
    Recursive Design Inc. -- <http://www.recursive.ca/>
    Raconteur -- <http://www.raconteur.info/>
    xampl for Ruby -- <http://rubyforge.org/projects/xampl/>
    Bob Hutchison, Aug 9, 2006
    #1
    1. Advertising

  2. On 8/9/06, Bob Hutchison <> wrote:
    > Hi,
    >
    > I'm looking for a persistent store, where:
    > * I can use it from Ruby
    > * it is fast
    > * it is transactional
    > * it can update more than one store in a single transaction
    > * simple String -> String mappings in the store are sufficient
    > * ideally uses files in the filesystem
    >

    ...
    >
    > Does anyone have any ideas?
    >
    > Thanks,
    > Bob
    >


    Hi Bob,

    If you're the fun and adventurous type you might want to look into
    Mongoose. ( http://rubyforge.org/projects/mongoose/ ) It's new, and
    claims to be fast. In the past I wrote some small apps that used the
    same authors other system called KirbyBase which was very nice, I
    haven't tried Mongoose yet, but I'm happily awaiting a problem with
    which to use it on. Though now that I'm thinking about it I don't
    know if it has built-in "transaction" support. I suppose it depends on
    what you want it to do.

    Have you tried SQLite and found it to be too slow? It sounds to me
    like you might be prematurely optimizing. We have several medium to
    medium-large sized apps that make heavy use of SQLite and speed has
    been not a problem at all.

    What domain are you working in that requires this kind of speed
    anyway? Just curious.

    Hope that helps,
    -Harold
    Harold Hausman, Aug 9, 2006
    #2
    1. Advertising

  3. On 09.08.2006 15:30, Bob Hutchison wrote:
    > Hi,
    >
    > I'm looking for a persistent store, where:
    > * I can use it from Ruby
    > * it is fast
    > * it is transactional
    > * it can update more than one store in a single transaction
    > * simple String -> String mappings in the store are sufficient
    > * ideally uses files in the filesystem


    What about PStore?

    robert
    Robert Klemme, Aug 9, 2006
    #3
  4. Bob Hutchison

    Guest

    On Wed, 9 Aug 2006, Bob Hutchison wrote:

    > Hi,
    >
    > I'm looking for a persistent store, where:
    > * I can use it from Ruby
    > * it is fast
    > * it is transactional
    > * it can update more than one store in a single transaction
    > * simple String -> String mappings in the store are sufficient
    > * ideally uses files in the filesystem
    >
    > These kinds of stores are normally implemented as persistent hashes or
    > BTrees (or both). I know that Sleepycat's Berkeley DB
    > <http://www.sleepycat.com/> does this, and I've used Java based systems that
    > do this. I also know of some C based things but they don't have Ruby
    > wrappers. I can't find anything in the Ruby world, and I don't care too much
    > if it isn't pure Ruby.
    >
    > I know about Purple <http://purple.rubyforge.org/> and the QDMB Ruby wrapper
    > <http://qdbm.sourceforge.net/>. Neither do the multiple hash/BTree in a
    > transaction thing.
    >
    > The trick appears to be with transactions.


    check out joel's fsdb - it's very nice if you want to go pure ruby. i've used
    it on several projects.

    > I know this can be done easily enough in a Relational DB, but I know that,
    > for example, JDBC/MySQL combination is significantly slower at what I need
    > to do than Perst (a pure Java thing that's startlingly fast). I've taken a
    > look at sqlite, which satisfies all requirements but I can't shake this
    > feeling that things can be better.
    >
    > Does anyone have any ideas?


    sqlite is hard to beat - i use it in at least 20 production systems and have
    never had a single error. ruby queue uses it under the hood for the cluster
    job store and this is used over nfs - we've been through disk failures and
    power failures and come through unscathed.

    it's also very fast, esp if you use in memory tables, but on linux it's just
    as easy to copy the db to /dev/shm and then go...

    if you end up using it try with my arrayfields package - james' sqlite binding
    detects it automatically and the tuples with come back as arrays with named
    field access - it's faster and requires less memory than a hash, and is also
    really convenient to code with.

    cheers.

    -a
    --
    to foster inner awareness, introspection, and reasoning is more efficient than
    meditation and prayer.
    - h.h. the 14th dali lama
    , Aug 9, 2006
    #4
  5. Harold Hausman wrote:
    > Have you tried SQLite and found it to be too slow? It sounds to me
    > like you might be prematurely optimizing. We have several medium to
    > medium-large sized apps that make heavy use of SQLite and speed has
    > been not a problem at all.

    On Linux, at any rate, one ought to be able to make SQLite extremely
    fast by adding RAM and tuning the I/O subsystem and the kernel, and
    using tricks like memory mapped files, assuming the database in question
    isn't too large. I'm assuming it's small, otherwise he'd pretty much
    need a humongous database server and an industrial strength database
    like Oracle or PostgreSQL.
    M. Edward (Ed) Borasky, Aug 9, 2006
    #5
  6. On Aug 9, 2006, at 9:56 AM, Harold Hausman wrote:
    > Hi Bob,
    >
    > If you're the fun and adventurous type you might want to look into
    > Mongoose. ( http://rubyforge.org/projects/mongoose/ ) It's new, and
    > claims to be fast. In the past I wrote some small apps that used the
    > same authors other system called KirbyBase which was very nice, I
    > haven't tried Mongoose yet, but I'm happily awaiting a problem with
    > which to use it on. Though now that I'm thinking about it I don't
    > know if it has built-in "transaction" support. I suppose it depends on
    > what you want it to do.


    I have had a look at Mongoose, and quite like it. I'm sure to find a
    use for it. However, for this particular situation Mongoose's
    transactions are too week.

    >
    > Have you tried SQLite and found it to be too slow? It sounds to me
    > like you might be prematurely optimizing. We have several medium to
    > medium-large sized apps that make heavy use of SQLite and speed has
    > been not a problem at all.


    In the Java world things like Perst and JDBM were very fast. SQLite
    was problematic for some reason that is a bit foggy just now (I'll
    think of it).

    >
    > What domain are you working in that requires this kind of speed
    > anyway? Just curious.


    I write web based applications. The situations I'm thinking of are
    either financial or CMS-like applications. In the financial cases
    I've got situations where there might be a few million (probably not
    more than ten million, well maybe 20 million) small things tucked
    away. In the case of CMS-like stuff there are far fewer things, lets
    say 25k at most and usually more like a thousand or so, but they are
    bigger things.

    The key factor (pardon the pun) is that the *vast* majority of the
    time (and I know how to make it 100% of the time) queries are as
    you'd have in a hash table (i.e. a single key where the key is
    usually a String).

    This is for the persistent store associated with xampl (see URL in my
    signature). Xampl already has a few persistent stores but I now need
    one a little fancier.


    >
    > Hope that helps,
    > -Harold
    >


    Thanks Harold.

    Cheers,
    Bob

    ----
    Bob Hutchison -- blogs at <http://www.recursive.ca/
    hutch/>
    Recursive Design Inc. -- <http://www.recursive.ca/>
    Raconteur -- <http://www.raconteur.info/>
    xampl for Ruby -- <http://rubyforge.org/projects/xampl/>
    Bob Hutchison, Aug 9, 2006
    #6
  7. On Aug 9, 2006, at 10:05 AM, Robert Klemme wrote:

    > On 09.08.2006 15:30, Bob Hutchison wrote:
    >> Hi,
    >> I'm looking for a persistent store, where:
    >> * I can use it from Ruby
    >> * it is fast
    >> * it is transactional
    >> * it can update more than one store in a single transaction
    >> * simple String -> String mappings in the store are sufficient
    >> * ideally uses files in the filesystem

    >
    > What about PStore?


    Oh. Right. PStore never registered with me before.

    I just had a look at it and it seems to be along the lines of QDBM
    and Purple, transactional on a single store and each store being one
    hash/tree.

    Thanks for pointing that out.

    Cheers,
    Bob

    >
    > robert
    >


    ----
    Bob Hutchison -- blogs at <http://www.recursive.ca/
    hutch/>
    Recursive Design Inc. -- <http://www.recursive.ca/>
    Raconteur -- <http://www.raconteur.info/>
    xampl for Ruby -- <http://rubyforge.org/projects/xampl/>
    Bob Hutchison, Aug 9, 2006
    #7
  8. Hi Ara,

    On Aug 9, 2006, at 10:08 AM, wrote:

    > On Wed, 9 Aug 2006, Bob Hutchison wrote:
    >
    >> Hi,
    >>
    >> I'm looking for a persistent store, where:
    >> * I can use it from Ruby
    >> * it is fast
    >> * it is transactional
    >> * it can update more than one store in a single transaction
    >> * simple String -> String mappings in the store are sufficient
    >> * ideally uses files in the filesystem
    >>
    >> These kinds of stores are normally implemented as persistent
    >> hashes or
    >> BTrees (or both). I know that Sleepycat's Berkeley DB
    >> <http://www.sleepycat.com/> does this, and I've used Java based
    >> systems that
    >> do this. I also know of some C based things but they don't have Ruby
    >> wrappers. I can't find anything in the Ruby world, and I don't
    >> care too much
    >> if it isn't pure Ruby.
    >>
    >> I know about Purple <http://purple.rubyforge.org/> and the QDMB
    >> Ruby wrapper <http://qdbm.sourceforge.net/>. Neither do the
    >> multiple hash/BTree in a transaction thing.
    >>
    >> The trick appears to be with transactions.

    >
    > check out joel's fsdb - it's very nice if you want to go pure
    > ruby. i've used
    > it on several projects.


    I'm already using that (version 0.5 -- I can't get to RAA right now
    for some reason and there are no files on Rubyforge for fsdb so I
    don't know if there is a more recent version). In version 0.5 the
    transactions were not sufficient I think (it would be nice if I was
    wrong).

    >
    >> I know this can be done easily enough in a Relational DB, but I
    >> know that,
    >> for example, JDBC/MySQL combination is significantly slower at
    >> what I need
    >> to do than Perst (a pure Java thing that's startlingly fast). I've
    >> taken a
    >> look at sqlite, which satisfies all requirements but I can't shake
    >> this
    >> feeling that things can be better.
    >>
    >> Does anyone have any ideas?

    >
    > sqlite is hard to beat - i use it in at least 20 production systems
    > and have
    > never had a single error. ruby queue uses it under the hood for
    > the cluster
    > job store and this is used over nfs - we've been through disk
    > failures and
    > power failures and come through unscathed.


    That's good to hear. Don't misunderstand, I *like* SQLite but I don't
    know that it is suitable. If I can't find anything else I'll use it.

    The thing that makes me nervous is that I can do what I need with
    just two tables. One that has (key, value-kind, value), and another
    that has (index-name, index-value, key, value-kind). Each column is a
    String. The vast majority of queries would be based on key and value-
    kind on the first table. The remaining queries would be a select/join
    kind of thing. And I can very easily jam the key and value-kind into
    the same field (but that would be an optimisation that may not be
    necessary). The trouble is that the first table might get to be very
    very large. I don't know how SQLite behaves with a single huge table.
    I suppose I'm going to have to find out.

    >
    > it's also very fast, esp if you use in memory tables, but on linux
    > it's just
    > as easy to copy the db to /dev/shm and then go...


    That's interesting.

    >
    > if you end up using it try with my arrayfields package - james'
    > sqlite binding
    > detects it automatically and the tuples with come back as arrays
    > with named
    > field access - it's faster and requires less memory than a hash,
    > and is also
    > really convenient to code with.
    >


    I noticed that the other day. Nice.

    Cheers,
    Bob

    > cheers.
    >
    > -a
    > --
    > to foster inner awareness, introspection, and reasoning is more
    > efficient than
    > meditation and prayer.
    > - h.h. the 14th dali lama
    >


    ----
    Bob Hutchison -- blogs at <http://www.recursive.ca/
    hutch/>
    Recursive Design Inc. -- <http://www.recursive.ca/>
    Raconteur -- <http://www.raconteur.info/>
    xampl for Ruby -- <http://rubyforge.org/projects/xampl/>
    Bob Hutchison, Aug 9, 2006
    #8
  9. On Aug 9, 2006, at 10:15 AM, M. Edward (Ed) Borasky wrote:

    > Harold Hausman wrote:
    >> Have you tried SQLite and found it to be too slow? It sounds to me
    >> like you might be prematurely optimizing. We have several medium to
    >> medium-large sized apps that make heavy use of SQLite and speed has
    >> been not a problem at all.

    > On Linux, at any rate, one ought to be able to make SQLite
    > extremely fast by adding RAM and tuning the I/O subsystem and the
    > kernel, and using tricks like memory mapped files, assuming the
    > database in question isn't too large. I'm assuming it's small,
    > otherwise he'd pretty much need a humongous database server and an
    > industrial strength database like Oracle or PostgreSQL.
    >


    A few GB max for one set of applications, far far less for others.

    Cheers,
    Bob

    ----
    Bob Hutchison -- blogs at <http://www.recursive.ca/
    hutch/>
    Recursive Design Inc. -- <http://www.recursive.ca/>
    Raconteur -- <http://www.raconteur.info/>
    xampl for Ruby -- <http://rubyforge.org/projects/xampl/>
    Bob Hutchison, Aug 9, 2006
    #9
  10. Bob Hutchison

    Guest

    On Thu, 10 Aug 2006, Bob Hutchison wrote:

    > The thing that makes me nervous is that I can do what I need with just two
    > tables. One that has (key, value-kind, value), and another that has
    > (index-name, index-value, key, value-kind). Each column is a String. The
    > vast majority of queries would be based on key and value-kind on the first
    > table. The remaining queries would be a select/join kind of thing. And I
    > can very easily jam the key and value-kind into the same field (but that
    > would be an optimisation that may not be necessary). The trouble is that the
    > first table might get to be very very large. I don't know how SQLite behaves
    > with a single huge table. I suppose I'm going to have to find out.


    please post a summary of your findings when you run your tests - i'd be
    interested to read them.

    cheers.

    -a
    --
    to foster inner awareness, introspection, and reasoning is more efficient than
    meditation and prayer.
    - h.h. the 14th dali lama
    , Aug 9, 2006
    #10
  11. =20

    > From: Bob Hutchison [mailto:]=20
    > Sent: Wednesday, August 09, 2006 4:39 PM
    >=20
    > > Have you tried SQLite and found it to be too slow? It sounds to me
    > > like you might be prematurely optimizing. We have several medium to
    > > medium-large sized apps that make heavy use of SQLite and speed has
    > > been not a problem at all.

    >=20
    > In the Java world things like Perst and JDBM were very fast. SQLite =20
    > was problematic for some reason that is a bit foggy just now (I'll =20
    > think of it).


    All I can say is SQLite3 was amazingly fast at solving every problem
    I threw at it (I don't know for the ruby bindings, but they appear
    to be relativ thin).

    I have to admit though that I never had more than a few 100 thousand
    items in a single table.

    Being blessed with the power and speed of complex SQL statements is=20
    invaluable especialy in a scripting language.

    cheers

    Simon
    Kroeger, Simon (ext), Aug 9, 2006
    #11
  12. Bob Hutchison wrote:
    >
    > On Aug 9, 2006, at 10:15 AM, M. Edward (Ed) Borasky wrote:
    >
    >> Harold Hausman wrote:
    >>> Have you tried SQLite and found it to be too slow? It sounds to me
    >>> like you might be prematurely optimizing. We have several medium to
    >>> medium-large sized apps that make heavy use of SQLite and speed has
    >>> been not a problem at all.

    >> On Linux, at any rate, one ought to be able to make SQLite extremely
    >> fast by adding RAM and tuning the I/O subsystem and the kernel, and
    >> using tricks like memory mapped files, assuming the database in
    >> question isn't too large. I'm assuming it's small, otherwise he'd
    >> pretty much need a humongous database server and an industrial
    >> strength database like Oracle or PostgreSQL.
    >>

    >
    > A few GB max for one set of applications, far far less for others.

    A few GB ... I'd be looking at a "real database" for that. "Pay me now
    or pay me later", as the saying goes. :)
    M. Edward (Ed) Borasky, Aug 10, 2006
    #12
  13. Bob Hutchison

    snacktime Guest

    On 8/9/06, Bob Hutchison <> wrote:
    > Hi,
    >
    > I'm looking for a persistent store, where:
    > * I can use it from Ruby
    > * it is fast
    > * it is transactional
    > * it can update more than one store in a single transaction
    > * simple String -> String mappings in the store are sufficient
    > * ideally uses files in the filesystem
    >
    > These kinds of stores are normally implemented as persistent hashes
    > or BTrees (or both). I know that Sleepycat's Berkeley DB <http://
    > www.sleepycat.com/> does this, and I've used Java based systems that
    > do this. I also know of some C based things but they don't have Ruby
    > wrappers. I can't find anything in the Ruby world, and I don't care
    > too much if it isn't pure Ruby.


    Why not use berkeleydb? There are ruby bindings for it.
    snacktime, Aug 10, 2006
    #13
  14. On Aug 9, 2006, at 11:32 PM, snacktime wrote:

    > On 8/9/06, Bob Hutchison <> wrote:
    >> Hi,
    >>
    >> I'm looking for a persistent store, where:
    >> * I can use it from Ruby
    >> * it is fast
    >> * it is transactional
    >> * it can update more than one store in a single transaction
    >> * simple String -> String mappings in the store are sufficient
    >> * ideally uses files in the filesystem
    >>
    >> These kinds of stores are normally implemented as persistent hashes
    >> or BTrees (or both). I know that Sleepycat's Berkeley DB <http://
    >> www.sleepycat.com/> does this, and I've used Java based systems that
    >> do this. I also know of some C based things but they don't have Ruby
    >> wrappers. I can't find anything in the Ruby world, and I don't care
    >> too much if it isn't pure Ruby.

    >
    > Why not use berkeleydb? There are ruby bindings for it.
    >


    The licensing cost of bdb for commercial re-distribution is
    prohibitive. Other than that, no reason.

    Cheers,
    Bob

    ----
    Bob Hutchison -- blogs at <http://www.recursive.ca/
    hutch/>
    Recursive Design Inc. -- <http://www.recursive.ca/>
    Raconteur -- <http://www.raconteur.info/>
    xampl for Ruby -- <http://rubyforge.org/projects/xampl/>
    Bob Hutchison, Aug 11, 2006
    #14
  15. On Aug 9, 2006, at 11:12 PM, M. Edward (Ed) Borasky wrote:

    > Bob Hutchison wrote:
    >>
    >> On Aug 9, 2006, at 10:15 AM, M. Edward (Ed) Borasky wrote:
    >>
    >>> Harold Hausman wrote:
    >>>> Have you tried SQLite and found it to be too slow? It sounds to me
    >>>> like you might be prematurely optimizing. We have several medium to
    >>>> medium-large sized apps that make heavy use of SQLite and speed has
    >>>> been not a problem at all.
    >>> On Linux, at any rate, one ought to be able to make SQLite
    >>> extremely fast by adding RAM and tuning the I/O subsystem and the
    >>> kernel, and using tricks like memory mapped files, assuming the
    >>> database in question isn't too large. I'm assuming it's small,
    >>> otherwise he'd pretty much need a humongous database server and
    >>> an industrial strength database like Oracle or PostgreSQL.
    >>>

    >>
    >> A few GB max for one set of applications, far far less for others.

    > A few GB ... I'd be looking at a "real database" for that. "Pay me
    > now or pay me later", as the saying goes. :)
    >
    >


    Well, my concern comes probably comes from the same place that your
    thinking of a 'real database' comes from :) To make it worse, in a
    relational DB I've got this urge to stick all that into only two
    tables... makes me nervous. Anyway, in the Java world, Perst has
    proven itself very capable, jaw-dropping fast, and reliable.


    Cheers,
    Bob

    ----
    Bob Hutchison -- blogs at <http://www.recursive.ca/
    hutch/>
    Recursive Design Inc. -- <http://www.recursive.ca/>
    Raconteur -- <http://www.raconteur.info/>
    xampl for Ruby -- <http://rubyforge.org/projects/xampl/>
    Bob Hutchison, Aug 11, 2006
    #15
  16. Bob Hutchison

    Guest

    On Fri, 11 Aug 2006, Bob Hutchison wrote:

    How fast is fast enough, and what kind of transaction support do you need?

    I have a class that is part of the suite of caches I provide in IOWA that
    I have recently been working on.

    It's called a DiskCache because it implements an LRU cache that operates
    exclusively through persistent storage, but the expiration of elements can
    be turned off, turning it into a simple persistent storage mechanism.
    Because it operates exclusively through persistent storage, it requires no
    in-memory indexes or other data structures. This was important to me
    because if one were using PStore with many small transactions on any
    data set that was not tiny (less than a thousandish elements), PStore's
    index reading/writing overhead became troublesome, and I didn't want to
    incur the RAM hit of an in memory index for a very large data set,
    either.

    It's relatively fast (it is much faster than PStore for non-trivial data
    sets since PStore requires reading an index into memory for each
    transaction), platform independent (well, I have tested it on Linux and
    Win XP) and it is multiprocess/threadsafe.

    It supports transactions as well as nested transactions.

    i.e.

    @cache.transaction do
    @cache[:a] = 1
    @cache.transaction do
    @cache[:b] = 2
    @cache.transaction do
    @cache[:c] = 3
    rollback
    end
    commit
    end
    rollback
    end

    At the end of those transactions, @cache[:b] == 2, and the two other items
    have been rolled back.

    I'm currently expanding the test cases for it and cleaning up some bugs,
    and will go ahead and release it as a standalone package this weekend.
    Maybe it will help you.


    Kirk Haines
    , Aug 11, 2006
    #16
  17. Kirk, how did you do this? Are you storing immutable objects and
    turning older versions into garbage somehow? If you have no indexing,
    then how do you enforce a consistent view across processes without
    incurring disk i/o (at least having to read the inode)? Or are you
    managing an arena that is backed by shared memory and you convert the
    key values deterministically into offsets?

    Sorry for all the questions, but I've just had to fight with this
    problem too many times and it would be great if you've come up with a
    better mousetrap.
    Francis Cianfrocca, Aug 11, 2006
    #17
  18. Kirk Haines wrote:
    > How fast is fast enough, and what kind of transaction support do you
    > need?
    >

    <snip>
    > I have a class that is part of the suite of caches I provide in IOWA
    > that I have recently been working on.
    >

    <snip>
    > I'm currently expanding the test cases for it and cleaning up some
    > bugs, and will go ahead and release it as a standalone package this
    > weekend. Maybe it will help you.

    Francis Cianfrocca wrote:
    > Kirk, how did you do this? Are you storing immutable objects and
    > turning older versions into garbage somehow? If you have no indexing,
    > then how do you enforce a consistent view across processes without
    > incurring disk i/o (at least having to read the inode)? Or are you
    > managing an arena that is backed by shared memory and you convert the
    > key values deterministically into offsets?
    >
    > Sorry for all the questions, but I've just had to fight with this
    > problem too many times and it would be great if you've come up with a
    > better mousetrap.


    I'm interested in this, too...please make an announcement on the list
    when you release it? :)

    -Justin
    Justin Collins, Aug 11, 2006
    #18
  19. Bob Hutchison

    Guest

    On Sat, 12 Aug 2006, Francis Cianfrocca wrote:

    > Kirk, how did you do this? Are you storing immutable objects and
    > turning older versions into garbage somehow? If you have no indexing,
    > then how do you enforce a consistent view across processes without
    > incurring disk i/o (at least having to read the inode)? Or are you
    > managing an arena that is backed by shared memory and you convert the
    > key values deterministically into offsets?
    >
    > Sorry for all the questions, but I've just had to fight with this
    > problem too many times and it would be great if you've come up with a
    > better mousetrap.


    I don't think it is necessarily a better mousetrap. In this case, I'm
    just willing to make the filesystem do some of the work for me instead of
    keeping a separate index in memory.

    I'm using a modified version of Ara Howard's lockfile.rb to handle locking
    between processes. It works over NFS, and I have modified it so that it
    also works on Windows by automatically falling back to a Windows safe
    locking method.

    I create a hash (Digest::SHA512 based) out of the key, and use that key as
    the filename for the data. There is also a second file written that
    contains some metadata (the structure on disk is actually a linked list to
    more easily support some of the LRU aspects and element expiration).
    There are a couple of metadata files that also maintain some information
    on the data store as a whole.

    The code automatically breaks the file storage into a tree of directories
    to help avoid any filesystem problems with having too many files in a
    single directory. The size of this tree is tuneable so that a data store
    that may have a million entries may have more nodes (directories) than one
    that is only expected to have 10000.

    So, retrieving an element from the data store is reduced to hashing the
    key, finding the file, and reading it. It's disk i/o, but less than
    PStore generates most of the time.

    Since my need was for something that could do this while also being
    treated as an LRU cache, there is some extra work mixed into there to
    maintain the meta data and expire elements from the cache (it can also
    expire them based on age), and that's really where most of the performance
    hit comes from.

    I'm working on creating a second version that strips out all of the LRU
    cache code to provide a simple, fast data persistence implementation
    without any of that extra overhead, and will be better able to report on
    just how much of a performance hit that overhead brings with it soon.


    Thanks,

    Kirk Haines
    , Aug 11, 2006
    #19
  20. On 8/11/06, <> wrote:
    > On Sat, 12 Aug 2006, Francis Cianfrocca wrote:
    >
    > > Kirk, how did you do this? Are you storing immutable objects and
    > > turning older versions into garbage somehow? If you have no indexing,
    > > then how do you enforce a consistent view across processes without
    > > incurring disk i/o (at least having to read the inode)? Or are you
    > > managing an arena that is backed by shared memory and you convert the
    > > key values deterministically into offsets?
    > >
    > > Sorry for all the questions, but I've just had to fight with this
    > > problem too many times and it would be great if you've come up with a
    > > better mousetrap.

    >
    > I don't think it is necessarily a better mousetrap. In this case, I'm
    > just willing to make the filesystem do some of the work for me instead of
    > keeping a separate index in memory.
    >
    > I'm using a modified version of Ara Howard's lockfile.rb to handle locking
    > between processes. It works over NFS, and I have modified it so that it
    > also works on Windows by automatically falling back to a Windows safe
    > locking method.
    >
    > I create a hash (Digest::SHA512 based) out of the key, and use that key as
    > the filename for the data. There is also a second file written that
    > contains some metadata (the structure on disk is actually a linked list to
    > more easily support some of the LRU aspects and element expiration).
    > There are a couple of metadata files that also maintain some information
    > on the data store as a whole.
    >
    > The code automatically breaks the file storage into a tree of directories
    > to help avoid any filesystem problems with having too many files in a
    > single directory. The size of this tree is tuneable so that a data store
    > that may have a million entries may have more nodes (directories) than one
    > that is only expected to have 10000.
    >
    > So, retrieving an element from the data store is reduced to hashing the
    > key, finding the file, and reading it. It's disk i/o, but less than
    > PStore generates most of the time.
    >
    > Since my need was for something that could do this while also being
    > treated as an LRU cache, there is some extra work mixed into there to
    > maintain the meta data and expire elements from the cache (it can also
    > expire them based on age), and that's really where most of the performance
    > hit comes from.
    >
    > I'm working on creating a second version that strips out all of the LRU
    > cache code to provide a simple, fast data persistence implementation
    > without any of that extra overhead, and will be better able to report on
    > just how much of a performance hit that overhead brings with it soon.
    >
    >
    > Thanks,
    >
    > Kirk Haines
    >
    >



    With all the times I've reinvented this wheel, I've never tried
    storing millions of data elements each in its own file. Do you have
    performance metrics you can share? Did you tune it for a particular
    filesystem?
    Francis Cianfrocca, Aug 11, 2006
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. NWx
    Replies:
    2
    Views:
    362
  2. Replies:
    0
    Views:
    644
  3. Michele Simionato

    Python is darn fast (was: How fast is Python)

    Michele Simionato, Aug 23, 2003, in forum: Python
    Replies:
    13
    Views:
    549
  4. gk
    Replies:
    7
    Views:
    952
    Tom Anderson
    Oct 12, 2010
  5. Josef Wolf

    How to store/load persistent data?

    Josef Wolf, Sep 4, 2006, in forum: Ruby
    Replies:
    0
    Views:
    139
    Josef Wolf
    Sep 4, 2006
Loading...

Share This Page