low-end persistence strategies?

Discussion in 'Python' started by Paul Rubin, Feb 16, 2005.

  1. Paul Rubin

    Paul Rubin Guest

    I've started a few threads before on object persistence in medium to
    high end server apps. This one is about low end apps, for example, a
    simple cgi on a personal web site that might get a dozen hits a day.
    The idea is you just want to keep a few pieces of data around that the
    cgi can update.

    Immediately, typical strategies like using a MySQL database become too
    big a pain. Any kind of compiled and installed 3rd party module (e.g.
    Metakit) is also too big a pain. But there still has to be some kind
    of concurrency strategy, even if it's something like crude file
    locking, or else two people running the cgi simultaneously can wipe
    out the data store. But you don't want crashing the app to leave a
    lock around if you can help it.

    Anyway, something like dbm or shelve coupled with flock-style file
    locking and a version of dbmopen that automatically retries after 1
    second if the file is locked would do the job nicely, plus there could
    be a cleanup mechanism for detecting stale locks.

    Is there a standard approach to something like that, or should I just
    code it the obvious way?

    Thanks.
     
    Paul Rubin, Feb 16, 2005
    #1
    1. Advertising

  2. Maybe ZODB helps.


    --
    Regards,

    Diez B. Roggisch
     
    Diez B. Roggisch, Feb 16, 2005
    #2
    1. Advertising

  3. Am Tue, 15 Feb 2005 18:57:47 -0800 schrieb Paul Rubin:

    > I've started a few threads before on object persistence in medium to
    > high end server apps. This one is about low end apps, for example, a
    > simple cgi on a personal web site that might get a dozen hits a day.
    > The idea is you just want to keep a few pieces of data around that the
    > cgi can update.

    [cut]
    > Anyway, something like dbm or shelve coupled with flock-style file
    > locking and a version of dbmopen that automatically retries after 1
    > second if the file is locked would do the job nicely, plus there could
    > be a cleanup mechanism for detecting stale locks.
    >
    > Is there a standard approach to something like that, or should I just
    > code it the obvious way?


    Hi,

    I would use the pickle module and access to the
    pickle files could be serialized (one after the other
    is allowed to read or write) with file locking.

    This means your cgi application can only serve one request
    after the other.

    HTH,
    Thomas

    --
    Thomas G├╝ttler, http://www.thomas-guettler.de/
     
    Thomas Guettler, Feb 16, 2005
    #3
  4. Paul Rubin

    Paul Rubin Guest

    "Diez B. Roggisch" <> writes:
    > Maybe ZODB helps.


    I think it's way too heavyweight for what I'm envisioning, but I
    haven't used it yet. I'm less concerned about object persistence
    (just saving strings is good enough) than finding the simplest
    possible approach to dealing with concurrent update attempts.
     
    Paul Rubin, Feb 16, 2005
    #4
  5. Paul Rubin wrote:

    > "Diez B. Roggisch" <> writes:
    >> Maybe ZODB helps.

    >
    > I think it's way too heavyweight for what I'm envisioning, but I
    > haven't used it yet. I'm less concerned about object persistence
    > (just saving strings is good enough) than finding the simplest
    > possible approach to dealing with concurrent update attempts.


    And that's exactly where zodb comes into play. It has full ACID support.
    Opening a zodb is a matter of three lines of code - not to be compared to
    rdbms'ses. And apart from some standard subclassing, you don't have to do
    anything to make your objects persistable. Just check the tutorial.
    --
    Regards,

    Diez B. Roggisch
     
    Diez B. Roggisch, Feb 16, 2005
    #5
  6. Paul Rubin

    Paul Rubin Guest

    "Diez B. Roggisch" <> writes:
    > > I think it's way too heavyweight for what I'm envisioning, but I
    > > haven't used it yet. I'm less concerned about object persistence
    > > (just saving strings is good enough) than finding the simplest
    > > possible approach to dealing with concurrent update attempts.

    >
    > And that's exactly where zodb comes into play. It has full ACID support.
    > Opening a zodb is a matter of three lines of code - not to be compared to
    > rdbms'ses.


    The issue with using an rdbms is not with the small amount of code
    needed to connect to it and query it, but in the overhead of
    installing the huge piece of software (the rdbms) itself, and keeping
    the rdbms server running all the time so the infrequently used app can
    connect to it. ZODB is also a big piece of software to install. Is
    it at least 100% Python with no C modules required? Does it need a
    separate server process? If it needs either C modules or a separate
    server, it really can't be called a low-end strategy.
     
    Paul Rubin, Feb 16, 2005
    #6
  7. > The issue with using an rdbms is not with the small amount of code
    > needed to connect to it and query it, but in the overhead of


    Its not only connecting - its creating (automaticall if necessary) and
    "connecting" which is actually only opening.

    > installing the huge piece of software (the rdbms) itself, and keeping
    > the rdbms server running all the time so the infrequently used app can
    > connect to it. ZODB is also a big piece of software to install. Is
    > it at least 100% Python with no C modules required? Does it need a
    > separate server process? If it needs either C modules or a separate
    > server, it really can't be called a low-end strategy.


    It has to be installed. And it has C-modules - but I don't see that as a
    problem. Of course this is my personal opinion - but it's certainly easier
    installed than to cough up your own transaction isolated persistence layer.
    I started using it over pickle when my multi-threaded app caused pickle to
    crash.

    ZODB does not have a server-process, and no external setup beyond the
    installation of the module itself.

    Even if you consider installing it as too heavy for your current needs, you
    should skim over the tutorial to get a grasp of how it works.

    --
    Regards,

    Diez B. Roggisch
     
    Diez B. Roggisch, Feb 16, 2005
    #7
  8. Paul Rubin

    Tom Willis Guest

    Sounds like you want pickle or cpickle.


    On Tue, 15 Feb 2005 19:00:31 -0800 (PST), Paul Rubin
    <"http://phr.cx"@nospam.invalid> wrote:
    > I've started a few threads before on object persistence in medium to
    > high end server apps. This one is about low end apps, for example, a
    > simple cgi on a personal web site that might get a dozen hits a day.
    > The idea is you just want to keep a few pieces of data around that the
    > cgi can update.
    >
    > Immediately, typical strategies like using a MySQL database become too
    > big a pain. Any kind of compiled and installed 3rd party module (e.g.
    > Metakit) is also too big a pain. But there still has to be some kind
    > of concurrency strategy, even if it's something like crude file
    > locking, or else two people running the cgi simultaneously can wipe
    > out the data store. But you don't want crashing the app to leave a
    > lock around if you can help it.
    >
    > Anyway, something like dbm or shelve coupled with flock-style file
    > locking and a version of dbmopen that automatically retries after 1
    > second if the file is locked would do the job nicely, plus there could
    > be a cleanup mechanism for detecting stale locks.
    >
    > Is there a standard approach to something like that, or should I just
    > code it the obvious way?
    >
    > Thanks.
    > --
    > http://mail.python.org/mailman/listinfo/python-list
    >



    --
    Thomas G. Willis
    http://paperbackmusic.net
     
    Tom Willis, Feb 16, 2005
    #8
  9. Paul Rubin

    Paul Rubin Guest

    "Diez B. Roggisch" <> writes:
    > It has to be installed. And it has C-modules - but I don't see that
    > as a problem. Of course this is my personal opinion - but it's
    > certainly easier installed than to cough up your own transaction
    > isolated persistence layer. I started using it over pickle when my
    > multi-threaded app caused pickle to crash.


    I don't feel that I need ACID since, as mentioned, I'm willing to lock
    the entire database for the duration of each transaction. I just want
    a simple way to handle locking, retries, and making sure the locks are
    cleaned up.

    > ZODB does not have a server-process, and no external setup beyond the
    > installation of the module itself.


    That helps, thanks.

    > Even if you consider installing it as too heavy for your current needs, you
    > should skim over the tutorial to get a grasp of how it works.


    Yes, I've been wanting to look at it sometime.
     
    Paul Rubin, Feb 16, 2005
    #9
  10. Paul Rubin

    Paul Rubin Guest

    Tom Willis <> writes:
    > Sounds like you want pickle or cpickle.


    No, the issue is how to handle multiple clients trying to update the
    pickle simultaneously.
     
    Paul Rubin, Feb 16, 2005
    #10
  11. Paul Rubin

    Chris Cioffi Guest

    I'd like to second this one...ZODB is *extremely* easy to use. I use
    it in projects with anything from a couple dozen simple objects all
    the way up to a moderately complex system with several hundred
    thousand stored custom objects. (I would use it for very complex
    systems as well, but I'm not working on any right now...)

    There are a few quirks to using ZODB, and the documentation sometimes
    feel lite, but mostly that's b/c ZODB is so easy to use.

    Chris


    On Wed, 16 Feb 2005 15:11:46 +0100, Diez B. Roggisch <> wrote:
    > Paul Rubin wrote:
    >
    > > "Diez B. Roggisch" <> writes:
    > >> Maybe ZODB helps.

    > >
    > > I think it's way too heavyweight for what I'm envisioning, but I
    > > haven't used it yet. I'm less concerned about object persistence
    > > (just saving strings is good enough) than finding the simplest
    > > possible approach to dealing with concurrent update attempts.

    >
    > And that's exactly where zodb comes into play. It has full ACID support.
    > Opening a zodb is a matter of three lines of code - not to be compared to
    > rdbms'ses. And apart from some standard subclassing, you don't have to do
    > anything to make your objects persistable. Just check the tutorial.
    > --
    > Regards,
    >
    > Diez B. Roggisch
    > --
    > http://mail.python.org/mailman/listinfo/python-list
    >



    --
    "It is our responsibilities, not ourselves, that we should take
    seriously." -- Peter Ustinov
     
    Chris Cioffi, Feb 16, 2005
    #11
  12. Paul Rubin

    Dave Brueck Guest

    ZODB performance (was Re: low-end persistence strategies?)

    Chris Cioffi wrote:
    > I'd like to second this one...ZODB is *extremely* easy to use. I use
    > it in projects with anything from a couple dozen simple objects all
    > the way up to a moderately complex system with several hundred
    > thousand stored custom objects. (I would use it for very complex
    > systems as well, but I'm not working on any right now...)


    Chris (or anyone else), could you comment on ZODB's performance? I've Googled
    around a bit and haven't been able to find anything concrete, so I'm really
    curious to know how ZODB does with a few hundred thousand objects.

    Specifically, what level of complexity do your ZODB queries/searches have? Any
    idea on how purely ad hoc searches perform? Obviously it will be affected by the
    nature of the objects, but any insight into ZODB's performance on large data
    sets would be helpful. What's the general ratio of reads to writes in your
    application?

    I'm starting on a project in which we'll do completely dynamic (generated on the
    fly) queries into the database (mostly of the form of "from the set of all
    objects, give me all that have property A AND have property B AND property B's
    value is between 10 and 100, ..."). The objects themselves are fairly dynamic as
    well, so building it on top of an RDBMS will require many joins across property
    and value tables, so in the end there might not be any performance advantage in
    an RDBMS (and it would certainly be a lot work to use an object database - a
    huge portion of the work is in the object-relational layer).

    Anyway, thanks for any info you can give me,
    -Dave
     
    Dave Brueck, Feb 16, 2005
    #12
  13. Paul Rubin

    Tom Willis Guest

    Oops missed that sorry.

    Carry on.

    On Wed, 16 Feb 2005 07:29:58 -0800 (PST), Paul Rubin
    <"http://phr.cx"@nospam.invalid> wrote:
    > Tom Willis <> writes:
    > > Sounds like you want pickle or cpickle.

    >
    > No, the issue is how to handle multiple clients trying to update the
    > pickle simultaneously.
    > --
    > http://mail.python.org/mailman/listinfo/python-list
    >



    --
    Thomas G. Willis
    http://paperbackmusic.net
     
    Tom Willis, Feb 16, 2005
    #13
  14. What about bsddb? On most Unix systems it should be
    already installed and on Windows it comes with the
    ActiveState distribution of Python, so it should fullfill
    your requirements.
     
    Michele Simionato, Feb 16, 2005
    #14
  15. Paul Rubin

    Paul Rubin Guest

    "Michele Simionato" <> writes:
    > What about bsddb? On most Unix systems it should be already
    > installed and on Windows it comes with the ActiveState distribution
    > of Python, so it should fullfill your requirements.


    As I understand it, bsddb doesn't expose the underlying Sleepycat API's
    for concurrent db updates, nor does it appear to make any attempt at
    locking, based on looking at the Python lib doc for it. There's an
    external module called pybsddb that includes this stuff. Maybe the
    stdlib maintainers ought to consider including it, if it's considered
    stable enough.
     
    Paul Rubin, Feb 16, 2005
    #15
  16. Re: ZODB performance (was Re: low-end persistence strategies?)

    > Chris (or anyone else), could you comment on ZODB's performance? I've
    > Googled around a bit and haven't been able to find anything concrete, so
    > I'm really curious to know how ZODB does with a few hundred thousand
    > objects.


    > Specifically, what level of complexity do your ZODB queries/searches have?
    > Any idea on how purely ad hoc searches perform? Obviously it will be
    > affected by the nature of the objects, but any insight into ZODB's
    > performance on large data sets would be helpful. What's the general ratio
    > of reads to writes in your application?


    This is a somewhat weak point of zodb. Zodb simply lets you store arbitrary
    object graphs. There is no indices created to access these, and no query
    language either. You can of course create indices yourself - and store them
    as simply as all other objects. But you've got to hand-tailor these to the
    objects you use, and create your querying code yourself - no 4gl like sql
    available.

    Of course writing queries as simple predicates evaluated against your whole
    object graph is straightforward - but unoptimized.

    The retrieval of objects themselves is very fast - I didn't compare to a
    rdbms, but as there is no networking involved it should be faster. And of
    course no joins are needed.

    So in the end, if you have always the same kind of queries that you only
    parametrize and create appropriate indices and hand-written "execution
    plans" things are nice.

    But I want to stress another point that can cause trouble when using zodb
    and that I didn't mention in replies to Paul so far, as he explicitly
    didn't want to use an rdbms:


    For rdbms'ses, a well-defined textual representation of the entities stored
    in the db is available. So while you have to put some effort on creating on
    OR-mapping (if you want to deal with objects) that will most likely evolve
    over time, migrating the underlying data usually is pretty straightforward,
    and even toolsupport is available. Basically, you're only dealing with
    CSV-Data that can be easily manipulated and stored back.

    ZODB on the other side is way easier to code for - but the hard times begin
    if you have a rolled out application that has a bunch of objects inside
    zodb that have to be migrated to newer versions and possibly changed object
    graph layouts. This made me create elaborate yaml/xml serializations to
    allow for im- and exports and use with xslt and currently I'm investigating
    a switch to postgres.

    This point is important, and future developments of mine will take that into
    consideration more than they did so far.

    --
    Regards,

    Diez B. Roggisch
     
    Diez B. Roggisch, Feb 16, 2005
    #16
  17. Paul Rubin

    Guest

    People sometimes run to complicated systems, when right before you
    there is a solution. In this case, it is with the filesystem itself.

    It turns out mkdir is an atomic event (at least on filesystems I've
    encountered). And, from that simple thing, you can build something
    reasonable as long as you do not need high performance. and space isn't
    an issue.

    You need a 2 layer lock (make 2 directories) and you need to keep 2
    data files around plus a 3rd temporary file.

    The reader reads from the newest of the 2 data files.

    The writer makes the locks, deletes the oldest data file and renames
    it's temporary file to be the new data file. You could
    have the locks expire after 10 minutes, to take care of failure to
    clean up. Ultimately, the writer is responsible for keeping the locks
    alive. The writer knows it is his lock because it has his timestamp.
    If the writer dies, no big deal, since it only affected a temporary
    file and the locks will expire.

    Rename the temporary file takes advantage of the fact that a rename
    is essentially immediate. Since, whatever does the reading, only reads
    from the newest of the 2 files (if both are available). Once, the
    rename of the temporary file done by the writer is complete, any future
    reads will now hit the newest data. And, deleting the oldest file
    doesn't matter since the reader never looks at it.

    If you want more specifics let me know.

    john
     
    , Feb 16, 2005
    #17
  18. The documentation hides this fact (I missed that) but actually python
    2.3+ ships
    with the pybsddb module which has all the functionality you allude too.
    Check at the test directory for bsddb.

    Michele Simionato
     
    Michele Simionato, Feb 16, 2005
    #18
  19. In article <>,
    Paul Rubin <http://> wrote:
    >I've started a few threads before on object persistence in medium to
    >high end server apps. This one is about low end apps, for example, a
    >simple cgi on a personal web site that might get a dozen hits a day.
    >The idea is you just want to keep a few pieces of data around that the
    >cgi can update.
    >
    >Immediately, typical strategies like using a MySQL database become too
    >big a pain. Any kind of compiled and installed 3rd party module (e.g.
    >Metakit) is also too big a pain. But there still has to be some kind
    >of concurrency strategy, even if it's something like crude file
    >locking, or else two people running the cgi simultaneously can wipe
    >out the data store. But you don't want crashing the app to leave a
    >lock around if you can help it.
    >
    >Anyway, something like dbm or shelve coupled with flock-style file
    >locking and a version of dbmopen that automatically retries after 1
    >second if the file is locked would do the job nicely, plus there could
    >be a cleanup mechanism for detecting stale locks.
    >
    >Is there a standard approach to something like that, or should I just
    >code it the obvious way?
    >
    >Thanks.


    I have a couple of oblique, barely-helpful reactions; I
    wish I knew better solutions.

    First: I'm using Metakit and SQLite; they give me more
    confidence and fewer surprises than dbm.

    Second: Locking indeed is a problem, and I haven't
    found a good global solution for it yet. I end up with
    local fixes, that is, rather project-specific locking
    schemes that exploit knowledge that, for example, there
    are no symbolic links to worry about, or NFS mounts, or
    ....

    Good luck.
     
    Cameron Laird, Feb 16, 2005
    #19
  20. Paul Rubin

    Paul Rubin Guest

    "Michele Simionato" <> writes:
    > The documentation hides this fact (I missed that) but actually
    > python 2.3+ ships with the pybsddb module which has all the
    > functionality you allude too. Check at the test directory for bsddb.


    Thanks, this is very interesting. It's important functionality that
    should be documented, if it works reliably. Have you had any probs
    with it?
     
    Paul Rubin, Feb 16, 2005
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.

Share This Page