low-end persistence strategies?

P

Paul Rubin

Michele Simionato said:
The documentation hides this fact (I missed that) but actually python
2.3+ ships
with the pybsddb module which has all the functionality you allude too.
Check at the test directory for bsddb.

Oh yow, it looks pretty complicated. Do you have any example code
around that uses the transaction stuff? If not I can try to figure it
out, but it looks like it would take significant effort.
 
P

Paul Rubin

Fred Pacquier said:
KirbyBase sounds like something that could fit the bill.

Hmm, this looks kind of nice. However, when used in embedded mode,
the overview blurb doesn't say anything about concurrency control.
I don't want to use it in client/server mode, for reasons already stated.
 
J

Jamey Cribbs

Paul said:
Hmm, this looks kind of nice. However, when used in embedded mode,
the overview blurb doesn't say anything about concurrency control.
I don't want to use it in client/server mode, for reasons already stated.

The KirbyBase distribution comes with two small scripts that each
implement a server.

kbsimpleserver.py allows multi-user access to KirbyBase tables. It
takes care of concurrent update issues by being single-threaded and
blocking. Client requests are handled sequentially. It works fine for
small tables that don't have a lot of concurrent access.

kbthreadedserver.py also allows for multi-user access. It creates a
multi-threaded, non-blocking server. Each client gets it's own thread.
The only time one thread will block the others is when it is going
to write to a table, and then it only blocks other write requests to
that same table. Reads never are blocked. This server script has
worked ok for me in limited testing.

Either of these server scripts would have to be running as a process
either on your web server or on another server on your network in order
for them to work. I don't know if that would be an issue for you.

HTH,

Jamey Cribbs
 
P

Paul Rubin

Jamey Cribbs said:
Either of these server scripts would have to be running as a process
either on your web server or on another server on your network in
order for them to work. I don't know if that would be an issue for you.

Yes, that's the whole point. I don't want to run a server process 24/7
just to look up two or three numbers a few times a day.
 
J

Jamey Cribbs

Paul said:
Yes, that's the whole point. I don't want to run a server process 24/7
just to look up two or three numbers a few times a day.

Ok, I see your point now. Well, this is off the top of my head, I
haven't tried it, but I think you could just use KirbyBase embedded in
your cgi script and it should work fine. I'm kind of thinking out loud
about this, but, let's see, if you had two user's simultaneously
accessing your web site at the same time, that would be two instances of
your cgi script. If they are just looking up data, that would be two
reads that KirbyBase would be doing against the same physical file. It
just opens files in read mode for that so that should work even
concurrently.

The only time there might be trouble is if two clients try to write to
the same table (physical file) at the same time. When it writes to a
file, KirbyBase opens it in append mode (r+, I think). My guess would
be, whichever client got there first would open the file. The second
client, arriving a split second later, would attempt to open the file in
append mode also and KirbyBase would return an exception. If your cgi
script caught and handled the exception, you would be fine.

Again this is off the top of my head after a long day, so I can't be
held responsible for my ramblings. :)

Jamey
 
M

Michele Simionato

Paul Rubin:
Oh yow, it looks pretty complicated. Do you have any example code
around that uses the transaction stuff? If not I can try to figure it
out, but it looks like it would take significant effort.

This was my impression too :-( The ZODB is way much easier to use so
at the end I used just that. Apparently the bsddb stuff is more
complicated than needed and the documentation sucks. However,
it does satisfy your requirements of being already installed, so I
mentioned it. I am looking too for an easy tutorial on how to do
concurrency/transactions with it.

Michele Simionato
 
P

Paul Rubin

Jamey Cribbs said:
The only time there might be trouble is if two clients try to write to
the same table (physical file) at the same time.

Yes, that's what I'm concerned about.
When it writes to a file, KirbyBase opens it in append mode (r+, I
think). My guess would be, whichever client got there first would
open the file. The second client, arriving a split second later,
would attempt to open the file in append mode also and KirbyBase
would return an exception. If your cgi script caught and handled
the exception, you would be fine.

I don't think the OS will stop both processes from opening in append
mode (which just means opening the file and seeking to the end) at
once unless you use O_EXCL or something. Then you're left with
handling the exception, which may get messy if you want to have all
the corner cases correct. It would be nice if there was a published
code snippet somewhere that did that. I may try writing one.

The best solution, I think, is Michele Simionato's, which is to use
the currently-undocumented bsddb transction features. Those features
are the right technical approach to this problem. But they really
ought to get documented first.
 
P

Paul Rubin

Michele Simionato said:
This was my impression too :-( The ZODB is way much easier to use so
at the end I used just that. Apparently the bsddb stuff is more
complicated than needed and the documentation sucks. However,
it does satisfy your requirements of being already installed, so I
mentioned it. I am looking too for an easy tutorial on how to do
concurrency/transactions with it.

The sleepycat docs seemed fine back when I looked at them years ago.
I'm just not sure what the Python wrapper is supposed to do in terms
of API.
 
J

John Lenton

I've started a few threads before on object persistence in medium to
high end server apps. This one is about low end apps, for example, a
simple cgi on a personal web site that might get a dozen hits a day.
The idea is you just want to keep a few pieces of data around that the
cgi can update.

Immediately, typical strategies like using a MySQL database become too
big a pain. Any kind of compiled and installed 3rd party module (e.g.
Metakit) is also too big a pain. But there still has to be some kind
of concurrency strategy, even if it's something like crude file
locking, or else two people running the cgi simultaneously can wipe
out the data store. But you don't want crashing the app to leave a
lock around if you can help it.

Anyway, something like dbm or shelve coupled with flock-style file
locking and a version of dbmopen that automatically retries after 1
second if the file is locked would do the job nicely, plus there could
be a cleanup mechanism for detecting stale locks.

Is there a standard approach to something like that, or should I just
code it the obvious way?

one easy way would be something along the lines of

from ConfigParser import ConfigParser
from fcntl import flock, LOCK_SH, LOCK_EX, LOCK_UN

class LockedParser(ConfigParser):
def _read(self, fp, fpname):
flock(fp, LOCK_SH) # block until can read
try:
rv = super(LockedParser, self)._read(fp, fpname)
finally:
flock(fp, LOCK_UN)
return rv

def write(self, fp):
flock(fp, LOCK_EX) # block until can write
try:
rv = super(LockedParser, self).write(fp)
finally:
flock(fp, LOCK_UN)
return rv

although you could do the same kind of stuff with csv, or even
Pickle. Of course this doesn't work if what you're wanting to
implement is a hit counter, but that is much easier: just grab a
LOCK_EX, read in, write out, LOCK_UN. If you care about (not)
overwriting changes, but fear you'll hold the lock for too long with
the simple 'grab the lock and run' approach, you could save a version
of the original file and compare before writing out. Complexity grows
a lot, and you suddenly would be better off using pybsddb or somesuch.

Of course I'm probably overlooking something, because it really can't
be this easy, can it?

--
John Lenton ([email protected]) -- Random fortune:
BOFH excuse #44:

bank holiday - system operating credits not recharged

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (GNU/Linux)

iD4DBQFCFCRlgPqu395ykGsRAl/DAKCR0UMqnWlwYiUyXVxnE6rxJWY93gCYzaat
VGxTsZaZ+GlLrHIAwBaCHg==
=mzQa
-----END PGP SIGNATURE-----
 
P

Paul Rubin

John Lenton said:
flock(fp, LOCK_EX) # block until can write ...
Of course I'm probably overlooking something, because it really can't
be this easy, can it?

Yes, maybe so. I'm just way behind the times and didn't realize flock
would block until existing incompatible locks are released. That may
solve the whole timeout/retry problem. I should have checked the docs
more carefully earlier; I was thinking only in terms of opening with
O_EXCL. Thanks!
 
M

Michele Simionato

<snip simple example with flock>

What happens if for any reason the application crashes?
Locked files will stay locked or not? And if yes, how do I
unlock them?

Michele Simionato
 
N

Nick Craig-Wood

Paul Rubin said:
The issue with using an rdbms is not with the small amount of code
needed to connect to it and query it, but in the overhead of
installing the huge piece of software (the rdbms) itself, and keeping
the rdbms server running all the time so the infrequently used app can
connect to it.

I've found SQLobject to be a really good way of poking objects in an
SQL database with zero hassle.

It can also use SQLite (which I haven't tried) which gets rid of your
large rdbms process but also gives you a migration path should the
problem expand.
ZODB is also a big piece of software to install. Is it at least
100% Python with no C modules required? Does it need a separate
server process? If it needs either C modules or a separate server,
it really can't be called a low-end strategy.

ZODB looks fun. I just wish (being lazy) that there was a seperate
debian package for just it and not the whole of Zope.
 
J

John Lenton

<snip simple example with flock>

What happens if for any reason the application crashes?
Locked files will stay locked or not? And if yes, how do I
unlock them?

the operating system cleans up the lock.

--
John Lenton ([email protected]) -- Random fortune:
Linux ext2fs has been stable for a long time, now it's time to break it
-- Linuxkongreß '95 in Berlin

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (GNU/Linux)

iD8DBQFCFFc/gPqu395ykGsRAm21AJwITsOiADt/3ddoPnYr275c3LYeiwCeIIfI
Y1Pjlf2JZF70Wvz2t4muTCI=
=2aN5
-----END PGP SIGNATURE-----
 
M

Michele Simionato

John Lenton:
the operating system cleans up the lock.

So, are you effectively saying than a custom made solution based on
flock
can be quite reliable and it could be a reasonable choice to use
shelve+flock
for small/hobbysts sites? I always thought locking was a bad beast and
feared
to implement it myself, but maybe I was wrong afterall ...

Michele Simionato
 
J

John Lenton

John Lenton:

So, are you effectively saying than a custom made solution based on
flock can be quite reliable and it could be a reasonable choice to
use shelve+flock for small/hobbysts sites? I always thought locking
was a bad beast and feared to implement it myself, but maybe I was
wrong afterall ...

locking works very well, when it works. If you're on Linux, the
manpage for flock has a NOTES section you should read. I don't know
how direct the mapping between python's flock/lockf and the OSs
flock/lockf are, you might want to look into that as well (but you'd
only really care if you are in one of the corner cases mentioned in
the referred NOTES section).

In some weird corner cases you'd have to revert to some other locking
scheme, but the same pattern applies, however: subclass whatever it is
you want to use, wrapping the appropriate methods in try/finally
lock/unlocks; you just want to change the flock to some other thing.

Also, if you use something where the process doesn't terminate between
calls (such as mod_python, I guess), you have to be sure to write the
try/finallys around your locking code, because the OS only cleans up
the lock when the process exits.

--
John Lenton ([email protected]) -- Random fortune:
Keep emotionally active. Cater to your favorite neurosis.

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.0 (GNU/Linux)

iD8DBQFCFNNWgPqu395ykGsRArnPAJwOfYeVLD/YwY87naht2zq5W/EteACffhhy
BxiPkQ7SnfVXSE4oiKUfpoA=
=Fw5a
-----END PGP SIGNATURE-----
 
P

Pierre Quentel

Maybe you'll find this too naive, but why do you want to avoid
concurrent accesses to a database that will be accessed 12 times a day ?

Regards,
Pierre
 
J

John Lenton

Maybe you'll find this too naive, but why do you want to avoid
concurrent accesses to a database that will be accessed 12 times a day ?

because every sunday at 3am your boss and his wife will both try to
use the script at the same time, and delete everything.

--
John Lenton ([email protected]) -- Random fortune:
If our behavior is strict, we do not need fun!

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.0 (GNU/Linux)

iD8DBQFCFPsEgPqu395ykGsRAnUVAKDBJyhZHfVDVM2OyMtpqdnOnjCBBgCgmGLx
6U4VxMZHjT5gaxnAwkWz8D0=
=AIbC
-----END PGP SIGNATURE-----
 
P

Paul Rubin

John Lenton said:
because every sunday at 3am your boss and his wife will both try to
use the script at the same time, and delete everything.

Yes, I think that could be pretty typical. For example, say I write a
cgi to maintain a signup list for a party I'm having. I email an
invitation out to some friends with a url to click if they want to
attend. If a dozen people click the url in the next day, several of
them will probably in the first minute or so after the email goes out.
So two simultaneous clicks isn't implausible.

More generally, I don't like writing code with bugs even if the bugs
have fairly low chance of causing trouble. So I'm looking for the
easiest way to do this kind of thing without bugs.
 
M

Michele Simionato

John Lenton:
Also, if you use something where the process doesn't terminate between
calls (such as mod_python, I guess), you have to be sure to write the
try/finallys around your locking code, because the OS only cleans up
the lock when the process exits.

This is what I feared. What happens in the case of a power failure?
Am I left with locked files floating around?

Michele Simionato
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,766
Messages
2,569,569
Members
45,042
Latest member
icassiem

Latest Threads

Top