Re: database connection pooling from mod_python app

Discussion in 'Python' started by Matt Goodall, Oct 8, 2003.

  1. Matt Goodall

    Matt Goodall Guest

    Ian Bicking wrote:

    > On Wednesday, October 8, 2003, at 11:23 AM, Anthony_Barker wrote:
    >> I have been searching around for database connection pooling for a
    >> mod_python app.

    > My understanding is that you don't really need connection pooling in
    > mod_python -- you can simply put the connection in a global variable.

    In fact, connection pooling within an application hosted by a forking
    Apache server is redundant since only one connection will ever be
    required at a time. 1 process == 1 request == 1 connection.

    > Maybe with something like:
    > def get_connection():
    > global _conn
    > try:
    > return _conn
    > except NameError:
    > _conn = (make your connection)
    > return _conn
    > I don't know mod_python enough to know if there's other magic
    > incantations you need. You don't need "pooling", because each request
    > is served in its own process -- you simply want to reuse the
    > connection for future requests. If there's 10 processes, you'll need
    > 10 separate connections, but those 10 processes will over time serve
    > many requests so you can use those 10 connections over and over.

    Ian's system will work just fine (although I think a module-scope
    variable is nicer than a global) but there are a few things to be aware
    of as detailed below. First of all I would like to define a couple of
    terms to try to avoid any confusion:

    "persistent connection" means a database connection which is opened once
    and never closed.

    "connection pool" means a managed collection of (possibly) open
    connections. A connection pool often has a maximum pool size and may
    wait for a period of time (i.e. block) for a connection to be closed
    rather than open a new collection if the pool size is exceeded.
    Connection objects from a pool are generally wrapped so that close()
    returns the connection to the pool rather than actually closing it.

    1. Reused Transaction vs Connection Pool Leaks

    It is critical that a persistent connection's transaction is *always*
    completed - either with commit or rollback. If you forget to do this
    then database operations during the next request that the process
    handles will use the same transaction ... which could be disasterous.

    A connection pool will only give out a connection that is "closed" and,
    typically, rollback() is called by the pool manager as a connection is
    returned. This avoids the problem of transactions spanning multiple
    requests. A common problem when using connection pools are connection
    leaks - a connection is not closed and hence is never returned to the
    pool - but IMHO this is less serious. I would rather see an application
    hang waiting for a connection to become available than have corrupt data.

    The reused transaction and connection leak problems are both programming
    errors (use a finally block to avoid the problem) but it is easier to
    spot a connection pool leak - set the maximum pool size to a low number
    and hammer the application. You'll soon find out if there are any
    problems ;-). Add some logging to the pool implementation to record who
    opens and closes connection and you can easily find offending code that
    forgets to call close().

    2. Apache forking

    As Ian mentions, Apache creates multiple processes to handle requests. A
    default installation of Apache 1.x on Debian has a MaxClients of 150,
    i.e. 150 processes. If each one of those processes maintains a
    persistent connection you can easily hit your database server's
    connection limit. 150 is a *lot* of connections to keep open anyway but
    many requests will not even need database access, i.e. images, CSS,
    JavaScript etc.

    Perhaps you've tuned your Apache config and set MaxClients to something
    more approriate, 30 for instance. Even then, it's likely that the number
    of simultaneous connections that are _required_ is much lower than 30,
    you might only need 5.

    3. Multiple Applications

    Now let's say your web server hosts 2 applications and both applications
    use a different database. If persistent connections are used you've just
    doubled the maximum number of open connections! This scenario applies to
    an Apache instance with virtual hosting and I think it's the same when
    mod_python's PythonInterpreter configuration directive is used.

    I'm not saying that persistent connections are inherently bad, just that
    there a number of issues to be aware of with Apache. If you are using a
    forking Apache setup and you cannot afford to open new database
    connections on each request then persistent connections may be your only

    However, if you need connection pooling to allow your application to
    scale effectively then you may be better off using an application server
    (Zope, Webware, Twisted etc) or something like SCGI
    ( Another possibility,
    which I have not yet tried, is SQL Relay (

    Sorry for the length of this post, I hope it helps.

    Cheers, Matt

    Matt Goodall, Pollenation Internet Ltd
    Matt Goodall, Oct 8, 2003
    1. Advertisements

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Justin M
    Justin M
    Dec 8, 2003
  2. Anthony_Barker
    Oct 8, 2003
  3. Ian Bicking
    Ian Bicking
    Oct 8, 2003
  4. Nick Kew
    Nick Kew
    Feb 23, 2004
  5. fniles
    Paul Clement
    Apr 24, 2007

Share This Page