Multiple modules with database access + general app design?

Discussion in 'Python' started by Robin Haswell, Jan 19, 2006.

  1. Hey people

    I'm an experience PHP programmer who's been writing python for a couple of
    weeks now. I'm writing quite a large application which I've decided to
    break down in to lots of modules (replacement for PHP's include()
    statement).

    My problem is, in PHP if you open a database connection it's always in
    scope for the duration of the script. Even if you use an abstraction layer
    ($db = DB::connect(...)) you can `global $db` and bring it in to scope,
    but in Python I'm having trouble keeping the the database in scope. At the
    moment I'm having to "push" the database into the module, but I'd prefer
    the module to bring the database connection in ("pull") from its parent.

    Eg:
    import modules
    modules.foo.c = db.cursor()
    modules.foo.Bar()

    Can anyone recommend any "cleaner" solutions to all of this? As far as I
    can see it, Python doesn't have much support for breaking down large
    programs in to organisable files and referencing each other.

    Another problem is I keep having to import modules all over the place. A
    real example is, I have a module "webhosting", a module "users", and a
    module "common". These are all submodules of the module "modules" (bad
    naming I know). The database connection is instantiated on the "db"
    variable of my main module, which is "yellowfish" (a global module), so
    get the situation where:

    (yellowfish.py)
    import modules
    modules.webhosting.c = db.cursor()
    modules.webhosting.Something()

    webhosting needs methods in common and users:

    from modules import common, users

    However users also needs common:

    from modules import common

    And they all need access to the database

    (users and common)
    from yellowfish import db
    c = db.cursor()

    Can anyone give me advice on making this all a bit more transparent? I
    guess I really would like a method to bring all these files in to the same
    scope to make everything seem to be all one application, even though
    everything is broken up in to different files.

    One added complication in this particular application:

    I used modules because I'm calling arbitrary methods defined in some XML
    format. Obviously I wanted to keep security in mind, so my application
    goes something like this:

    import modules
    module, method, args = getXmlAction()
    m = getattr(modules, module)
    m.c = db.cursor()
    f = getattr(m, method)
    f(args)

    In PHP this method is excellent, because I can include all the files I
    need, each containing a class, and I can use variable variables:

    <?php
    $class = new $module; // can't remember if this works, there are
    // alternatves though
    $class->$method($args);
    ?>

    And $class->$method() just does "global $db; $db->query(...);".

    Any advice would be greatly appreciated!

    Cheers

    -Robin Haswell
    Robin Haswell, Jan 19, 2006
    #1
    1. Advertising

  2. Robin Haswell

    Paul McGuire Guest

    "Robin Haswell" <> wrote in message
    news:p...
    > Hey people
    >
    > I'm an experience PHP programmer who's been writing python for a couple of
    > weeks now. I'm writing quite a large application which I've decided to
    > break down in to lots of modules (replacement for PHP's include()
    > statement).
    >
    > My problem is, in PHP if you open a database connection it's always in
    > scope for the duration of the script. Even if you use an abstraction layer
    > ($db = DB::connect(...)) you can `global $db` and bring it in to scope,
    > but in Python I'm having trouble keeping the the database in scope. At the
    > moment I'm having to "push" the database into the module, but I'd prefer
    > the module to bring the database connection in ("pull") from its parent.
    >
    > Eg:
    > import modules
    > modules.foo.c = db.cursor()
    > modules.foo.Bar()
    >
    > Can anyone recommend any "cleaner" solutions to all of this?


    Um, I think your Python solution *is* moving in a cleaner direction than
    simple sharing of a global $db variable. Why make the Bar class have to
    know where to get a db cursor from? What do you do if your program extends
    to having multiple Bar() objects working with different cursors into the db?

    The unnatural part of this (and hopefully, the part that you feel is
    "unclean") is that you're trading one global for another. By just setting
    modules.foo.c to the db cursor, you force all Bar() instances to use that
    same cursor.

    Instead, make the database cursor part of Bar's constructor. Now you can
    externally create multiple db cursors, a Bar for each, and they all merrily
    do their own separate, isolated processing, in blissful ignorance of each
    other's db cursors (vs. colliding on the shared $db variable).

    -- Paul
    Paul McGuire, Jan 19, 2006
    #2
    1. Advertising

  3. On Thu, 19 Jan 2006 12:23:12 +0000, Paul McGuire wrote:

    > "Robin Haswell" <> wrote in message
    > news:p...
    >> Hey people
    >>
    >> I'm an experience PHP programmer who's been writing python for a couple of
    >> weeks now. I'm writing quite a large application which I've decided to
    >> break down in to lots of modules (replacement for PHP's include()
    >> statement).
    >>
    >> My problem is, in PHP if you open a database connection it's always in
    >> scope for the duration of the script. Even if you use an abstraction layer
    >> ($db = DB::connect(...)) you can `global $db` and bring it in to scope,
    >> but in Python I'm having trouble keeping the the database in scope. At the
    >> moment I'm having to "push" the database into the module, but I'd prefer
    >> the module to bring the database connection in ("pull") from its parent.
    >>
    >> Eg:
    >> import modules
    >> modules.foo.c = db.cursor()
    >> modules.foo.Bar()
    >>
    >> Can anyone recommend any "cleaner" solutions to all of this?

    >
    > Um, I think your Python solution *is* moving in a cleaner direction than
    > simple sharing of a global $db variable. Why make the Bar class have to
    > know where to get a db cursor from? What do you do if your program extends
    > to having multiple Bar() objects working with different cursors into the db?
    >
    > The unnatural part of this (and hopefully, the part that you feel is
    > "unclean") is that you're trading one global for another. By just setting
    > modules.foo.c to the db cursor, you force all Bar() instances to use that
    > same cursor.
    >
    > Instead, make the database cursor part of Bar's constructor. Now you can
    > externally create multiple db cursors, a Bar for each, and they all merrily
    > do their own separate, isolated processing, in blissful ignorance of each
    > other's db cursors (vs. colliding on the shared $db variable).


    Hm if truth be told, I'm not totally interested in keeping a separate
    cursor for every class instance. This application runs in a very simple
    threaded socket server - every time a new thread is created, we create a
    new db.cursor (m = getattr(modules, module)\n m.c = db.cursor() is the
    first part of the thread), and when the thread finishes all its actions
    (of which there are many, but all sequential), the thread exits. I don't
    see any situations where lots of methods will tread on another method's
    cursor. My main focus really is minimising the number of connections.
    Using MySQLdb, I'm not sure if every MySQLdb.connect or db.cursor is a
    separate connection, but I get the feeling that a lot of cursors = a lot
    of connections. I'd much prefer each method call with a thread to reuse
    that thread's connection, as creating a connection incurs significant
    overhead on the MySQL server and DNS server.

    -Rob

    >
    > -- Paul
    Robin Haswell, Jan 19, 2006
    #3
  4. Robin Haswell wrote:
    > cursor for every class instance. This application runs in a very simple
    > threaded socket server - every time a new thread is created, we create a
    > new db.cursor (m = getattr(modules, module)\n m.c = db.cursor() is the
    > first part of the thread), and when the thread finishes all its actions
    > (of which there are many, but all sequential), the thread exits. I don't


    If you use a threading server, you can't put the connection object into
    the module. Modules and hence module variables are shared across
    threads. You could use thread local storage, but I think it's better to
    pass the connection explicitely as a parameter.

    > separate connection, but I get the feeling that a lot of cursors = a lot
    > of connections. I'd much prefer each method call with a thread to reuse
    > that thread's connection, as creating a connection incurs significant
    > overhead on the MySQL server and DNS server.


    You can create several cursor objects from one connection. There should
    be no problems if you finish processing of one cursor before you open
    the next one. In earlier (current?) versions of MySQL, only one result
    set could be opened at a time, so using cursors in parallel present some
    problems to the driver implementor.

    Daniel
    Daniel Dittmar, Jan 19, 2006
    #4
  5. On Thu, 19 Jan 2006 14:37:34 +0100, Daniel Dittmar wrote:

    > Robin Haswell wrote:
    >> cursor for every class instance. This application runs in a very simple
    >> threaded socket server - every time a new thread is created, we create a
    >> new db.cursor (m = getattr(modules, module)\n m.c = db.cursor() is the
    >> first part of the thread), and when the thread finishes all its actions
    >> (of which there are many, but all sequential), the thread exits. I don't

    >
    > If you use a threading server, you can't put the connection object into
    > the module. Modules and hence module variables are shared across
    > threads. You could use thread local storage, but I think it's better to
    > pass the connection explicitely as a parameter.


    Would you say it would be better if in every thread I did:

    m = getattr(modules, module)
    b.db = db

    ...

    def Foo():
    c = db.cursor()

    ?

    >
    >> separate connection, but I get the feeling that a lot of cursors = a lot
    >> of connections. I'd much prefer each method call with a thread to reuse
    >> that thread's connection, as creating a connection incurs significant
    >> overhead on the MySQL server and DNS server.

    >
    > You can create several cursor objects from one connection. There should
    > be no problems if you finish processing of one cursor before you open
    > the next one. In earlier (current?) versions of MySQL, only one result
    > set could be opened at a time, so using cursors in parallel present some
    > problems to the driver implementor.
    >
    > Daniel
    Robin Haswell, Jan 19, 2006
    #5
  6. Robin Haswell wrote:
    > Hey people
    >
    > I'm an experience PHP programmer who's been writing python for a couple of
    > weeks now. I'm writing quite a large application which I've decided to
    > break down in to lots of modules (replacement for PHP's include()
    > statement).
    >
    > My problem is, in PHP if you open a database connection it's always in
    > scope for the duration of the script. Even if you use an abstraction layer
    > ($db = DB::connect(...)) you can `global $db` and bring it in to scope,
    > but in Python I'm having trouble keeping the the database in scope. At the
    > moment I'm having to "push" the database into the module, but I'd prefer
    > the module to bring the database connection in ("pull") from its parent.
    >


    This is what I do.

    Create a separate module to contain your global variables - mine is
    called 'common'.

    In common, create a class, with attributes, but with no methods. Each
    attribute becomes a global variable. My class is called 'c'.

    At the top of every other module, put 'from common import c'.

    Within each module, you can now refer to any global variable as
    c.whatever.

    You can create class attributes on the fly. You can therefore have
    something like -

    c.db = MySql.connect(...)

    All modules will be able to access c.db

    As Daniel has indicated, it may not be safe to share one connection
    across multiple threads, unless you can guarantee that one thread
    completes its processing before another one attempts to access the
    database. You can use threading locks to assist with this.

    HTH

    Frank Millman
    Frank Millman, Jan 19, 2006
    #6
  7. Robin Haswell wrote:
    > On Thu, 19 Jan 2006 14:37:34 +0100, Daniel Dittmar wrote:
    >>If you use a threading server, you can't put the connection object into
    >>the module. Modules and hence module variables are shared across
    >>threads. You could use thread local storage, but I think it's better to
    >>pass the connection explicitely as a parameter.

    >
    >
    > Would you say it would be better if in every thread I did:
    >
    > m = getattr(modules, module)
    > b.db = db
    >
    > ...
    >
    > def Foo():
    > c = db.cursor()
    >


    I was thinking (example from original post):

    import modules
    modules.foo.Bar(db.cursor ())

    # file modules.foo
    def Bar (cursor):
    cursor.execute (...)

    The same is true for other objects like the HTTP request: always pass
    them as parameters because module variables are shared between threads.

    If you have an HTTP request object, then you could attach the database
    connection to that object, that way you have to pass only one object.

    Or you create a new class that encompasses everything useful for this
    request: the HTTP request, the database connection, possibly an object
    containing authorization infos etc.

    I assume that in PHP, global still means 'local to this request', as PHP
    probably runs in threads under Windows IIS (and Apache 2.0?). In Python,
    you have to be more explicit about the scope.

    Daniel
    Daniel Dittmar, Jan 19, 2006
    #7
  8. On Thu, 19 Jan 2006 15:43:58 +0100, Daniel Dittmar wrote:

    > Robin Haswell wrote:
    >> On Thu, 19 Jan 2006 14:37:34 +0100, Daniel Dittmar wrote:
    >>>If you use a threading server, you can't put the connection object into
    >>>the module. Modules and hence module variables are shared across
    >>>threads. You could use thread local storage, but I think it's better to
    >>>pass the connection explicitely as a parameter.

    >>
    >>
    >> Would you say it would be better if in every thread I did:
    >>
    >> m = getattr(modules, module)
    >> b.db = db
    >>
    >> ...
    >>
    >> def Foo():
    >> c = db.cursor()
    >>

    >
    > I was thinking (example from original post):
    >
    > import modules
    > modules.foo.Bar(db.cursor ())
    >
    > # file modules.foo
    > def Bar (cursor):
    > cursor.execute (...)


    Ah I see.. sounds interesting. Is it possible to make any module variable
    local to a thread, if set within the current thread? Your method, although
    good, would mean revising all my functions in order to make it work?

    Thanks
    Robin Haswell, Jan 19, 2006
    #8
  9. On Thu, 19 Jan 2006 06:38:39 -0800, Frank Millman wrote:

    >
    > Robin Haswell wrote:
    >> Hey people
    >>
    >> I'm an experience PHP programmer who's been writing python for a couple of
    >> weeks now. I'm writing quite a large application which I've decided to
    >> break down in to lots of modules (replacement for PHP's include()
    >> statement).
    >>
    >> My problem is, in PHP if you open a database connection it's always in
    >> scope for the duration of the script. Even if you use an abstraction layer
    >> ($db = DB::connect(...)) you can `global $db` and bring it in to scope,
    >> but in Python I'm having trouble keeping the the database in scope. At the
    >> moment I'm having to "push" the database into the module, but I'd prefer
    >> the module to bring the database connection in ("pull") from its parent.
    >>

    >
    > This is what I do.
    >
    > Create a separate module to contain your global variables - mine is
    > called 'common'.
    >
    > In common, create a class, with attributes, but with no methods. Each
    > attribute becomes a global variable. My class is called 'c'.
    >
    > At the top of every other module, put 'from common import c'.
    >
    > Within each module, you can now refer to any global variable as
    > c.whatever.
    >
    > You can create class attributes on the fly. You can therefore have
    > something like -
    >
    > c.db = MySql.connect(...)
    >
    > All modules will be able to access c.db
    >
    > As Daniel has indicated, it may not be safe to share one connection
    > across multiple threads, unless you can guarantee that one thread
    > completes its processing before another one attempts to access the
    > database. You can use threading locks to assist with this.
    >
    > HTH
    >
    > Frank Millman



    Thanks, that sounds like an excellent idea. While I don't think it applies
    to the database (threading seems to be becoming a bit of an issue at the
    moment), I know I can use that in other areas :)

    Cheers

    -Rob
    Robin Haswell, Jan 19, 2006
    #9
  10. Robin Haswell

    Magnus Lycka Guest

    Robin Haswell wrote:
    > Can anyone give me advice on making this all a bit more transparent? I
    > guess I really would like a method to bring all these files in to the same
    > scope to make everything seem to be all one application, even though
    > everything is broken up in to different files.


    This is very much a deliberate design decision in Python.
    I haven't used PHP, but in e.g. C, the #include directive
    means that you pollute your namespace with all sorts of
    strange names from all the third party libraries you are
    using, and this doesn't scale well. As your application
    grows, you'll get mysterious bugs due to strange name clashes,
    removing some module you no-longer need means that your app
    won't build since the include file you no longer include in
    turn included another file that you should have included but
    didn't etc. In Python, explicit is better than implicit (type
    "import this" at the Python prompt) and while this causes some
    extra typing it helps with code maintenance. You can always
    see where a name in your current namespace comes from (unless
    you use "from xxx import *"). No magic!


    Concerning your database operations, it seems they are distributed
    over a lot of different modules, and that might also cause problems,
    whatever programming language we use. In typical database
    applications, you need to keep track of transactions properly.

    For each opened connection, you can perform a number of transactions
    after each other. A transaction starts with the first database
    operation after a connect, commit or rollback. A cursor should only
    live within a transaction. In other words, you should close all
    cursors before you perform a commit or rollback.

    I find it very difficult to manage transactions properly if the
    commits are spread out in the code. Usually I want one module to
    contain some kind of transaction management logic, where I determine
    the transaction boundries. This logic will hand out cursor object
    to various pieces of code, and determine when to close the cursors
    and commit the transaction.

    I haven't really written multithreaded applications, so I don't
    have any experiences in the problems that might cause. I know that
    it's a fairly common pattern to have all database transactions in
    one thread though, and to use Queue.Queue instances to pass data
    to and from the thread that handles DB.

    Anyway, you can only have one transaction going on at a time for
    a connection, so if you share connections between threads (or use
    a separate DB thread and queues) a rollback or commit in one thread
    will affect the other threads as well...

    Each DB-API 2.0 compliant library should be able to declare how it
    can be used in a threaded application. See the DB-API 2.0 spec:
    http://python.org/peps/pep-0249.html Look for "threadsafety".
    Magnus Lycka, Jan 19, 2006
    #10
  11. Robin Haswell wrote:
    > Ah I see.. sounds interesting. Is it possible to make any module variable
    > local to a thread, if set within the current thread?


    Not directly. The following class tries to simulate it (only in Python 2.4):

    import threading

    class ThreadLocalObject (threading.local):
    def setObject (self, object):
    setattr (self, 'object', object)

    def clearObject (self):
    setattr (self, 'object', None)

    def __getattr__ (self, name):
    object = threading.local.__getattribute__ (self, 'object')
    return getattr (object, name)

    You use it as:

    in some module x:

    db = ThreadLocalObject ()

    in some module that create the database connection:

    import x

    def createConnection ()
    localdb = ...connect (...)
    x.db.setObject (localdb)

    in some module that uses the databasse connection:

    import x

    def bar ():
    cursor = x.db.cursor ()

    The trick is:
    - every attribute of a threading.local is thread local (see doc of
    module threading)
    - when accessing an attribute of object x.db, the method __getattr__
    will first retrieve the thread local database connection and then access
    the specific attribute of the database connection. Thus it looks as if
    x.db is itself a database connection object.

    That way, only the setting of the db variable would have to be changed.

    I'm not exactly recommneding this, as it seems very error prone to me.
    It's easy to overwrite the variable holding the cursors with an actual
    cursor object.

    Daniel
    Daniel Dittmar, Jan 19, 2006
    #11
  12. Daniel Dittmar wrote:
    > Robin Haswell wrote:
    > > Ah I see.. sounds interesting. Is it possible to make any module variable
    > > local to a thread, if set within the current thread?

    >
    > Not directly. The following class tries to simulate it (only in Python 2.4):
    >
    > import threading
    >
    > class ThreadLocalObject (threading.local):


    Daniel, perhaps you can help me here.

    I have subclassed threading.Thread, and I store a number of attributes
    within the subclass that are local to the thread. It seems to work
    fine, but according to what you say (and according to the Python docs,
    otherwise why would there be a 'Local' class) there must be some reason
    why it is not a good idea. Please can you explain the problem with this
    approach.

    Briefly, this is what I am doing.

    class Link(threading.Thread): # each link runs in its own thread
    """Run a loop listening for messages from client."""

    def __init__(self,args):
    threading.Thread.__init__(self)
    print 'link connected',self.getName()
    self.ctrl, self.conn = args
    self._db = {} # to store db connections for this client
    connection
    [create various other local attributes]

    def run(self):
    readable = [self.conn.fileno()]
    error = []
    self.sendData = [] # 'stack' of replies to be sent

    self.running = True
    while self.running:
    if self.sendData:
    writable = [self.conn.fileno()]
    else:
    writable = []
    r,w,e = select.select(readable,writable,error,0.1) # 0.1
    timeout
    [continue to handle connection]

    class Controller(object):
    """Run a main loop listening for client connections."""

    def __init__(self):
    self.s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
    self.s.bind((HOST,PORT))
    self.s.listen(5)
    self.running = True

    def mainloop(self):
    while self.running:
    try:
    conn,addr = self.s.accept()
    Link(args=(self,conn)).start() # create thread to
    handle connection
    except KeyboardInterrupt:
    self.shutdown()

    Controller().mainloop()

    TIA

    Frank Millman
    Frank Millman, Jan 20, 2006
    #12
  13. Frank Millman wrote:
    > I have subclassed threading.Thread, and I store a number of attributes
    > within the subclass that are local to the thread. It seems to work
    > fine, but according to what you say (and according to the Python docs,
    > otherwise why would there be a 'Local' class) there must be some reason
    > why it is not a good idea. Please can you explain the problem with this
    > approach.


    Your design is just fine. If you follow the thread upwards, you'll
    notice that I encouraged the OP to pass everything by parameter.

    Using thread local storage in this case was meant to be a kludge so that
    not every def and every call has to be changed. There are other cases
    when you don't control how threads are created (say, a plugin for web
    framework) where thread local storage is useful.

    threading.local is new in Python 2.4, so it doesn't seem to be that
    essential to Python thread programming.

    Daniel
    Daniel Dittmar, Jan 20, 2006
    #13
  14. Daniel Dittmar wrote:
    > Frank Millman wrote:
    > > I have subclassed threading.Thread, and I store a number of attributes
    > > within the subclass that are local to the thread. It seems to work
    > > fine, but according to what you say (and according to the Python docs,
    > > otherwise why would there be a 'Local' class) there must be some reason
    > > why it is not a good idea. Please can you explain the problem with this
    > > approach.

    >
    > Your design is just fine. If you follow the thread upwards, you'll
    > notice that I encouraged the OP to pass everything by parameter.
    >


    Many thanks, Daniel

    Frank
    Frank Millman, Jan 20, 2006
    #14
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Replies:
    0
    Views:
    424
  2. Julia Goolia
    Replies:
    5
    Views:
    405
    Matthias Baas
    Jul 18, 2003
  3. nrolland
    Replies:
    1
    Views:
    101
    George Ogata
    Dec 4, 2006
  4. Mario Ruiz
    Replies:
    10
    Views:
    217
    Mario Ruiz
    Mar 13, 2008
  5. oleg korenevich
    Replies:
    4
    Views:
    436
    Mel Wilson
    Feb 2, 2012
Loading...

Share This Page