Re: Async client for PostgreSQL?

Discussion in 'Python' started by Laszlo Nagy, Sep 1, 2012.

  1. Laszlo Nagy

    Laszlo Nagy Guest


    > Hi
    >
    > does running on tornado imply that you would not consider twisted
    > http://twistedmatrix.com ?
    >
    > If not, twisted has exactly this capability hiding long running
    > queries on whatever db's behind deferToThread().

    All right, I was reading its documentation

    http://twistedmatrix.com/documents/10.1.0/api/twisted.internet.threads.deferToThread.html

    It doesn't tell too much about it: "Run a function in a thread and
    return the result as a Deferred.".

    Run a function but in what thread? Does it create a new thread for every
    invocation? In that case, I don't want to use this. My example case: 10%
    from 100 requests/second deal with a database. But it does not mean that
    one db-related request will do a single db API call only. They will
    almost always do more: start transaction, parse and open query, fetch
    with cursor, close query, open another query etc. then commit
    transaction. 8 API calls to do a quick fetch + update (usually under
    100msec, but it might be blocked by another transaction for a while...)
    So we are talking about 80 database API calls per seconds at least. It
    would be insane to initialize a new thread for each invocation. And
    wrapping these API calls into a single closure function is not useful
    either, because that function would not be able to safely access the
    state that is stored in the main thread. Unless you protet it with
    locks. But it is whole point of async I/O server: to avoid using slow
    locks, expensive threads and context switching.

    Maybe, deferToThread uses a thread pool? But it doesn't say much about
    it. (Am I reading the wrong documentation?) BTW I could try a version
    that uses a thread pool.

    It is sad, by the way. We have async I/O servers for Python that can be
    used for large number of clients, but most external modules/extensions
    do not support their I/O loops. Including the extension modules of the
    most popular databases. So yes, you can use Twisted or torandoweb until
    you do not want to call *some* API functions that are blocking. (By
    *some* I mean: much less blocking than non-blocking, but quite a few.)
    We also have synchronous Python servers, but we cannot get rid of the
    GIL, Python threads are expensive and slow, so they cannot be used for a
    large number of clients. And finally, we have messaging services/IPC
    like zeromq. They are probably the most expensive, but they scale very
    well. But you need more money to operate the underlying hardware. I'm
    starting to think that I did not get a quick answer because my use case
    (100 clients) fall into to the "heavy weight" category, and the solution
    is to invest more in the hardware. :)

    Thanks,

    Laszlo
     
    Laszlo Nagy, Sep 1, 2012
    #1
    1. Advertisements

  2. Laszlo Nagy

    Guest

    On Saturday, September 1, 2012 3:28:52 PM UTC-4, Laszlo Nagy wrote:
    > > Hi

    >
    > >

    >
    > > does running on tornado imply that you would not consider twisted

    >
    > > http://twistedmatrix.com ?

    >
    > >

    >
    > > If not, twisted has exactly this capability hiding long running

    >
    > > queries on whatever db's behind deferToThread().

    >
    > All right, I was reading its documentation
    >
    >
    >
    > http://twistedmatrix.com/documents/10.1.0/api/twisted.internet.threads.deferToThread.html
    >
    >
    >
    > It doesn't tell too much about it: "Run a function in a thread and
    >
    > return the result as a Deferred.".
    >


    You can find more documentation here:

    http://twistedmatrix.com/documents/current/core/howto/threading.html

    Also, Twisted has dedicated APIs for interacting with databases asynchronously:

    http://twistedmatrix.com/documents/current/core/howto/rdbms.html

    Additionally, there is a non-blocking (rather than thread-based) implementation of the above API available for PostgreSQL:

    http://pypi.python.org/pypi/txpostgres

    >
    >
    > Run a function but in what thread? Does it create a new thread for every
    >
    > invocation? In that case, I don't want to use this. My example case: 10%
    >
    > from 100 requests/second deal with a database. But it does not mean that
    >
    > one db-related request will do a single db API call only. They will
    >
    > almost always do more: start transaction, parse and open query, fetch
    >
    > with cursor, close query, open another query etc. then commit
    >
    > transaction. 8 API calls to do a quick fetch + update (usually under
    >
    > 100msec, but it might be blocked by another transaction for a while...)
    >
    > So we are talking about 80 database API calls per seconds at least. It
    >
    > would be insane to initialize a new thread for each invocation. And
    >
    > wrapping these API calls into a single closure function is not useful
    >
    > either, because that function would not be able to safely access the
    >
    > state that is stored in the main thread. Unless you protet it with
    >
    > locks. But it is whole point of async I/O server: to avoid using slow
    >
    > locks, expensive threads and context switching.
    >
    >
    >
    > Maybe, deferToThread uses a thread pool? But it doesn't say much about
    >
    > it. (Am I reading the wrong documentation?) BTW I could try a version
    >
    > that uses a thread pool.
    >
    >
    >
    > It is sad, by the way. We have async I/O servers for Python that can be
    >
    > used for large number of clients, but most external modules/extensions
    >
    > do not support their I/O loops. Including the extension modules of the
    >
    > most popular databases. So yes, you can use Twisted or torandoweb until
    >
    > you do not want to call *some* API functions that are blocking. (By
    >
    > *some* I mean: much less blocking than non-blocking, but quite a few.)
    >
    > We also have synchronous Python servers, but we cannot get rid of the
    >
    > GIL, Python threads are expensive and slow, so they cannot be used for a
    >
    > large number of clients. And finally, we have messaging services/IPC
    >
    > like zeromq. They are probably the most expensive, but they scale very
    >
    > well. But you need more money to operate the underlying hardware. I'm
    >
    > starting to think that I did not get a quick answer because my use case
    >
    > (100 clients) fall into to the "heavy weight" category, and the solution
    >
    > is to invest more in the hardware. :)
    >
    >
    >
    > Thanks,
    >
    >
    >
    > Laszlo
     
    , Sep 3, 2012
    #2
    1. Advertisements

  3. Laszlo Nagy

    Guest

    On Saturday, September 1, 2012 3:28:52 PM UTC-4, Laszlo Nagy wrote:
    > > Hi

    >
    > >

    >
    > > does running on tornado imply that you would not consider twisted

    >
    > > http://twistedmatrix.com ?

    >
    > >

    >
    > > If not, twisted has exactly this capability hiding long running

    >
    > > queries on whatever db's behind deferToThread().

    >
    > All right, I was reading its documentation
    >
    >
    >
    > http://twistedmatrix.com/documents/10.1.0/api/twisted.internet.threads.deferToThread.html
    >
    >
    >
    > It doesn't tell too much about it: "Run a function in a thread and
    >
    > return the result as a Deferred.".
    >


    You can find more documentation here:

    http://twistedmatrix.com/documents/current/core/howto/threading.html

    Also, Twisted has dedicated APIs for interacting with databases asynchronously:

    http://twistedmatrix.com/documents/current/core/howto/rdbms.html

    Additionally, there is a non-blocking (rather than thread-based) implementation of the above API available for PostgreSQL:

    http://pypi.python.org/pypi/txpostgres

    >
    >
    > Run a function but in what thread? Does it create a new thread for every
    >
    > invocation? In that case, I don't want to use this. My example case: 10%
    >
    > from 100 requests/second deal with a database. But it does not mean that
    >
    > one db-related request will do a single db API call only. They will
    >
    > almost always do more: start transaction, parse and open query, fetch
    >
    > with cursor, close query, open another query etc. then commit
    >
    > transaction. 8 API calls to do a quick fetch + update (usually under
    >
    > 100msec, but it might be blocked by another transaction for a while...)
    >
    > So we are talking about 80 database API calls per seconds at least. It
    >
    > would be insane to initialize a new thread for each invocation. And
    >
    > wrapping these API calls into a single closure function is not useful
    >
    > either, because that function would not be able to safely access the
    >
    > state that is stored in the main thread. Unless you protet it with
    >
    > locks. But it is whole point of async I/O server: to avoid using slow
    >
    > locks, expensive threads and context switching.
    >
    >
    >
    > Maybe, deferToThread uses a thread pool? But it doesn't say much about
    >
    > it. (Am I reading the wrong documentation?) BTW I could try a version
    >
    > that uses a thread pool.
    >
    >
    >
    > It is sad, by the way. We have async I/O servers for Python that can be
    >
    > used for large number of clients, but most external modules/extensions
    >
    > do not support their I/O loops. Including the extension modules of the
    >
    > most popular databases. So yes, you can use Twisted or torandoweb until
    >
    > you do not want to call *some* API functions that are blocking. (By
    >
    > *some* I mean: much less blocking than non-blocking, but quite a few.)
    >
    > We also have synchronous Python servers, but we cannot get rid of the
    >
    > GIL, Python threads are expensive and slow, so they cannot be used for a
    >
    > large number of clients. And finally, we have messaging services/IPC
    >
    > like zeromq. They are probably the most expensive, but they scale very
    >
    > well. But you need more money to operate the underlying hardware. I'm
    >
    > starting to think that I did not get a quick answer because my use case
    >
    > (100 clients) fall into to the "heavy weight" category, and the solution
    >
    > is to invest more in the hardware. :)
    >
    >
    >
    > Thanks,
    >
    >
    >
    > Laszlo
     
    , Sep 3, 2012
    #3
    1. Advertisements

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Steven
    Replies:
    0
    Views:
    446
    Steven
    Nov 30, 2005
  2. Laszlo Nagy

    Async client for PostgreSQL?

    Laszlo Nagy, Sep 1, 2012, in forum: Python
    Replies:
    2
    Views:
    337
  3. Werner Thie

    Re: Async client for PostgreSQL?

    Werner Thie, Sep 1, 2012, in forum: Python
    Replies:
    0
    Views:
    271
    Werner Thie
    Sep 1, 2012
  4. Werner Thie

    Re: Async client for PostgreSQL?

    Werner Thie, Sep 1, 2012, in forum: Python
    Replies:
    0
    Views:
    315
    Werner Thie
    Sep 1, 2012
  5. Laszlo Nagy

    Re: Async client for PostgreSQL?

    Laszlo Nagy, Sep 2, 2012, in forum: Python
    Replies:
    0
    Views:
    244
    Laszlo Nagy
    Sep 2, 2012
Loading...

Share This Page