Some notes on a high-performance Python application.

Discussion in 'Python' started by John Nagle, Mar 26, 2008.

  1. John Nagle

    John Nagle Guest

    I run SiteTruth (sitetruth.com), which rates web sites for
    legitimacy, based on what information it can find out about
    the business behind the web site. I'm going to describe here
    how the machinery behind this is organized, because I had to
    solve some problems in Python that I haven't seen solved before.

    The site is intended mainly to support AJAX applications which
    query the site for every ad they see. You can download the AdRater
    client ("http://www.sitetruth.com/downloads/adrater.html") and use
    the site, if you like. It's an extension for Firefox, written in
    Javascript. For every web page you visit, it looks for URLs that
    link to ad sites, and queries the server for a rating, then puts
    up icons on top of each ad indicating the rating of the advertiser.

    The client makes the query by sending a URL to an .fcgi program
    in Python, and gets XML back. So that's the interface.

    At the server end, there's an Linux/Apache/mod_fcgi/Python server.
    Requests come in via FCGI, and are assigned to an FCGI server process
    by Apache. The initial processing is straightforward; there's a
    MySQL database and a table of domains and ratings. If the site is
    known, a rating is returned immediately. This is all standard FCGI.

    If the domain hasn't been rated yet, things get interesting.
    The server returns an XML reply with a status code that tells the
    client to display a "busy" icon and retry in five seconds. Then
    the process of rating a site has to be started. This takes more
    resources and needs from 15 seconds to a minute, as pages from
    the site are read and processed.

    So we don't want to do rating inside the FCGI processes.
    We want FCGI processing to remain fast even during periods of heavy
    rating load. And we may need to spread the processing over multiple
    computers.

    So the FCGI program puts a rating request into the database,
    in a MySQL table of type ENGINE=MEMORY. This is just an in-memory
    table, something that MySQL supports but isn't used much. Each
    rating server has a "rating scheduler" process, which repeatedly
    reads from that table, looking for work to do. When it finds work,
    it marks the task as "in process".

    The rating scheduler launches multiple subprocesses to do ratings,
    all of which run at a lower priority than the rest of the system.
    The rating scheduler communicates with its subprocesses via pipes
    and Pickle. Launching a new subprocess for each rating is too slow;
    it adds several seconds as CPython loads code and starts up. So
    the subprocesses are reusable, like FCGI tasks. Every 100 uses or
    so, we terminate each subprocess and start another one, in case of
    memory leaks. (There seems to be a leak we can't find in M2Crypto.
    Guido couldn't find it either when he used M2Crypto, as he wrote in
    his blog.)

    Each rating process only rates one site at a time, but is multithreaded
    so it can read multiple pages from the site, and other remote data
    sources like BBBonline, at one time. This allows us to get a rating
    within 15 seconds or so. When the site is rated, the database is
    updated, and the next request back at the FCGI program level will
    return the rating. We won't have to look at that domain for another month.

    The system components can run on multiple machines. One can add
    rating capacity by adding another rating server and pointing it at the same
    database. FCGI capacity can be added by adding more FCGI servers and a
    load balancer. Adding database capacity is harder, because that
    means going to MySQL replication, which creates coordination problems
    we haven't dealt with yet. Also, since multiple processes are running
    on each CPU, multicore CPUs help.

    Using MySQL as a queueing engine across multiple servers is unusual,
    but it works well. It has the nice feature that the queue ordering
    can be anything you can write in a SELECT statement. So we put "fair
    queueing" in the rating scheduler; multiple requests from the same IP
    address compete with each other, not with those from other IP addresses.
    So no one site can use up all the rating capacity.

    Another useful property of using MySQL for coordination is that
    we can have internal web pages that make queries and display the
    system and queue status. This is easy to do from the outside when
    the queues are in MySQL. It's tough to do that when they're inside
    some process. We log errors in a database table, not text files,
    for the same reason. In addition to specific problem logging,
    all programs have a final try block around the whole program that
    does a stack backtrace and puts that in a log entry in MySQL.
    All servers log to the same database.

    Looking at this architecture, it was put together from off the shelf
    parts, but not the parts that have big fan bases. FCGI isn't used much.
    The MySQL memory engine isn't used much. MySQL advisory locking
    (SELECT GET LOCK("lockname",timeout)) isn't used much. Pickle
    isn't used much over pipes. M2Crypto isn't used much. We've
    spent much time finding and dealing with problems in the components.
    Yet all this works quite well.

    Does anyone else architect their systems like this?

    John Nagle
     
    John Nagle, Mar 26, 2008
    #1
    1. Advertising

  2. Am Mittwoch, 26. März 2008 17:33:43 schrieb John Nagle:
    > ...
    >
    > Using MySQL as a queueing engine across multiple servers is unusual,
    > but it works well. It has the nice feature that the queue ordering
    > can be anything you can write in a SELECT statement. So we put "fair
    > queueing" in the rating scheduler; multiple requests from the same IP
    > address compete with each other, not with those from other IP addresses.
    > So no one site can use up all the rating capacity.
    >
    > ...
    >
    > Does anyone else architect their systems like this?


    A Xen(tm) management system I've written at least shares this aspect in that
    the RPC subsystem for communication between the frontend and the backends is
    basically a (MySQL) database table which is regularily queried by all
    backends that work on VHosts to change the state (in the form of a command)
    according to what the user specifies in the (Web-)UI.

    FWIW, the system is based on SQLObject and CherryPy, doing most of the
    parallel tasks threaded from a main process (because the largest part of the
    backends is dealing with I/O from subprocesses [waiting for them to
    complete]), which is different from what you do. CherryPy is also deployed
    with the threading server.

    --

    Heiko Wundram
     
    Heiko Wundram, Mar 26, 2008
    #2
    1. Advertising

  3. Heiko Wundram wrote:
    > Am Mittwoch, 26. März 2008 17:33:43 schrieb John Nagle:
    >> ...
    >>
    >> Using MySQL as a queueing engine across multiple servers is unusual,
    >> but it works well. It has the nice feature that the queue ordering
    >> can be anything you can write in a SELECT statement. So we put "fair
    >> queueing" in the rating scheduler; multiple requests from the same IP
    >> address compete with each other, not with those from other IP addresses.
    >> So no one site can use up all the rating capacity.
    >> ...
    >> Does anyone else architect their systems like this?

    >
    > A Xen(tm) management system I've written at least shares this aspect in that
    > the RPC subsystem for communication between the frontend and the backends is
    > basically a (MySQL) database table which is regularily queried by all
    > backends that work on VHosts to change the state (in the form of a command)
    > according to what the user specifies in the (Web-)UI.


    I see nothing unusual with this:

    I vaguely remember that this database approach was teached at my former
    university as a basic mechanism for distributed systems at least since 1992,
    but I'd guess much longer...

    And in one of my projects a RDBMS-based queue was used for a PKI
    registration server (e.g. for handling the outbound CMP queue).

    IIRC Microsoft's Biztalk Server also stores inbound and outbound queues in
    its internal MS-SQL database (which then can be the bottleneck).

    Ciao, Michael.
     
    Michael Ströder, Mar 26, 2008
    #3
  4. Am Mittwoch, 26. März 2008 18:54:29 schrieb Michael Ströder:
    > Heiko Wundram wrote:
    > > Am Mittwoch, 26. März 2008 17:33:43 schrieb John Nagle:
    > >> ...
    > >>
    > >> Using MySQL as a queueing engine across multiple servers is unusual,
    > >> but it works well. It has the nice feature that the queue ordering
    > >> can be anything you can write in a SELECT statement. So we put "fair
    > >> queueing" in the rating scheduler; multiple requests from the same IP
    > >> address compete with each other, not with those from other IP addresses.
    > >> So no one site can use up all the rating capacity.
    > >> ...
    > >> Does anyone else architect their systems like this?

    > >
    > > A Xen(tm) management system I've written at least shares this aspect in
    > > that the RPC subsystem for communication between the frontend and the
    > > backends is basically a (MySQL) database table which is regularily
    > > queried by all backends that work on VHosts to change the state (in the
    > > form of a command) according to what the user specifies in the (Web-)UI.

    >
    > I vaguely remember that this database approach was teached at my former
    > university as a basic mechanism for distributed systems at least since
    > 1992, but I'd guess much longer...


    I didn't say it was unusual or frowned upon (and I was also taught this at uni
    IIRC as a means to "easily" distribute systems which don't have specific
    requirements for response time to RPC requests), but anyway, as you noted for
    Biztalk, it's much easier to hit bottlenecks with a polling-style RPC than
    with a "true" RPC system, as I've come to experience when the number of nodes
    (i.e., backends) grew over the last year and a half.

    That's what's basically causing a re-consideration to move from DB-style RPC
    to socket-based RPC, which is going to happen at some point in time for the
    system noted above (but I've sinced changed jobs and am now only a consulting
    developer for that anyway, so it won't be my job to do the dirty migration
    and the redesign ;-)).

    --
    Heiko Wundram
     
    Heiko Wundram, Mar 26, 2008
    #4
  5. John Nagle

    John Nagle Guest

    Heiko Wundram wrote:
    > Am Mittwoch, 26. März 2008 18:54:29 schrieb Michael Ströder:
    >> Heiko Wundram wrote:
    >>> Am Mittwoch, 26. März 2008 17:33:43 schrieb John Nagle:


    > I didn't say it was unusual or frowned upon (and I was also taught this at uni
    > IIRC as a means to "easily" distribute systems which don't have specific
    > requirements for response time to RPC requests), but anyway, as you noted for
    > Biztalk, it's much easier to hit bottlenecks with a polling-style RPC than
    > with a "true" RPC system, as I've come to experience when the number of nodes
    > (i.e., backends) grew over the last year and a half.


    I know, I don't like the polling either. The time scale is
    such that the poll delay isn't a problem, though, and because it's
    using the MySQL MEMORY engine, there's no disk I/O. After completing
    a request, the rating scheduler immediately queries the database,
    so there's no lost time if there's a queue. The polling delay
    only applies when a rating server is idle.

    I miss QNX, which has good message passing primitives.
    Linux is weak in that area.

    John Nagle
     
    John Nagle, Mar 27, 2008
    #5
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Bjorn Jensen
    Replies:
    0
    Views:
    1,206
    Bjorn Jensen
    Mar 22, 2005
  2. David McNab
    Replies:
    0
    Views:
    659
    David McNab
    Apr 23, 2004
  3. SteveM
    Replies:
    5
    Views:
    1,586
    Mark Rae [MVP]
    Aug 28, 2007
  4. Minor Gordon
    Replies:
    10
    Views:
    539
    Manlio Perillo
    Mar 26, 2008
  5. Chris M. Thomasson
    Replies:
    0
    Views:
    470
    Chris M. Thomasson
    Nov 15, 2010
Loading...

Share This Page