best way to serve wsgi with multiple processes

R

Robin

Hi,

I am building some computational web services using soaplib. This
creates a WSGI application.

However, since some of these services are computationally intensive,
and may be long running, I was looking for a way to use multiple
processes. I thought about using multiprocessing.Process manually in
the service, but I was a bit worried about how that might interact
with a threaded server (I was hoping the thread serving that request
could just wait until the child is finished). Also it would be good to
keep the services as simple as possible so it's easier for people to
write them.

I have at the moment the following WSGI structure:
TransLogger(URLMap(URLParser(soaplib objects)))
although presumably, due to the beauty of WSGI, this shouldn't matter.

As I've found with all web-related Python stuff, I'm overwhelmed by
the choice and number of alternatives. I've so far been using cherrypy
and ajp-wsgi for my testing, but am aware of Spawning, twisted etc.
What would be the simplest [quickest to setup and fewest details of
the server required - ideally with a simple example] and most reliable
[this will eventually be 'in production' as part of a large scientific
project] way to host this sort of WSGI with a process-per-request
style?

Thanks!

Robin
 
R

Robin Becker

Robin said:
Hi,

I am building some computational web services using soaplib. This
creates a WSGI application.

However, since some of these services are computationally intensive,
and may be long running, I was looking for a way to use multiple
processes. I thought about using multiprocessing.Process manually in
the service, but I was a bit worried about how that might interact
with a threaded server (I was hoping the thread serving that request
could just wait until the child is finished). Also it would be good to
keep the services as simple as possible so it's easier for people to
write them.

I have at the moment the following WSGI structure:
TransLogger(URLMap(URLParser(soaplib objects)))
although presumably, due to the beauty of WSGI, this shouldn't matter.

As I've found with all web-related Python stuff, I'm overwhelmed by
the choice and number of alternatives. I've so far been using cherrypy
and ajp-wsgi for my testing, but am aware of Spawning, twisted etc.
What would be the simplest [quickest to setup and fewest details of
the server required - ideally with a simple example] and most reliable
[this will eventually be 'in production' as part of a large scientific
project] way to host this sort of WSGI with a process-per-request
style?

......

We've used forked fastcgi (flup) with success as that decouples the wsgi process
(in our case django) from the main server (in our case apache). Our reasons for
doing that were to allow the backend to use modern pythons without having to
upgrade the server (which is required if using say mod_python). The wsgi process
runs as an ordinary user which eases some tasks.

A disadvantage of our scheme is that long running processes may cause problems
eg timeouts. In practice since there are no guarantees for how long an http
connection will hold up (because of proxies etc etc) we decided to work around
this problem. Basically long running jobs go into a task queue on the server and
the response is used to reconnect to the long running job peridically for status
querying/results etc etc.
 
R

Robin

We've used forked fastcgi (flup) with success as that decouples the wsgi process
(in our case django) from the main server (in our case apache). Our reasons for
doing that were to allow the backend to use modern pythons without having to
upgrade the server (which is required if using say mod_python). The wsgi process
runs as an ordinary user which eases some tasks.

Yes - I've done something very similar with ajp-wsgi (from the author
of flup; and which incidently performs very well works really nicely)
to go from apache -> wsgi. But the issue I'm asking about here is to
have multiple WSGI processes - ie to allow concurrent execution of
more than one web service at the time (since these are long running
computational soap web services). ajp-wsgi embeds a single python
interpreter so multiple running services would be effected by the GIL
- I imagine flup is similar (a single process on the python side).

So I'm not worried about decoupling from the web server - I'm happy to
use pure python server (which I guess is easier to setup) - but I want
the web server to dispatch requests to different processes running the
wsgi app. I've looked at Spawning, but couldn't get it to work and it
seems a little bit 'beta' for my taste (doesn't exit cleanly, leaves
worker processes running etc.)

Cheers

Robin
 
R

Robin

I'm sorry - I originally missed the worked 'forked' and hence the
whole point of your message I think.

I looked at flup before but had forgotten about the forked version.
Having revisited it I think the forked version does keep a process
pool so each request is processed by a seperate process, which is
exactly what I wanted.

Cheers

Robin
 
D

Diez B. Roggisch

Robin said:
I'm sorry - I originally missed the worked 'forked' and hence the
whole point of your message I think.

I looked at flup before but had forgotten about the forked version.
Having revisited it I think the forked version does keep a process
pool so each request is processed by a seperate process, which is
exactly what I wanted.

You can have that with mod_wsgi & daemon mode as well, with presumably less
setup hassle.

Diez
 
R

Robin Becker

Robin wrote:
...........
Yes - I've done something very similar with ajp-wsgi (from the author
of flup; and which incidently performs very well works really nicely)
to go from apache -> wsgi. But the issue I'm asking about here is to
have multiple WSGI processes - ie to allow concurrent execution of
more than one web service at the time (since these are long running
computational soap web services). ajp-wsgi embeds a single python
interpreter so multiple running services would be effected by the GIL
- I imagine flup is similar (a single process on the python side).

So I'm not worried about decoupling from the web server - I'm happy to
use pure python server (which I guess is easier to setup) - but I want
the web server to dispatch requests to different processes running the
wsgi app. I've looked at Spawning, but couldn't get it to work and it
seems a little bit 'beta' for my taste (doesn't exit cleanly, leaves
worker processes running etc.)

well the flup server for fast cgi supports forking if the server is declared as
an external process in apache. Then the top level of the flup process handles
each request and passes it off to a forked worker. I cannot recall exactly, but
I believe that apache mod_fastcgi does the right thing when it comes to
internally declared fastcgi handlers. For apache at least I think the threading
issues are handled properly.

I think the preforkserver.py code handles all the threading issues for you
(assuming it's not win32).
 
M

M.-A. Lemburg

You can have that with mod_wsgi & daemon mode as well, with presumably less
setup hassle.

Another option that works well on Unix and even Windows is SCGI
which deals with the forking and piping of data for you:

http://www.mems-exchange.org/software/scgi/
http://python.ca/scgi/

Lighttpd even ships with a mod_scgi module built-in.

More on the protocol used by SCGI (basically net-strings):

http://python.ca/scgi/protocol.txt

Unlike FastCGI, it's very easy to setup.

Since SCGI provides standard CGI on the Python side, it's easy to
wrap this up as WSGI interface (using e.g. the code from PEP 333).

--
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source (#1, Feb 11 2009)________________________________________________________________________

::: Try our new mxODBC.Connect Python Database Interface for free ! ::::


eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
Registered at Amtsgericht Duesseldorf: HRB 46611
http://www.egenix.com/company/contact/
 
R

Robin

well the flup server for fast cgi supports forking if the server is declared as
an external process in apache. Then the top level of the flup process handles
each request and passes it off to a forked worker. I cannot recall exactly, but
I believe that apache mod_fastcgi does the right thing when it comes to
internally declared fastcgi handlers. For apache at least I think the threading
issues are handled properly.

I think the preforkserver.py code handles all the threading issues for you
(assuming it's not win32).

Thanks - I think if I go the flup route I would use AJP though - since
its very easy to setup with apache (1 proxy line) and mod_ajp comes as
standard. And then everything is very much seperated from the apache
process.
 
R

Robin Becker

Robin said:
Thanks - I think if I go the flup route I would use AJP though - since
its very easy to setup with apache (1 proxy line) and mod_ajp comes as
standard. And then everything is very much seperated from the apache
process.
........

that's right and very easy to control. The only problem I recall is that the
socket needs to be made readable by www. You can do that with a sudo chown or by
setting up the mask at the ajp server start.
 
G

Graham Dumpleton

Hi,

I am building some computational web services using soaplib. This
creates a WSGI application.

However, since some of these services are computationally intensive,
and may be long running, I was looking for a way to use multiple
processes. I thought about using multiprocessing.Process manually in
the service, but I was a bit worried about how that might interact
with a threaded server (I was hoping the thread serving that request
could just wait until the child is finished). Also it would be good to
keep the services as simple as possible so it's easier for people to
write them.

I have at the moment the following WSGI structure:
TransLogger(URLMap(URLParser(soaplib objects)))
although presumably, due to the beauty of WSGI, this shouldn't matter.

As I've found with all web-related Python stuff, I'm overwhelmed by
the choice and number of alternatives. I've so far been using cherrypy
and ajp-wsgi for my testing, but am aware of Spawning, twisted etc.
What would be the simplest [quickest to setup and fewest details of
the server required - ideally with a simple example] and most reliable
[this will eventually be 'in production' as part of a large scientific
project] way to host this sort of WSGI with a process-per-request
style?

In this sort of situation one wouldn't normally do the work in the
main web server, but have a separarte long running daemon process
embedding mini web server that understands XML-RPC. The main web
server would then make XML-RPC requests against the backend daemon
process, which would use threading and or queueing to handle the
requests.

If the work is indeed long running, the backend process would normally
just acknowledge the request and not wait. The web page would return
and it would be up to user to then somehow occassionally poll web
server, manually or by AJAX, to see how progres is going. That is,
further XML-RPC requests from main server to backend daemon process
asking about progress.

I do't believe the suggestions about fastcgi/scgi/ajp/flup or mod_wsgi
are really appropriate as you don't want this done in web server
processes as then you are at mercy of web server processes being
killed or dying when part way through something. Some of these systems
will do this if requests take too long. Thus better to offload real
work to another process.

Graham
 
G

Graham Dumpleton

2009/2/12 alex goretoy said:
GAE (Google App Engine) uses WSGI for webapps. You don't have to overhead of
managing a server and all it's services this way as well. Just manage dns
entries. Although, there are limitations depending on your project needs of
what libs you need to use.

GAE is not suitable as they kill off any requests that take more than
a set time. That time isn't that long, so can't support long running
requests.

Graham
appengine.google.com

-Alex Goretoy
http://www.goretoy.com



Hi,

I am building some computational web services using soaplib. This
creates a WSGI application.

However, since some of these services are computationally intensive,
and may be long running, I was looking for a way to use multiple
processes. I thought about using multiprocessing.Process manually in
the service, but I was a bit worried about how that might interact
with a threaded server (I was hoping the thread serving that request
could just wait until the child is finished). Also it would be good to
keep the services as simple as possible so it's easier for people to
write them.

I have at the moment the following WSGI structure:
TransLogger(URLMap(URLParser(soaplib objects)))
although presumably, due to the beauty of WSGI, this shouldn't matter.

As I've found with all web-related Python stuff, I'm overwhelmed by
the choice and number of alternatives. I've so far been using cherrypy
and ajp-wsgi for my testing, but am aware of Spawning, twisted etc.
What would be the simplest [quickest to setup and fewest details of
the server required - ideally with a simple example] and most reliable
[this will eventually be 'in production' as part of a large scientific
project] way to host this sort of WSGI with a process-per-request
style?

In this sort of situation one wouldn't normally do the work in the
main web server, but have a separarte long running daemon process
embedding mini web server that understands XML-RPC. The main web
server would then make XML-RPC requests against the backend daemon
process, which would use threading and or queueing to handle the
requests.

If the work is indeed long running, the backend process would normally
just acknowledge the request and not wait. The web page would return
and it would be up to user to then somehow occassionally poll web
server, manually or by AJAX, to see how progres is going. That is,
further XML-RPC requests from main server to backend daemon process
asking about progress.

I do't believe the suggestions about fastcgi/scgi/ajp/flup or mod_wsgi
are really appropriate as you don't want this done in web server
processes as then you are at mercy of web server processes being
killed or dying when part way through something. Some of these systems
will do this if requests take too long. Thus better to offload real
work to another process.

Graham
 
R

Robin

I am building some computational web services using soaplib. This
creates a WSGI application.
However, since some of these services are computationally intensive,
and may be long running, I was looking for a way to use multiple
processes. I thought about using multiprocessing.Process manually in
the service, but I was a bit worried about how that might interact
with a threaded server (I was hoping the thread serving that request
could just wait until the child is finished). Also it would be good to
keep the services as simple as possible so it's easier for people to
write them.
I have at the moment the following WSGI structure:
TransLogger(URLMap(URLParser(soaplib objects)))
although presumably, due to the beauty of WSGI, this shouldn't matter.
As I've found with all web-related Python stuff, I'm overwhelmed by
the choice and number of alternatives. I've so far been using cherrypy
and ajp-wsgi for my testing, but am aware of Spawning, twisted etc.
What would be the simplest [quickest to setup and fewest details of
the server required - ideally with a simple example] and most reliable
[this will eventually be 'in production' as part of a large scientific
project] way to host this sort of WSGI with a process-per-request
style?

In this sort of situation one wouldn't normally do the work in the
main web server, but have a separarte long running daemon process
embedding mini web server that understands XML-RPC. The main web
server would then make XML-RPC requests against the backend daemon
process, which would use threading and or queueing to handle the
requests.

If the work is indeed long running, the backend process would normally
just acknowledge the request and not wait. The web page would return
and it would be up to user to then somehow occassionally poll web
server, manually or by AJAX, to see how progres is going. That is,
further XML-RPC requests from main server to backend daemon process
asking about progress.

I do't believe the suggestions about fastcgi/scgi/ajp/flup or mod_wsgi
are really appropriate as you don't want this done in web server
processes as then you are at mercy of web server processes being
killed or dying when part way through something. Some of these systems
will do this if requests take too long. Thus better to offload real
work to another process.

Thanks - in this case I am contrained to use SOAP (I am providing SOAP
services using soaplib so they run as a WSGI app). I choose soaplib
becuase it seems the simplest way to get soap services running in
Python (I was hoping to get this setup quickly).
So I am not really able to get into anything more complex as you
suggest... I have my nice easy WSGI app soap service, I would just
like it to run in a process pool to avoid GIL. Turns out I can do that
with apache+mod_wsgi and daemon mode, or flup forked server (I would
probably use ajp - so flup is in a seperate process to apache and
listens on some local port, and apache proxies to that using the ajp
protocol). I'm not sure which one is best... for now I'm continuing to
just develop on cherrypy on my own machine.

I suspect I will use ajp forked flup, since that only requires
mod_proxy and mod_proxy_ajp which I understand come with standard
apache and the system administrators will probably be happier with.

Cheers

Robin
 
R

Robin

GAE is not suitable as they kill off any requests that take more than
a set time. That time isn't that long, so can't support long running
requests.

GAE is definitely not suitable in this case... The servers are
provided and maintained as part of a large scientific project for
which I am providing just a few services... Other groups are running
services in other platforms on tomcat through soaplab/instantsoap -
but I was hoping to use native python services since I thought it
would be easier.

Cheers

Robin
 
G

Graham Dumpleton

Hi,
I am building some computational web services using soaplib. This
creates a WSGI application.
However, since some of these services are computationally intensive,
and may be long running, I was looking for a way to use multiple
processes. I thought about using multiprocessing.Process manually in
the service, but I was a bit worried about how that might interact
with a threaded server (I was hoping the thread serving that request
could just wait until the child is finished). Also it would be good to
keep the services as simple as possible so it's easier for people to
write them.
I have at the moment the following WSGI structure:
TransLogger(URLMap(URLParser(soaplib objects)))
although presumably, due to the beauty of WSGI, this shouldn't matter..
As I've found with all web-related Python stuff, I'm overwhelmed by
the choice and number of alternatives. I've so far been using cherrypy
and ajp-wsgi for my testing, but am aware of Spawning, twisted etc.
What would be the simplest [quickest to setup and fewest details of
the server required - ideally with a simple example] and most reliable
[this will eventually be 'in production' as part of a large scientific
project] way to host this sort of WSGI with a process-per-request
style?
In this sort of situation one wouldn't normally do the work in the
main web server, but have a separarte long running daemon process
embedding mini web server that understands XML-RPC. The main web
server would then make XML-RPC requests against the backend daemon
process, which would use threading and or queueing to handle the
requests.
If the work is indeed long running, the backend process would normally
just acknowledge the request and not wait. The web page would return
and it would be up to user to then somehow occassionally poll web
server, manually or by AJAX, to see how progres is going. That is,
further XML-RPC requests from main server to backend daemon process
asking about progress.
I do't believe the suggestions about fastcgi/scgi/ajp/flup or mod_wsgi
are really appropriate as you don't want this done in web server
processes as then you are at mercy of web server processes being
killed or dying when part way through something. Some of these systems
will do this if requests take too long. Thus better to offload real
work to another process.

Thanks - in this case I am contrained to use SOAP (I am providing SOAP
services using soaplib so they run as a WSGI app). I choose soaplib
becuase it seems the simplest way to get soap services running in
Python (I was hoping to get this setup quickly).
So I am not really able to get into anything more complex as you
suggest... I have my nice easy WSGI app soap service, I would just
like it to run in a process pool to avoid GIL.

You can still use SOAP, you don't have to use XML-RPC, they are after
all just an interprocess communications mechanism.
Turns out I can do that
with apache+mod_wsgi and daemon mode, or flup forked server (I would
probably use ajp - so flup is in a seperate process to apache and
listens on some local port, and apache proxies to that using the ajp
protocol). I'm not sure which one is best... for now I'm continuing to
just develop on cherrypy on my own machine.

In mod_wsgi daemon mode the application is still in a distinct
process. The only dfference is that Apache is acting as the process
supervisor and you do not have to install a separate system such as
supervisord or monit to start up the process and ensure it is
restarted if it crashes, as Apache/mod_wsgi will do that for you. You
also don't need flup when using mod_wsgi as it provides everything.
I suspect I will use ajp forked flup, since that only requires
mod_proxy and mod_proxy_ajp which I understand come with standard
apache and the system administrators will probably be happier with.

The Apache/mod_wsgi approach actually has less dependencies. For it
you only need Apache+mod_wsgi. For AJP you need Apache+flup+monit-or-
supervisord. Just depends on which dependencies you think are easier
to configure and manage. :)

Graham
 
R

Robin Becker

Graham Dumpleton wrote:
..........
requests.

If the work is indeed long running, the backend process would normally
just acknowledge the request and not wait. The web page would return
and it would be up to user to then somehow occassionally poll web
server, manually or by AJAX, to see how progres is going. That is,
further XML-RPC requests from main server to backend daemon process
asking about progress.
........
this is exactly what we do with the long runners. The wsgi (django in our case)
process can work out how long the process is likely to take and either responds
directly or offloads the job to an xmrpc server and responds with a page
containing a token allowing access to the queue server which refreshes
periodically to determine job status etc etc. When the job finishes the refresh
request returns the job result and finishes looping. In our case we don't need
to worry about people abandoning the job since the results are cached and may be
of use to others (typical case produce brochure containing details of all
resources in a country or large city). To avoid overload the xmlrpc server is
only allowed to run 3 active threads from its queue.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,906
Latest member
SkinfixSkintag

Latest Threads

Top