multitask http server (single-process multi-connection HTTP server)

L

lkcl

for several reasons, i'm doing a cooperative multi-tasking HTTP
server:
git clone git://pyjs.org/git/multitaskhttpd.git

there probably exist perfectly good web frameworks that are capable of
doing this sort of thing: i feel certain that twisted is one of them.
however, the original author of rtmplite decided to rip twisted out
and to use multitask.py and i'm one of those strange people that also
likes the idea of using 900 lines of awesome elegant code rather than
tens of thousands of constantly-moving-target.

one of the things that's slightly unfortunate is that i'm going to
have to copy SimpleHTTPServer.py and slightly modify it;
CGIHTTPServer.py as well. this is Generally Bad Practice.

can anyone think of a way of monkey-patching or otherwise using
SimpleHTTPRequestHandler and CGIHTTPRequestHandler and overriding the
base class from which those two are derived?

i have had to pull bits out of BaseHTTPRequestHandler to make them use
the "yield" logic of multitask.py already, which was painful enough.

ideas, anyone?

l.
 
G

Gelonida

Hi lkcl,

Do you have any documentation or overview for your project?


Questions I would be interested in:
- List of features already working
- list of features under development
- list of features being in in the near future
 
G

geremy condra

for several reasons, i'm doing a cooperative multi-tasking HTTP
server:
 git clone git://pyjs.org/git/multitaskhttpd.git

there probably exist perfectly good web frameworks that are capable of
doing this sort of thing: i feel certain that twisted is one of them.
however, the original author of rtmplite decided to rip twisted out
and to use multitask.py and i'm one of those strange people that also
likes the idea of using 900 lines of awesome elegant code rather than
tens of thousands of constantly-moving-target.

one of the things that's slightly unfortunate is that i'm going to
have to copy SimpleHTTPServer.py and slightly modify it;
CGIHTTPServer.py as well.  this is Generally Bad Practice.

can anyone think of a way of monkey-patching or otherwise using
SimpleHTTPRequestHandler and CGIHTTPRequestHandler and overriding the
base class from which those two are derived?

i have had to pull bits out of BaseHTTPRequestHandler to make them use
the "yield" logic of multitask.py already, which was painful enough.

ideas, anyone?

l.

I may not be fully understanding what you're doing, but is there a
reason that one of the mixins can't be used?

Geremy Condra
 
L

lkcl

Hi lkcl,

Do you have any documentation or overview for your project?

git clone git://pyjs.org/git/multitaskhttpd.git

i only started it today, but yes, there's a README.

the primary reason it's being developed is because GNUmed are looking
to create a web service but they want to use the same psycopg2-based
middleware that's taken them pretty much forever to develop. see my
reply to geremy condra for more details.

if this turns out to be something that _really_ hasn't been done
before, then i'm happy for other people to pitch in and help out.
Questions  I would be interested in:
- List of features already working

* simple HTTP GET of files and subdirs

(because i blatantly copied SimpleHTTPServer.py)

* JSONRPC services

(i blatantly copied SimpleJSONRPCServer.py, it's
something i found, it's based on SimpleXMLRPCService.py)
- list of features under development

the time between "under development" and "in future" is so short it's
hardly worthwhile stating. i added JSONRPC in about 90 minutes for
example. tomorrow i'll add HTTP POST multi-part forms and it will
take me about... 50 mins i should imagine, by looking at something
like turbogears or django. i'm not going to "waste time reinventing"
stuff when i can pretty much cut/paste it. web servers have been
_done_ already - it's just that cooperative multitasking seems most
definitely _not_ to have been done before.
- list of features being in in the near future

* HTTP POST with multi-part forms (just like standard web frameworks)
* a better API once i have a clearer idea of what's needed
* use of regex matching on apps, just like django urls.py

l.
 
T

Tim Wintle

have you seen nagare:
http://www.nagare.org/

I've not used it - but from my understanding it might be what you're
looking for (for the http part at least).
i hate to think how this would be done using any of the standard
MixIns. even if you wrote a special MixIn which did single-instance
socket handling, you couldn't use it because the BaseHTTPHandler
doesn't "cooperate", it has a while True loop on serving connections
until they're closed.

I started working on something like this once (but I still had threads)
- afraid I can't find the code with it in right now. I think it was
similar to what you're doing:

at least 2 threads - one accepts requests, the other deals with them.

* overload BaseHTTPServer.process_request so it just adds connections
to a queue.

* worker threads fetch connections from the queue and starts working on
them. when they want to give up control they raise an exception to
bubble back up, and the connection is re-added to the queue along with
any persistent data.

I seem to remember the annoying bit being having to override
SocketServer.finish_request to use an existing handler.

- you can fairly easily limit that to process a single session at a time
with a shared dictionary or similar.


The reason I was doing it was for work that was cached for a short time,
but took a while on cache misses - If I noticed that another thread was
currently updating the cached version then I raised an exception. (I had
code that unlocked the gil, so multi-threaded made sense)


Tim
 
L

Luke Kenneth Casson Leighton


i have now! :)

it uses stackless python, which is proobbably where the nonblocking
aspects come from. going from there...

http://stacklessexamples.googlecode.com/svn/trunk/examples/networking/basicWebserver.py

ah ha! on the face of it, that does actually look like it achieves
the same sort of thing.
I've not used it - but from my understanding it might be what you're
looking for (for the http part at least).

yes, for the http part: the rest - mmm no.
I started working on something like this once (but I still had threads)
- afraid I can't find the code with it in right now. I think it was
similar to what you're doing:

at least 2 threads - one accepts requests, the other deals with them.

ok, that sounds like the problem has moved: requests could still be
received rather than blocked at the TCP level, but they'd still not
actually get processed if the 2nd "dealing with it" thread was in
"serve_forever()" mode. and because of HTTP Keep-Alives, when
close_connection=1 in the HTTPRequestHandler base class, that would
still be an issue.

looking at that stackless basic web server example, i believe that
that's actually it: the concept of "tasklets", and that cooperative
scheduling loop:
while time.time() < t + delay:
stackless.schedule()

multitask.py effectively does the same thing, but using "yield",
which is just amazing.

but... not being funny or anything, but basically i'm done already :)
multitaskhttpd works, it doesn't need stackless, i completed a JSONRPC
service last night, i'll add POST of multi-part forms today, and i
have everything that [GNUmed] will need.

i think convincing the gnumed team to get all their users to install
and use stackless python would be a bit of a hard sell.

l.
 
L

lkcl

but... not being funny or anything, but basically i'm done already :)multitaskhttpdworks, it doesn't need stackless, i completed a JSONRPC
service last night, i'll add POST of multi-part forms today, and i
have everything that [GNUmed] will need.

ok instead of adding this to httpd.py i created an HTTP proxy out of
multitaskhttpd. i also cobbled together an example JSONRPC Server
which is highly likely to be used in gnumed, now. this modified
version of SimpleJSONRPCServer.py i had to rip bits of
BaseHTTPServer.py and add in Connection Keep-Alives on every
response. so, send_error needed modding/replacing, as did send_head,
list_directory and so on, all with a view to making sure that the HTTP
proxy is "happy".

the reason why the proxy works is because the incoming HTTP
connection results in an HTTP/1.1 proxy connection with Connection:
Keep-Alive set. so, even if the user's browser drops the connection,
the proxy permanently keeps open the connection to the upstream HTTP
server. in the case of the standard SimpleHTTPServer.py and so on
that results in the handle_request() loop basically serving that same
user _forever_. well... it would, if it wasn't for the fact that the
standard version of send_error() in BaseHTTPServer.py does
"Connection: Close", hence the reason why i had to replace it.

so, yeah - now you can do truly dreadful things like... create a
massive in-memory data structure and never have to serialise it
because the back-end process will _still_ be around pretty much
forever... and you can keep doing that until the server runs out of
memory. hurrah!

l.
 

Members online

Forum statistics

Threads
473,764
Messages
2,569,566
Members
45,041
Latest member
RomeoFarnh

Latest Threads

Top