Are the built-in HTTP servers production quality?

P

Paul Morrow

I've heard it said that web servers built upon the standard library's
SimpleHTTPServer or CGIHTTPServer aren't really suitable for use in a
production system. Is this still the case? And if so, why?

Is it primarily a performance issue? If so, aren't there a number of
things that can easily be done to improve webserver performance ---
caching proxy front-ends (e.g Apache's mod_proxy), faster hardware,
server clusters, etc.? Also it seems fairly straightforward to make
them asynchronous (via the ForkingMixIn and ThreadingMixIn)...

Is it that they're not safe; can be easily compromised/cracked? If so,
wouldn't hiding them behind trusted front-ends (like Apache) help there
too?

Are they simply too buggy to be relied upon? They leak memory, don't
correctly handle certain types of requests, or perhaps they're not
standards compliant enough?

They're so easy to work with, I'd really love to understand what we
believe their shortcomings to be.

Thanks.

Paul
 
I

Irmen de Jong

Paul said:
I've heard it said that web servers built upon the standard library's
SimpleHTTPServer or CGIHTTPServer aren't really suitable for use in a
production system. Is this still the case? And if so, why?

For starters, the SimpleHTTPServer reports the wrong Content-Length.
See my patch at http://tinyurl.com/56frb

--Irmen
 
P

Paul Morrow

Irmen said:
For starters, the SimpleHTTPServer reports the wrong Content-Length.
See my patch at http://tinyurl.com/56frb

Yes, something is wrong there. I wonder though if it makes sense to
continue to open text files in 'text' mode, so that the treatment of
newlines is normalized, but then adjust the content-length to be the
length of the (newline converted) string as read from the file.
 
I

Irmen de Jong

Paul said:
Yes, something is wrong there. I wonder though if it makes sense to
continue to open text files in 'text' mode, so that the treatment of
newlines is normalized, but then adjust the content-length to be the
length of the (newline converted) string as read from the file.

Well, I think that could be done too.
But why not ditch the newline conversion altogether and
treat text files just as any other file (which is what my patch does).

-Irmen
 
P

Paul Morrow

Irmen said:
Well, I think that could be done too.
But why not ditch the newline conversion altogether and
treat text files just as any other file (which is what my patch does).

-Irmen

I'm not sure that we can rely on the client browser doing the right
thing with the newlines. I'm not aware of an rfc that really covers
this, but do all browsers convert text files as needed to their native
format? If so, sending them as binary (unaltered) would be fine. But
if not, maybe the solution is to detect (or make a guess at) the OS of
the client, then adjust the newlines accordingly (yuck!)...
 
I

Irmen de Jong

Richard said:
HTTP 1.1 covers this topic:

http://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html#sec3.7.1

Still, that doesn't help if your server text format isn't one of these.

Hmm, well, reading that particular section of the HTTP spec makes
me think that the solution I programmed in the patch isn't the optimal
one.
Rather than treating text files (with content-type text/...) as any
other -binary- file, I now think that it's actually better to read them
in and convert the line endings to CR LF. But this requires:

- buffering of the entire text file in memory during the CR LF conversion
- fixing the content-type at the end because it's not the same as
the filesize (this was the original problem I solved).

What do you think?


Bye,
Irmen.
 
G

Grant Edwards

Rather than treating text files (with content-type text/...) as any
other -binary- file, I now think that it's actually better to read them
in and convert the line endings to CR LF. But this requires:

- buffering of the entire text file in memory during the CR LF conversion

Why do you have to buffer the entire file? You couldm ake two
almost-identical conversion passes through the file: the first
time, just count the "output" bytes. Finish sending the HTTP
headers out, then make the second pass actually writing the
output bytes.

If the OS had enough spare RAM sitting around, it will buffer
the file for you and the second pass won't even hit the disk.
 
I

Irmen de Jong

Grant said:
Why do you have to buffer the entire file? You couldm ake two
almost-identical conversion passes through the file: the first
time, just count the "output" bytes. Finish sending the HTTP
headers out, then make the second pass actually writing the
output bytes.

That would slow things down quite a bit, because you now
have to do the same CPU-intensive task twice.

But it saves memory.

Oh, the choices...

--Irmen
 
C

Christopher T King

That would slow things down quite a bit, because you now
have to do the same CPU-intensive task twice.

But it saves memory.

Oh, the choices...

Choices, indeed:

from StringIO import StringIO
from gzip import GzipFile

gzipdata = StringIO()
gzipfile = GzipFile(mode='w', fileobj=gzipdata)

length = 0
for line in slurp_data():
line = mangle(line)
length += length(line)
gzipfile.write(line)

gzipfile.close()

gzipdata.seek(0)
print length
for line in GzipFile(fileobj=gzipdata):
spew_data(line)

;)
 
D

David Fraser

Irmen said:
That would slow things down quite a bit, because you now
have to do the same CPU-intensive task twice.

But it saves memory.

Oh, the choices...

--Irmen

Or to improve it slightly you could do a find loop on the characters you
have to convert [LF for Unix], work out how many characters you're going
to have to add, give the contentlength, send the headers, and then use
the cached list of LF positions to send strings + inserted characters

Now that would have been easier to say in Python...

David
 
P

Paul Morrow

Irmen said:
Hmm, well, reading that particular section of the HTTP spec makes
me think that the solution I programmed in the patch isn't the optimal
one.
Rather than treating text files (with content-type text/...) as any
other -binary- file, I now think that it's actually better to read them
in and convert the line endings to CR LF. But this requires:

- buffering of the entire text file in memory during the CR LF conversion
- fixing the content-type at the end because it's not the same as
the filesize (this was the original problem I solved).

What do you think?

I think that it might be best if these files were already in the correct
format on the disk -- maybe via a daemon that periodically 'fixes'
those in need, a special ftp server, or a file-system plugin/hook
(mmm...!) -- so that nothing special has to be done to them by the web
server. In which case, reading them all as binary (as your patch takes
care of) works fine.
 
I

Irmen de Jong

Irmen said:
- fixing the content-type at the end because it's not the same as
the filesize (this was the original problem I solved).

Whoops, ofcourse I meant: fixing the content-length

--Irmen
 
A

Alan Kennedy

[Irmen de Jong]
in and convert the line endings to CR LF. But this requires:

- buffering of the entire text file in memory during the CR LF conversion
- fixing the content-type at the end because it's not the same as
the filesize (this was the original problem I solved).

What do you think?

Or send it with "Transfer-encoding: chunked", which is purpose built
for dealing with content of unknown length.

http://www.cse.ohio-state.edu/cgi-bin/rfc/rfc2616.html#sec-3.6.1

Each chunk is preceded by a header indicating the size of the
forthcoming chunk. An algorithm for reading chunked encoding is given
in one of the appendices: it's pretty simple to understand:-

http://www.cse.ohio-state.edu/cgi-bin/rfc/rfc2616.html#sec-19.4.6

So on the server side, simply continually buffer as much of the
textual content as you want to send in each chunk, do your
line-endings translation on the buffer, send an ascii rep of the size
of the buffer, and then send the buffer contents. Repeat until EOF.

This is how robust production servers like Apache and Tomcat do it:
they don't try to buffer the content, for a whole load of good reasons.

Each chunk can also have its own separate headers, which are appended
to the headers for the whole response, while processing that chunk, so
it should be relatively simple to do "Transfer-encoding: gzip" on each
chunk as well, for that extra bandwith saving.

Just one more way to do it.
 
A

Alan Kennedy

[Alan Kennedy]

[snip spiel about chunked transfer encoding]
This is how robust production servers like Apache and Tomcat do it: they
don't try to buffer the content, for a whole load of good reasons.

I should have stressed: this strategy is most often used when there is
some form of dynamic content generation going on, when it's not
possible to know when the output of the users code (e.g. CGI, PHP,
ASP, etc) is going to stop.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top