WSGI question: reading headers before message body has been read

Discussion in 'Python' started by Ron Garret, Jan 18, 2009.

  1. Ron Garret

    Ron Garret Guest

    I'm writing a WSGI application and I would like to check the content-
    length header before reading the content to make sure that the content
    is not too big in order to prevent denial-of-service attacks. So I do
    something like this:

    def application(environ, start_response):
    status = "200 OK"
    headers = [('Content-Type', 'text/html'), ]
    start_response(status, headers)
    if int(environ['CONTENT_LENGTH'])>1000: return 'File too big'

    But this doesn't seem to work. If I upload a huge file it still waits
    until the entire file has been uploaded before complaining that it's
    too big.

    Is it possible to read the HTTP headers in WSGI before the request
    body has been read?

    Thanks,
    rg
     
    Ron Garret, Jan 18, 2009
    #1
    1. Advertising

  2. Ron Garret schrieb:
    > I'm writing a WSGI application and I would like to check the content-
    > length header before reading the content to make sure that the content
    > is not too big in order to prevent denial-of-service attacks. So I do
    > something like this:
    >
    > def application(environ, start_response):
    > status = "200 OK"
    > headers = [('Content-Type', 'text/html'), ]
    > start_response(status, headers)
    > if int(environ['CONTENT_LENGTH'])>1000: return 'File too big'
    >
    > But this doesn't seem to work. If I upload a huge file it still waits
    > until the entire file has been uploaded before complaining that it's
    > too big.
    >
    > Is it possible to read the HTTP headers in WSGI before the request
    > body has been read?


    AFAIK that is nothing that WSGI defines - it's an implementation-detail
    of your server. Which one do you use?

    Diez
     
    Diez B. Roggisch, Jan 18, 2009
    #2
    1. Advertising

  3. On Jan 18, 2009, at 8:01 PM, Ron Garret wrote:

    > def application(environ, start_response):
    > status = "200 OK"
    > headers = [('Content-Type', 'text/html'), ]
    > start_response(status, headers)
    > if int(environ['CONTENT_LENGTH'])>1000: return 'File too big'


    How would that work for chunked transfer-encoding?

    Cheers,

    --
    PA.
    http://alt.textdrive.com/nanoki/
     
    Petite Abeille, Jan 18, 2009
    #3
  4. Ron Garret

    Ron Garret Guest

    On Jan 18, 11:29 am, "Diez B. Roggisch" <> wrote:
    > Ron Garret schrieb:
    >
    >
    >
    > > I'm writing a WSGI application and I would like to check the content-
    > > length header before reading the content to make sure that the content
    > > is not too big in order to prevent denial-of-service attacks.  So I do
    > > something like this:

    >
    > > def application(environ, start_response):
    > >     status = "200 OK"
    > >     headers = [('Content-Type', 'text/html'), ]
    > >     start_response(status, headers)
    > >     if int(environ['CONTENT_LENGTH'])>1000: return 'File too big'

    >
    > > But this doesn't seem to work.  If I upload a huge file it still waits
    > > until the entire file has been uploaded before complaining that it's
    > > too big.

    >
    > > Is it possible to read the HTTP headers in WSGI before the request
    > > body has been read?

    >
    > AFAIK that is nothing that WSGI defines - it's an implementation-detail
    > of your server. Which one do you use?


    Apache at the moment, with lighttpd as a contender to replace it.

    rg
     
    Ron Garret, Jan 18, 2009
    #4
  5. Ron Garret

    Ron Garret Guest

    On Jan 18, 11:43 am, Petite Abeille <> wrote:
    > On Jan 18, 2009, at 8:01 PM, Ron Garret wrote:
    >
    > > def application(environ, start_response):
    > >    status = "200 OK"
    > >    headers = [('Content-Type', 'text/html'), ]
    > >    start_response(status, headers)
    > >    if int(environ['CONTENT_LENGTH'])>1000: return 'File too big'

    >
    > How would that work for chunked transfer-encoding?


    It wouldn't. But many clients don't use chunked-transfer-encoding
    when uploading files whose size is known. In that case it would be
    nice to let users know that their upload is going to fail BEFORE they
    waste hours waiting for 10GB of data to go down the wire.

    rg
     
    Ron Garret, Jan 18, 2009
    #5
  6. Ron Garret schrieb:
    > On Jan 18, 11:29 am, "Diez B. Roggisch" <> wrote:
    >> Ron Garret schrieb:
    >>
    >>
    >>
    >>> I'm writing a WSGI application and I would like to check the content-
    >>> length header before reading the content to make sure that the content
    >>> is not too big in order to prevent denial-of-service attacks. So I do
    >>> something like this:
    >>> def application(environ, start_response):
    >>> status = "200 OK"
    >>> headers = [('Content-Type', 'text/html'), ]
    >>> start_response(status, headers)
    >>> if int(environ['CONTENT_LENGTH'])>1000: return 'File too big'
    >>> But this doesn't seem to work. If I upload a huge file it still waits
    >>> until the entire file has been uploaded before complaining that it's
    >>> too big.
    >>> Is it possible to read the HTTP headers in WSGI before the request
    >>> body has been read?

    >> AFAIK that is nothing that WSGI defines - it's an implementation-detail
    >> of your server. Which one do you use?

    >
    > Apache at the moment, with lighttpd as a contender to replace it.



    Together with mod_wsgi?

    Diez
     
    Diez B. Roggisch, Jan 18, 2009
    #6
  7. Ron Garret

    Ron Garret Guest

    On Jan 18, 12:40 pm, "Diez B. Roggisch" <> wrote:
    > Ron Garret schrieb:
    >
    >
    >
    > > On Jan 18, 11:29 am, "Diez B. Roggisch" <> wrote:
    > >> Ron Garret schrieb:

    >
    > >>> I'm writing a WSGI application and I would like to check the content-
    > >>> length header before reading the content to make sure that the content
    > >>> is not too big in order to prevent denial-of-service attacks.  So I do
    > >>> something like this:
    > >>> def application(environ, start_response):
    > >>>     status = "200 OK"
    > >>>     headers = [('Content-Type', 'text/html'), ]
    > >>>     start_response(status, headers)
    > >>>     if int(environ['CONTENT_LENGTH'])>1000: return 'File too big'
    > >>> But this doesn't seem to work.  If I upload a huge file it still waits
    > >>> until the entire file has been uploaded before complaining that it's
    > >>> too big.
    > >>> Is it possible to read the HTTP headers in WSGI before the request
    > >>> body has been read?
    > >> AFAIK that is nothing that WSGI defines - it's an implementation-detail
    > >> of your server. Which one do you use?

    >
    > > Apache at the moment, with lighttpd as a contender to replace it.

    >
    > Together with mod_wsgi?
    >
    > Diez


    Yes. (Is there any other way to run WSGI apps under Apache?)

    rg
     
    Ron Garret, Jan 18, 2009
    #7
  8. On Jan 19, 6:01 am, Ron Garret <> wrote:
    > I'm writing a WSGI application and I would like to check the content-
    > length header before reading the content to make sure that the content
    > is not too big in order to prevent denial-of-service attacks.  So I do
    > something like this:
    >
    > def application(environ, start_response):
    >     status = "200 OK"
    >     headers = [('Content-Type', 'text/html'), ]
    >     start_response(status, headers)
    >     if int(environ['CONTENT_LENGTH'])>1000: return 'File too big'


    You should be returning 413 (Request Entity Too Large) error status
    for that specific case, not a 200 response.

    You should not be returning a string as response content as it is very
    inefficient, wrap it in an array.

    > But this doesn't seem to work.  If I upload a huge file it still waits
    > until the entire file has been uploaded before complaining that it's
    > too big.
    >
    > Is it possible to read the HTTP headers in WSGI before the request
    > body has been read?


    Yes.

    The issue is that in order to avoid the client sending the data the
    client needs to actually make use of HTTP/1.1 headers to indicate it
    is expecting a 100-continue response before sending data. You don't
    need to handle that as Apache/mod_wsgi does it for you, but the only
    web browser I know of that supports 100-continue is Opera browser.
    Clients like curl do also support it as well though. In other words,
    if people use IE, Firefox or Safari, the request content will be sent
    regardless anyway.

    There is though still more to this though. First off is that if you
    are going to handle 413 errors in your own WSGI application and you
    are using mod_wsgi daemon mode, then request content is still sent by
    browser regardless, even if using Opera. This is because the act of
    transferring content across to mod_wsgi daemon process triggers return
    of 100-continue to client and so it sends data. There is a ticket for
    mod_wsgi to implement proper 100-continue support for daemon mode, but
    will be a while before that happens.

    Rather than have WSGI application handle 413 error cases, you are
    better off letting Apache/mod_wsgi handle it for you. To do that all
    you need to do is use the Apache 'LimitRequestBody' directive. This
    will check the content length for you and send 413 response without
    the WSGI application even being called. When using daemon mode, this
    is done in Apache child worker processes and for 100-continue case
    data will not be read at all and can avoid client sending it if using
    Opera.

    Only caveat on that is the currently available mod_wsgi has a bug in
    it such that 100-continue requests not always working for daemon mode.
    You need to apply fix in:

    http://code.google.com/p/modwsgi/issues/detail?id=121

    For details on LimitRequestBody directive see:

    http://httpd.apache.org/docs/2.2/mod/core.html#limitrequestbody

    Graham
     
    Graham Dumpleton, Jan 18, 2009
    #8
  9. On Jan 19, 6:43 am, Petite Abeille <> wrote:
    > On Jan 18, 2009, at 8:01 PM, Ron Garret wrote:
    >
    > > def application(environ, start_response):
    > >    status = "200 OK"
    > >    headers = [('Content-Type', 'text/html'), ]
    > >    start_response(status, headers)
    > >    if int(environ['CONTENT_LENGTH'])>1000: return 'File too big'

    >
    > How would that work for chunked transfer-encoding?


    Chunked transfer encoding on request content is not supported by WSGI
    specification as WSGI requires CONTENT_LENGTH be set and disallows
    reading more than defined content length, where CONTENT_LENGTH is
    supposed to be taken as 0 if not provided.

    If using Apache/mod_wsgi 3.0 (currently in development, so need to use
    subversion copy), you can step outside what WSGI strictly allows and
    still handle chunked transfer encoding on request content, but you
    still don't have a CONTENT_LENGTH so as to check in advance if more
    data than expected is going to be sent.

    If wanting to know how to handle chunked transfer encoding in
    mod_wsgi, better off asking on mod_wsgi list.

    Graham
     
    Graham Dumpleton, Jan 18, 2009
    #9
  10. Ron Garret

    Ron Garret Guest

    On Jan 18, 1:21 pm, Graham Dumpleton <>
    wrote:
    > On Jan 19, 6:01 am, Ron Garret <> wrote:
    >
    > > I'm writing a WSGI application and I would like to check the content-
    > > length header before reading the content to make sure that the content
    > > is not too big in order to prevent denial-of-service attacks.  So I do
    > > something like this:

    >
    > > def application(environ, start_response):
    > >     status = "200 OK"
    > >     headers = [('Content-Type', 'text/html'), ]
    > >     start_response(status, headers)
    > >     if int(environ['CONTENT_LENGTH'])>1000: return 'File too big'

    >
    > You should be returning 413 (Request Entity Too Large) error status
    > for that specific case, not a 200 response.
    >
    > You should not be returning a string as response content as it is very
    > inefficient, wrap it in an array.
    >
    > > But this doesn't seem to work.  If I upload a huge file it still waits
    > > until the entire file has been uploaded before complaining that it's
    > > too big.

    >
    > > Is it possible to read the HTTP headers in WSGI before the request
    > > body has been read?

    >
    > Yes.
    >
    > The issue is that in order to avoid the client sending the data the
    > client needs to actually make use of HTTP/1.1 headers to indicate it
    > is expecting a 100-continue response before sending data. You don't
    > need to handle that as Apache/mod_wsgi does it for you, but the only
    > web browser I know of that supports 100-continue is Opera browser.
    > Clients like curl do also support it as well though. In other words,
    > if people use IE, Firefox or Safari, the request content will be sent
    > regardless anyway.
    >
    > There is though still more to this though. First off is that if you
    > are going to handle 413 errors in your own WSGI application and you
    > are using mod_wsgi daemon mode, then request content is still sent by
    > browser regardless, even if using Opera. This is because the act of
    > transferring content across to mod_wsgi daemon process triggers return
    > of 100-continue to client and so it sends data. There is a ticket for
    > mod_wsgi to implement proper 100-continue support for daemon mode, but
    > will be a while before that happens.
    >
    > Rather than have WSGI application handle 413 error cases, you are
    > better off letting Apache/mod_wsgi handle it for you. To do that all
    > you need to do is use the Apache 'LimitRequestBody' directive. This
    > will check the content length for you and send 413 response without
    > the WSGI application even being called. When using daemon mode, this
    > is done in Apache child worker processes and for 100-continue case
    > data will not be read at all and can avoid client sending it if using
    > Opera.
    >
    > Only caveat on that is the currently available mod_wsgi has a bug in
    > it such that 100-continue requests not always working for daemon mode.
    > You need to apply fix in:
    >
    >  http://code.google.com/p/modwsgi/issues/detail?id=121
    >
    > For details on LimitRequestBody directive see:
    >
    >  http://httpd.apache.org/docs/2.2/mod/core.html#limitrequestbody
    >
    > Graham


    Thanks for the detailed response!

    rg
     
    Ron Garret, Jan 18, 2009
    #10
  11. Ron Garret schrieb:
    > On Jan 18, 12:40 pm, "Diez B. Roggisch" <> wrote:
    >> Ron Garret schrieb:
    >>
    >>
    >>
    >>> On Jan 18, 11:29 am, "Diez B. Roggisch" <> wrote:
    >>>> Ron Garret schrieb:
    >>>>> I'm writing a WSGI application and I would like to check the content-
    >>>>> length header before reading the content to make sure that the content
    >>>>> is not too big in order to prevent denial-of-service attacks. So I do
    >>>>> something like this:
    >>>>> def application(environ, start_response):
    >>>>> status = "200 OK"
    >>>>> headers = [('Content-Type', 'text/html'), ]
    >>>>> start_response(status, headers)
    >>>>> if int(environ['CONTENT_LENGTH'])>1000: return 'File too big'
    >>>>> But this doesn't seem to work. If I upload a huge file it still waits
    >>>>> until the entire file has been uploaded before complaining that it's
    >>>>> too big.
    >>>>> Is it possible to read the HTTP headers in WSGI before the request
    >>>>> body has been read?
    >>>> AFAIK that is nothing that WSGI defines - it's an implementation-detail
    >>>> of your server. Which one do you use?
    >>> Apache at the moment, with lighttpd as a contender to replace it.

    >> Together with mod_wsgi?
    >>
    >> Diez

    >
    > Yes. (Is there any other way to run WSGI apps under Apache?)


    Well, not so easy, but of course you can work with mod_python or even
    CGI/fastcgi to eventually invoke a WSGI-application.

    However, the original question - that's a tough one.

    According to this, it seems one can use an apache-directive to prevent
    mod_wsgi to even pass a request to the application if it exceeds a
    certain size.

    http://code.google.com/p/modwsgi/wiki/ConfigurationGuidelines

    Search for "Limiting Request Content"

    However, I'm not sure how early that happens. I can only suggest you try
    & contact Graham Dumpleton directly, he is very responsive.


    Diez
     
    Diez B. Roggisch, Jan 18, 2009
    #11
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. muskan
    Replies:
    1
    Views:
    336
    Ioannis Vranos
    Sep 22, 2004
  2. dont bother
    Replies:
    0
    Views:
    830
    dont bother
    Mar 3, 2004
  3. Jon Bendtsen
    Replies:
    4
    Views:
    408
    Jon Bendtsen
    Jun 4, 2009
  4. Alice Bevan–McGregor

    Streaming templating languages for use as WSGI body.

    Alice Bevan–McGregor, Jan 5, 2011, in forum: Python
    Replies:
    0
    Views:
    218
    Alice Bevan–McGregor
    Jan 5, 2011
  5. Ian
    Replies:
    2
    Views:
    2,021
Loading...

Share This Page