Python's CGI and Javascripts uriEncode: A disconnect.

Discussion in 'Python' started by Elf M. Sternberg, Jul 1, 2003.

  1. It's all Netscape's fault.

    RFC 2396 (URI Specifications) specifies that a space shall be encoded
    using %20 and the plus symbol is always safe. Netscape (and possibly
    even earlier browsers like Mosaic) used the plus symbol '+' as a
    substitute for the space in the last part of the URI, arguments to the
    object referenced (you know, all the stuff after the question mark in
    a URL).

    The ECMA-262 "Javascript" standard now supported by both Netscape and
    Internet Explorer honor RFC 2396, translating spaces into their hex
    equivalent %20 and leaving pluses alone.

    The Python library cgi.FieldStorage decodes it backwards, expecting
    pluses to be spaces and %2b to represent pluses. This behavior is
    present even in python 2.2, and arguably helps support older browsers.
    But when web applications are heavily javascript-dependent, this can
    cause major headaches.

    Other than override cgi.FieldStorage's parse_qsl, is there anyway to
    fix this disconnect?

    Elf
     
    Elf M. Sternberg, Jul 1, 2003
    #1
    1. Advertising

  2. Elf M. Sternberg <> wrote:

    > Netscape (and possibly even earlier browsers like Mosaic) used the
    > plus symbol '+' as a substitute for the space in the last part of
    > the URI


    This is correct in a query parameter. eg. in ...?foo=abc+def, the symbol
    is a space.

    This is part of the specification for the media type
    application/x-www-form-urlencoded, defined by HTML itself (section
    17.13.4.1 of the 4.01 spec). This states that spaces should normally
    be encoded as '+', however really using '%20' is just as good and
    causes less confusion, so that's what newer browsers (and I) do.

    Elsewhere, spaces should not be encoded as '+'.

    The reasoning for this initial decision is unclear - presumably it is
    intended to improve readability, but URIs with query parts are
    generally not going to be very readable anyway.

    > The ECMA-262 "Javascript" standard now supported by both Netscape and
    > Internet Explorer honor RFC 2396, translating spaces into their hex
    > equivalent %20 and leaving pluses alone.


    Depends which function you are talking about. The 'escape' and 'encodeURI'
    built-in functions are not designed to encode single URI query parameter
    values, they're designed to encode larger chunks of URI. As such they do
    not need to encode plus characters.

    The encodeURIComponent function *does*, and it is this function that you
    should use if you want some JavaScript code to submit a query parameter.

    The only drawback is that encodeURIComponent is relatively new, so you
    won't find it on medium-old browsers like Netscape 4 and IE 5.0. (The
    same goes for encodeURI - you only get 'escape' in older browsers.)

    > The Python library cgi.FieldStorage decodes it backwards, expecting
    > pluses to be spaces and %2b to represent pluses.


    The Python library is correct per spec. If your scripts are not encoding
    plus symbols in query parameters to %2B, they are at fault (and will go
    equally wrong in any other language).

    Possible solutions:

    a. use encodeURIComponent() instead. This is best, but won't work
    universally.
    b. use escape(), then replace any pluses in its output with %2B. This
    is OK, but won't handle Unicode properly or predictably. (note: in IE,
    encodeURI() also fails to handle Unicode predictably.)
    c. roll your own encodeURIComponent function.

    It's a bit off-topic for c.l.py, but here's a (c.)-style solution I've used
    before:

    function encPar(wide) {
    var narrow= encUtf8(wide);
    var enc= '';
    for (var i= 0; i<narrow.length; i++) {
    if (encPar_OK.indexOf(narrow.charAt(i))==-1)
    enc= enc+encHex2(narrow.charCodeAt(i));
    else
    enc= enc+narrow.charAt(i);
    }
    return enc;
    }
    var encPar_OK= 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'+
    '0123456789*@-_./';

    function encHex2(v) {
    return '%'+encHex2_DIGITS.charAt(v>>>4)+encHex2_DIGITS.charAt(v&0xF);
    }
    var encHex2_DIGITS= '0123456789ABCDEF';

    function encUtf8(wide) {
    var c, s;
    var enc= '';
    var i= 0;
    while(i<wide.length) {
    c= wide.charCodeAt(i++);
    // handle UTF-16 surrogates
    if (c>=0xDC00 && c<0xE000) continue;
    if (c>=0xD800 && c<0xDC00) {
    if (i>=wide.length) continue;
    s= wide.charCodeAt(i++);
    if (s<0xDC00 || c>=0xDE00) continue;
    c= ((c-0xD800)<<10)+(s-0xDC00)+0x10000;
    }
    // output value
    if (c<0x80) enc+=
    String.fromCharCode(c);
    else if (c<0x800) enc+=
    String.fromCharCode(0xC0+(c>>6),0x80+(c&0x3F));
    else if (c<0x10000) enc+=
    String.fromCharCode(0xE0+(c>>12),0x80+(c>>6&0x3F),0x80+(c&0x3F));
    else enc+=
    String.fromCharCode(0xF0+(c>>18),0x80+(c>>12&0x3F),
    0x80+(c>>6&0x3F),0x80+(c&0x3F));
    }
    return enc;
    }

    if that's of any use.

    Kind of sucks having to do this, eh?

    --
    Andrew Clover
    mailto:
    http://www.doxdesk.com/
     
    Andrew Clover, Jul 5, 2003
    #2
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. mkhmer
    Replies:
    1
    Views:
    381
    Shawn Wildermuth (C# MVP)
    Jul 17, 2006
  2. Tiwkiz
    Replies:
    0
    Views:
    455
    Tiwkiz
    Jan 30, 2007
  3. Ric Pullen

    Can someone ExplainRun at server and Client javascripts

    Ric Pullen, Jul 11, 2003, in forum: ASP .Net Web Controls
    Replies:
    0
    Views:
    114
    Ric Pullen
    Jul 11, 2003
  4. Noozer
    Replies:
    1
    Views:
    119
  5. Replies:
    2
    Views:
    731
    Fran├žois
    Jun 5, 2006
Loading...

Share This Page