Re: CGI and Unicode

Discussion in 'Python' started by Jeremy Yallop, Jun 23, 2003.

  1. Jim Hefferon wrote:
    > I have been struggling with getting Unicode out of Python's cgi
    > module. I have a small script illustrating the problem at the bottom
    > but first I need to explain.


    [...]

    > But when I ask what is the type of the variable that I get from
    > the cgi module, it comes out as StringType, not UnicodeType. My
    > browser is Galeon on the latest Debian and I've also tested it
    > with IE on NT.
    >
    > What am I missing?


    The problem, I think, is the lack of consistency amongst browsers in
    indicating the encoding of the submitted data. For instance, when
    responding to the form in your script, Opera includes a "Content-type"
    header containing:

    application/x-www-form-urlencoded;charset=utf-8

    whereas the "Content-type" header sent by Mozilla (and I suspect most
    other browsers[0]) doesn't indicate the charset:

    application/x-www-form-urlencoded

    If all browsers always included did this, then the cgi module could
    reliably detect the data encoding and store the parameters as Unicode
    strings when appropriate. As it stands, there's usually insufficient
    information for cgi to detect when Unicode is being sent or what the
    encoding is. If /you/ can determine by other means that the submitted
    data is UTF-8 encoded (which is probably the case if the form was part
    of a UTF-8 encoded document) there's nothing stopping you from
    decoding it yourself (using codecs.utf_8_decode or unicode(string,
    'utf-8'), for example).

    Oh, one last thing (which you probably know, but just in case...): you
    can access the submitted headers through the environment variables of
    the CGI process.

    import os
    for key, value in os.environ.items():
    print '<p>%30s : %s</p>' % (key, value)

    Hope this helps,

    Jeremy.

    [0] A quick skim of rfc 1867 seems to indicate that the charset clause
    isn't standard.
     
    Jeremy Yallop, Jun 23, 2003
    #1
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Gilles Lenfant

    Re: CGI and Unicode

    Gilles Lenfant, Jun 23, 2003, in forum: Python
    Replies:
    0
    Views:
    2,246
    Gilles Lenfant
    Jun 23, 2003
  2. Andrew Clover

    Re: CGI and Unicode

    Andrew Clover, Jun 23, 2003, in forum: Python
    Replies:
    1
    Views:
    1,034
    Tyler Eaves
    Jun 24, 2003
  3. Grzegorz ¦liwiñski
    Replies:
    2
    Views:
    969
    Grzegorz ¦liwiñski
    Jan 19, 2011
  4. Chirag Mistry
    Replies:
    6
    Views:
    172
    Ollivier Robert
    Feb 8, 2008
  5. Terry Reedy
    Replies:
    0
    Views:
    76
    Terry Reedy
    Jan 7, 2014
Loading...

Share This Page