Differences in UTF-8 html form inputs

Discussion in 'Perl Misc' started by Realbot, Jan 8, 2005.

  1. Realbot

    Realbot Guest

    Hi,

    I'm having some problems with a web application of mine.
    To make things clearer here is an html input form which shows it.
    It inputs two strings with GET and POST and it uses HTML::Mason.

    <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
    <html>
    <head>
    <title>Test utf</title>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
    </head>
    <body>
    <form name="formutfget" method="GET">
    Enter text (get):<br>
    <input type="text" name="textget" size="20" maxlength="30">
    </form>
    <form name="formutfpost" method="POST">
    Enter text (post):<br>
    <input type="text" name="textpost" size="20" maxlength="30">
    </form>
    Value of GET: <% $textget %><br>
    Hex of GET: <% $hexget %><br>
    Value of POST: <% $textpost %><br>
    Hex of POST: <% $hexpost %><br>
    </body>
    </html>
    <%args>
    $textget => ''
    $textpost => ''
    $hexget => ''
    $hexpost => ''
    </%args>
    <%init>
    $hexget = unpack('H*', $textget);
    $hexpost = unpack('H*', $textpost);
    </%init>

    The strange thing is that running this form under these environments
    Debian Woody - perl 5.6.1 - Mozilla 1.4.3/Firefox 1.0
    Debian Sid - perl 5.8.4 - Mozilla 1.4.3/Firefox 1.0
    using as input the string "Δωδεκανήσων" (I don't know what it means btw...), I get as output

    Value of GET: Δωδεκανήσων
    Hex of GET: 26233931363b26233936393b26233934383b26233934393b26233935343b26233934353b26233935373b26233934323b26233936333b26233936393b26233935373b
    Value of POST: Δωδεκανήσων
    Hex of POST: 26233931363b26233936393b26233934383b26233934393b26233935343b26233934353b26233935373b26233934323b26233936333b26233936393b26233935373b

    while in OpenBSD - perl 5.8.0 - Mozilla 1.4.3/Firefox 1.0 with the same input string I get

    Value of GET: Δωδεκανήσων
    Hex of GET: ce94cf89ceb4ceb5cebaceb1cebdceaecf83cf89cebd
    Value of POST: Δωδεκανήσων
    Hex of POST: ce94cf89ceb4ceb5cebaceb1cebdceaecf83cf89cebd

    So, it seems that in the former I get escaped unicode character and in the latter UTF-8 ones.
    I thought that it could be a 5.6 vs 5.8 difference but as you can see even under Debian Sid I got the same unicode chars.
    Could it be an OpenBSD peculiarity? I've Googled but with no luck, maybe someone can shed some light on it...

    Thanks!
    Realbot, Jan 8, 2005
    #1
    1. Advertising

  2. Realbot wrote:


    > using as input the string "???????????" (I don't know what it means
    > btw...),


    "Dodecahedron"--i.e., a solid shape with 12 faces. If you're a gamer
    who owns "funny dice", your 12-sided dice are dodecahedrons (or, if
    you prefer, dodecahedra).
    --
    Christopher Mattern

    "Which one you figure tracked us?"
    "The ugly one, sir."
    "...Could you be more specific?"
    Chris Mattern, Jan 8, 2005
    #2
    1. Advertising

  3. On Sat, 8 Jan 2005, Realbot wrote:

    > I'm having some problems with a web application of mine.


    Forms submission including characters outside of us-ascii is
    non-trivial, and isn't in itself a Perl problem.

    OT: commentary of mine at
    http://ppewww.ph.gla.ac.uk/~flavell/charset/form-i18n.html

    Until one can get that part sorted out to one's satisfaction, any
    fiddling around that one might do in one's Perl script would be a bit
    pointless, IMHO. And discussion of the web part would be more at home
    on comp.infosystems.www.authoring.cgi (beware the automoderation bot).

    > <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">


    If we assume that the page itself is really coded in utf-8 (note that
    in the event of a dispute, the server's actual HTTP Content-type
    header wins over anything that you might secrete in a meta
    http-equiv), then you can expect current browsers to submit
    utf-8-encoded form data. But not-quite-so-new browsers - even some
    which support utf-8 display - get utf-8 forms submission sadly wrong.

    > <form name="formutfget" method="GET">


    In -theory- the method GET supports nothing better than the us-ascii
    character coding. But see my commentary for further discussion.

    > The strange thing is that running this form under these environments

    [...]

    > So, it seems that in the former I get escaped unicode character and
    > in the latter UTF-8 ones.


    It looks as if somebody is trying to ape the misbegotten behaviour of
    MSIE.

    In a practical sense there isn't one right answer - there are several
    compromises, depending on which browsers support what. But none of
    the details here are features of the Perl programming language,
    AFAICS.

    good luck
    Alan J. Flavell, Jan 10, 2005
    #3
  4. Realbot

    Realbot Guest

    Alan J. Flavell wrote:
    > On Sat, 8 Jan 2005, Realbot wrote:
    >
    > Forms submission including characters outside of us-ascii is
    > non-trivial, and isn't in itself a Perl problem.
    >
    > OT: commentary of mine at
    > http://ppewww.ph.gla.ac.uk/~flavell/charset/form-i18n.html


    I read it avidly before posting, very well written.

    > Until one can get that part sorted out to one's satisfaction, any
    > fiddling around that one might do in one's Perl script would be a bit
    > pointless, IMHO. And discussion of the web part would be more at home
    > on comp.infosystems.www.authoring.cgi (beware the automoderation bot).
    >
    >
    >> <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

    >
    >
    > If we assume that the page itself is really coded in utf-8 (note that
    > in the event of a dispute, the server's actual HTTP Content-type
    > header wins over anything that you might secrete in a meta
    > http-equiv), then you can expect current browsers to submit
    > utf-8-encoded form data.


    I found out that this was the exact problem. Apache installed on all Debian versions is configured with

    AddDefaultCharset on

    which completely ignores the encoding given in META tag and uses always the default encoding.
    In Apache installation under OpenBSD the parameter was not present and so it was correct.
    When I removed that nasty parameter everything worked on Debian too...

    > In a practical sense there isn't one right answer - there are several
    > compromises, depending on which browsers support what. But none of
    > the details here are features of the Perl programming language,
    > AFAICS.


    Now I know! :)

    Thanks a lot.
    Realbot, Jan 10, 2005
    #4
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Guest

    assigning values to form inputs

    Guest, Aug 24, 2004, in forum: ASP .Net
    Replies:
    3
    Views:
    529
  2. HugeBob

    Generating Form Inputs

    HugeBob, Aug 4, 2006, in forum: XML
    Replies:
    4
    Views:
    385
    Johannes Koch
    Aug 7, 2006
  3. Ellie
    Replies:
    2
    Views:
    449
    Ellie
    Oct 30, 2008
  4. Home_Job_opportunity
    Replies:
    0
    Views:
    498
    Home_Job_opportunity
    Jan 8, 2009
  5. Home_Job_opportunity
    Replies:
    0
    Views:
    585
    Home_Job_opportunity
    Jan 14, 2009
Loading...

Share This Page