How to get UTF-8 from an urlencoded web form ?

Discussion in 'Perl Misc' started by Yohan N. Leder, Jul 15, 2006.

  1. Hello.

    All my tests are done using ActivePerl 5.8.8.817 under Win2K FR and
    Apache2.

    I'm trying to obtain (and display) user data which come from a web form
    with enctype as 'application/x-www-form-urlencoded' and don't succeed. I
    can do-it if the form is a 'multipart/form-data' but not a
    'application/x-www-form-urlencoded'.

    Here is a script to show the difference :

    ---- BEGIN ----
    #!/usr/bin/perl -w
    my $this = "utf8_and_webform.pl";

    require 5.8.0;
    use utf8;
    binmode(STDOUT, ':utf8');
    print "Content-type: text/html; charset=UTF-8\n\n";
    if (defined $ENV{'QUERY_STRING'} && length($ENV{'QUERY_STRING'}) > 0)
    {&see;}
    else {&ask;}
    exit 0;

    sub ask
    { # provide web forms for user to enter data
    print <<PAGE
    <html><head><title>Test about UTF-8 and web form</title></head><body>
    Use the form you want and see the resulting data.
    <p>
    FORM with enctype as 'application/x-www-form-urlencoded' :<br>
    <form action='$this?x' method='post' accept-charset='UTF-8'
    enctype='application/x-www-form-urlencoded'>
    <textarea name='msg' rows='4' cols='30' wrap='virtual'></textarea>
    <input type='submit' value='send'>
    </form></body></html></p>
    <p>
    FORM with enctype as 'multipart/form-data' :<br>
    <form action='$this?x' method='post' accept-charset='UTF-8'
    enctype='multipart/form-data'>
    <textarea name='msg' rows='4' cols='30' wrap='virtual'></textarea>
    <input type='submit' value='send'></p>
    </form></body></html>
    PAGE
    }

    sub see
    { # display data which come from user form
    my $data='';

    binmode(STDIN, ':utf8'); # or ':encoding('UTF-8')'
    read(STDIN, $data, $ENV{'CONTENT_LENGTH'});

    # OR
    #use Encode qw(decode);
    #read(STDIN, $data, $ENV{'CONTENT_LENGTH'});
    #$data = decode('UTF-8', $data);

    print $data;
    }
    ----- END ----

    For example, if I submit the 'urlencoded' form (the first one, at top of
    generated web page, if you run the script without any url parameter)
    with the letter 'é' (accentuated e) inside the textarea, I get 'msg=%C3%
    A9' displayed in the browser (knowing this has been proceeded through
    the see() sub).

    While, if I submit the same 'é' from the 'multipart/form-data' form (the
    second one, at bottom of generated web page), I get a well interpreted
    UTF-8 'é' as expected.

    How to get this same UTF-8 'é' when form uses 'application/x-www-form-
    urlencoded' enctype ? How to modify the see() sub for this Hello. I'm
    trying to obtain (and display) user data which come from a web form with
    enctype as 'application/x-www-form-urlencoded' and don't succeed. I can
    do-it if the form is a 'multipart/form-data' but not a 'application/x-
    www-form-urlencoded'.

    Here is a script to show the difference :

    ---- BEGIN ----
    #!/usr/bin/perl -w
    my $this = "utf8_and_webform.pl";

    require 5.8.0;
    use utf8;
    binmode(STDOUT, ':utf8');
    print "Content-type: text/html; charset=UTF-8\n\n";
    if (defined $ENV{'QUERY_STRING'} && length($ENV{'QUERY_STRING'}) > 0)
    {&see;}
    else {&ask;}
    exit 0;

    sub ask
    { # provide web forms for user to enter data
    print <<PAGE
    <html><head><title>Test about UTF-8 and web form</title></head><body>
    Use the form you want and see the resulting data.
    <p>
    FORM with enctype as 'application/x-www-form-urlencoded' :<br>
    <form action='$this?x' method='post' accept-charset='UTF-8'
    enctype='application/x-www-form-urlencoded'>
    <textarea name='msg' rows='4' cols='30' wrap='virtual'></textarea>
    <input type='submit' value='send'>
    </form></body></html></p>
    <p>
    FORM with enctype as 'multipart/form-data' :<br>
    <form action='$this?x' method='post' accept-charset='UTF-8'
    enctype='multipart/form-data'>
    <textarea name='msg' rows='4' cols='30' wrap='virtual'></textarea>
    <input type='submit' value='send'></p>
    </form></body></html>
    PAGE
    }

    sub see
    { # display data which come from user form
    my $data='';

    binmode(STDIN, ':utf8'); # or ':encoding('UTF-8')'
    read(STDIN, $data, $ENV{'CONTENT_LENGTH'});

    # OR
    #use Encode qw(decode);
    #read(STDIN, $data, $ENV{'CONTENT_LENGTH'});
    #$data = decode('UTF-8', $data);

    print $data;
    }
    ----- END ----

    For example, if I submit the 'urlencoded' form (the first one, at top of
    generated web page, if you run the script without any url parameter)
    with the letter 'é' (accentuated e) inside the textarea, I get 'msg=%C3%
    A9' displayed in the browser (knowing this has been proceeded through
    the see() sub).

    While, if I submit the same 'é' from the 'multipart/form-data' form (the
    second one, at bottom of generated web page), I get a well interpreted
    UTF-8 'é' as expected.

    How to get this same UTF-8 'é' when form uses 'application/x-www-form-
    urlencoded' enctype ? What to do in see() for this urlencoded form case
    ?
     
    Yohan N. Leder, Jul 15, 2006
    #1
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. JJBW
    Replies:
    1
    Views:
    10,241
    Joerg Jooss
    Apr 24, 2004
  2. Leif K-Brooks
    Replies:
    3
    Views:
    10,384
    Courtney
    Nov 29, 2004
  3. Replies:
    3
    Views:
    481
  4. Thomas Henz

    decode a urlencoded string

    Thomas Henz, Aug 25, 2003, in forum: ASP General
    Replies:
    2
    Views:
    120
    Ray at
    Aug 25, 2003
  5. Yohan N. Leder

    How get UTF-8 from urlencoded web form

    Yohan N. Leder, Jul 15, 2006, in forum: Perl Misc
    Replies:
    23
    Views:
    565
    John W. Kennedy
    Jul 20, 2006
Loading...

Share This Page