How to get UTF-8 from an urlencoded web form ?


Y

Yohan N. Leder

Hello.

All my tests are done using ActivePerl 5.8.8.817 under Win2K FR and
Apache2.

I'm trying to obtain (and display) user data which come from a web form
with enctype as 'application/x-www-form-urlencoded' and don't succeed. I
can do-it if the form is a 'multipart/form-data' but not a
'application/x-www-form-urlencoded'.

Here is a script to show the difference :

---- BEGIN ----
#!/usr/bin/perl -w
my $this = "utf8_and_webform.pl";

require 5.8.0;
use utf8;
binmode(STDOUT, ':utf8');
print "Content-type: text/html; charset=UTF-8\n\n";
if (defined $ENV{'QUERY_STRING'} && length($ENV{'QUERY_STRING'}) > 0)
{&see;}
else {&ask;}
exit 0;

sub ask
{ # provide web forms for user to enter data
print <<PAGE
<html><head><title>Test about UTF-8 and web form</title></head><body>
Use the form you want and see the resulting data.
<p>
FORM with enctype as 'application/x-www-form-urlencoded' :<br>
<form action='$this?x' method='post' accept-charset='UTF-8'
enctype='application/x-www-form-urlencoded'>
<textarea name='msg' rows='4' cols='30' wrap='virtual'></textarea>
<input type='submit' value='send'>
</form></body></html></p>
<p>
FORM with enctype as 'multipart/form-data' :<br>
<form action='$this?x' method='post' accept-charset='UTF-8'
enctype='multipart/form-data'>
<textarea name='msg' rows='4' cols='30' wrap='virtual'></textarea>
<input type='submit' value='send'></p>
</form></body></html>
PAGE
}

sub see
{ # display data which come from user form
my $data='';

binmode(STDIN, ':utf8'); # or ':encoding('UTF-8')'
read(STDIN, $data, $ENV{'CONTENT_LENGTH'});

# OR
#use Encode qw(decode);
#read(STDIN, $data, $ENV{'CONTENT_LENGTH'});
#$data = decode('UTF-8', $data);

print $data;
}
----- END ----

For example, if I submit the 'urlencoded' form (the first one, at top of
generated web page, if you run the script without any url parameter)
with the letter 'é' (accentuated e) inside the textarea, I get 'msg=%C3%
A9' displayed in the browser (knowing this has been proceeded through
the see() sub).

While, if I submit the same 'é' from the 'multipart/form-data' form (the
second one, at bottom of generated web page), I get a well interpreted
UTF-8 'é' as expected.

How to get this same UTF-8 'é' when form uses 'application/x-www-form-
urlencoded' enctype ? How to modify the see() sub for this Hello. I'm
trying to obtain (and display) user data which come from a web form with
enctype as 'application/x-www-form-urlencoded' and don't succeed. I can
do-it if the form is a 'multipart/form-data' but not a 'application/x-
www-form-urlencoded'.

Here is a script to show the difference :

---- BEGIN ----
#!/usr/bin/perl -w
my $this = "utf8_and_webform.pl";

require 5.8.0;
use utf8;
binmode(STDOUT, ':utf8');
print "Content-type: text/html; charset=UTF-8\n\n";
if (defined $ENV{'QUERY_STRING'} && length($ENV{'QUERY_STRING'}) > 0)
{&see;}
else {&ask;}
exit 0;

sub ask
{ # provide web forms for user to enter data
print <<PAGE
<html><head><title>Test about UTF-8 and web form</title></head><body>
Use the form you want and see the resulting data.
<p>
FORM with enctype as 'application/x-www-form-urlencoded' :<br>
<form action='$this?x' method='post' accept-charset='UTF-8'
enctype='application/x-www-form-urlencoded'>
<textarea name='msg' rows='4' cols='30' wrap='virtual'></textarea>
<input type='submit' value='send'>
</form></body></html></p>
<p>
FORM with enctype as 'multipart/form-data' :<br>
<form action='$this?x' method='post' accept-charset='UTF-8'
enctype='multipart/form-data'>
<textarea name='msg' rows='4' cols='30' wrap='virtual'></textarea>
<input type='submit' value='send'></p>
</form></body></html>
PAGE
}

sub see
{ # display data which come from user form
my $data='';

binmode(STDIN, ':utf8'); # or ':encoding('UTF-8')'
read(STDIN, $data, $ENV{'CONTENT_LENGTH'});

# OR
#use Encode qw(decode);
#read(STDIN, $data, $ENV{'CONTENT_LENGTH'});
#$data = decode('UTF-8', $data);

print $data;
}
----- END ----

For example, if I submit the 'urlencoded' form (the first one, at top of
generated web page, if you run the script without any url parameter)
with the letter 'é' (accentuated e) inside the textarea, I get 'msg=%C3%
A9' displayed in the browser (knowing this has been proceeded through
the see() sub).

While, if I submit the same 'é' from the 'multipart/form-data' form (the
second one, at bottom of generated web page), I get a well interpreted
UTF-8 'é' as expected.

How to get this same UTF-8 'é' when form uses 'application/x-www-form-
urlencoded' enctype ? What to do in see() for this urlencoded form case
?
 
Ad

Advertisements


Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top