filename charset and internal Perl utf8

Y

Yohan N. Leder

Hi. I'm under Win with ActivePerl 5.8.8, working on a script to add UTF-
8 support. So, the script is generating an html page with charset=utf-8
and containing a form with an upload file field. The script (the same)
which receives the form submission (POST of multipart/form-data) proceed
with a raw STDIN read, then parsing of the content and decoding to
internal utf8 about all name and value text fields, including the one
containing the filename. Something like this about this code part :

binmode STDIN;
read(STDIN, $data, $ENV{'CONTENT_LENGTH'});
.... parsing code here ...
For each name/value pair gathered in a hash :
decode("utf8", $name);
decode("utf8", $value);

Well, it works for every pair except the filename one when ther's an
acccentuated character in the file name.

For example,
- In the form, I select (under Win) : "c:\â.bin"
- I clic sending button
- I receive "c:\â.bin" whatever be the decoding stage.

This is the same with or without decode() :-(

What does it means exactly ?

Does it means that this filename as to be decoded from the current
operating-system's charset ; decode('iso-8859-1', $value) ?

If yes, how to know the client os's charset ?
If no, what to do ?

And, same question but on server-side : do I have to take care of the
current server operating-system's charset for the purpose to create
files with filename in this same charset ?
 
J

jl_post

Yohan said:
binmode STDIN;
read(STDIN, $data, $ENV{'CONTENT_LENGTH'});
... parsing code here ...
For each name/value pair gathered in a hash :
decode("utf8", $name);
decode("utf8", $value);


Dear Yohan,

I'll offer some input, but please be warned that I am not too
familiar with different encodings and how they work. Therefore, some
of the advice I'll give you might have errors. (Keep that in mind if
someone else offers you advice that contradicts mine.)

That having been said:

After reading through "perldoc Encode" I see that the decode()
function returns a string encoded in Perl's internal form. From what
you posted, it looks like you're doing nothing with the return value.
You may have wanted to modify $name and $value by transforming them
into Perl's internal form. If so, you can do that with this code:

$name = decode("utf8", $name);
$value = decode("utf8", $value);

In your old code, you never actually modified $name and $value.
This might explain why you had the same results with and without
decode().

I don't know if my advice will help, but you may want to consider
trying it anyway. If it works, great! (If it doesn't, well...)

I hope this helps, Yohan.

-- Jean-Luc
 
Y

Yohan N. Leder

$name = decode("utf8", $name);
$value = decode("utf8", $value);

Thanks, but I don't understand your reply. What you show is exactly what
I'm doing already : decode to internal utf8 the name and value pairs.
So, it works for all the pairs unless the value containing the filename
as explained in my original message.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,011
Latest member
AjaUqq1950

Latest Threads

Top