filename charset and internal Perl utf8

Discussion in 'Perl Misc' started by Yohan N. Leder, Jun 8, 2006.

  1. Hi. I'm under Win with ActivePerl 5.8.8, working on a script to add UTF-
    8 support. So, the script is generating an html page with charset=utf-8
    and containing a form with an upload file field. The script (the same)
    which receives the form submission (POST of multipart/form-data) proceed
    with a raw STDIN read, then parsing of the content and decoding to
    internal utf8 about all name and value text fields, including the one
    containing the filename. Something like this about this code part :

    binmode STDIN;
    read(STDIN, $data, $ENV{'CONTENT_LENGTH'});
    .... parsing code here ...
    For each name/value pair gathered in a hash :
    decode("utf8", $name);
    decode("utf8", $value);

    Well, it works for every pair except the filename one when ther's an
    acccentuated character in the file name.

    For example,
    - In the form, I select (under Win) : "c:\â.bin"
    - I clic sending button
    - I receive "c:\â.bin" whatever be the decoding stage.

    This is the same with or without decode() :-(

    What does it means exactly ?

    Does it means that this filename as to be decoded from the current
    operating-system's charset ; decode('iso-8859-1', $value) ?

    If yes, how to know the client os's charset ?
    If no, what to do ?

    And, same question but on server-side : do I have to take care of the
    current server operating-system's charset for the purpose to create
    files with filename in this same charset ?
    Yohan N. Leder, Jun 8, 2006
    1. Advertisements

  2. Guest

    Yohan N. Leder wrote:
    > binmode STDIN;
    > read(STDIN, $data, $ENV{'CONTENT_LENGTH'});
    > ... parsing code here ...
    > For each name/value pair gathered in a hash :
    > decode("utf8", $name);
    > decode("utf8", $value);

    Dear Yohan,

    I'll offer some input, but please be warned that I am not too
    familiar with different encodings and how they work. Therefore, some
    of the advice I'll give you might have errors. (Keep that in mind if
    someone else offers you advice that contradicts mine.)

    That having been said:

    After reading through "perldoc Encode" I see that the decode()
    function returns a string encoded in Perl's internal form. From what
    you posted, it looks like you're doing nothing with the return value.
    You may have wanted to modify $name and $value by transforming them
    into Perl's internal form. If so, you can do that with this code:

    $name = decode("utf8", $name);
    $value = decode("utf8", $value);

    In your old code, you never actually modified $name and $value.
    This might explain why you had the same results with and without

    I don't know if my advice will help, but you may want to consider
    trying it anyway. If it works, great! (If it doesn't, well...)

    I hope this helps, Yohan.

    -- Jean-Luc
    , Jun 8, 2006
    1. Advertisements

  3. In article <>,
    > $name = decode("utf8", $name);
    > $value = decode("utf8", $value);

    Thanks, but I don't understand your reply. What you show is exactly what
    I'm doing already : decode to internal utf8 the name and value pairs.
    So, it works for all the pairs unless the value containing the filename
    as explained in my original message.
    Yohan N. Leder, Jun 9, 2006
  4. In article <>,
    > $value = decode("utf8", $value);

    Oops, OK, now I'm seeing. Poor of me :)
    Yohan N. Leder, Jun 9, 2006
    1. Advertisements

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. =?Utf-8?B?Sm9l?=

    Extract filename from a filename typed by user

    =?Utf-8?B?Sm9l?=, Aug 23, 2004, in forum: ASP .Net
    Travis Murray
    Aug 24, 2004
  2. Replies:
    Roland de Ruiter
    Jun 15, 2006
  3. Stefano Crocco
    Stefano Crocco
    Nov 14, 2008
  4. gry
    Alf P. Steinbach
    Mar 13, 2012
  5. optimistx

    javascript charset <> page charset

    optimistx, Aug 14, 2008, in forum: Javascript
    Aug 15, 2008

Share This Page