Discussion in 'Perl Misc' started by cc96ai, May 30, 2007.

  cc96ai

    cc96ai Guest

    I got UTF8 value %C3%A9
    how could I encode it become é ?

    I try encode_base64 , but no luck
    maybe I miss some, anyone have idea ?
    cc96ai, May 30, 2007
  Guest

    cc96ai <> wrote in message-id: <>

    >I got UTF8 value %C3%A9
    >how could I encode it become é ?
    >I try encode_base64 , but no luck
    >maybe I miss some, anyone have idea ?

    you might like Unicode::Lite
    , May 30, 2007
  3. On 2007-05-30 00:00, cc96ai <> wrote:
    > I got UTF8 value %C3%A9

    Thats's not UTF-8. That's URL-encoded UTF-8.

    > how could I encode it become é ?

    You have *decode* it to get é. And since it is encoded twice, you have
    to decode it twice.

    First decode the URL-Encoding:

    $s = "%C3%A9";

    $s =~ s/%([0-9A-F][0-9A-F])/chr(hex($1))/eg;

    (there is almost certainly a module on CPAN which provides a
    function to do that - but (to my surprise) neither CGI nor URI
    contain such a function, ans its a simple one-liner)

    Now you have UTF-8, which you can decode to a "perl character string":

    use Encode;
    $s = decode('utf-8', $s);

    Now you have a string with a single character "é".

    Now, how does MIME get into it?

    For MIME, you again have to decide on a specific character encoding
    (e.g., UTF-8, or ISO-8859-1, or whatever), and then possibly on a
    specific transport encoding (base64 or quoted-printable).

    So you have to encode it in your character encoding first, and then
    possibly encode the result again with the transport encoding.

    Note that the MIME is a quite complex format (especially the encoding of
    header fields described in RFC 2047 and RFC 2231), so I won't go into
    more detail unless you tell us exactly what you need. Any advice I can
    give (except "use existing modules" and "read the RFCs") is almost
    certainly incomplete and will cause you to produce ill-formed messages
    if follow it blindly.


    Peter J. Holzer, Jun 10, 2007
