HTML::Entities::encode() returning wrong(?) entities

Discussion in 'Perl Misc' started by Jim Higson, Jul 23, 2004.

  1. Jim Higson

    Jim Higson Guest

    I'm calling encode_entities on some text I have read from a file, to turn it
    into a webpage. According to file:

    $ file text/text.en
    $ text/text.en: UTF-8 Unicode English text, with very long lines

    (although this might not matter)
    Anyway, the letter ä appears in the text, and should be changed to ä

    However, instead it is changed to:
    ä

    I can't see anything unusual about my code. Any ideas why I'm having this
    problem?
     
    Jim Higson, Jul 23, 2004
    #1
    1. Advertising

  2. Jim Higson

    Jim Higson Guest

    Jim Higson wrote:

    > I'm calling encode_entities on some text I have read from a file, to turn
    > it into a webpage. According to file:
    >
    > $ file text/text.en
    > $ text/text.en: UTF-8 Unicode English text, with very long lines
    >
    > (although this might not matter)
    > Anyway, the letter ä appears in the text, and should be changed to ä
    >
    > However, instead it is changed to:
    > ä
    >
    > I can't see anything unusual about my code. Any ideas why I'm having this
    > problem?



    I just found the answer myself - as I suspected it was to do with reading
    the unicode in perl. Adding use open ':utf8'; to the top of the source
    fixed this (although I'm not quite certain exactly what this means)
     
    Jim Higson, Jul 23, 2004
    #2
    1. Advertising

  3. Jim Higson

    Joe Smith Guest

    Jim Higson wrote:

    > $ text/text.en: UTF-8 Unicode English text, with very long lines
    > Anyway, the letter ä appears in the text, and should be changed to ä


    In UTF-8 encoding, the single character "ä" is stored as two bytes:
    "\xC3" and "\xA9". If you allow perl to think that the file is ISO-8859-1,
    it will interpret those two bytes as "Ã" and "©". You need to tell perl
    that the file is :utf8 in order for it to recognize those two bytes as
    being a single Unicode character.

    -Joe
     
    Joe Smith, Jul 25, 2004
    #3
  4. Jim Higson

    Eric Amick Guest

    On Fri, 23 Jul 2004 20:43:44 +0100, Jim Higson <> wrote:

    >Jim Higson wrote:
    >
    >> I'm calling encode_entities on some text I have read from a file, to turn
    >> it into a webpage. According to file:
    >>
    >> $ file text/text.en
    >> $ text/text.en: UTF-8 Unicode English text, with very long lines
    >>
    >> (although this might not matter)
    >> Anyway, the letter ä appears in the text, and should be changed to &auml;
    >>
    >> However, instead it is changed to:
    >> &Atilde;&curren;
    >>
    >> I can't see anything unusual about my code. Any ideas why I'm having this
    >> problem?

    >
    >
    >I just found the answer myself - as I suspected it was to do with reading
    >the unicode in perl. Adding use open ':utf8'; to the top of the source
    >fixed this (although I'm not quite certain exactly what this means)


    It tells Perl to open all files with UTF-8 encoding set by default. Only
    you can say whether that is the right thing. If it isn't, you can
    specify it for specific files by using ':utf8' as the second argument of
    a three-argument open or with a binmode call on the appropriate
    filehandle.

    --
    Eric Amick
    Columbia, MD
     
    Eric Amick, Jul 25, 2004
    #4
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. ViperDK

    The right way to Encode html output

    ViperDK, Jul 17, 2003, in forum: ASP .Net
    Replies:
    2
    Views:
    698
    Nicole Calinoiu
    Jul 22, 2003
  2. BemusedByQM
    Replies:
    38
    Views:
    1,023
    Raymond DeCampo
    Aug 18, 2005
  3. Robert Oschler
    Replies:
    8
    Views:
    768
    Christopher T King
    Jul 31, 2004
  4. Robert Brewer
    Replies:
    0
    Views:
    527
    Robert Brewer
    Jul 25, 2004
  5. G.B.
    Replies:
    5
    Views:
    139
Loading...

Share This Page