utf8 encoding problem

Discussion in 'Ruby' started by Ad Ad, Jun 25, 2009.

  1. Ad Ad

    Ad Ad Guest

    Hi,
    I am retrieving a string from a txt file.
    The file contains some utf8 characters.

    I am comparing these characters against a default string.

    The problem is that some of the characters are not stored in a default
    format.

    For example:
    A is stored as A

    Naturally when I compare the character it fails.
    Strangely when I unpacked the character it appears as 65313 which is the
    correct utf8 number for A.

    Any way around this?

    thanks.
    --
    Posted via http://www.ruby-forum.com/.
    Ad Ad, Jun 25, 2009
    #1
    1. Advertising

  2. Ad Ad

    Eric Hodel Guest

    On Jun 25, 2009, at 14:29, Ad Ad wrote:

    > Hi,
    > I am retrieving a string from a txt file.
    > The file contains some utf8 characters.
    >
    > I am comparing these characters against a default string.
    >
    > The problem is that some of the characters are not stored in a default
    > format.
    >
    > For example:
    > A is stored as =EF=BC=A1
    >
    > Naturally when I compare the character it fails.
    > Strangely when I unpacked the character it appears as 65313 which is =20=


    > the
    > correct utf8 number for A.
    >
    > Any way around this?


    Well, =EF=BC=A1 is "Fullwidth Latin Capital Letter A" from the "Hiragana =
    and =20
    Katakana" category (Unicode FF21) whereas A is "Latin Capital Letter =20
    A" from the "Latin" category (Unicode 0041).

    I don't know of a way to translate between the two categories, but =20
    maybe that will help.=
    Eric Hodel, Jun 25, 2009
    #2
    1. Advertising

  3. Although I haven't tried it myself, I did a search for $BA43QH>3QJQ49(B and
    found this page.
    It appears people use jcode and tr to solve this problem.

    http://www.eml.ele.cst.nihon-u.ac.jp/~momma/wiki/wiki.cgi/Ruby/全角半角変換.html
    http://blog.grayproductions.net/articles/the_kcode_variable_and_jcode_library


    2009/6/25 Eric Hodel <>:
    > On Jun 25, 2009, at 14:29, Ad Ad wrote:
    >
    >> Hi,
    >> I am retrieving a string from a txt file.
    >> The file contains some utf8 characters.
    >>
    >> I am comparing these characters against a default string.
    >>
    >> The problem is that some of the characters are not stored in a default
    >> format.
    >>
    >> For example:
    >> A is stored as $B#A(B
    >>
    >> Naturally when I compare the character it fails.
    >> Strangely when I unpacked the character it appears as 65313 which is the
    >> correct utf8 number for A.
    >>
    >> Any way around this?

    >
    > Well, $B#A(B is "Fullwidth Latin Capital Letter A" from the "Hiragana and
    > Katakana" category (Unicode FF21) whereas A is "Latin Capital Letter A" from
    > the "Latin" category (Unicode 0041).
    >
    > I don't know of a way to translate between the two categories, but maybe
    > that will help.
    >


    --
    Cheers,
    James Rubingh
    http://www.wrive.com
    James Rubingh, Jun 26, 2009
    #3
  4. Ad Ad

    Ad Ad Guest

    James Rubingh wrote:
    > Although I haven't tried it myself, I did a search for
    > $BA43QH>3QJQ49(B and
    > found this page.
    > It appears people use jcode and tr to solve this problem.
    >
    > http://www.eml.ele.cst.nihon-u.ac.jp/~momma/wiki/wiki.cgi/Ruby/全角半角変換.html
    > http://blog.grayproductions.net/articles/the_kcode_variable_and_jcode_library
    >
    >
    > 2009/6/25 Eric Hodel <>:


    brilliant!
    str.tr!('ï½-zA-Z','a-zA-z') worked like a charm. :)
    --
    Posted via http://www.ruby-forum.com/.
    Ad Ad, Jun 26, 2009
    #4
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Hardy Wang

    Encoding.Default and Encoding.UTF8

    Hardy Wang, Jun 8, 2004, in forum: ASP .Net
    Replies:
    5
    Views:
    18,778
    Jon Skeet [C# MVP]
    Jun 9, 2004
  2. Wichert Akkerman

    utf8 encoding problem

    Wichert Akkerman, Jan 22, 2004, in forum: Python
    Replies:
    1
    Views:
    428
    Erik Max Francis
    Jan 22, 2004
  3. Wichert Akkerman

    Re: utf8 encoding problem

    Wichert Akkerman, Jan 22, 2004, in forum: Python
    Replies:
    4
    Views:
    493
    =?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=
    Jan 25, 2004
  4. Mark Toth

    Problem with encoding latin1/UTF8

    Mark Toth, Dec 28, 2007, in forum: Ruby
    Replies:
    1
    Views:
    133
    Chris Gers32
    Jan 7, 2008
  5. gry
    Replies:
    2
    Views:
    697
    Alf P. Steinbach
    Mar 13, 2012
Loading...

Share This Page