utf8 encoding problem

Ad Ad · Jun 25, 2009

Hi,
I am retrieving a string from a txt file.
The file contains some utf8 characters.

I am comparing these characters against a default string.

The problem is that some of the characters are not stored in a default
format.

For example:
A is stored as ï¼¡

Naturally when I compare the character it fails.
Strangely when I unpacked the character it appears as 65313 which is the
correct utf8 number for A.

Any way around this?

thanks.

Eric Hodel · Jun 25, 2009

Hi,
I am retrieving a string from a txt file.
The file contains some utf8 characters.

I am comparing these characters against a default string.

The problem is that some of the characters are not stored in a default
format.

For example:
A is stored as =EF=BC=A1

Naturally when I compare the character it fails.
Strangely when I unpacked the character it appears as 65313 which is =20=

the
correct utf8 number for A.

Any way around this?

Well, =EF=BC=A1 is "Fullwidth Latin Capital Letter A" from the "Hiragana =
and =20
Katakana" category (Unicode FF21) whereas A is "Latin Capital Letter =20
A" from the "Latin" category (Unicode 0041).

I don't know of a way to translate between the two categories, but =20
maybe that will help.=

James Rubingh · Jun 26, 2009

Although I haven't tried it myself, I did a search for $BA43QH>3QJQ49(B and
found this page.
It appears people use jcode and tr to solve this problem.

http://www.eml.ele.cst.nihon-u.ac.jp/~momma/wiki/wiki.cgi/Ruby/全角半角変換.html
http://blog.grayproductions.net/articles/the_kcode_variable_and_jcode_library

Ad Ad · Jun 26, 2009

James said:
Although I haven't tried it myself, I did a search for
$BA43QH>3QJQ49(B and
found this page.
It appears people use jcode and tr to solve this problem.

http://www.eml.ele.cst.nihon-u.ac.jp/~momma/wiki/wiki.cgi/Ruby/全角半角変換.html
http://blog.grayproductions.net/articles/the_kcode_variable_and_jcode_library

2009/6/25 Eric Hodel <[email protected]>:

brilliant!
str.tr!('ï½-ï½šï¼¡-ï¼º','a-zA-z') worked like a charm.

Cyrillic text from file - set utf8 in cmd, unknown characters output anyway	0	Nov 11, 2022
[ENCODING] UTF8 hell	12	Feb 2, 2010
MySql+UTF8 woes	0	Jul 26, 2007
Problem with encoding latin1/UTF8	1	Dec 28, 2007
Problem with a login script, SESSION user rights and put this together so it works with the other pages and MySQL. Code examples.	2	May 5, 2023
encoding error	1	Feb 20, 2013
character encoding question	2	Mar 26, 2010
UTF8	2	Mar 15, 2005

utf8 encoding problem

Ad Ad

Eric Hodel

James Rubingh

Ad Ad

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads