Check if a string contains japanese character and convert from UTF-8 to ISO-2022-JP

wing328hk · Mar 16, 2006

Hi,

Is there a way to check whether string (in UTF8) contains japanese
characters?

Is there a way to convert UTF-8 to ISO-2022-JP?

Thanks and regards,
Wing

Jürgen Exner · Mar 16, 2006

Is there a way to check whether string (in UTF8) contains japanese
characters?

Well, you could use a simple RE to check if your text contains any
characters that are within the range of those characters that are typically
used for Japanese text. I am not aware of any pre-written library or
function to do that.
And actually I think that's even difficult to do because arguably e.g. the
letter "a" may or may not be part of what you consider Japanese characters.
After all nowadays Latin characters are frequently used in Japanese text for
all kinds of foreign names.

Is there a way to convert UTF-8 to ISO-2022-JP?

Text::Iconv does a good job at that.

jue

BZ · Mar 16, 2006

Is there a way to check whether string (in UTF8) contains japanese
characters?

You could try matching against one of the unicode character classes like
\p{Hiragana} (see perlunicode).

Is there a way to convert UTF-8 to ISO-2022-JP?

Encode::from_to and friends.

wing328hk · Mar 17, 2006

Thanks for your prompt reply.

I've taken a look at perlunicdoe and it seems to me that it's possible
to match the japanese characters by checking the class property.

I'm just wondering whether there is a way to check if the string
contains Japanese characters but not Chinese characters since some
Japanese characters are also Chinese characters.

Thanks and regards,
Wing

Rick Scott · Mar 17, 2006

([email protected] uttered

I've taken a look at perlunicdoe and it seems to me that it's possible
to match the japanese characters by checking the class property.

I'm just wondering whether there is a way to check if the string
contains Japanese characters but not Chinese characters since some
Japanese characters are also Chinese characters.

Unicode uses the same code point for a given character regardless of
what language it's in. So, for instance, the character

QQQa
QQf
QQf
QQf
QQf qaa
??????????????QQP?????????????'
QQf
QQf]Q
QQf Q
]QQ ?ap
]QQ ?4ba
QQf ]QQQ
qaQ?' )?QQbaa
aaJ?? ?4QQQ?'

is Unicode 0x5927 regardless whether you're writing Chinese or Japanese.
As I understand it, all the kanji characters (along with others) are
members of the Han Unicode script, so \p{Han} will match them
regardless of whether they are used in Japanese, Chinese, both, or
neither. If you want to differentiate them, it looks as though you
are going to have to compile (or find) lists of what you consider to
be Chinese Chinese characters and Japanese Chinese characters. =)

Rick

John W. Kennedy · Mar 18, 2006

Hi,

Is there a way to check whether string (in UTF8) contains japanese
characters?

Han characters are shared, so it cannot be done that way. Assuming that
you are looking at substantial amounts of real-world text, you could
scan for kana, which are uniquely Japanese.

Is there a way to convert UTF-8 to ISO-2022-JP?

Not in general, because Unicode has more characters.

--
John W. Kennedy
"But now is a new thing which is very old--
that the rich make themselves richer and not poorer,
which is the true Gospel, for the poor's sake."
-- Charles Williams. "Judgement at Chelmsford"

japanese encoding iso-2022-jp in python vs. perl	4	Oct 23, 2007
Question on conversion from UTF8 to Shift_JIS (or ISO-2022-JP)	1	Apr 19, 2006
How to convert between Japanese coding systems?	3	Feb 19, 2009
XML::PARSER utf-8 and japanese characters	1	Jul 27, 2004
How can I convert contacts from MSG to VCF that will function flawlessly on iOS and Android?	1	Sep 4, 2025
CGI and UTF-8	14	Sep 28, 2009
Hello guys ! How do I convert a string from an array into numbers ? Javascript	3	Dec 19, 2022
MeCab UTF-8 Decoding Problem	6	Jun 29, 2013

Check if a string contains japanese character and convert from UTF-8 to ISO-2022-JP

wing328hk

Jürgen Exner

BZ

wing328hk

Rick Scott

John W. Kennedy

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads