Check if a string contains japanese character and convert from UTF-8 to ISO-2022-JP

W

wing328hk

Hi,

Is there a way to check whether string (in UTF8) contains japanese
characters?

Is there a way to convert UTF-8 to ISO-2022-JP?

Thanks and regards,
Wing
 
J

Jürgen Exner

Is there a way to check whether string (in UTF8) contains japanese
characters?

Well, you could use a simple RE to check if your text contains any
characters that are within the range of those characters that are typically
used for Japanese text. I am not aware of any pre-written library or
function to do that.
And actually I think that's even difficult to do because arguably e.g. the
letter "a" may or may not be part of what you consider Japanese characters.
After all nowadays Latin characters are frequently used in Japanese text for
all kinds of foreign names.
Is there a way to convert UTF-8 to ISO-2022-JP?

Text::Iconv does a good job at that.

jue
 
B

BZ

Is there a way to check whether string (in UTF8) contains japanese
characters?

You could try matching against one of the unicode character classes like
\p{Hiragana} (see perlunicode).
Is there a way to convert UTF-8 to ISO-2022-JP?

Encode::from_to and friends.
 
W

wing328hk

Thanks for your prompt reply.

I've taken a look at perlunicdoe and it seems to me that it's possible
to match the japanese characters by checking the class property.

I'm just wondering whether there is a way to check if the string
contains Japanese characters but not Chinese characters since some
Japanese characters are also Chinese characters.

Thanks and regards,
Wing
 
R

Rick Scott

([email protected] uttered:)
I've taken a look at perlunicdoe and it seems to me that it's possible
to match the japanese characters by checking the class property.

I'm just wondering whether there is a way to check if the string
contains Japanese characters but not Chinese characters since some
Japanese characters are also Chinese characters.

Unicode uses the same code point for a given character regardless of
what language it's in. So, for instance, the character

QQQa
QQf
QQf
QQf
QQf qaa
??????????????QQP?????????????'
QQf
QQf]Q
QQf Q
]QQ ?ap
]QQ ?4ba
QQf ]QQQ
qaQ?' )?QQbaa
aaJ?? ?4QQQ?'

is Unicode 0x5927 regardless whether you're writing Chinese or Japanese.
As I understand it, all the kanji characters (along with others) are
members of the Han Unicode script, so \p{Han} will match them
regardless of whether they are used in Japanese, Chinese, both, or
neither. If you want to differentiate them, it looks as though you
are going to have to compile (or find) lists of what you consider to
be Chinese Chinese characters and Japanese Chinese characters. =)




Rick
 
J

John W. Kennedy

Hi,

Is there a way to check whether string (in UTF8) contains japanese
characters?

Han characters are shared, so it cannot be done that way. Assuming that
you are looking at substantial amounts of real-world text, you could
scan for kana, which are uniquely Japanese.
Is there a way to convert UTF-8 to ISO-2022-JP?

Not in general, because Unicode has more characters.

--
John W. Kennedy
"But now is a new thing which is very old--
that the rich make themselves richer and not poorer,
which is the true Gospel, for the poor's sake."
-- Charles Williams. "Judgement at Chelmsford"
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,479
Members
44,899
Latest member
RodneyMcAu

Latest Threads

Top