Detect whether unicode string is Japanese

B

Bob Marley

How can I get a tally of how many characters in a Unicode string are
Japanese (hiragana, katakana, kanji)? When I unpack a string, each
character comes out like \xE3\x81\x95, but I am trying to check if it's
in the range 3040-309F (Hiragana) and I don't understand how to convert
between the 3-byte representation and that range...
 
J

Jan Dvorak

How can I get a tally of how many characters in a Unicode string are
Japanese (hiragana, katakana, kanji)? When I unpack a string, each
character comes out like \xE3\x81\x95, but I am trying to check if it's
in the range 3040-309F (Hiragana) and I don't understand how to convert
between the 3-byte representation and that range...

You may lookup the unicode mapping on google, but you will have to write new
function for each possible encoding (UTF-8,UTF16LE...).

Or, with ruby 1.9, you can iterate string by characters (not bytes), and
use .ord function to get the unicode position number:

mystr.each_char do |ch|
puts ch.ord
end

Jan
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,535
Members
45,007
Latest member
obedient dusk

Latest Threads

Top