Detect whether unicode string is Japanese

Bob Marley · Sep 29, 2008

How can I get a tally of how many characters in a Unicode string are
Japanese (hiragana, katakana, kanji)? When I unpack a string, each
character comes out like \xE3\x81\x95, but I am trying to check if it's
in the range 3040-309F (Hiragana) and I don't understand how to convert
between the 3-byte representation and that range...

Jan Dvorak · Sep 30, 2008

How can I get a tally of how many characters in a Unicode string are
Japanese (hiragana, katakana, kanji)? When I unpack a string, each
character comes out like \xE3\x81\x95, but I am trying to check if it's
in the range 3040-309F (Hiragana) and I don't understand how to convert
between the 3-byte representation and that range...

You may lookup the unicode mapping on google, but you will have to write new
function for each possible encoding (UTF-8,UTF16LE...).

Or, with ruby 1.9, you can iterate string by characters (not bytes), and
use .ord function to get the unicode position number:

mystr.each_char do |ch|
puts ch.ord
end

Jan

Parsing Japanese Language and Some Ruby Trivia	14	Jan 11, 2006
Converting an Array to a String in JavaScript	7	Sep 22, 2023
Flexible string representation, unicode, typography, ...	94	Aug 23, 2012
Thinking Unicode	0	Aug 8, 2013
Unicode (UTF-8) in C	13	Mar 16, 2014
Python Unicode handling wins again -- mostly	67	Nov 30, 2013
YAML + ASCII Encoded Unicode	1	Feb 9, 2009
Unicode questions	17	Oct 19, 2010

Detect whether unicode string is Japanese

Bob Marley

Jan Dvorak

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads