ordering Japanese text

mab2001 · May 4, 2006

Hi,

I am using Sadhiro Tomoyuki's Lingua::JA::Sort::JIS module to sort
Japanese names of stores. I have come close to achieving the order my
client has asked for but am having a little difficulty matching their
request exactly. The problem seems to be collating kana glyphs with
manyogana glyphs. (Please excuse me if I am misusing any terms - this
is my first introduction to Japanese.)

Here is an example of 13 store names ordered with
Lingua::JA::Sort::JIS::msort:

1. $B0K@*C0(B JR$B5~ETE9(B
2. $B%"%Z%C%/%9(B $BJ!;3(B
3. $B%"%_%e%W%i%6(B $B</;yEg(B
4. $B%*%/%N(B $B00@n(B
5. $B$5$/$iLnI42_E9(B $B@gBf(B
6. $B$5$D$^20(B $B</;yEg(B
7. $B%9%?%s%9(B $BJF;R(B
8. $B$=$4$&(B $B?@8ME9(B
9. $B$=$4$&(B $B@iMUE9(B
10. $B$=$4$&(B $BBg5\E9(B
11. $B$=$4$&(B $B2#IME9(B
12. $B%@%$%"%b%s%I%7%F%#%"%k%k(B $B3`86(B
13. $B%K%e!<%:(B $B7'K\(B

My client tells me that entry 1 should actually come after the 3rd
entry and before the fourth. From this description on manyogana, I'm
thinking they're saying that collation of the glyph $B0K(B should be based
on its katakana adaptation $B%$(B which makes sense:

http://en.wikipedia.org/wiki/Manyogana

Note I'm basing many of my statements on staring at and comparing these
glyphs online and so I might be far off.

So my questions are:

1. Is my client correct in their ordering?
2. I believe I've tried all the combinations of collation levels and
kanji classes in the Lingua::JA::Sort::JIS jcmp function but have not
achieved the desired ordering. Have I perhaps missed the correct
combination?
3. Is the solution to first convert the manyogana characters to
katakana and then do the msort? If so does anyone know of a Perl module
to do this or a nice reference that I could use more programmatically
than the image on the link above?
4. Can anyone think of any other glyphs or classes of Japanese glyphs
similar to manyogana that I should be worried about?

Thanks for any help you can give me!

Best,
Mike

mab2001 · May 6, 2006

After a discussion of this on the perl-i18n mailing list, I've come to
understand the problem a bit more. In Japanese, text ordering is based
on phonetization. But as in english, there are multiple pronunciations
of a particular piece of text. Moreover, the "more correct"
pronunciation among the possibilities is influenced by the context of
the text. So in other words, the problem is intractable if all you have
is the text alone and inefficient even if you have more information
(because of the myriad factors that influence pronunciation).

The solution that I am using then is to store with each piece of
kana/kanji text, a kana-only phonetization of that text. I then rely on
the content editors to know the context of the text and supply an
accurate phonetization in kana. (In other words, I'm putting the
responsibility on someone else!) There does exist a determinate
ordering of the kana-only text and so this becomes a tractable problem.

Mike

Guest · May 7, 2006

In comp.lang.perl.misc (e-mail address removed) wrote:

: The solution that I am using then is to store with each piece of
: kana/kanji text, a kana-only phonetization of that text. I then rely on
: the content editors to know the context of the text and supply an
: accurate phonetization in kana. (In other words, I'm putting the
: responsibility on someone else!) There does exist a determinate
: ordering of the kana-only text and so this becomes a tractable problem.

This is indeed best practice; have a look at Sharp Zaurus and other Japanese
PIMs which regularly offer a "pronounciation" field next to "name in written
form", the first is kana, the latter is kanji.

Oliver.

A number everyday of the month "and" a different number depending on the day of the month´s day time	2	Mar 16, 2021
Minimum Total Difficulty	0	Nov 15, 2023
Php combine identical lines in text file	4	Oct 11, 2023
Please Help me to Write these C programs, I am fully confused to solve these Programs. Thanks alot.	1	May 30, 2022
Hypotenuse of a right triangle	0	Jan 18, 2018
Fibonacci C code	1	Aug 4, 2018
Text does not display correctly to glow	2	Sep 16, 2022
Survey details won't go through using php, ajax, Mysql	0	Oct 26, 2023

ordering Japanese text

mab2001

mab2001

Guest

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads