ordering Japanese text

M

mab2001

Hi,

I am using Sadhiro Tomoyuki's Lingua::JA::Sort::JIS module to sort
Japanese names of stores. I have come close to achieving the order my
client has asked for but am having a little difficulty matching their
request exactly. The problem seems to be collating kana glyphs with
manyogana glyphs. (Please excuse me if I am misusing any terms - this
is my first introduction to Japanese.)

Here is an example of 13 store names ordered with
Lingua::JA::Sort::JIS::msort:

1. $B0K@*C0(B JR$B5~ETE9(B
2. $B%"%Z%C%/%9(B $BJ!;3(B
3. $B%"%_%e%W%i%6(B $B</;yEg(B
4. $B%*%/%N(B $B00@n(B
5. $B$5$/$iLnI42_E9(B $B@gBf(B
6. $B$5$D$^20(B $B</;yEg(B
7. $B%9%?%s%9(B $BJF;R(B
8. $B$=$4$&(B $B?@8ME9(B
9. $B$=$4$&(B $B@iMUE9(B
10. $B$=$4$&(B $BBg5\E9(B
11. $B$=$4$&(B $B2#IME9(B
12. $B%@%$%"%b%s%I%7%F%#%"%k%k(B $B3`86(B
13. $B%K%e!<%:(B $B7'K\(B

My client tells me that entry 1 should actually come after the 3rd
entry and before the fourth. From this description on manyogana, I'm
thinking they're saying that collation of the glyph $B0K(B should be based
on its katakana adaptation $B%$(B which makes sense:

http://en.wikipedia.org/wiki/Manyogana

Note I'm basing many of my statements on staring at and comparing these
glyphs online and so I might be far off.

So my questions are:

1. Is my client correct in their ordering?
2. I believe I've tried all the combinations of collation levels and
kanji classes in the Lingua::JA::Sort::JIS jcmp function but have not
achieved the desired ordering. Have I perhaps missed the correct
combination?
3. Is the solution to first convert the manyogana characters to
katakana and then do the msort? If so does anyone know of a Perl module
to do this or a nice reference that I could use more programmatically
than the image on the link above?
4. Can anyone think of any other glyphs or classes of Japanese glyphs
similar to manyogana that I should be worried about?

Thanks for any help you can give me!

Best,
Mike
 
M

mab2001

After a discussion of this on the perl-i18n mailing list, I've come to
understand the problem a bit more. In Japanese, text ordering is based
on phonetization. But as in english, there are multiple pronunciations
of a particular piece of text. Moreover, the "more correct"
pronunciation among the possibilities is influenced by the context of
the text. So in other words, the problem is intractable if all you have
is the text alone and inefficient even if you have more information
(because of the myriad factors that influence pronunciation).

The solution that I am using then is to store with each piece of
kana/kanji text, a kana-only phonetization of that text. I then rely on
the content editors to know the context of the text and supply an
accurate phonetization in kana. (In other words, I'm putting the
responsibility on someone else!) There does exist a determinate
ordering of the kana-only text and so this becomes a tractable problem.

Mike
 
G

Guest

In comp.lang.perl.misc (e-mail address removed) wrote:

: The solution that I am using then is to store with each piece of
: kana/kanji text, a kana-only phonetization of that text. I then rely on
: the content editors to know the context of the text and supply an
: accurate phonetization in kana. (In other words, I'm putting the
: responsibility on someone else!) There does exist a determinate
: ordering of the kana-only text and so this becomes a tractable problem.

This is indeed best practice; have a look at Sharp Zaurus and other Japanese
PIMs which regularly offer a "pronounciation" field next to "name in written
form", the first is kana, the latter is kanji.

Oliver.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,576
Members
45,054
Latest member
LucyCarper

Latest Threads

Top