Check if a string contains japanese character and convert from UTF-8 to ISO-2022-JP

Discussion in 'Perl Misc' started by wing328hk@gmail.com, Mar 16, 2006.

  1. Guest

    Hi,

    Is there a way to check whether string (in UTF8) contains japanese
    characters?

    Is there a way to convert UTF-8 to ISO-2022-JP?

    Thanks and regards,
    Wing
    , Mar 16, 2006
    #1
    1. Advertising

  2. wrote:
    > Is there a way to check whether string (in UTF8) contains japanese
    > characters?


    Well, you could use a simple RE to check if your text contains any
    characters that are within the range of those characters that are typically
    used for Japanese text. I am not aware of any pre-written library or
    function to do that.
    And actually I think that's even difficult to do because arguably e.g. the
    letter "a" may or may not be part of what you consider Japanese characters.
    After all nowadays Latin characters are frequently used in Japanese text for
    all kinds of foreign names.

    > Is there a way to convert UTF-8 to ISO-2022-JP?


    Text::Iconv does a good job at that.

    jue
    Jürgen Exner, Mar 16, 2006
    #2
    1. Advertising

  3. BZ Guest

    wrote in comp.lang.perl.misc:
    > Is there a way to check whether string (in UTF8) contains japanese
    > characters?


    You could try matching against one of the unicode character classes like
    \p{Hiragana} (see perlunicode).

    > Is there a way to convert UTF-8 to ISO-2022-JP?


    Encode::from_to and friends.

    --
    BZ
    BZ, Mar 16, 2006
    #3
  4. Guest

    Thanks for your prompt reply.

    I've taken a look at perlunicdoe and it seems to me that it's possible
    to match the japanese characters by checking the class property.

    I'm just wondering whether there is a way to check if the string
    contains Japanese characters but not Chinese characters since some
    Japanese characters are also Chinese characters.

    Thanks and regards,
    Wing
    , Mar 17, 2006
    #4
  5. Rick Scott Guest

    ( uttered:)
    > I've taken a look at perlunicdoe and it seems to me that it's possible
    > to match the japanese characters by checking the class property.
    >
    > I'm just wondering whether there is a way to check if the string
    > contains Japanese characters but not Chinese characters since some
    > Japanese characters are also Chinese characters.


    Unicode uses the same code point for a given character regardless of
    what language it's in. So, for instance, the character

    QQQa
    QQf
    QQf
    QQf
    QQf qaa
    ??????????????QQP?????????????'
    QQf
    QQf]Q
    QQf Q
    ]QQ ?ap
    ]QQ ?4ba
    QQf ]QQQ
    qaQ?' )?QQbaa
    aaJ?? ?4QQQ?'

    is Unicode 0x5927 regardless whether you're writing Chinese or Japanese.
    As I understand it, all the kanji characters (along with others) are
    members of the Han Unicode script, so \p{Han} will match them
    regardless of whether they are used in Japanese, Chinese, both, or
    neither. If you want to differentiate them, it looks as though you
    are going to have to compile (or find) lists of what you consider to
    be Chinese Chinese characters and Japanese Chinese characters. =)




    Rick
    --
    key CF8F8A75 / print C5C1 F87D 5056 D2C0 D5CE D58F 970F 04D1 CF8F 8A75
    The reverse side also has a reverse side.
    :Japanese proverb
    Rick Scott, Mar 17, 2006
    #5
  6. Re: Check if a string contains japanese character and convert fromUTF-8 to ISO-2022-JP

    wrote:
    > Hi,
    >
    > Is there a way to check whether string (in UTF8) contains japanese
    > characters?


    Han characters are shared, so it cannot be done that way. Assuming that
    you are looking at substantial amounts of real-world text, you could
    scan for kana, which are uniquely Japanese.

    > Is there a way to convert UTF-8 to ISO-2022-JP?


    Not in general, because Unicode has more characters.

    --
    John W. Kennedy
    "But now is a new thing which is very old--
    that the rich make themselves richer and not poorer,
    which is the true Gospel, for the poor's sake."
    -- Charles Williams. "Judgement at Chelmsford"
    John W. Kennedy, Mar 18, 2006
    #6
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. kettle
    Replies:
    4
    Views:
    469
    kettle
    Oct 24, 2007
  2. Mikel Lindsaar
    Replies:
    0
    Views:
    464
    Mikel Lindsaar
    Mar 31, 2008
  3. Replies:
    0
    Views:
    80
  4. Replies:
    1
    Views:
    363
    Peter J. Holzer
    Apr 22, 2006
  5. Steven D'Aprano

    Re: Turnign greek-iso filenames => utf-8 iso

    Steven D'Aprano, Jun 12, 2013, in forum: Python
    Replies:
    17
    Views:
    469
    Steven D'Aprano
    Jun 13, 2013
Loading...

Share This Page