Convert unicode string to "basic characters"

Discussion in 'Perl Misc' started by Mike Mimic, Jun 7, 2004.

  1. Mike Mimic

    Mike Mimic Guest

    Hi!

    How can I convert characters in unicode string to their
    "basic characters"? All those "a with ..." and "o with ...".

    I tryed with Unicode::Normalize:

    $string = NFD($string);
    $string =~ s/\pM//og;

    But the problem is for example with U+00D8 (O with stroke), D+0110 (D
    with stroke), U+0141 (L with stroke)...


    Mike
     
    Mike Mimic, Jun 7, 2004
    #1
    1. Advertising

  2. Mike Mimic wrote:
    > How can I convert characters in unicode string to their
    > "basic characters"? All those "a with ..." and "o with ...".


    And what would you use as the "basic character" for those Farsi, Korean,
    Chinese, Arabic, Hebrew, Vietnamese and dozens and dozens of other
    characters?
    Unicode encompasses almost all of the commonly spoken languages on earth.

    jue
     
    Jürgen Exner, Jun 7, 2004
    #2
    1. Advertising

  3. Mike Mimic

    Mike Mimic Guest

    Hi!

    Jürgen Exner wrote:
    > Mike Mimic wrote:
    > And what would you use as the "basic character" for those Farsi, Korean,
    > Chinese, Arabic, Hebrew, Vietnamese and dozens and dozens of other
    > characters?


    I would leave them as they are. I would like to convert only thoose
    which are "combined".


    Mike
     
    Mike Mimic, Jun 7, 2004
    #3
  4. Mike Mimic wrote:
    [Converting non-ASCII to ASCII characters]
    > Jürgen Exner wrote:
    >> Mike Mimic wrote:
    >> And what would you use as the "basic character" for those Farsi,
    >> Korean, Chinese, Arabic, Hebrew, Vietnamese and dozens and dozens of
    >> other characters?

    >
    > I would leave them as they are. I would like to convert only thoose
    > which are "combined".


    What do you mean by "combined"? I can't tell how much you know about other
    languages.
    But if you are talking about "accented characters", are you aware that often
    those are different characters, not just pronounciation marks (as e.g. in
    Arabic)? Like are you sure, that you want to convert e.g. Österreich (=
    Austria) into Osterreich (= Easter empire)?

    jue
     
    Jürgen Exner, Jun 7, 2004
    #4
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Chris  Song
    Replies:
    3
    Views:
    278
    Ganesan Rajagopal
    Dec 27, 2005
  2. Chris  Song
    Replies:
    1
    Views:
    348
    Leif K-Brooks
    Dec 27, 2005
  3. ldng
    Replies:
    3
    Views:
    1,879
    Tim Golden
    May 10, 2007
  4. Jeremy
    Replies:
    1
    Views:
    827
    Alex Willmer
    Jan 11, 2011
  5. Jeremy
    Replies:
    0
    Views:
    599
    Jeremy
    Jan 11, 2011
Loading...

Share This Page