Convert unicode string to "basic characters"

Discussion in 'Perl Misc' started by Mike Mimic, Jun 7, 2004.

  1. Mike Mimic

    Mike Mimic Guest

    Hi!

    How can I convert characters in unicode string to their
    "basic characters"? All those "a with ..." and "o with ...".

    I tryed with Unicode::Normalize:

    $string = NFD($string);
    $string =~ s/\pM//og;

    But the problem is for example with U+00D8 (O with stroke), D+0110 (D
    with stroke), U+0141 (L with stroke)...


    Mike
     
    Mike Mimic, Jun 7, 2004
    #1
    1. Advertisements

  2. And what would you use as the "basic character" for those Farsi, Korean,
    Chinese, Arabic, Hebrew, Vietnamese and dozens and dozens of other
    characters?
    Unicode encompasses almost all of the commonly spoken languages on earth.

    jue
     
    Jürgen Exner, Jun 7, 2004
    #2
    1. Advertisements

  3. Mike Mimic

    Mike Mimic Guest

    Hi!

    I would leave them as they are. I would like to convert only thoose
    which are "combined".


    Mike
     
    Mike Mimic, Jun 7, 2004
    #3
  4. Mike Mimic wrote:
    [Converting non-ASCII to ASCII characters]
    What do you mean by "combined"? I can't tell how much you know about other
    languages.
    But if you are talking about "accented characters", are you aware that often
    those are different characters, not just pronounciation marks (as e.g. in
    Arabic)? Like are you sure, that you want to convert e.g. Österreich (=
    Austria) into Osterreich (= Easter empire)?

    jue
     
    Jürgen Exner, Jun 7, 2004
    #4
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.