Convert unicode string to "basic characters"

M

Mike Mimic

Hi!

How can I convert characters in unicode string to their
"basic characters"? All those "a with ..." and "o with ...".

I tryed with Unicode::Normalize:

$string = NFD($string);
$string =~ s/\pM//og;

But the problem is for example with U+00D8 (O with stroke), D+0110 (D
with stroke), U+0141 (L with stroke)...


Mike
 
J

Jürgen Exner

Mike said:
How can I convert characters in unicode string to their
"basic characters"? All those "a with ..." and "o with ...".

And what would you use as the "basic character" for those Farsi, Korean,
Chinese, Arabic, Hebrew, Vietnamese and dozens and dozens of other
characters?
Unicode encompasses almost all of the commonly spoken languages on earth.

jue
 
M

Mike Mimic

Hi!

Jürgen Exner said:
Mike Mimic wrote:
And what would you use as the "basic character" for those Farsi, Korean,
Chinese, Arabic, Hebrew, Vietnamese and dozens and dozens of other
characters?

I would leave them as they are. I would like to convert only thoose
which are "combined".


Mike
 
J

Jürgen Exner

Mike Mimic wrote:
[Converting non-ASCII to ASCII characters]
I would leave them as they are. I would like to convert only thoose
which are "combined".

What do you mean by "combined"? I can't tell how much you know about other
languages.
But if you are talking about "accented characters", are you aware that often
those are different characters, not just pronounciation marks (as e.g. in
Arabic)? Like are you sure, that you want to convert e.g. Österreich (=
Austria) into Osterreich (= Easter empire)?

jue
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,756
Messages
2,569,535
Members
45,008
Latest member
obedient dusk

Latest Threads

Top