unaccenting letters

D

David Alex Lamb

Has anyone developed a method to un-accent accented characters? I mean
for example translating e-acute, e-grave, and e-circumflex to e?
 
G

Guest

Has anyone developed a method to un-accent accented characters? I mean
for example translating e-acute, e-grave, and e-circumflex to e?

If using this for comparisons, use a Collator.

I haven't found code for what you ask for, but I assume the algorithm
would be fairly straight forward.

First, do a full decomposition on the character in question.
Second, grab the first character from the resultant String.

You will need reference to the Unicode Standard to create this
decomposition table.

HTH,
La'ie Techie
 
J

John O'Conner

LÄÊ»ie Techie said:
You will need reference to the Unicode Standard to create this
decomposition table.


What you want is already in the JDK...just not documented and not in the
public API. You should check out sun.text.Normalizer. You can create
"normalized" decomposed text. See the Unicode Standard for more
information about normalization (http://www.unicode.org/reports/tr15/).

For documented API that normalizes text, see IBM's ICU4J project:
http://oss.software.ibm.com/icu4j/
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,431
Messages
2,571,677
Members
48,796
Latest member
Greg L.

Latest Threads

Top