Convert string

Y

yaaros

Hi!!

I'd like to write a method that convert given string to string which
contains only english alphabet's letters. So when I give the methods
string with some special characters like from Polish alphabet ±, ê etc
I' d like to get the string where ± is replaced by ± etc. Is it
possible to write such a universal method ?? That will work for all
special charakters like ä, ±, ¶, ü etc.

Thanks in advance
Yaaros
 
J

Joshua Cranmer

Hi!!

I'd like to write a method that convert given string to string which
contains only english alphabet's letters. So when I give the methods
string with some special characters like from Polish alphabet ±, ê etc
I' d like to get the string where ± is replaced by ± etc. Is it
possible to write such a universal method ?? That will work for all
special charakters like ä, ±, ¶, ü etc.

Thanks in advance
Yaaros

What would be the correct output for the following characters?
Greek lowercase alpha
`fi' ligature
Japanese hiragana ka
Unified CJK ideograph for mountain

Technically speaking, any decidable problem can be solved in Java
(modulo certain native OS interactions), but are you looking for a
library that does this instead? Or how to write one yourself?

The Unicode normalization processes would probably be of great help as a
basis. <http://www.unicode.org/reports/tr15/>
 
R

Roedy Green

I'd like to write a method that convert given string to string which
contains only english alphabet's letters. So when I give the methods
string with some special characters like from Polish alphabet ?, ? etc
I' d like to get the string where ? is replaced by ? etc. Is it
possible to write such a universal method ?? That will work for all
special charakters like ä, ?, ?, ü etc.


boolean english = 'a' <= c && c <= 'z' || 'A' <= c && c <= 'Z';

Just do a loop going through your string composing a new one with a
StringBuilder consisting only of the chars you like.

You could also create an array indexeded by char number to what you
wanted to convert the character to, e.g. cvt[ 'à' ] -> 'a'. Then you
loop looking up each char. Convert to 0 means leave out.

the Quoter Amanuensis contains many such tables. See
http://mindprod.com/products1.html#QUOTER
 
P

Patricia Shanahan

Roedy Green wrote:
....
You could also create an array indexeded by char number to what you
wanted to convert the character to, e.g. cvt[ 'à' ] -> 'a'. Then you
loop looking up each char. Convert to 0 means leave out.
....

I suggest mapping to a String rather than a char, to allow for e.g. two
letter expansions.

Patricia
 
R

Roedy Green

I suggest mapping to a String rather than a char, to allow for e.g. two
letter expansions.

String also allows for 0-length transforms, to ignore a letter.
However, if you have no multi-char transforms the code will be faster
and considerably more compact using chars.
 
D

Daniel Pitts

Roedy said:
String also allows for 0-length transforms, to ignore a letter.
However, if you have no multi-char transforms the code will be faster
and considerably more compact using chars.
On the other hand, using String allows for codepoints that aren't single
characters.
 
E

Eric Sosman

Roedy said:
String also allows for 0-length transforms, to ignore a letter.
However, if you have no multi-char transforms the code will be faster
and considerably more compact using chars.

... which suggests a hybrid approach: A char[] array for the
one-to-one mappings, with a special value like '\u0000' meaning
"I don't know; check for exceptional cases."

/* Untested, uncompiled, unscrutinized, un to the Nth: */

static char[] translateTable = new char[65536];
static { /* initialize it */ }

static Map<Character,String> weirdCasesMap = ...;
static { /* initialize it */ }

String translate(String old) {
StringBuilder buff = new StringBuilder();
for (int n = old.length(), i = 0; i < n; ++i) {
char oldc = old.charAt(i);
char newc = translateTable[oldc];
if (newc != 0) {
buff.append(newc);
}
else {
String news = weirdCasesMap.get(
Character.valueOf(oldc));
if (news != null) // allows for deletions
buff.append(news);
}
}
return buff.toString();
}

If deletions (incoming characters that map to nothing in the
output) are common, consider using two special codes in translateTable:
one meaning "Check the map" and the other meaning "Ignore this."
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,582
Members
45,065
Latest member
OrderGreenAcreCBD

Latest Threads

Top