UTF-8 newbie question

G

Gerry Lawrence

I'm having a problem with a simple method I've created for parsing an
input variable. The input data is in utf-8. When this gets run, it
just turns the data into a single question mark for each character
read. I know I need to change the variable type to bytes and possibly
do another conversion after this, but all my novice attempts based on
various archived posts have failed. Here is the code:

public static void MyStringParse(String line) {
try {
StringTokenizer input = new StringTokenizer(line,"\",false);
System.out.print(input.nextToken());
while (input.hasMoreTokens()) {
System.out.print(" | " + input.nextToken());
}
System.out.println();
}
catch (Exception e) {
e.printStackTrace();
}
}

Many Thanks in advance,

Gerry
 
G

Guest

I'm having a problem with a simple method I've created for parsing an
input variable. The input data is in utf-8. When this gets run, it just
turns the data into a single question mark for each character read.

Java programs only have to worry about encodings in I/O. Internally, all
chars are stored as unicode.

My assumption is that question marks are printed because your terminal
doesn't recognize the character. This happens to me whenever I try write
Hawaiian. My solution is to save the output to an html file with the
encoding set to utf-8.

Aloha,
La'ie Techie
 
D

Dale King

La?ie Techie said:
On Tue, 02 Dec 2003 13:24:18 -0800, Gerry Lawrence wrote:

My assumption is that question marks are printed because your terminal
doesn't recognize the character. This happens to me whenever I try write
Hawaiian

That's funny because Hawaiian is cited as one of the only languages for
which ASCII is supposedly sufficient. The others being Latin, English and
Swahili.
 
G

Guest

That's funny because Hawaiian is cited as one of the only languages for
which ASCII is supposedly sufficient. The others being Latin, English and
Swahili.

Hawaiian has 14 letters, 12 of which were adopted from English. The
remaining two are the Ê»okina and kahakÅ. The Ê»okina (x02BB) is a
consonant which looks similar to the apostrophe. The kahakÅ is a line
that stresses a vowel. These two marks are significant.

lÄÊ»ie means "silent"
laie means "lawyer"

If you see squares (or question marks) in the above message, it's because
your newsreader doesn't support UTF-8 or your font lacks support for
Hawaiian.

Aloha,
LÄÊ»ie Techie
 
D

Dale King

La?ie Techie said:
Hawaiian has 14 letters, 12 of which were adopted from English. The
remaining two are the ?okina and kahako. The ?okina (x02BB) is a
consonant which looks similar to the apostrophe. The kahako is a line
that stresses a vowel. These two marks are significant.

I don't dispute it, I just know I have seen it stated multiple places that
ASCII is only sufficient to encode English, Latin, Swahili, and Hawaiian. Do
a web search on ASCII Hawaiian Swahili will show several.

I suspect that when they say ASCII is adequate they use apostrophe for the
okina and dash for the kahako.
Aloha,
La?ie Techie

So that must mean that your handle means silent techie.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,432
Messages
2,571,680
Members
48,796
Latest member
Greg L.

Latest Threads

Top