Reading and writing extended ascii characters

G

Geoff Warnock

I am writing a Java program (using the jdk 1.5.0) for a project. It
has to read in words from an input file. It is a language translator
so that each line consists of an english word, german, french, spanish
etc. Some of these words contain accents and other characters which
are ascii values of beyond 127.

My problem is that i have a punctuation removal method which strips
all words of unnecessary characters before adding each line to a new
Binary Search Tree Node. The method just drops the ascii values
greater than 127 and does not store them. Why is this happening?


for(int i = 0; i < data.length; i++)
{
int j =0;
while(j != data.length())
{
c = data.charAt(j);
if((c <= 'z') && (c >= 'a'))
temp = temp + c;
else if((c <= 'Z') && (c >= 'A'))
temp = temp + c;
else if(c == 39)
temp = temp + c;
else if((c > (char)127) && (c < (char)168))
temp = temp + c;
else if(c == ' ')
{
if((j > 0) && (j < data.length() - 1))
lastChar = data.charAt(j-1);
nextChar = data.charAt(j+1);
if(((lastChar <= 'z' && lastChar >= 'a') || (lastChar <= 'Z' &&
lastChar >= 'A') || (lastChar == 39)) && ((nextChar <=
'z' && nextChar >= 'a') || (nextChar <= 'Z' && nextChar <= 'A') ||
(nextChar == 39)))
temp = temp + c;
}
}
}
j++;
}
data = temp;
temp = "";


It is the "else if((c > (char)127) && (c < (char)168)) that will not
work. Any ideas??

Thanks, Geoff Warnock.
 
G

Gordon Beaton

My problem is that i have a punctuation removal method which strips
all words of unnecessary characters before adding each line to a new
Binary Search Tree Node. The method just drops the ascii values
greater than 127 and does not store them. Why is this happening?
[...]

It is the "else if((c > (char)127) && (c < (char)168)) that will not
work. Any ideas??

Probably it has to do with how "c" and "data" are declared or
initialized, but you didn't post those parts.

I'll bet that you've read your data from the file without specifying
the correct character encoding. Have you confirmed (e.g. with
System.out.println()) that lines containing those characters have been
read correctly?

That said, I'd like to suggest that instead of doing it "the hard
way", and making assumptions about the values of various characters,
you consider using a simple test like this:

if (Character.isLetter(c)) {
}

/gordon
 
D

Daniel Tryba

Geoff Warnock said:
It is the "else if((c > (char)127) && (c < (char)168)) that will not
work. Any ideas??

How is it not working?

BTW tmp is a string. Looping and appending one char at a time to it is
very inefficient

BTW2 your input is a string, you should look at reglar expression to
filter out unwanted characters.

BTW3 Strings are Unicode, your source is something else (most likely),
so anything you read gets translated to Unicode. On most platforms the
default system encoding is iso-88591-1, which doesn't have values for
values >127 and < 160
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,534
Members
45,008
Latest member
Rahul737

Latest Threads

Top