java 1.4.2_03 under redhat 9.0 - encoding problem

C

Christoph Breidert

Hello Folks,

I'm new to Linux and Redhat, so please excuse me, if this is a simple
problem. I have also posted in the corresponding redhat newsgroup.

I did search the newsgroups, and could not find any relevant postings.

My Problem is that with java 1.4.2_03 under redhat 9.0 I do not get
the 'umlaut' (öäüöäü) displayed correctly when using java.

I have a file containing some ööääüü-characters. VI shows them all
right. Reading from this file with java, displaying the content on the
system.out and then writing to another file destroys all umlauts.
Below is the sample code:
BufferedReader in = new BufferedReader(new FileReader("umlaut.txt"));
//file containing ööääüü

StringBuffer sb = new StringBuffer();
while(true){
String s = in.readLine();
if(s == null)
break;
sb.append(s);
System.out.println(s);
}
File f = new File("testfile");
BufferedWriter out = new BufferedWriter(new FileWriter(f));
out.write(sb.toString(), 0, sb.length());
out.flush();
out.close();
<<

Is this a redhat bug? Is this a java bug? Am I missing something?

The sample code gives correct results when executed on a windows
machine, and on other linux systems with different java/redhat
versions.

THX for help,

Christoph
 
A

Andrew Thompson

Christoph Breidert wrote:
....
I'm new to Linux and Redhat, so please excuse me, if this is a simple
problem. I have also posted in the corresponding redhat newsgroup.

tut, tut. Please cross-post in these
situations, rather than multi-posting.

[ Imagines someone typing the same/similar
words to the Redhat group as I a type this. ]
 
S

Stijn Van Vreckem

Christoph,

The FileReader is using the platform's default character encoding.
With an InputStreamReader chained to a FileInputStream you can change the
encoding.

Regards,
Stijn
 
Z

Zsolt

Hi,

I spent a couple of hours on this issue. You will probably don't trust me
but my solution was:

export LANG=en_US

in the script I start java.

Zsolt
 
C

Christoph Breidert

Hi,

thx for all the good replies. I figured it out by now.

The Problem was: My Red Hat Linux ran under UTF-8. I developed my
application on a windows machine saving files in ANSI. Running my
application on the linux machine screwed up the characters.

Solution: You want the same enconding on both machines. First I saved
the files as UTF-8 on the windows system. Then running the application
on the linux machine displayed the characters correctly. Then I
figured, that I rather want Linux to run under ISO-Latin instead of
UTF-8 (because I did not want to convert all files to UTF-8).

The way to change this encoding is to update the file
/etc/i18n/sysconfig/i18n:

LANG:"de_DE.UTF-8"
SUPPORTED="de_DE.UTF-8:de_DE:de"
=>
LANG:"de_DE.ISO-8859-1"
SUPPORTED="de_DE.ISO-8859-1:de_DE:de"

I guess this is obvious to many out there, but it took me some time to
figure it out.

Cheers,

Christoph
 
C

Chris Smith

Christoph,

Glad to hear you solved your problem. I have one comment, though, that
will hopefully help you out.

Christoph said:
The way to change this encoding is to update the file
/etc/i18n/sysconfig/i18n:

LANG:"de_DE.UTF-8"
SUPPORTED="de_DE.UTF-8:de_DE:de"
=>
LANG:"de_DE.ISO-8859-1"
SUPPORTED="de_DE.ISO-8859-1:de_DE:de"

That may be the way to change the default encoding used by your
platform. However, it's really best not to rely on that when you're
trying to get several pieces of software to interoperate. Rather,
specify an encoding and use it explicitly in your code.

In Java, encodings can be specified in all places where bytes are
converted to characters - notably, InputStreamReader and
OutputStreamWriter have constructors that take an encoding; and String
has getBytes(encoding) and a constructor that receives a byte[] and an
encoding. Uses of FileReader or FileWriter should be replaced by a
combination of FileInputStream with InputStreamReader, or
FileOutputStream and OutputStreamWriter, respectively.

This approach will prevent you from being at the mercy of compatible
system configurations whenever you try to integrate software on two
machines. It's a fundamental rule of compatible software that published
formats and protocols should specify their character encoding, not get
it from outside.

--
www.designacourse.com
The Easiest Way to Train Anyone... Anywhere.

Chris Smith - Lead Software Developer/Technical Trainer
MindIQ Corporation
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,431
Messages
2,571,679
Members
48,796
Latest member
Greg L.

Latest Threads

Top