Cannot read German characters via FileInputStream

Z

Zsolt

Hi,

the German characters will be replaced with question marks when I read a
file using FileInputStream unless I set the LANG environment to "en_US" on
Linux.

export LANG=en_US

How can I fix the problem without setting the LANG environment variable ?

Zsolt
 
S

S Manohar

Have you tried using an InputStramReader?

InputStreamReader isr = new InputStreamReader(myInputStream,
Charset.forName("US_ASCII"));

replacing the Charset name for the one used by the file.

You can create a BufferedReader from this to help with reading whole lines.

HTH
 
T

Tor Iver Wilhelmsen

Zsolt said:
Yes, I have tried that but it didn't help either.

Try setting default file encoding, e.g.

java -Dfile.encoding=iso8859-1 MyClass
 
M

Marco Schmidt

Zsolt:
the German characters will be replaced with question marks when I read a
file using FileInputStream unless I set the LANG environment to "en_US" on
Linux.

export LANG=en_US

How can I fix the problem without setting the LANG environment variable ?

Where do the question marks show up? In the file, or when you print
them to standard output? In the latter case, the console may not have
the correct encoding type.

Besides, you should use a Reader instead of an InputStream. After all
you are reading characters. Make sure the encoding is set correctly
(so that it matches the file).

Regards,
Marco
 
Z

Zsolt

Hi Marco,

the question marks show up when I print the contents (thus not in the input)
I read.

I have also tried to use FileReader but got the same result.

Here is my code:

private static String encoding = "ISO-8859-1";

private static String readStream(InputStream sin)
throws IOException

{

InputStreamReader in = null;

if (encoding == null || encoding.length() == 0)

{

in = new InputStreamReader(sin);

}

else

{

in = new InputStreamReader(sin, encoding);

}

// log.println("Encoding: <" + in.getEncoding() + ">");

StringWriter out = new StringWriter();

try

{

char[] buf = new char[8 * 1024];

for (int bytesRead = 0; (bytesRead = in.read(buf)) != -1; )

{

out.write(buf, 0, bytesRead);

}

}

finally

{

in.close();

}

String cont = out.toString().trim();

out.close();

log.println("Read: <" + cont + ">");

return cont;

}
 
J

Jon A. Cruz

Zsolt said:
Hi Marco,

the question marks show up when I print the contents (thus not in the input)
I read.


Not necessarily.

It could very well be that you read in the character correctly, but are
displaying it via a pipeline that does the changing.

If you walk the string in the middle and dump each of it's characters
(not bytes, characters) as hex, then you might be able to pinpoint problems.



Make sure you use the correct encoding.
in = new InputStreamReader(sin, encoding);

}

// log.println("Encoding: <" + in.getEncoding() + ">");
[SNIP]


for (int bytesRead = 0; (bytesRead = in.read(buf)) != -1; )

OK. Since you're reading char's (16-bit unsigned values) and not bytes
(8-bit signed values), 'bytesRead' is a misleading name.

String cont = out.toString().trim();

out.close();

log.println("Read: <" + cont + ">");

Bingo!!!!

Check what log.println() does. It could be the culprit mangling characters.

Instead use a small test file and add this with each read in char:

char c;
c = ???;

....

System.out.println( "The char '" + c + "' is \\u" + Integer.toHexString(
c & 0x0ffff) );

Oh, and try to send some thing to your log like this:

log.println("A test of [\u00df] and [\u00fc].");

You should see

A test of [ß] and [ü].

(The capital 'B'-looking s and a 'u' with umlaut)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,731
Messages
2,569,432
Members
44,832
Latest member
GlennSmall

Latest Threads

Top