Zsolt said:
Hi Marco,
the question marks show up when I print the contents (thus not in the input)
I read.
Not necessarily.
It could very well be that you read in the character correctly, but are
displaying it via a pipeline that does the changing.
If you walk the string in the middle and dump each of it's characters
(not bytes, characters) as hex, then you might be able to pinpoint problems.
Make sure you use the correct encoding.
in = new InputStreamReader(sin, encoding);
}
// log.println("Encoding: <" + in.getEncoding() + ">");
[SNIP]
for (int bytesRead = 0; (bytesRead = in.read(buf)) != -1; )
OK. Since you're reading char's (16-bit unsigned values) and not bytes
(8-bit signed values), 'bytesRead' is a misleading name.
String cont = out.toString().trim();
out.close();
log.println("Read: <" + cont + ">");
Bingo!!!!
Check what log.println() does. It could be the culprit mangling characters.
Instead use a small test file and add this with each read in char:
char c;
c = ???;
....
System.out.println( "The char '" + c + "' is \\u" + Integer.toHexString(
c & 0x0ffff) );
Oh, and try to send some thing to your log like this:
log.println("A test of [\u00df] and [\u00fc].");
You should see
A test of [ß] and [ü].
(The capital 'B'-looking s and a 'u' with umlaut)