Redirecting System.out and exotic characters

F

François R

I redirect system.out to a JTextArea with the following class

private class TextAreaOutputStream extends OutputStream {
JTextArea textArea;
TextAreaOutputStream(JTextArea textArea) {
this.textArea = textArea;
}
public void flush() {
textArea.repaint();
}
public void write(int b) {
//try {
textArea.append(new String(new byte[] {(byte)b}));
// } catch (UnsupportedEncodingException e){e.printStackTrace();}
}
}

and I use the class with
JTextArea msg = new JTextArea();
System.setOut(new PrintStream(new TextAreaOutputStream(msg), true));

This works well except when I have a character like Č (latin capital
letter C with caron, '\u010C') in a string, which is displayed as ? in
the text area whereas
msg.append(string); would be ok.
How could I correct the code above to have such a letter well
formed ?

Thanks
François
 
M

Mayeul

François R said:
This works well except when I have a character like Č (latin capital
letter C with caron, '\u010C') in a string, which is displayed as ? in
the text area whereas
msg.append(string); would be ok.
How could I correct the code above to have such a letter well
formed ?

You have a character encoding problem.

Both the constructors PrintStream(OutputStream,boolean) and
String(byte[]) assume you're using your platform's default character
encoding to translate chars to bytes and vice-versa.

I expect your platform's default character to _not_ handle characters
such as U+10C, hence them being replaced with question marks.

The fix is to specify a character encoding to use, a unicode one, for
instance utf-8.


You can do that by constructing your PrintStream this way:

new PrintStream(new TextAreaOutputStream(msg), true, "utf-8")

And implementing your TextAreaOutputStream differently : it should store
the bytes in a buffer and wait til the OutputStream is flushed, thus
probably aligned after a character's final byte, then transform the
bytes received into a String and update the TextArea with it.

This could be done by writing the bytes you receive to a
ByteArrayOutputStream, and whenever it is flushed, fetch the byte[] and
build a String with it as such:

new String(bytes, "utf-8")


Note: one may think that using utf-16 instead of utf-8 would guarantee a
character to be 2-bytes and thus the solution easier to implement.
Except that *really* special characters (higher-than-U+FFFF characters)
still are be 4-bytes instead of 2-bytes with utf-16.
ucs-4 may work better if well-supported, I'm not sure.
 
R

Roedy Green

This works well except when I have a character like ? (latin capital
letter C with caron, '\u010C') in a string, which is displayed as ? in
the text area whereas
msg.append(string); would be ok.
How could I

The way I would do it is direct the output to a file using UTF-8
encoding, or at least an encoding that supports the letters you need.
Then view it in some sort of viewer/editor that understands encodings.

See http://mindprod.com/applet/fileio.html
for the code to set up a PrintWriter to a file.
 
F

François R

François R said:
This works well except when I have a character like Č (latin capital
letter C with caron, '\u010C') in a string, which is displayed as ? in
the text area whereas
msg.append(string); would be ok.
How could I correct the code above to have such a letter well
formed ?

You have a character encoding problem.

Both the constructors PrintStream(OutputStream,boolean) and
String(byte[]) assume you're using your platform's default character
encoding to translate chars to bytes and vice-versa.

I expect your platform's default character to _not_ handle characters
such as U+10C, hence them being replaced with question marks.

The fix is to specify a character encoding to use, a unicode one, for
instance utf-8.

You can do that by constructing your PrintStream this way:

new PrintStream(new TextAreaOutputStream(msg), true, "utf-8")

And implementing your TextAreaOutputStream differently : it should store
the bytes in a buffer and wait til the OutputStream is flushed, thus
probably aligned after a character's final byte, then transform the
bytes received into a String and update the TextArea with it.

This could be done by writing the bytes you receive to a
ByteArrayOutputStream, and whenever it is flushed, fetch the byte[] and
build a String with it as such:

new String(bytes, "utf-8")

Note: one may think that using utf-16 instead of utf-8 would guarantee a
character to be 2-bytes and thus the solution easier to implement.
Except that *really* special characters (higher-than-U+FFFF characters)
still are be 4-bytes instead of 2-bytes with utf-16.
ucs-4 may work better if well-supported, I'm not sure.

Thanks a lot for the suggestion !
I tried this:
try {
System.setOut(new PrintStream(new TextAreaOutputStream(msg), true,
"utf-8"));
} catch ....

and

private class TextAreaOutputStream extends OutputStream {
JTextArea textArea;
ByteArrayOutputStream buffer = new ByteArrayOutputStream();
TextAreaOutputStream(JTextArea textArea) {
this.textArea = textArea;
}

public void flush() {
//textArea.repaint();
try {
textArea.append(buffer.toString("utf-8"));
buffer.reset();
} catch (UnsupportedEncodingException e){e.printStackTrace();}
}
public void write(int b) {
buffer.write(b);
//try {
//textArea.append(new String(new byte[] {(byte)b}));
// } catch (UnsupportedEncodingException e){e.printStackTrace();}
}

}

And it works well as it seems, with name like Cížek or Čížek properly
displayed.

François
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,053
Latest member
BrodieSola

Latest Threads

Top