R
rich.manalang
I'm writting a servlet filter that manipulates the http response body
(injecting HTML). It works fine with pages using the English charset,
but when processing a page with double-byte chars, some of the
characters are junk.
When processing the OutputStream, I create a ByteArrayOutputStream
baStream = new ByteArrayOutputStream();
then I create a string (forcing it to UTF-8) with that stream:
String str = new String(baStream.toByteArray(), "UTF-8");
I then manipulate that string using standard regex, then output it back
to the browser:
outStream.write(str.getBytes());
The problem is I don't know a lot about how charsets work in Java. I
do know that Java's native string charset is UTF-16, but beyond that,
I'm not sure how to make sure that what comes into my servlet filter is
what goes out.
Thanks in advance!
Rich
(injecting HTML). It works fine with pages using the English charset,
but when processing a page with double-byte chars, some of the
characters are junk.
When processing the OutputStream, I create a ByteArrayOutputStream
baStream = new ByteArrayOutputStream();
then I create a string (forcing it to UTF-8) with that stream:
String str = new String(baStream.toByteArray(), "UTF-8");
I then manipulate that string using standard regex, then output it back
to the browser:
outStream.write(str.getBytes());
The problem is I don't know a lot about how charsets work in Java. I
do know that Java's native string charset is UTF-16, but beyond that,
I'm not sure how to make sure that what comes into my servlet filter is
what goes out.
Thanks in advance!
Rich