Need help with String encoding issue

R

rich.manalang

I'm writting a servlet filter that manipulates the http response body
(injecting HTML). It works fine with pages using the English charset,
but when processing a page with double-byte chars, some of the
characters are junk.

When processing the OutputStream, I create a ByteArrayOutputStream

baStream = new ByteArrayOutputStream();

then I create a string (forcing it to UTF-8) with that stream:

String str = new String(baStream.toByteArray(), "UTF-8");

I then manipulate that string using standard regex, then output it back
to the browser:

outStream.write(str.getBytes());

The problem is I don't know a lot about how charsets work in Java. I
do know that Java's native string charset is UTF-16, but beyond that,
I'm not sure how to make sure that what comes into my servlet filter is
what goes out.

Thanks in advance!

Rich
 
L

Lothar Kimmeringer

outStream.write(str.getBytes());

here you should use str.getBytes("UTF-8");

Alternatively use a Writer instead of an OutputStream, that
you can get from the servlet as well. Then you can write
String direclty without coping with the encoding to be used.

Or you wrap an OutputStreamWriter around your OutputStream
with specifying the encoding you want to use within the
constructor.


Regards, Lothar
--
Lothar Kimmeringer E-Mail: (e-mail address removed)
PGP-encrypted mails preferred (Key-ID: 0x8BC3CD81)

Always remember: The answer is forty-two, there can only be wrong
questions!
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,756
Messages
2,569,535
Members
45,008
Latest member
obedient dusk

Latest Threads

Top