"Mangled" Servlet Unicode Output Characters

W

Wolfgang

I have a very simple servlet, see code at

http://www.alexandria.ucsb.edu/~rnott/tmp/Test.java

The servlet reads lines from a file tmp1.txt and just writes them back
to a Web page.

The lines in tmp1.txt contain UTF-8 encoded text, including some
special characters, such as the Norwegian 'ø' as in Magerøya, or the
German 'ö' as in Sömmerda.

The servlet generates the Web page ok, listing the lines from file
tmp1.txt.

However, the special characters like 'ø' and 'ö' don't show up on the
Web page, instead they are mangled, like 'ö' instead of 'ö' (other,
regular characters are fine).

Why are the special characters mangled, and what do I do to have them
show up properly on the Web page?

Thanks for your help and advice.

Wolfgang,
Santa Barbara, CA
 
J

John C. Bollinger

Wolfgang said:
I have a very simple servlet, see code at

http://www.alexandria.ucsb.edu/~rnott/tmp/Test.java

The servlet reads lines from a file tmp1.txt and just writes them back
to a Web page.

The lines in tmp1.txt contain UTF-8 encoded text, including some
special characters, such as the Norwegian 'ø' as in Magerøya, or the
German 'ö' as in Sömmerda.

The servlet generates the Web page ok, listing the lines from file
tmp1.txt.

However, the special characters like 'ø' and 'ö' don't show up on the
Web page, instead they are mangled, like 'ö' instead of 'ö' (other,
regular characters are fine).

Why are the special characters mangled, and what do I do to have them
show up properly on the Web page?

These are typical symptoms of a character encoding mismatch. I see in
your source code that you use the system's default encoding to read the
text file. If the system default is not UTF-8 then that will be a
problem. You almost have it right in that regard: just pass the string
"UTF-8" an additional parameter to your InputStreamReader's constructor
(you will also have to add a handler for an additional checked exception).

Your output HTML is also a bit funky, as you are declaring an XML
document with the HTML 4 / transitional DTD. Transitional HTML 4 is not
necessarily well-formed XML. You should probably either drop the XML
declaration or (if you can) go all the way to XHTML. This is probably
not the cause of your current problem, but either way, I recommend that
you specify the charset in the response's content-type, rather than
relying on the XML declaration. To do so, use
response.setContentType("text/html; charset=UTF-8");


John Bollinger
(e-mail address removed)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,581
Members
45,056
Latest member
GlycogenSupporthealth

Latest Threads

Top