JSP/Servlet: Posting and reading UTF-8 characters

  • Thread starter Matthias Krueger
  • Start date
M

Matthias Krueger

I am currently trying to post and read UTF-8 encoded characters
on my JSP pages in an Tomcat 4.1 environment. Whenever I post
a German umlaut or a Chinese or Arabian character into my form,
the result displayed on my web page is complete garbage. I tried
the usual "tricks" like

- <%@ page contentType="text/html;charset=utf-8" %>
- request.setCharacterEncoding("UTF-8");
- response.setContentType("text/html; charset=utf-8");

with no success. I thought all the problems with different character
encodings would be gone with UTF-8 but no I am stuck here for more
than 1 day.

Here's my current code (just a slightly modified version of the
sample JSP found at
http://lists.w3.org/Archives/Public/www-international/2002OctDec/0148.html

PS: From my system.out logs:
Current request encoded with: UTF-8
Current response encoded with: UTF-8

My IE6 on Win2000 is also switched to UTF-8

What is going on here ???
:/


Thank you in advance for any suggestions,
Matthias



-------[snip]--------------------------------------------[snap]------

<%@ page
language="java"
import="java.util.Enumeration"
pageEncoding="utf-8"
contentType="text/html;charset=utf-8"
%><%
request.setCharacterEncoding("UTF-8");
response.setContentType("text/html; charset=utf-8");
%>
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>

<body bgcolor="white">

<b>Form post</b>
<form method="post" accept-charset="UTF-8">
<input name="textfeldmb" type="text">
<textarea name="textareamb"></textarea>
<input type="submit" value="submit">
</form>
<hr>
<%
System.out.println("Current request encoded with: " +
request.getCharacterEncoding());
System.out.println("Current response encoded with: " +
response.getCharacterEncoding());

Enumeration e = request.getParameterNames();
if(e != null && e.hasMoreElements()) {
%>
<b>Request parameters</b><br>
<br>
The parameters are read after calling request.setCharacterEncoding("UTF-8");
<bR>
<TABLE>
<TR valign=top>
<TH align=left>Parameter:</TH>
<TH align=left>Value:</TH>
</TR>
<%
while(e.hasMoreElements()) {
String k = (String) e.nextElement();
String val = request.getParameter(k);
System.out.println("request parameter " + k + ": " + val);
%>
<TR valign=top>
<TD><%= k %></TD>
<TD><%= val %></TD>
</TR>
<%
}
%>
</TABLE>
<%
}
%>
</body></html>
 
R

Roedy Green

I am currently trying to post and read UTF-8 encoded characters
on my JSP pages in an Tomcat 4.1 environment. Whenever I post
a German umlaut or a Chinese or Arabian character into my form,
the result displayed on my web page is complete garbage. I tried
the usual "tricks" like

Look at the source of the generated page. What are you seeing
in the header?

You should see something like this:

--><!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
"http://www.w3.org/TR/html4/loose.dtd">
<html>
<head>
<link href="../mindprod.css" type="text/css" rel="stylesheet">
<link rev="MADE" href="mailto:[email protected]">
<link rel="icon" href="../images/beanicon.png">
<meta http-equiv="Content-Type" content="text/html;
charset=iso-8859-1">

The crucial part is, what does charset say?
 
R

Roedy Green

Look at the source of the generated page. What are you seeing
in the header?

If you are seeing the proper UTF-8?

If so what hex sequence is being generated for the u umlauts?
The java code is \u00fc. That should come out as: C3BC

You also want to be sure the problem is not with your browser.
 
M

Matthias Krueger

Just for reference:

I solved the problem by converting by Strings to bytes[] and
creating a new String with explicit UTF-8 encoding. I still
do not understand why I need to do this (as
request.getCharacterEncoding() and response.getCharacterEncoding()
both return "UTF-8") but at least it works now...

public static final String utf8Convert(String utf8String) throws
java.io.UnsupportedEncodingException {
byte[] bytes = new byte[utf8String.length()];
for (int i = 0; i < utf8String.length(); i++) {
bytes = (byte) utf8String.charAt(i);
}
return new String(bytes, "UTF-8");
}

Regards,
Matthias



Matthias said:
I am currently trying to post and read UTF-8 encoded characters
on my JSP pages in an Tomcat 4.1 environment. Whenever I post
a German umlaut or a Chinese or Arabian character into my form,
the result displayed on my web page is complete garbage. I tried
the usual "tricks" like

- <%@ page contentType="text/html;charset=utf-8" %>
- request.setCharacterEncoding("UTF-8");
- response.setContentType("text/html; charset=utf-8");

with no success. I thought all the problems with different character
encodings would be gone with UTF-8 but no I am stuck here for more
than 1 day.

Here's my current code (just a slightly modified version of the
sample JSP found at
http://lists.w3.org/Archives/Public/www-international/2002OctDec/0148.html

PS: From my system.out logs:
Current request encoded with: UTF-8
Current response encoded with: UTF-8

My IE6 on Win2000 is also switched to UTF-8

What is going on here ???
:/


Thank you in advance for any suggestions,
Matthias



-------[snip]--------------------------------------------[snap]------

<%@ page
language="java"
import="java.util.Enumeration"
pageEncoding="utf-8"
contentType="text/html;charset=utf-8"
%><%
request.setCharacterEncoding("UTF-8");
response.setContentType("text/html; charset=utf-8");
%>
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>

<body bgcolor="white">

<b>Form post</b>
<form method="post" accept-charset="UTF-8">
<input name="textfeldmb" type="text">
<textarea name="textareamb"></textarea>
<input type="submit" value="submit">
</form>
<hr>
<%
System.out.println("Current request encoded with: " +
request.getCharacterEncoding());
System.out.println("Current response encoded with: " +
response.getCharacterEncoding());

Enumeration e = request.getParameterNames();
if(e != null && e.hasMoreElements()) {
%>
<b>Request parameters</b><br>
<br>
The parameters are read after calling
request.setCharacterEncoding("UTF-8");
<bR>
<TABLE>
<TR valign=top>
<TH align=left>Parameter:</TH>
<TH align=left>Value:</TH>
</TR>
<%
while(e.hasMoreElements()) {
String k = (String) e.nextElement();
String val = request.getParameter(k);
System.out.println("request parameter " + k + ": " + val);
%>
<TR valign=top>
<TD><%= k %></TD>
<TD><%= val %></TD>
</TR>
<%
}
%>
</TABLE>
<%
}
%>
</body></html>
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,535
Members
45,007
Latest member
obedient dusk

Latest Threads

Top