JSP/Servlet: Posting and reading UTF-8 characters

Discussion in 'Java' started by Matthias Krueger, Sep 5, 2003.

  1. I am currently trying to post and read UTF-8 encoded characters
    on my JSP pages in an Tomcat 4.1 environment. Whenever I post
    a German umlaut or a Chinese or Arabian character into my form,
    the result displayed on my web page is complete garbage. I tried
    the usual "tricks" like

    - <%@ page contentType="text/html;charset=utf-8" %>
    - request.setCharacterEncoding("UTF-8");
    - response.setContentType("text/html; charset=utf-8");

    with no success. I thought all the problems with different character
    encodings would be gone with UTF-8 but no I am stuck here for more
    than 1 day.

    Here's my current code (just a slightly modified version of the
    sample JSP found at
    http://lists.w3.org/Archives/Public/www-international/2002OctDec/0148.html

    PS: From my system.out logs:
    Current request encoded with: UTF-8
    Current response encoded with: UTF-8

    My IE6 on Win2000 is also switched to UTF-8

    What is going on here ???
    :/


    Thank you in advance for any suggestions,
    Matthias



    -------[snip]--------------------------------------------[snap]------

    <%@ page
    language="java"
    import="java.util.Enumeration"
    pageEncoding="utf-8"
    contentType="text/html;charset=utf-8"
    %><%
    request.setCharacterEncoding("UTF-8");
    response.setContentType("text/html; charset=utf-8");
    %>
    <html>
    <head>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
    </head>

    <body bgcolor="white">

    <b>Form post</b>
    <form method="post" accept-charset="UTF-8">
    <input name="textfeldmb" type="text">
    <textarea name="textareamb"></textarea>
    <input type="submit" value="submit">
    </form>
    <hr>
    <%
    System.out.println("Current request encoded with: " +
    request.getCharacterEncoding());
    System.out.println("Current response encoded with: " +
    response.getCharacterEncoding());

    Enumeration e = request.getParameterNames();
    if(e != null && e.hasMoreElements()) {
    %>
    <b>Request parameters</b><br>
    <br>
    The parameters are read after calling request.setCharacterEncoding("UTF-8");
    <bR>
    <TABLE>
    <TR valign=top>
    <TH align=left>Parameter:</TH>
    <TH align=left>Value:</TH>
    </TR>
    <%
    while(e.hasMoreElements()) {
    String k = (String) e.nextElement();
    String val = request.getParameter(k);
    System.out.println("request parameter " + k + ": " + val);
    %>
    <TR valign=top>
    <TD><%= k %></TD>
    <TD><%= val %></TD>
    </TR>
    <%
    }
    %>
    </TABLE>
    <%
    }
    %>
    </body></html>
     
    Matthias Krueger, Sep 5, 2003
    #1
    1. Advertising

  2. Matthias Krueger

    Roedy Green Guest

    On Fri, 05 Sep 2003 12:28:56 +0200, Matthias Krueger
    <-ilmenau.de> wrote or quoted :

    >I am currently trying to post and read UTF-8 encoded characters
    >on my JSP pages in an Tomcat 4.1 environment. Whenever I post
    >a German umlaut or a Chinese or Arabian character into my form,
    >the result displayed on my web page is complete garbage. I tried
    >the usual "tricks" like


    Look at the source of the generated page. What are you seeing
    in the header?

    You should see something like this:

    --><!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
    "http://www.w3.org/TR/html4/loose.dtd">
    <html>
    <head>
    <link href="../mindprod.css" type="text/css" rel="stylesheet">
    <link rev="MADE" href="mailto:">
    <link rel="icon" href="../images/beanicon.png">
    <meta http-equiv="Content-Type" content="text/html;
    charset=iso-8859-1">

    The crucial part is, what does charset say?


    --
    Canadian Mind Products, Roedy Green.
    Coaching, problem solving, economical contract programming.
    See http://mindprod.com/jgloss/jgloss.html for The Java Glossary.
     
    Roedy Green, Sep 5, 2003
    #2
    1. Advertising

  3. Matthias Krueger

    Roedy Green Guest

    On Fri, 05 Sep 2003 18:42:08 GMT, Roedy Green <>
    wrote or quoted :

    >Look at the source of the generated page. What are you seeing
    >in the header?


    If you are seeing the proper UTF-8?

    If so what hex sequence is being generated for the u umlauts?
    The java code is \u00fc. That should come out as: C3BC

    You also want to be sure the problem is not with your browser.


    --
    Canadian Mind Products, Roedy Green.
    Coaching, problem solving, economical contract programming.
    See http://mindprod.com/jgloss/jgloss.html for The Java Glossary.
     
    Roedy Green, Sep 5, 2003
    #3
  4. Just for reference:

    I solved the problem by converting by Strings to bytes[] and
    creating a new String with explicit UTF-8 encoding. I still
    do not understand why I need to do this (as
    request.getCharacterEncoding() and response.getCharacterEncoding()
    both return "UTF-8") but at least it works now...

    public static final String utf8Convert(String utf8String) throws
    java.io.UnsupportedEncodingException {
    byte[] bytes = new byte[utf8String.length()];
    for (int i = 0; i < utf8String.length(); i++) {
    bytes = (byte) utf8String.charAt(i);
    }
    return new String(bytes, "UTF-8");
    }

    Regards,
    Matthias



    Matthias Krueger wrote:
    >
    > I am currently trying to post and read UTF-8 encoded characters
    > on my JSP pages in an Tomcat 4.1 environment. Whenever I post
    > a German umlaut or a Chinese or Arabian character into my form,
    > the result displayed on my web page is complete garbage. I tried
    > the usual "tricks" like
    >
    > - <%@ page contentType="text/html;charset=utf-8" %>
    > - request.setCharacterEncoding("UTF-8");
    > - response.setContentType("text/html; charset=utf-8");
    >
    > with no success. I thought all the problems with different character
    > encodings would be gone with UTF-8 but no I am stuck here for more
    > than 1 day.
    >
    > Here's my current code (just a slightly modified version of the
    > sample JSP found at
    > http://lists.w3.org/Archives/Public/www-international/2002OctDec/0148.html
    >
    > PS: From my system.out logs:
    > Current request encoded with: UTF-8
    > Current response encoded with: UTF-8
    >
    > My IE6 on Win2000 is also switched to UTF-8
    >
    > What is going on here ???
    > :/
    >
    >
    > Thank you in advance for any suggestions,
    > Matthias
    >
    >
    >
    > -------[snip]--------------------------------------------[snap]------
    >
    > <%@ page
    > language="java"
    > import="java.util.Enumeration"
    > pageEncoding="utf-8"
    > contentType="text/html;charset=utf-8"
    > %><%
    > request.setCharacterEncoding("UTF-8");
    > response.setContentType("text/html; charset=utf-8");
    > %>
    > <html>
    > <head>
    > <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
    > </head>
    >
    > <body bgcolor="white">
    >
    > <b>Form post</b>
    > <form method="post" accept-charset="UTF-8">
    > <input name="textfeldmb" type="text">
    > <textarea name="textareamb"></textarea>
    > <input type="submit" value="submit">
    > </form>
    > <hr>
    > <%
    > System.out.println("Current request encoded with: " +
    > request.getCharacterEncoding());
    > System.out.println("Current response encoded with: " +
    > response.getCharacterEncoding());
    >
    > Enumeration e = request.getParameterNames();
    > if(e != null && e.hasMoreElements()) {
    > %>
    > <b>Request parameters</b><br>
    > <br>
    > The parameters are read after calling
    > request.setCharacterEncoding("UTF-8");
    > <bR>
    > <TABLE>
    > <TR valign=top>
    > <TH align=left>Parameter:</TH>
    > <TH align=left>Value:</TH>
    > </TR>
    > <%
    > while(e.hasMoreElements()) {
    > String k = (String) e.nextElement();
    > String val = request.getParameter(k);
    > System.out.println("request parameter " + k + ": " + val);
    > %>
    > <TR valign=top>
    > <TD><%= k %></TD>
    > <TD><%= val %></TD>
    > </TR>
    > <%
    > }
    > %>
    > </TABLE>
    > <%
    > }
    > %>
    > </body></html>
    >
     
    Matthias Krueger, Sep 8, 2003
    #4
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. circuit_breaker
    Replies:
    2
    Views:
    2,094
    Jack Jia
    Apr 4, 2004
  2. javadev
    Replies:
    5
    Views:
    13,096
    javadev
    Nov 16, 2006
  3. Replies:
    4
    Views:
    852
  4. Grzegorz ¦liwiñski
    Replies:
    2
    Views:
    1,027
    Grzegorz ¦liwiñski
    Jan 19, 2011
  5. majna
    Replies:
    4
    Views:
    786
    Thomas 'PointedEars' Lahn
    Sep 19, 2007
Loading...

Share This Page