"Mangled" Servlet Unicode Output Characters

Discussion in 'Java' started by Wolfgang, Jun 9, 2004.

  1. Wolfgang

    Wolfgang Guest

    I have a very simple servlet, see code at

    http://www.alexandria.ucsb.edu/~rnott/tmp/Test.java

    The servlet reads lines from a file tmp1.txt and just writes them back
    to a Web page.

    The lines in tmp1.txt contain UTF-8 encoded text, including some
    special characters, such as the Norwegian 'ø' as in Magerøya, or the
    German 'ö' as in Sömmerda.

    The servlet generates the Web page ok, listing the lines from file
    tmp1.txt.

    However, the special characters like 'ø' and 'ö' don't show up on the
    Web page, instead they are mangled, like 'ö' instead of 'ö' (other,
    regular characters are fine).

    Why are the special characters mangled, and what do I do to have them
    show up properly on the Web page?

    Thanks for your help and advice.

    Wolfgang,
    Santa Barbara, CA
     
    Wolfgang, Jun 9, 2004
    #1
    1. Advertising

  2. Wolfgang wrote:

    > I have a very simple servlet, see code at
    >
    > http://www.alexandria.ucsb.edu/~rnott/tmp/Test.java
    >
    > The servlet reads lines from a file tmp1.txt and just writes them back
    > to a Web page.
    >
    > The lines in tmp1.txt contain UTF-8 encoded text, including some
    > special characters, such as the Norwegian 'ø' as in Magerøya, or the
    > German 'ö' as in Sömmerda.
    >
    > The servlet generates the Web page ok, listing the lines from file
    > tmp1.txt.
    >
    > However, the special characters like 'ø' and 'ö' don't show up on the
    > Web page, instead they are mangled, like 'ö' instead of 'ö' (other,
    > regular characters are fine).
    >
    > Why are the special characters mangled, and what do I do to have them
    > show up properly on the Web page?


    These are typical symptoms of a character encoding mismatch. I see in
    your source code that you use the system's default encoding to read the
    text file. If the system default is not UTF-8 then that will be a
    problem. You almost have it right in that regard: just pass the string
    "UTF-8" an additional parameter to your InputStreamReader's constructor
    (you will also have to add a handler for an additional checked exception).

    Your output HTML is also a bit funky, as you are declaring an XML
    document with the HTML 4 / transitional DTD. Transitional HTML 4 is not
    necessarily well-formed XML. You should probably either drop the XML
    declaration or (if you can) go all the way to XHTML. This is probably
    not the cause of your current problem, but either way, I recommend that
    you specify the charset in the response's content-type, rather than
    relying on the XML declaration. To do so, use
    response.setContentType("text/html; charset=UTF-8");


    John Bollinger
     
    John C. Bollinger, Jun 9, 2004
    #2
    1. Advertising

  3. Wolfgang

    Wolfgang Guest

    Thanks, John

    for your corrections to my code. This makes things work.

    For those interested, I also found essentially the same advice (with
    more detail) at
    http://www.jorendorff.com/articles/unicode/java.html

    Wolfgang


    "John C. Bollinger" <> wrote:
    >
    >These are typical symptoms of a character encoding mismatch. I see in
    >your source code that you use the system's default encoding to read the
    >text file. If the system default is not UTF-8 then that will be a
    >problem. You almost have it right in that regard: just pass the string
    >"UTF-8" an additional parameter to your InputStreamReader's constructor
    >(you will also have to add a handler for an additional checked exception).
    >
    >Your output HTML is also a bit funky, as you are declaring an XML
    >document with the HTML 4 / transitional DTD. Transitional HTML 4 is not
    >necessarily well-formed XML. You should probably either drop the XML
    >declaration or (if you can) go all the way to XHTML. This is probably
    >not the cause of your current problem, but either way, I recommend that
    >you specify the charset in the response's content-type, rather than
    >relying on the XML declaration. To do so, use
    > response.setContentType("text/html; charset=UTF-8");
    >
    >John Bollinger
    >
    >
    >Wolfgang wrote:
    >
    >> I have a very simple servlet, see code at
    >>
    >> http://www.alexandria.ucsb.edu/~rnott/tmp/Test.java
    >>
    >> The servlet reads lines from a file tmp1.txt and just writes them back
    >> to a Web page.
    >>
    >> The lines in tmp1.txt contain UTF-8 encoded text, including some
    >> special characters, such as the Norwegian 'ø' as in Magerøya, or the
    >> German 'ö' as in Sömmerda.
    >>
    >> The servlet generates the Web page ok, listing the lines from file
    >> tmp1.txt.
    >>
    >> However, the special characters like 'ø' and 'ö' don't show up on the
    >> Web page, instead they are mangled, like 'ö' instead of 'ö' (other,
    >> regular characters are fine).
    >>
    >> Why are the special characters mangled, and what do I do to have them
    >> show up properly on the Web page?

    >
     
    Wolfgang, Jun 9, 2004
    #3
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. John
    Replies:
    0
    Views:
    391
  2. circuit_breaker
    Replies:
    2
    Views:
    2,045
    Jack Jia
    Apr 4, 2004
  3. N Cook
    Replies:
    14
    Views:
    843
    N Cook
    Jun 3, 2005
  4. Luigi Donatello Asero

    Re: Google cached version mangled

    Luigi Donatello Asero, May 16, 2005, in forum: HTML
    Replies:
    0
    Views:
    494
    Luigi Donatello Asero
    May 16, 2005
  5. Grzegorz ¦liwiñski
    Replies:
    2
    Views:
    985
    Grzegorz ¦liwiñski
    Jan 19, 2011
Loading...

Share This Page