javax.mail sends mail with broken charset

Discussion in 'Java' started by Laura Schmidt, May 28, 2014.

  1. Hello,

    I send mail with javax.mail and I noticed that some special chars are
    broken.

    I am using UTF-8 on my development system, where the mail generating
    code is produced. The received mail also contains "charset=UTF-8".

    But it's broken. The string "äß" is delivered as "=E4=DF" ( see below).

    How can I fix this?

    Laura


    Message-ID:
    <>
    Subject: Test
    MIME-Version: 1.0
    Content-Type: text/plain; charset=UTF-8
    Content-Transfer-Encoding: quoted-printable
    Date: Wed, 28 May 2014 08:21:17 +0200 (CEST)

    Test: =E4=DF
    Laura Schmidt, May 28, 2014
    #1
    1. Advertising

  2. Laura Schmidt

    Lars Enderin Guest

    2014-05-28 08:30, Laura Schmidt skrev:
    > Hello,
    >
    > I send mail with javax.mail and I noticed that some special chars are
    > broken.
    >
    > I am using UTF-8 on my development system, where the mail generating
    > code is produced. The received mail also contains "charset=UTF-8".
    >
    > But it's broken. The string "äß" is delivered as "=E4=DF" ( see below).
    >
    > How can I fix this?
    >
    > Laura
    >
    >
    > Message-ID:
    > <>
    > Subject: Test
    > MIME-Version: 1.0
    > Content-Type: text/plain; charset=UTF-8
    > Content-Transfer-Encoding: quoted-printable
    > Date: Wed, 28 May 2014 08:21:17 +0200 (CEST)
    >
    > Test: =E4=DF
    >


    Did you intend to send the two characters "äß", i e LATIN SMALL LETTER A
    WITH TILDE, and LATIN CAPITAL LETTER SHARP S? These are iso-8859-1
    characters, correctly encoded as "=E4=DF" in quoted-printable. It seems
    to be an interpretation error at the receiving end.

    --
    Lars Enderin
    Lars Enderin, May 28, 2014
    #2
    1. Advertising

  3. Laura Schmidt

    Lars Enderin Guest

    2014-05-28 11:13, Lars Enderin skrev:
    > 2014-05-28 08:30, Laura Schmidt skrev:
    >> Hello,
    >>
    >> I send mail with javax.mail and I noticed that some special chars are
    >> broken.
    >>
    >> I am using UTF-8 on my development system, where the mail generating
    >> code is produced. The received mail also contains "charset=UTF-8".
    >>
    >> But it's broken. The string "äß" is delivered as "=E4=DF" ( see below).
    >>
    >> How can I fix this?
    >>
    >> Laura
    >>
    >>
    >> Message-ID:
    >> <>
    >> Subject: Test
    >> MIME-Version: 1.0
    >> Content-Type: text/plain; charset=UTF-8
    >> Content-Transfer-Encoding: quoted-printable
    >> Date: Wed, 28 May 2014 08:21:17 +0200 (CEST)
    >>
    >> Test: =E4=DF
    >>

    >
    > Did you intend to send the two characters "äß", i e LATIN SMALL LETTER A
    > WITH TILDE, and LATIN CAPITAL LETTER SHARP S? These are iso-8859-1
    > characters, correctly encoded as "=E4=DF" in quoted-printable. It seems
    > to be an interpretation error at the receiving end.
    >

    Or you have actually sent the two characters as iso-8859-1, despite the
    header. Could you change to 8-bit instead of quoted-printable?

    --
    Lars Enderin
    Lars Enderin, May 28, 2014
    #3
  4. On 5/28/2014 1:30 AM, Laura Schmidt wrote:
    > Hello,
    >
    > I send mail with javax.mail and I noticed that some special chars are
    > broken.
    >
    > I am using UTF-8 on my development system, where the mail generating
    > code is produced. The received mail also contains "charset=UTF-8".
    >
    > But it's broken. The string "äß" is delivered as "=E4=DF" ( see below).


    It's not broken. Well, it's as functional as the mail specifications
    are, which isn't terribly great, but that's beside the point.

    > Content-Transfer-Encoding: quoted-printable


    See this line? It means that non-ASCII characters (and NUL and "bare"
    CR/LF as well, etc.) are "quoted" using an =XX, where X are hexadecimal
    characters. See section 6.7 of RFC 2045 for more details.

    If what you expected was an 8-bit body that you could just slurp and
    understand without needing to do any complicated processing, then you
    are sadly extremely mistaken about how messed up the email
    specifications are. Email cannot be reliably processed via hacked up
    scripts reading/writing it manually; you need to use real libraries.
    I've been doing a running series on my blog on just how insane of a mess
    everything is.

    [If you want an example, look at the source of this message to figure
    out how the emoji in my From: header is encoded. And then observe all
    the prior occurrences of people quoting that header and managing to
    mangle it. I find it quite amusing, personally.]

    --
    Beware of bugs in the above code; I have only proved it correct, not
    tried it. -- Donald E. Knuth
    Joshua Cranmer ðŸ§, May 28, 2014
    #4
  5. Thanks to Lars & Joshua,

    the following change to my code solved the problem:

    msg.setContent (message,"text/plain;charset=utf-8");

    Laura
    Laura Schmidt, May 29, 2014
    #5
  6. On 05/29/2014 09:19 AM, Laura Schmidt wrote:

    > the following change to my code solved the problem:


    .... but I don't know why. The new mail reads like this:

    Message-ID:
    <>
    Subject: Test
    MIME-Version: 1.0
    Content-Type: text/plain;charset=utf-8
    Content-Transfer-Encoding: quoted-printable
    Date: Thu, 29 May 2014 09:16:32 +0200 (CEST)

    Test: =C3=A4=C3=9F

    The header is the same, but the code for "äß" changed from "=E4=DF " to
    "=C3=A4=C3=9F".

    Laura
    Laura Schmidt, May 29, 2014
    #6
  7. Laura Schmidt <> wrote:
    > I send mail with javax.mail and I noticed that some special chars are
    > broken.
    > I am using UTF-8 on my development system, where the mail generating
    > code is produced. The received mail also contains "charset=UTF-8".
    > But it's broken. The string "äß" is delivered as "=E4=DF" ( see below).


    Maybe you could post the crucial lines of your Java code, like the
    one where you (maybe) explicitly set the charset (if this isn't just
    the default on your machine), and those, where you pass that String
    or bytearray containing the two letters to some javax.mail... method.

    > MIME-Version: 1.0
    > Content-Type: text/plain; charset=UTF-8
    > Content-Transfer-Encoding: quoted-printable
    >
    > Test: =E4=DF


    Yes, this mail is definitely inconsistent/wrong.

    PS: (for Lars) the first of the letters is an umlaut-a, and =E4 is the
    umlaut-a's quoted iso-8859-1 (or -15) representation, just not the
    expected quoted utf-8 one.
    Andreas Leitgeb, May 29, 2014
    #7
  8. Laura Schmidt <> wrote:
    > On 05/29/2014 09:19 AM, Laura Schmidt wrote:
    >> the following change to my code solved the problem:

    > ... but I don't know why. The new mail reads like this:
    > Message-ID: <>
    > Subject: Test
    > MIME-Version: 1.0
    > Content-Type: text/plain;charset=utf-8
    > Content-Transfer-Encoding: quoted-printable
    > Date: Thu, 29 May 2014 09:16:32 +0200 (CEST)
    >
    > Test: =C3=A4=C3=9F
    >
    > The header is the same, but the code for "äß" changed from "=E4=DF " to
    > "=C3=A4=C3=9F".


    Now it is indeed correct.

    =C3=A4 is quoted-printable(utf8(ä)) and =C3=9F is quoted-printable(utf8(ß))

    PS: I started typing my other followup yesterday, but only today I noticed
    that I hadn't sent it off, and did so, before checking what else had been
    posted meanwhile...
    Andreas Leitgeb, May 29, 2014
    #8
  9. Laura Schmidt

    Lars Enderin Guest

    2014-05-29 09:35, Laura Schmidt skrev:
    > On 05/29/2014 09:19 AM, Laura Schmidt wrote:
    >
    >> the following change to my code solved the problem:

    >
    > .... but I don't know why. The new mail reads like this:
    >
    > Message-ID:
    > <>
    > Subject: Test
    > MIME-Version: 1.0
    > Content-Type: text/plain;charset=utf-8
    > Content-Transfer-Encoding: quoted-printable
    > Date: Thu, 29 May 2014 09:16:32 +0200 (CEST)
    >
    > Test: =C3=A4=C3=9F
    >
    > The header is the same, but the code for "äß" changed from "=E4=DF " to
    > "=C3=A4=C3=9F".
    >


    That looks right. To see the original characters, you need a mail client
    or other program that understands quoted-printable. In your first
    attempt, you seem to have used ISO-8859-1 as the default encoding. What
    default encoding does the JVM assume? Inside Java, the characters are
    encoded as 16-bit characters, a form of UTF-16. The encoding only
    matters when you write to or from a byte stream or file.

    --
    Lars Enderin
    Lars Enderin, May 29, 2014
    #9
  10. Laura Schmidt

    Joerg Meier Guest

    On Wed, 28 May 2014 11:13:18 +0200, Lars Enderin wrote:

    > Did you intend to send the two characters "äß", i e LATIN SMALL LETTER A
    > WITH TILDE, and LATIN CAPITAL LETTER SHARP S? These are iso-8859-1
    > characters, correctly encoded as "=E4=DF" in quoted-printable.


    Andreas' Post reminded me that I meant to correct this the other day but
    forgot - both of these are wrong: it's not LATIN SMALL LETTER A WITH TILDE
    (that would have been =E3), it's LATIN SMALL LETTER A WITH DIAERESIS, and
    instead of LATIN CAPITAL LETTER SHARP S (which would have been 1E9E) it is
    LATIN SMALL LETTER SHARP S. Both of those show up as what they actually are
    in your post.

    Liebe Gruesse,
    Joerg

    --
    Ich lese meine Emails nicht, replies to Email bleiben also leider
    ungelesen.
    Joerg Meier, May 29, 2014
    #10
  11. Laura Schmidt

    Roedy Green Guest

    On Wed, 28 May 2014 08:30:18 +0200, Laura Schmidt <>
    wrote, quoted or indirectly quoted someone who said :

    With a problem like this it a good idea to have a peek at the hex
    messages going out with WireShark to help you decide if the fault lies
    with sender or receiver.
    --
    Roedy Green Canadian Mind Products http://mindprod.com
    Young man, in mathematics you don't understand things.
    You just get used to them.
    ~ John von Neumann (born: 1903-12-28 died: 1957-02-08 at age: 53)
    Roedy Green, May 29, 2014
    #11
  12. On 05/29/2014 09:35 AM, Laura Schmidt wrote:

    > The header is the same, but the code for "äß" changed from "=E4=DF " to
    > "=C3=A4=C3=9F".


    I forgot to say that the second mail is displayed correctly in the mail
    client (thunderbird) while the first one displays the code itself.

    Laura
    Laura Schmidt, May 30, 2014
    #12
  13. Laura Schmidt

    Lars Enderin Guest

    2014-05-30 12:16, Laura Schmidt skrev:
    > On 05/29/2014 09:35 AM, Laura Schmidt wrote:
    >
    >> The header is the same, but the code for "äß" changed from "=E4=DF " to
    >> "=C3=A4=C3=9F".

    >
    > I forgot to say that the second mail is displayed correctly in the mail
    > client (thunderbird) while the first one displays the code itself.


    Because the two bytes encoded as =E4=D7 are not valid UTF-8, of course.
    Look up UTF-8 in Wikipedia, for example.

    --
    Lars Enderin
    Lars Enderin, May 30, 2014
    #13
  14. Laura Schmidt

    Lars Enderin Guest

    2014-05-30 13:22, Lars Enderin skrev:

    > Because the two bytes encoded as =E4=D7 are not valid UTF-8, of course.
    > Look up UTF-8 in Wikipedia, for example.
    >

    DF, not D7

    --
    Lars Enderin
    Lars Enderin, May 30, 2014
    #14
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. sunil
    Replies:
    0
    Views:
    600
    sunil
    Jul 28, 2004
  2. Replies:
    1
    Views:
    741
    Esmond Pitt
    Mar 27, 2005
  3. lizard
    Replies:
    0
    Views:
    1,763
    lizard
    Jan 30, 2006
  4. Andrew Thompson

    javax.servlet and javax.servlet.http

    Andrew Thompson, Apr 24, 2007, in forum: Java
    Replies:
    1
    Views:
    654
    newbie_at_tomcat
    Apr 25, 2007
  5. optimistx

    javascript charset <> page charset

    optimistx, Aug 14, 2008, in forum: Javascript
    Replies:
    2
    Views:
    270
    optimistx
    Aug 15, 2008
Loading...

Share This Page