javax.mail sends mail with broken charset

Discussion in 'Java' started by Laura Schmidt, May 28, 2014.

  1. Hello,

    I send mail with javax.mail and I noticed that some special chars are

    I am using UTF-8 on my development system, where the mail generating
    code is produced. The received mail also contains "charset=UTF-8".

    But it's broken. The string "äß" is delivered as "=E4=DF" ( see below).

    How can I fix this?


    Subject: Test
    MIME-Version: 1.0
    Content-Type: text/plain; charset=UTF-8
    Content-Transfer-Encoding: quoted-printable
    Date: Wed, 28 May 2014 08:21:17 +0200 (CEST)

    Test: =E4=DF
    Laura Schmidt, May 28, 2014
    1. Advertisements

  2. Laura Schmidt

    Lars Enderin Guest

    2014-05-28 08:30, Laura Schmidt skrev:
    Did you intend to send the two characters "äß", i e LATIN SMALL LETTER A
    WITH TILDE, and LATIN CAPITAL LETTER SHARP S? These are iso-8859-1
    characters, correctly encoded as "=E4=DF" in quoted-printable. It seems
    to be an interpretation error at the receiving end.
    Lars Enderin, May 28, 2014
    1. Advertisements

  3. Laura Schmidt

    Lars Enderin Guest

    2014-05-28 11:13, Lars Enderin skrev:
    Or you have actually sent the two characters as iso-8859-1, despite the
    header. Could you change to 8-bit instead of quoted-printable?
    Lars Enderin, May 28, 2014
  4. It's not broken. Well, it's as functional as the mail specifications
    are, which isn't terribly great, but that's beside the point.
    See this line? It means that non-ASCII characters (and NUL and "bare"
    CR/LF as well, etc.) are "quoted" using an =XX, where X are hexadecimal
    characters. See section 6.7 of RFC 2045 for more details.

    If what you expected was an 8-bit body that you could just slurp and
    understand without needing to do any complicated processing, then you
    are sadly extremely mistaken about how messed up the email
    specifications are. Email cannot be reliably processed via hacked up
    scripts reading/writing it manually; you need to use real libraries.
    I've been doing a running series on my blog on just how insane of a mess
    everything is.

    [If you want an example, look at the source of this message to figure
    out how the emoji in my From: header is encoded. And then observe all
    the prior occurrences of people quoting that header and managing to
    mangle it. I find it quite amusing, personally.]
    Joshua Cranmer ðŸ§, May 28, 2014
  5. Thanks to Lars & Joshua,

    the following change to my code solved the problem:

    msg.setContent (message,"text/plain;charset=utf-8");

    Laura Schmidt, May 29, 2014
  6. .... but I don't know why. The new mail reads like this:

    Subject: Test
    MIME-Version: 1.0
    Content-Type: text/plain;charset=utf-8
    Content-Transfer-Encoding: quoted-printable
    Date: Thu, 29 May 2014 09:16:32 +0200 (CEST)

    Test: =C3=A4=C3=9F

    The header is the same, but the code for "äß" changed from "=E4=DF " to

    Laura Schmidt, May 29, 2014
  7. Maybe you could post the crucial lines of your Java code, like the
    one where you (maybe) explicitly set the charset (if this isn't just
    the default on your machine), and those, where you pass that String
    or bytearray containing the two letters to some javax.mail... method.
    Yes, this mail is definitely inconsistent/wrong.

    PS: (for Lars) the first of the letters is an umlaut-a, and =E4 is the
    umlaut-a's quoted iso-8859-1 (or -15) representation, just not the
    expected quoted utf-8 one.
    Andreas Leitgeb, May 29, 2014
  8. Now it is indeed correct.

    =C3=A4 is quoted-printable(utf8(ä)) and =C3=9F is quoted-printable(utf8(ß))

    PS: I started typing my other followup yesterday, but only today I noticed
    that I hadn't sent it off, and did so, before checking what else had been
    posted meanwhile...
    Andreas Leitgeb, May 29, 2014
  9. Laura Schmidt

    Lars Enderin Guest

    2014-05-29 09:35, Laura Schmidt skrev:
    That looks right. To see the original characters, you need a mail client
    or other program that understands quoted-printable. In your first
    attempt, you seem to have used ISO-8859-1 as the default encoding. What
    default encoding does the JVM assume? Inside Java, the characters are
    encoded as 16-bit characters, a form of UTF-16. The encoding only
    matters when you write to or from a byte stream or file.
    Lars Enderin, May 29, 2014
  10. Laura Schmidt

    Joerg Meier Guest

    Andreas' Post reminded me that I meant to correct this the other day but
    forgot - both of these are wrong: it's not LATIN SMALL LETTER A WITH TILDE
    (that would have been =E3), it's LATIN SMALL LETTER A WITH DIAERESIS, and
    instead of LATIN CAPITAL LETTER SHARP S (which would have been 1E9E) it is
    LATIN SMALL LETTER SHARP S. Both of those show up as what they actually are
    in your post.

    Liebe Gruesse,
    Joerg Meier, May 29, 2014
  11. Laura Schmidt

    Roedy Green Guest

    With a problem like this it a good idea to have a peek at the hex
    messages going out with WireShark to help you decide if the fault lies
    with sender or receiver.
    Roedy Green, May 29, 2014
  12. I forgot to say that the second mail is displayed correctly in the mail
    client (thunderbird) while the first one displays the code itself.

    Laura Schmidt, May 30, 2014
  13. Laura Schmidt

    Lars Enderin Guest

    2014-05-30 12:16, Laura Schmidt skrev:
    Because the two bytes encoded as =E4=D7 are not valid UTF-8, of course.
    Look up UTF-8 in Wikipedia, for example.
    Lars Enderin, May 30, 2014
  14. Laura Schmidt

    Lars Enderin Guest

    2014-05-30 13:22, Lars Enderin skrev:
    DF, not D7
    Lars Enderin, May 30, 2014
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.