javax.mail sends mail with broken charset

L

Laura Schmidt

Hello,

I send mail with javax.mail and I noticed that some special chars are
broken.

I am using UTF-8 on my development system, where the mail generating
code is produced. The received mail also contains "charset=UTF-8".

But it's broken. The string "äß" is delivered as "=E4=DF" ( see below).

How can I fix this?

Laura


Message-ID:
<13382290.1.1401258077688.JavaMail.tomcat7@h1403230.stratoserver.net>
Subject: Test
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
Date: Wed, 28 May 2014 08:21:17 +0200 (CEST)

Test: =E4=DF
 
L

Lars Enderin

2014-05-28 08:30, Laura Schmidt skrev:
Hello,

I send mail with javax.mail and I noticed that some special chars are
broken.

I am using UTF-8 on my development system, where the mail generating
code is produced. The received mail also contains "charset=UTF-8".

But it's broken. The string "äß" is delivered as "=E4=DF" ( see below).

How can I fix this?

Laura


Message-ID:
<13382290.1.1401258077688.JavaMail.tomcat7@h1403230.stratoserver.net>
Subject: Test
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
Date: Wed, 28 May 2014 08:21:17 +0200 (CEST)

Test: =E4=DF

Did you intend to send the two characters "äß", i e LATIN SMALL LETTER A
WITH TILDE, and LATIN CAPITAL LETTER SHARP S? These are iso-8859-1
characters, correctly encoded as "=E4=DF" in quoted-printable. It seems
to be an interpretation error at the receiving end.
 
L

Lars Enderin

2014-05-28 11:13, Lars Enderin skrev:
2014-05-28 08:30, Laura Schmidt skrev:

Did you intend to send the two characters "äß", i e LATIN SMALL LETTER A
WITH TILDE, and LATIN CAPITAL LETTER SHARP S? These are iso-8859-1
characters, correctly encoded as "=E4=DF" in quoted-printable. It seems
to be an interpretation error at the receiving end.
Or you have actually sent the two characters as iso-8859-1, despite the
header. Could you change to 8-bit instead of quoted-printable?
 
J

Joshua Cranmer ðŸ§

Hello,

I send mail with javax.mail and I noticed that some special chars are
broken.

I am using UTF-8 on my development system, where the mail generating
code is produced. The received mail also contains "charset=UTF-8".

But it's broken. The string "äß" is delivered as "=E4=DF" ( see below).

It's not broken. Well, it's as functional as the mail specifications
are, which isn't terribly great, but that's beside the point.
Content-Transfer-Encoding: quoted-printable

See this line? It means that non-ASCII characters (and NUL and "bare"
CR/LF as well, etc.) are "quoted" using an =XX, where X are hexadecimal
characters. See section 6.7 of RFC 2045 for more details.

If what you expected was an 8-bit body that you could just slurp and
understand without needing to do any complicated processing, then you
are sadly extremely mistaken about how messed up the email
specifications are. Email cannot be reliably processed via hacked up
scripts reading/writing it manually; you need to use real libraries.
I've been doing a running series on my blog on just how insane of a mess
everything is.

[If you want an example, look at the source of this message to figure
out how the emoji in my From: header is encoded. And then observe all
the prior occurrences of people quoting that header and managing to
mangle it. I find it quite amusing, personally.]
 
L

Laura Schmidt

Thanks to Lars & Joshua,

the following change to my code solved the problem:

msg.setContent (message,"text/plain;charset=utf-8");

Laura
 
L

Laura Schmidt

the following change to my code solved the problem:

.... but I don't know why. The new mail reads like this:

Message-ID:
<22731732.1.1401347792347.JavaMail.tomcat7@h1403230.stratoserver.net>
Subject: Test
MIME-Version: 1.0
Content-Type: text/plain;charset=utf-8
Content-Transfer-Encoding: quoted-printable
Date: Thu, 29 May 2014 09:16:32 +0200 (CEST)

Test: =C3=A4=C3=9F

The header is the same, but the code for "äß" changed from "=E4=DF " to
"=C3=A4=C3=9F".

Laura
 
A

Andreas Leitgeb

Laura Schmidt said:
I send mail with javax.mail and I noticed that some special chars are
broken.
I am using UTF-8 on my development system, where the mail generating
code is produced. The received mail also contains "charset=UTF-8".
But it's broken. The string "äß" is delivered as "=E4=DF" ( see below).

Maybe you could post the crucial lines of your Java code, like the
one where you (maybe) explicitly set the charset (if this isn't just
the default on your machine), and those, where you pass that String
or bytearray containing the two letters to some javax.mail... method.
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

Test: =E4=DF

Yes, this mail is definitely inconsistent/wrong.

PS: (for Lars) the first of the letters is an umlaut-a, and =E4 is the
umlaut-a's quoted iso-8859-1 (or -15) representation, just not the
expected quoted utf-8 one.
 
A

Andreas Leitgeb

Laura Schmidt said:
... but I don't know why. The new mail reads like this:
Message-ID: <22731732.1.1401347792347.JavaMail.tomcat7@h1403230.stratoserver.net>
Subject: Test
MIME-Version: 1.0
Content-Type: text/plain;charset=utf-8
Content-Transfer-Encoding: quoted-printable
Date: Thu, 29 May 2014 09:16:32 +0200 (CEST)

Test: =C3=A4=C3=9F

The header is the same, but the code for "äß" changed from "=E4=DF " to
"=C3=A4=C3=9F".

Now it is indeed correct.

=C3=A4 is quoted-printable(utf8(ä)) and =C3=9F is quoted-printable(utf8(ß))

PS: I started typing my other followup yesterday, but only today I noticed
that I hadn't sent it off, and did so, before checking what else had been
posted meanwhile...
 
L

Lars Enderin

2014-05-29 09:35, Laura Schmidt skrev:
.... but I don't know why. The new mail reads like this:

Message-ID:
<22731732.1.1401347792347.JavaMail.tomcat7@h1403230.stratoserver.net>
Subject: Test
MIME-Version: 1.0
Content-Type: text/plain;charset=utf-8
Content-Transfer-Encoding: quoted-printable
Date: Thu, 29 May 2014 09:16:32 +0200 (CEST)

Test: =C3=A4=C3=9F

The header is the same, but the code for "äß" changed from "=E4=DF " to
"=C3=A4=C3=9F".

That looks right. To see the original characters, you need a mail client
or other program that understands quoted-printable. In your first
attempt, you seem to have used ISO-8859-1 as the default encoding. What
default encoding does the JVM assume? Inside Java, the characters are
encoded as 16-bit characters, a form of UTF-16. The encoding only
matters when you write to or from a byte stream or file.
 
J

Joerg Meier

Did you intend to send the two characters "äß", i e LATIN SMALL LETTER A
WITH TILDE, and LATIN CAPITAL LETTER SHARP S? These are iso-8859-1
characters, correctly encoded as "=E4=DF" in quoted-printable.

Andreas' Post reminded me that I meant to correct this the other day but
forgot - both of these are wrong: it's not LATIN SMALL LETTER A WITH TILDE
(that would have been =E3), it's LATIN SMALL LETTER A WITH DIAERESIS, and
instead of LATIN CAPITAL LETTER SHARP S (which would have been 1E9E) it is
LATIN SMALL LETTER SHARP S. Both of those show up as what they actually are
in your post.

Liebe Gruesse,
Joerg
 
R

Roedy Green

With a problem like this it a good idea to have a peek at the hex
messages going out with WireShark to help you decide if the fault lies
with sender or receiver.
 
L

Laura Schmidt

The header is the same, but the code for "äß" changed from "=E4=DF " to
"=C3=A4=C3=9F".

I forgot to say that the second mail is displayed correctly in the mail
client (thunderbird) while the first one displays the code itself.

Laura
 
L

Lars Enderin

2014-05-30 12:16, Laura Schmidt skrev:
I forgot to say that the second mail is displayed correctly in the mail
client (thunderbird) while the first one displays the code itself.

Because the two bytes encoded as =E4=D7 are not valid UTF-8, of course.
Look up UTF-8 in Wikipedia, for example.
 
L

Lars Enderin

2014-05-30 13:22, Lars Enderin skrev:
Because the two bytes encoded as =E4=D7 are not valid UTF-8, of course.
Look up UTF-8 in Wikipedia, for example.
DF, not D7
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,012
Latest member
RoxanneDzm

Latest Threads

Top