UTF8 characters not appearing correctly in email subject line

A

Andee Weir

Hi everyone,

Thanks for taking the time to look at this.

I've got a problem trying to send emails with 'unusual' characters
(e.g. ó) in the Subject (& contents). The following code gives an
example :-

package mail;

import javax.mail.*;
import java.util.*;
import javax.mail.internet.*;

public class Send {
static String mailServer = "smtp.server.com";
static String mailFrom = "(e-mail address removed)";
static String mailTo = "(e-mail address removed)";
static String mailSubject = "Actualización";
static String mailBody = "Actualización";
static Properties props = System.getProperties();

public static void Send() {

props.put("mail.host",mailServer);
props.put("mail.transport.protocol","smtp");
Session mailSession = Session.getDefaultInstance(props,null);
mailSession.setDebug(false);
MimeMessage msg = new MimeMessage(mailSession);
try {
msg.setFrom(new InternetAddress(mailFrom));
InternetAddress[] address = {new InternetAddress(mailTo)};
msg.setRecipients(Message.RecipientType.TO,address);
msg.setSubject(mailSubject, "UTF8");
msg.setSentDate(new Date());
msg.setText(mailBody, "UTF8");
Transport.send(msg);
} catch (Exception e) {
}
}

public static void main(String[] args) {
Send();
}
}

When I run this I end up with '=?UTF8?Q?Actualizaci=C3=B3n?=' in the
subject & the contents of 'This message uses a character set that is
not supported by the Internet Service. To view the original message
content, open the attached message. If the text doesn't display
correctly, save the attachment to disk, and then open it using a
viewer that can display the original character set.'.

The email server can deal with these special characters as I can use
Outlook to create the email I require.

Thanks in advance for any solutions.

Andee
 
B

Barry White

I am not an expert in internet mail but my understanding is that there
are still a significant number of SMTP servers that only support 7bit
ASCII encoding.

If your email happens to pass through just one of these it is likely to
get mangled. You send email to your smtp server to send on to the final
destination. It will pass through other servers on it's way and one of
these may only support ASCII.

Perhaps someone can expand on this or correct it if necessary,

Barry
 
J

Jon A. Cruz

Andee said:
Hi everyone,

Thanks for taking the time to look at this.

I've got a problem trying to send emails with 'unusual' characters
(e.g. ó) in the Subject (& contents). The following code gives an
example :-

static String mailSubject = "Actualización";
static String mailBody = "Actualización";

One minor point. Instead of using non-ASCII characters directly in your
sources, use the Unicode escapes instead. \uxxxx. native2ascii can help
you with this.
msg.setSubject(mailSubject, "UTF8");
msg.setSentDate(new Date());
msg.setText(mailBody, "UTF8");

Not sure if this is a problem, but as of JDK 1.2 and higher it was
changed to the "official" string of "UTF-8".

When I run this I end up with '=?UTF8?Q?Actualizaci=C3=B3n?=' in the
subject & the contents of 'This message uses a character set that is
not supported by the Internet Service. To view the original message
content, open the attached message. If the text doesn't display
correctly, save the attachment to disk, and then open it using a
viewer that can display the original character set.'.

Aha! Looks like your mail went out exactly as you told it to. That is
the proper way to encode non-ASCII mail headers.

However... the proper encoding name is "UTF-8". "UTF8" is just Java's
internal key used to identify the encoding, not it's public name.
 
J

Jon A. Cruz

Barry said:
I am not an expert in internet mail but my understanding is that there
are still a significant number of SMTP servers that only support 7bit
ASCII encoding.

If your email happens to pass through just one of these it is likely to
get mangled. You send email to your smtp server to send on to the final
destination. It will pass through other servers on it's way and one of
these may only support ASCII.

Perhaps someone can expand on this or correct it if necessary,


Yes.

Non-ASCII text generally needs to get handled explicitly to make it
through mail gateways. Hence the need for the character set. Once that
is on there, intermediate mail gateways are free to change the transfer
encoding to get through.

Mail headers are a little different in how they get across. However, his
are properly encoded in ASCII-only escapes:
'=?UTF8?Q?Actualizaci=C3=B3n?='

That means "Character set is 'UTF8'" followed by "This data is Q-encoded".

The OP's problem is most likely only that he is using Java's internal
name of "UTF8" instead of the proper public name of "UTF-8".
 
A

Andee Weir

Thanks for the help guys - it worked a treat.

Just a word of warning - when I ran native2ascii in dos & typed in ó
it returned \u00a2 which when I used it in the email code returned a
cent symbol. The actual code I required was \u00f3 (found at
http://www.unicode.org/charts/PDF/U0080.pdf).

Thanks again,

Andee
 
M

Michael Borgwardt

Andee said:
Thanks for the help guys - it worked a treat.

Just a word of warning - when I ran native2ascii in dos & typed in ó
it returned \u00a2 which when I used it in the email code returned a
cent symbol. The actual code I required was \u00f3

Then you probably were using different encodings for creating the file
and running native2ascii.
 
H

Hiran Chaudhuri

Jon A. Cruz said:
Barry White wrote:
Mail headers are a little different in how they get across. However, his
are properly encoded in ASCII-only escapes:
'=?UTF8?Q?Actualizaci=C3=B3n?='

That means "Character set is 'UTF8'" followed by "This data is Q-encoded".

Watch out: UTF-8 is not ASCII

UTF-8 uses 8 bits, while ASCII uses 7 bits. You won't notice a difference as
long as you use ASCII text only, as both charsets map to the same bit values
here. But as soon as you need other characters, UTF-8 will introduce a
multibyte character with the most significant bit in the first byte set =>
no ASCII.

To ensure such text goes through 7bit clean ASCII mail transfer agents, use
something like MIME encoding, such as Base64.

Hiran
 
T

Thomas Weidenfeller

Hiran said:
Watch out: UTF-8 is not ASCII

UTF-8 uses 8 bits, while ASCII uses 7 bits.

Jon got it absolutely right. I have explained that a few times here. An
encoding information like "UTF-8" in such a mail header does not mean
that the data is in that encoding. It means the data WAS in that
encoding. It is now in plain 7 bit ASCII. The encoding information is
there, so MUAs can reconstruct the original string.
To ensure such text goes through 7bit clean ASCII mail transfer agents, use
something like MIME encoding, such as Base64.

In headers you use Q or B encoding.

/Thomas
 
J

Jon A. Cruz

Hiran said:
Watch out: UTF-8 is not ASCII

Yes, I know. And so does the mail API the poster was using. That's the
reason for the second part of my statement "This data is Q-encoded".

UTF-8 uses 8 bits, while ASCII uses 7 bits. You won't notice a difference as
long as you use ASCII text only, as both charsets map to the same bit values
here. But as soon as you need other characters, UTF-8 will introduce a
multibyte character with the most significant bit in the first byte set =>
no ASCII.

Right.

Look at that subject line. That's what is there. Multibyte UTF-8
characters encoded into 7-bit ASCII only.

Notice the difference between "character set" and "encoding".
To ensure such text goes through 7bit clean ASCII mail transfer agents, use
something like MIME encoding, such as Base64.

Or for header lines do like the poster's mail API does and follow RFC
2047. "Q" encoding is most often used when only a few characters are
non-ASCII, or when enough of them to carry meaning are ASCII. "B"
encoding uses Base64. BTW, that subject line *is* using MIME, since it's
following RFC-2045 through RFC-2049.


However... the person using the API doesn't have to worry about the
mechanics. They pass in a Unicode string (because Java is Unicode) along
with the character set you'd like it to try to use and the mail API
takes care of the rest.
 
J

Jon A. Cruz

Michael said:
Then you probably were using different encodings for creating the file
and running native2ascii.

Yes.

That was *exactly* one of the largest reasons not to use non-ASCII
litterally in sources.

"o acute" is 0xa2 in CodePage 437 (the DOS code page) and CodePage 850
(the DOS international code page), while it is 0xf3 in Latin-1
(ISO-8859-1) and CodePage 1252 (default Windows western).

One thing native2ascii lets you set is the encoding. If you've been
editing sources in Windows, you'd probably want to call it with
Cp1252 as the explicit encoding.
 
Joined
Aug 5, 2007
Messages
2
Reaction score
0
I have problems logging

Hello! Help solve the problem.
Very often try to enter the forum, but says that the password is not correct.
Regrettably use of remembering. Give like to be?
Thank you!
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,013
Latest member
KatriceSwa

Latest Threads

Top