UTF-8 subject in an email

Y

Yohan N. Leder

Hello,

I would like to send UTF-8 material and binary attachment in an email.
So, for this, I've built a multipart/mixed MIME message : no problem
about this.

But, my problem is about UTF-8 encoding.

The two parts which can contains no-ascii characters using UTF-8
encoding are : the subject and the body. So, I've first tried to encode
the subject line as indicated at http://www.ietf.org/rfc/rfc2047.txt. It
gives something like this :

use utf8;
use open ':utf8';
use Encode;
my $subject = encode('MIME-B', "boîte");

Email is well sent and received, but the client email I'm using (Eudora)
display the accentuated "i" as "î".

I've also tried not using Encode::MIME::Header, with several ways like :

my $subject = "=?UTF-8?B?".2b64("boîte")."?=";
my $subject = "=?UTF-8?B?".2b64(encode_utf8("boîte"))."?=";

And, even without "use open ':utf8';" which is used at sendmail pipe
time through "open (SENDMAIL, "|sendmail -t") or die "err open sendmail"

So, what's wrong ?

And I've not seen about body at this time :-(
 
B

Bart Van der Donck

Yohan said:
I would like to send UTF-8 material and binary attachment in an email.
So, for this, I've built a multipart/mixed MIME message : no problem
about this.

But, my problem is about UTF-8 encoding.

The two parts which can contains no-ascii characters using UTF-8
encoding are : the subject and the body. So, I've first tried to encode
the subject line as indicated at http://www.ietf.org/rfc/rfc2047.txt. It
gives something like this :

use utf8;
use open ':utf8';
use Encode;
my $subject = encode('MIME-B', "boîte");

Email is well sent and received, but the client email I'm using (Eudora)
display the accentuated "i" as "î".

I've also tried not using Encode::MIME::Header, with several ways like :

my $subject = "=?UTF-8?B?".2b64("boîte")."?=";
my $subject = "=?UTF-8?B?".2b64(encode_utf8("boîte"))."?=";

And, even without "use open ':utf8';" which is used at sendmail pipe
time through "open (SENDMAIL, "|sendmail -t") or die "err open sendmail"

So, what's wrong ?

I think this is not a direct UTF-8 issue. If you encode the subject
line correctly to 7-bit ascii, it's then up to the email client how it
will decode the subject. I think there's not much more you can do.
Maybe Eudora doesn't like RFC2047 or UTF-8, or has only limited support
for it.

Does it work in Eudora when you have a line like:

Subject: =?utf-8?Q?bo=C3=AEte?=

or

Subject: =?utf-8?B?Ym/DrnRl?=

in stead of 'boîte' ?

I would suggest to test your subject lines in as much emailclients as
possible.
 
D

Dr.Ruud

Yohan N. Leder schreef:
use utf8;

This means that (you state that) your source code is in UTF-8 encoding.

my $subject = encode('MIME-B', "boîte");

That "boîte" is probably not in UTF-8 but in Latin1?

Try "bo\x{EE}te", or just remove the "use utf8;" line.

Have you read the new perlunitut already?
(I have no URL, use Google)
 
Y

Yohan N. Leder

I think this is not a direct UTF-8 issue. If you encode the subject
line correctly to 7-bit ascii, it's then up to the email client how it
will decode the subject. I think there's not much more you can do.

I effectively declare header with "Content-Transfer-Encoding: 7bit".
Maybe Eudora doesn't like RFC2047 or UTF-8, or has only limited support
for it.

Hum, after you sentence above, I has a doubt and searched quickly using
"eudora utf-8" in Google... And, I've found this page
<http://eudorabb.qualcomm.com/printthread.php?t=3227> where we can read
: "The e-mail standard says that all e-mail programs modified 1999 or
later must (read "must"!) understand UTF-8 encoded e-mails. The people
at Qualcomm didn't manage to make Eudara UTF-8 compliant, though. So e-
mails you receive that are in UTF-8 may contain funny characters. Change
to a different client? No, Eudora is fine. It is among the best e-mail
clients I have ever seen. Except for this bug, that is..."

I don't know if it's always the case (I'm using the last one : Eudora
7.0), but it sounds strange for a pionneer like Eudora...

Well, I've trie the eudora plugin they talk about (called utf8iso) and :

- it works partially for the body : well take care of the "Content-Type:
text/plain; charset=utf-8" statement, but "Content-Transfer-Encoding:
8bit" appear in body as if it was simple text rather than a statement.

- it doesn't do anything for the encoded subject line.

What do the Qualcomm guys do ? Move to Unicode is required since 1999
according to said:
Does it work in Eudora when you have a line like:
Subject: =?utf-8?Q?bo=C3=AEte?=
or
Subject: =?utf-8?B?Ym/DrnRl?=

I got the same : not any UTF-8 interpretation... But, now, it's obvious
it's the Eudora's fault :( It's a pity !
I would suggest to test your subject lines in as much emailclients as
possible.

Thus, effectively, I'll have to check in several email clients, but -
and this but is heavy - it's really a pity due to the Eudora
popularity...
 
Y

Yohan N. Leder

Yohan N. Leder schreef:


This means that (you state that) your source code is in UTF-8 encoding.

Yes ! Edited under Komodo only.
That "boîte" is probably not in UTF-8 but in Latin1?

No, because of the reply above : all literals are automatically in UTF-
8.
Try "bo\x{EE}te",

Not any change, I got "boîte" as subject line in Eudora (even,
including the utf8iso plugin).
or just remove the "use utf8;" line.

Always the same, whatever be the way : without "use utf8;", going
through "=?utf-8?B?..." or "=?utf-8?Q?..."
Have you read the new perlunitut already?
(I have no URL, use Google)

Found here <http://users.tkk.fi/~jhi/perlunitut.pod>. Sounds like the
same as perluniintro (from memory).
 
Y

Yohan N. Leder

I would like to send UTF-8 material and binary attachment in an email.
So, for this, I've built a multipart/mixed MIME message : no problem
about this.

But, my problem is about UTF-8 encoding.

Email is well sent and received, but the client email I'm using (Eudora)
display the accentuated "i" as "î".

OK, I've searched a little and found that some popular client emails and
webmail interfaces doesn't take care at all of Unicode (for example,
take a look at :

http://www.questionpoint.org/crs/html/help/en/home/home_unicode.html

It's a pity, but I can't ignore it, simply because a lot of my users
will use some of thoses (first being Eudora).

Then, even if all of my scripts now handle Unicode (UTF-8) at every
level, I have to found a way to send the emails they have to send in a
way which may be supported by the majority of popular email clients.

So, I've though about an algorithm, knowing subject and body come from
UTF-8 strings.

About Subject :
------------
Since the email scripts have to send are reports and notifications, I
can decide of the subject (thus, remove actual possibility for use to
change it) and manage to povide some in ISO-8859-1 only. However, since
something like Eudora just ignore the word-encoding, I will simply pass
subject as it is, and without charset specification.

About Body :
----------
Here it's different, because it's amix between stuffs from script itself
and checked user inputs... So, it may contain any characters and signs,
including accentuated upper/lower, euro, percent, etc. Then, here is the
way I've though, awaiting your opinion :

1 - Does body contains at least one character which cannot be
represented in ISO-8859-1 ?

2a - No => I add "binmode(SENDMAIL,':encoding(iso-8859-1)')" after pipe
opening, and just indicate "Content-Type: text/plain; charset='iso-8859-
1'" ahead of body part.

Yes => I encode body in Base64 and send it as an attachment (second
one since it may already contain one) using "Content-Transfer-Encoding:
base64" and "Content-Type: text/plain; name=notif.txt", or "Content-
Type: test/html; name=notif.htm" with indication of charset as UTF-8 in
an html file containing the minimum required tags and notification
content.

2b - Or, in replacement of 2a, I build an HTML email indicating UTF-8
charset, hoping the most used client will be able to read HTML and user
didn't unckeked this option in their client email.

What do you think about this solution ? Do you have another idea ? Do
you see something wrong there ?
 
D

Dr.Ruud

Yohan N. Leder schreef:
Ruud:

Yes ! Edited under Komodo only.
OK.



Found here <http://users.tkk.fi/~jhi/perlunitut.pod>. Sounds like the
same as perluniintro (from memory).

That is indeed perluniintro. I meant a new one, by Juerd.



Check the "Content" header fields too, you'll need ones like:
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: 8bit
for the body to be recognized as encoded in UTF-8.


A compliant Subject-header can only have 7-bit characters, so you need
to do

use utf8 ;
...
my $subject = encode( 'MIME_Header', 'Subject: boîte' ) ;


Test:

perl -MEncode -le 'print "Subject: " . encode("MIME-Header",
"bo\x{00EE}te")'

prints: Subject: =?UTF-8?B?Ym/DrnRl?=
 
D

Dr.Ruud

Dr.Ruud schreef:
my $subject = encode( 'MIME_Header', 'Subject: boîte' ) ;

Better:

my $subject = 'Subject: ' . encode( 'MIME_Header', 'boîte' ) ;

because the single space after the colon after the header field name is
expected by some systems.
 
A

Andrzej Adam Filip

Yohan N. Leder said:
I would like to send UTF-8 material and binary attachment in an email.
So, for this, I've built a multipart/mixed MIME message : no problem
about this.

But, my problem is about UTF-8 encoding.

The two parts which can contains no-ascii characters using UTF-8
encoding are : the subject and the body. So, I've first tried to encode
the subject line as indicated at http://www.ietf.org/rfc/rfc2047.txt. It
gives something like this :

use utf8;
use open ':utf8';
use Encode;
my $subject = encode('MIME-B', "boîte");
[...]

Why do you use binary encoding instead of quoted-printable? [MIME-Q]
Quoted-printable is the best choice for encoding of "mostly" ascii text.
 
Y

Yohan N. Leder

Why do you use binary encoding instead of quoted-printable? [MIME-Q]
Quoted-printable is the best choice for encoding of "mostly" ascii text.

I've tried both Q? and B? and both been ignored by Eudora at the
arrival.
 
Y

Yohan N. Leder

Check the "Content" header fields too, you'll need ones like:
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: 8bit
for the body to be recognized as encoded in UTF-8.

Already done, but now we know the responsible is Eudora which ignore the
charset.
A compliant Subject-header can only have 7-bit characters, so you need
to do
use utf8 ;
my $subject = encode( 'MIME_Header', 'Subject: boîte' ) ;
Test:
perl -MEncode -le 'print "Subject: " . encode("MIME-Header", "bo\x{00EE}te")'
prints: Subject: =?UTF-8?B?Ym/DrnRl?=

Agreed, but sure now Eudora (as some others clients emails) is the only
responsible.

So, I've decided to check if convertible to ISO-8859-1 using encode()
with CHECK in eval{}, then do it if possible or take an alternative way
if not (like : sending text as attachment too, or build an HTML oriented
email ; not choosen at this time)
 
D

Dr.Ruud

Yohan N. Leder schreef:
Ruud:

Already done, but now we know the responsible is Eudora which ignore
the charset.

Some mailers assume that the charset for the body can also be used for
the Subject, certainly when there are characters >127 in the Subject.
But that is not compliant.

About Eudora, see also:
http://www.cit.cornell.edu/computer/email/thunderbird/overview.html
http://windharp.de/software/utf8iso.htm

Agreed, but sure now Eudora (as some others clients emails) is the
only responsible.

Eudora is dead.

Not all Eudora's were the same:
http://www.bd8.com/eudora/multilingual/

So, I've decided to check if convertible to ISO-8859-1 using encode()
with CHECK in eval{}, then do it if possible or take an alternative
way if not (like : sending text as attachment too, or build an HTML
oriented email ; not choosen at this time)


Yes, that is the nice way.
 
Y

Yohan N. Leder


Yes, thanks, downloaded and installed yesterday, but it doesn't does the
job in every case :

- it works partially for the body : well take care of the "Content-Type:
text/plain; charset=utf-8" statement, but "Content-Transfer-Encoding:
8bit" appear in body as if it was simple text rather than a statement.

- it doesn't do anything for the word-encoded subject in some cases (not
the context in mind).

Well, however, it seems there ware other client around which ignore UTF-
8 : so, no choice !
Eudora is dead.
Not all Eudora's were the same:
http://www.bd8.com/eudora/multilingual/

Hoping not. It should be not so hard to add UTF-8 support in all next
versions, whatever be the localization. Or, maybe, in the original
English one for starting. Or, why not, as first step, that Qualcomm and
assimilated integrate the utf8iso plugin in the standard package, then
improve it.
Yes, that is the nice way.

Thaks, it's in progress...
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,579
Members
45,053
Latest member
BrodieSola

Latest Threads

Top