Internationalised email subjects

bugmagnet · Jun 20, 2007

I am writing a simple email program in Python that will send out
emails containing Chinese characters in the message subject and body.
I am not having any trouble getting the email body displayed correctly
in Chinese inside the email client, however the email subject and
sender name (which are also in Chinese) are garbled and are not
displayed correctly in the email client.

Here is the code snippet:

writer = MimeWriter.MimeWriter(out)
headers = {"From": senderName + ' <' + senderName + '>', "To":
recipientEmail, "Reply-to": senderEmail}

writer.addheader("Subject", subject)
writer.addheader("MIME-Version", "1.0")
writer.addheader('From', headers['From'])
writer.addheader('To', headers['To'])
writer.addheader('Reply-to', headers['Reply-to'])

I'm quite new to Python (and programming in general) and am having a
hard time wrapping my head around the internationalization functions
of Python, so was hoping someone could point me in the right
direction. Is there a different method I need to use in order for
the sender name and subject to be displayed correctly? Is there an
extra step I am missing? Some sample code would be very helpful.

Thanks!

Martin Skou · Jun 20, 2007

From:
http://docs.python.org/lib/module-email.header.html

>>> from email.message import Message
>>> from email.header import Header
>>> msg = Message()
>>> h = Header('p\xf6stal', 'iso-8859-1')
>>> msg['Subject'] = h
>>> print msg.as_string()

Click to expand...

Click to expand...

Subject: =?iso-8859-1?q?p=F6stal?=

/Martin

bugmagnet · Jun 20, 2007

Thanks Martin, I actually have read that page before. The part that
confuses me is the line:

h = Header('p\xf6stal', 'iso-8859-1')

I have tried using:

h = Header(' ', 'GB2312')

but when I run the code, I get the following error:

UnicodeDecodeError: 'gb2312' codec can't decode bytes in position 2-3:
illegal multibyte sequence

Is there something I need to do in order to encode the Chinese
characters into the GB2312 character set?

bugmagnet · Jun 21, 2007

Seems some characters are missing from my last post. The line that
says:

h = Header(' ', 'GB2312')

should say:

h = Header(' ', 'GB2312')

bugmagnet · Jun 21, 2007

That's really strange. The chinese characters I am inputing into the
post are not being displayed. Basically, what I am doing is this:

h = Header('(Some Chinese characters inserted here', 'GB2312')

And when I run this code, I receive the following error message:

UnicodeDecodeError: 'gb2312' codec can't decode bytes in position 2-3:
illegal multibyte sequence

Any idea what I may be doing wrong? How do I convert Chinese
characters into something like p\xf6stal in the original code posted
by Martin? Can someone point me in the right direction? I'm not even
sure what class/method to look into for this.

Gabriel Genellina · Jun 21, 2007

En Thu said:
That's really strange. The chinese characters I am inputing into the
post are not being displayed. Basically, what I am doing is this:

h = Header('(Some Chinese characters inserted here', 'GB2312')

And when I run this code, I receive the following error message:

UnicodeDecodeError: 'gb2312' codec can't decode bytes in position 2-3:
illegal multibyte sequence

If you execute: print "some chinese characters", do you get the right
results?
Are you sure your system is using gb2312? In case you don't know and don't
trust autodetection, try something like this:

py> from unicodedata import *
py> name("á".decode("latin-1"))
'NO-BREAK SPACE'
py> name("á".decode("cp850"))
'LATIN SMALL LETTER A WITH ACUTE'

The first attempt shows the wrong name, so my console *cannot* be using
latin-1. With cp850 I got the right results, so it *might* be cp850 (it
may also be another encoding that happens to match this single character).
Further tests may reveal that it is actually cp850.
You should try with "some chinese characters" and see if your encoding is
actually gb2312.

Evan Klitzke · Jun 21, 2007

That's really strange. The chinese characters I am inputing into the
post are not being displayed. Basically, what I am doing is this:

You're not sending your email in UTF-8 (or another encoding that would
permit Chinese characters). Your email header shows:

Content-Type: text/plain; charset="us-ascii"

You probably need to reconfigure your mail client to send Chinese characters.

Ben Finney · Jun 22, 2007

Seems some characters are missing from my last post. The line that
says:

h = Header(' ', 'GB2312')

should say:

h = Header(' ', 'GB2312')

Your message has this field in the header:

Content-Type: text/plain; charset="us-ascii"

which is why the non-ASCII characters don't appear. This is the fault
of Google's charset munging.

Please, people who use Google for mail and Usenet, kick them until
they present "utf-8" as the default encoding, instead of downgrading
to "us-ascii".

Evan Klitzke · Jun 22, 2007

Your message has this field in the header:

Content-Type: text/plain; charset="us-ascii"

which is why the non-ASCII characters don't appear. This is the fault
of Google's charset munging.

Please, people who use Google for mail and Usenet, kick them until
they present "utf-8" as the default encoding, instead of downgrading
to "us-ascii".

Ironically, you're sending out us-ascii encoded emails as well. Like
it or not, 7-bit ASCII is the standard for SMTP, so it's a reasonable
default character encoding to send MIME encoded messages in -- and
it's trivial to change the outgoing character set to UTF-8 in
Gmail/Google Apps.

Ben Finney · Jun 22, 2007

Evan Klitzke said:
Ironically, you're sending out us-ascii encoded emails as well.

Yes, because I was (a) replying to a message already in that encoding,
and (b) that encoding was sufficient to encode all the characters in
my message.

Where the original poster's message says that he posted a message with
Chinese characters, and the message was munged by Google to the
"us-ascii" charset.

=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?= · Jun 22, 2007

That's really strange. The chinese characters I am inputing into the
post are not being displayed. Basically, what I am doing is this:

h = Header('(Some Chinese characters inserted here', 'GB2312')

What encoding do "Some Chinese characters" have at that point?

1. Don't try this at the interactive prompt. It will completely confuse
you. Instead, use IDLE.
2. In IDLE, put
# -*- coding: utf-8 -*-
into the top of the source code file.
3. Write the header as a Unicode string, i.e. with a u prefix
4. Explicitly encode it, such as

h = Header(u'(Some Chinese characters inserted here'.encode('GB2312'),
'GB2312')

If you are *not* inserting the characters from the Python source
code directly, go back to my original question: What are the
characters encoded in?

HTH,
Martin

bugmagnet · Jun 22, 2007

Thanks Martin,

The "Some Chinese characters" are loaded from a MySQL table and are
encoded in GB2312 format.

I've added the following line at the top of the code:

# -*- coding: GB2312 -*-

I've also added the following line into the code:

h = Header(subject.encode('GB2312'), 'GB2312')

Note that the 'subject' variable consists of GB2312 encoded text, so I
am not sure if it is necessary to call the subject.encode('GB2312')
method. When I try to execute this code, I get the following error:

File "/home/web88/html/app/test.py", line 17,
in Header(subject.encode('GB2312'), 'GB2312')
LookupError: unknown encoding: GB2312

Any idea what may be wrong?

bugmagnet · Jun 22, 2007

Thanks Richie,

I've tried removing the encode('GB2312') line, so the code looks like
this:

h = Header(subject, 'GB2312')

However, this line still causes the following error message:

Traceback (most recent call last):
File "/home/web88/html/app/sendmail.py", line 314, in
h = Header(subject, 'GB2312')
File "/usr/lib/python2.2/email/Header.py", line 188, in __init__
self.append(s, charset, errors)
File "/usr/lib/python2.2/email/Header.py", line 272, in append
ustr = unicode(s, incodec, errors)
LookupError: unknown encoding: gb2312 )

Any ideas?

Gabriel Genellina · Jun 23, 2007

En Fri said:
I've tried removing the encode('GB2312') line, so the code looks like
this:

h = Header(subject, 'GB2312')

However, this line still causes the following error message:

Traceback (most recent call last):
File "/home/web88/html/app/sendmail.py", line 314, in
h = Header(subject, 'GB2312')
File "/usr/lib/python2.2/email/Header.py", line 188, in __init__
self.append(s, charset, errors)
File "/usr/lib/python2.2/email/Header.py", line 272, in append
ustr = unicode(s, incodec, errors)
LookupError: unknown encoding: gb2312 )

It appears that you don't have the gb2312 codec - maybe it was not
available with your rather old Python version (2.2). Upgrading to a newer
version may help.

bugmagnet · Jun 25, 2007

I'm an idiot! Gabriel, you're right! Turns out the ISP was running
Python 2.3, which has known issues with the GB2312 codec. They've
upgraded to 2.4 and now everything runs smoothly!

Script to send email not working	1	Apr 10, 2023
Parsing Email Headers	6	Mar 11, 2010
Can't Get Email Interface Working	3	Apr 7, 2007
JavaScript Challenge: Validating Email Addresses	1	Oct 6, 2023
Need an if statement	8	Jun 13, 2023
Trouble with UnicodeEncodeError and email	0	Jan 8, 2014
HOWTO: Parsing email using Python part1	2	Jul 3, 2011
How to send email programmatically from a gmail email a/c when port587(smtp) is blocked	5	Sep 11, 2012

Internationalised email subjects

bugmagnet

Martin Skou

bugmagnet

bugmagnet

bugmagnet

Gabriel Genellina

Evan Klitzke

Ben Finney

Evan Klitzke

Ben Finney

=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=

bugmagnet

bugmagnet

Gabriel Genellina

bugmagnet

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads