Internationalised email subjects

B

bugmagnet

I am writing a simple email program in Python that will send out
emails containing Chinese characters in the message subject and body.
I am not having any trouble getting the email body displayed correctly
in Chinese inside the email client, however the email subject and
sender name (which are also in Chinese) are garbled and are not
displayed correctly in the email client.

Here is the code snippet:

writer = MimeWriter.MimeWriter(out)
headers = {"From": senderName + ' <' + senderName + '>', "To":
recipientEmail, "Reply-to": senderEmail}

writer.addheader("Subject", subject)
writer.addheader("MIME-Version", "1.0")
writer.addheader('From', headers['From'])
writer.addheader('To', headers['To'])
writer.addheader('Reply-to', headers['Reply-to'])

I'm quite new to Python (and programming in general) and am having a
hard time wrapping my head around the internationalization functions
of Python, so was hoping someone could point me in the right
direction. Is there a different method I need to use in order for
the sender name and subject to be displayed correctly? Is there an
extra step I am missing? Some sample code would be very helpful.

Thanks!
 
B

bugmagnet

Thanks Martin, I actually have read that page before. The part that
confuses me is the line:

h = Header('p\xf6stal', 'iso-8859-1')

I have tried using:

h = Header(' ', 'GB2312')

but when I run the code, I get the following error:

UnicodeDecodeError: 'gb2312' codec can't decode bytes in position 2-3:
illegal multibyte sequence

Is there something I need to do in order to encode the Chinese
characters into the GB2312 character set?
 
B

bugmagnet

Seems some characters are missing from my last post. The line that
says:

h = Header(' ', 'GB2312')

should say:

h = Header(' ', 'GB2312')
 
B

bugmagnet

That's really strange. The chinese characters I am inputing into the
post are not being displayed. Basically, what I am doing is this:

h = Header('(Some Chinese characters inserted here', 'GB2312')

And when I run this code, I receive the following error message:

UnicodeDecodeError: 'gb2312' codec can't decode bytes in position 2-3:
illegal multibyte sequence

Any idea what I may be doing wrong? How do I convert Chinese
characters into something like p\xf6stal in the original code posted
by Martin? Can someone point me in the right direction? I'm not even
sure what class/method to look into for this.
 
G

Gabriel Genellina

That's really strange. The chinese characters I am inputing into the
post are not being displayed. Basically, what I am doing is this:

h = Header('(Some Chinese characters inserted here', 'GB2312')

And when I run this code, I receive the following error message:

UnicodeDecodeError: 'gb2312' codec can't decode bytes in position 2-3:
illegal multibyte sequence

If you execute: print "some chinese characters", do you get the right
results?
Are you sure your system is using gb2312? In case you don't know and don't
trust autodetection, try something like this:

py> from unicodedata import *
py> name("á".decode("latin-1"))
'NO-BREAK SPACE'
py> name("á".decode("cp850"))
'LATIN SMALL LETTER A WITH ACUTE'

The first attempt shows the wrong name, so my console *cannot* be using
latin-1. With cp850 I got the right results, so it *might* be cp850 (it
may also be another encoding that happens to match this single character).
Further tests may reveal that it is actually cp850.
You should try with "some chinese characters" and see if your encoding is
actually gb2312.
 
E

Evan Klitzke

That's really strange. The chinese characters I am inputing into the
post are not being displayed. Basically, what I am doing is this:

You're not sending your email in UTF-8 (or another encoding that would
permit Chinese characters). Your email header shows:

Content-Type: text/plain; charset="us-ascii"

You probably need to reconfigure your mail client to send Chinese characters.
 
B

Ben Finney

Seems some characters are missing from my last post. The line that
says:

h = Header(' ', 'GB2312')

should say:

h = Header(' ', 'GB2312')

Your message has this field in the header:

Content-Type: text/plain; charset="us-ascii"

which is why the non-ASCII characters don't appear. This is the fault
of Google's charset munging.

Please, people who use Google for mail and Usenet, kick them until
they present "utf-8" as the default encoding, instead of downgrading
to "us-ascii".
 
E

Evan Klitzke

Your message has this field in the header:

Content-Type: text/plain; charset="us-ascii"

which is why the non-ASCII characters don't appear. This is the fault
of Google's charset munging.

Please, people who use Google for mail and Usenet, kick them until
they present "utf-8" as the default encoding, instead of downgrading
to "us-ascii".

Ironically, you're sending out us-ascii encoded emails as well. Like
it or not, 7-bit ASCII is the standard for SMTP, so it's a reasonable
default character encoding to send MIME encoded messages in -- and
it's trivial to change the outgoing character set to UTF-8 in
Gmail/Google Apps.
 
B

Ben Finney

Evan Klitzke said:
Ironically, you're sending out us-ascii encoded emails as well.

Yes, because I was (a) replying to a message already in that encoding,
and (b) that encoding was sufficient to encode all the characters in
my message.

Where the original poster's message says that he posted a message with
Chinese characters, and the message was munged by Google to the
"us-ascii" charset.
 
?

=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=

That's really strange. The chinese characters I am inputing into the
post are not being displayed. Basically, what I am doing is this:

h = Header('(Some Chinese characters inserted here', 'GB2312')

What encoding do "Some Chinese characters" have at that point?

1. Don't try this at the interactive prompt. It will completely confuse
you. Instead, use IDLE.
2. In IDLE, put
# -*- coding: utf-8 -*-
into the top of the source code file.
3. Write the header as a Unicode string, i.e. with a u prefix
4. Explicitly encode it, such as

h = Header(u'(Some Chinese characters inserted here'.encode('GB2312'),
'GB2312')

If you are *not* inserting the characters from the Python source
code directly, go back to my original question: What are the
characters encoded in?

HTH,
Martin
 
B

bugmagnet

Thanks Martin,

The "Some Chinese characters" are loaded from a MySQL table and are
encoded in GB2312 format.

I've added the following line at the top of the code:

# -*- coding: GB2312 -*-

I've also added the following line into the code:

h = Header(subject.encode('GB2312'), 'GB2312')

Note that the 'subject' variable consists of GB2312 encoded text, so I
am not sure if it is necessary to call the subject.encode('GB2312')
method. When I try to execute this code, I get the following error:

File "/home/web88/html/app/test.py", line 17,
in Header(subject.encode('GB2312'), 'GB2312')
LookupError: unknown encoding: GB2312

Any idea what may be wrong?
 
B

bugmagnet

Thanks Richie,

I've tried removing the encode('GB2312') line, so the code looks like
this:

h = Header(subject, 'GB2312')

However, this line still causes the following error message:

Traceback (most recent call last):
File "/home/web88/html/app/sendmail.py", line 314, in
h = Header(subject, 'GB2312')
File "/usr/lib/python2.2/email/Header.py", line 188, in __init__
self.append(s, charset, errors)
File "/usr/lib/python2.2/email/Header.py", line 272, in append
ustr = unicode(s, incodec, errors)
LookupError: unknown encoding: gb2312 )

Any ideas?
 
G

Gabriel Genellina

I've tried removing the encode('GB2312') line, so the code looks like
this:

h = Header(subject, 'GB2312')

However, this line still causes the following error message:

Traceback (most recent call last):
File "/home/web88/html/app/sendmail.py", line 314, in
h = Header(subject, 'GB2312')
File "/usr/lib/python2.2/email/Header.py", line 188, in __init__
self.append(s, charset, errors)
File "/usr/lib/python2.2/email/Header.py", line 272, in append
ustr = unicode(s, incodec, errors)
LookupError: unknown encoding: gb2312 )

It appears that you don't have the gb2312 codec - maybe it was not
available with your rather old Python version (2.2). Upgrading to a newer
version may help.
 
B

bugmagnet

I'm an idiot! Gabriel, you're right! Turns out the ISP was running
Python 2.3, which has known issues with the GB2312 codec. They've
upgraded to 2.4 and now everything runs smoothly!
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,754
Messages
2,569,528
Members
45,000
Latest member
MurrayKeync

Latest Threads

Top