MIME encoding change in Python 2.4.3 (or 2.4.2? 2.4.1?) - problemand solution



I have an application that processes MIME messages. It reads a message from a file,
looks for a text/html and text/plain parts in it, performs some processing on these
parts, and outputs the new message.

Ever since I recently upgraded my Python to 2.4.3, the output messages started to
come out garbled, as a block of junk characters.

I traced the problem back to a few lines that were removed from the email package:
The new Python no longer encodes the payload when converting the MIME message to a

Since my program must work on several computers, each having a different version of
Python, I had to find a way to make it work correctly no matter if msg.as_string()
encodes the payload or not.

Here is a piece of code that demonstrates how to work around this problem:

................... code start ................
import email
import email.MIMEText
import email.Charset

def do_some_processing(s):
"""Return the input text or HTML string after processing it in some way."""
# For the sake of this example, we only do some trivial processing.
return s.replace('foo','bar')

msg = email.message_from_string(file('input_mime_msg','r').read())
utf8 = email.Charset.Charset('UTF-8')
for part in msg.walk():
if part.is_multipart():
if part.get_content_type() in ('text/plain','text/html'):
s = part.get_payload(None, True) # True means decode the payload, which is normally base64-encoded.
# s is now a sting containing just the text or html of the part, not encoded in any way.

s = do_some_processing(s)

# Starting with Python 2.4.3 or so, msg.as_string() no longer encodes the payload
# according to the charset, so we have to do it ourselves here.
# The trick is to create a message-part with 'x' as payload and see if it got
# encoded or not.
should_encode = (email.MIMEText.MIMEText('x', 'html', 'UTF-8').get_payload() != 'x')
if should_encode:
s = utf8.body_encode(s)

part.set_payload(s, utf8)
# The next two lines may be necessary if the original input message uses a different encoding
# encoding than the one used in the email package. In that case we have to replace the
# Content-Transfer-Encoding header to indicate the new encoding.
del part['Content-Transfer-Encoding']
part['Content-Transfer-Encoding'] = utf8.get_body_encoding()

................... code end ................

Hope this helps someone out there.
(Permission is hereby granted for anybody to use this piece of code for any purpose whatsoever)


Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question