Problems with email.Generator.Generator

C

Chris Withers

Hi All,

The following piece of code is giving me issues:

from email.Charset import Charset,QP
from email.MIMEText import MIMEText
charset = Charset('utf-8')
charset.body_encoding = QP
msg = MIMEText(
u'Some text with chars that need encoding: \xa3',
'plain',
)
msg.set_charset(charset)
print msg.as_string()

Under Python 2.4.2, this produces the following output, as I'd expect:

MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Content-Type: text/plain; charset="utf-8"

Some text with chars that need encoding: =A3

However, under Python 2.4.3, I now get:

Traceback (most recent call last):
File "test_encoding.py", line 14, in ?
msg.as_string()
File "c:\python24\lib\email\Message.py", line 129,
in
as_string
g.flatten(self, unixfrom=unixfrom)
File "c:\python24\lib\email\Generator.py", line 82,
in flatten
self._write(msg)
File "c:\python24\lib\email\Generator.py", line 113,
in _write
self._dispatch(msg)
File "c:\python24\lib\email\Generator.py", line 139,
in
_dispatch
meth(msg)
File "c:\python24\lib\email\Generator.py", line 182,
in
_handle_text
self._fp.write(payload)
UnicodeEncodeError: 'ascii' codec can't encode
character
u'\xa3' in position 41:
ordinal not in range(128)

This seems to be as a result of this change:

http://svn.python.org/view/python/b...mail/Generator.py?rev=42272&r1=37910&r2=42272

....which is referred to as part of a fix for this bug:

http://sourceforge.net/tracker/?func=detail&aid=1409455&group_id=5470&atid=105470

Now, is this change to Generator.py in error or am I doing something wrong?

If the latter, how can I change my code such that it works as I'd expect?

cheers,

Chris
 
M

Manlio Perillo

Chris Withers ha scritto:
Hi All,

The following piece of code is giving me issues:

from email.Charset import Charset,QP
from email.MIMEText import MIMEText
charset = Charset('utf-8')
charset.body_encoding = QP
msg = MIMEText(
u'Some text with chars that need encoding: \xa3',
'plain',
)
msg.set_charset(charset)
print msg.as_string()

Under Python 2.4.2, this produces the following output, as I'd expect:
[...]
However, under Python 2.4.3, I now get:

Try with:

msg = MIMEText(
u'Some text with chars that need encoding: \xa3',
_charset='utf-8',
)


and you will obtain the error:


Traceback (most recent call last):
File "<pyshell#4>", line 3, in -toplevel-
_charset='utf-8',
File "C:\Python2.4\lib\email\MIMEText.py", line 28, in __init__
self.set_payload(_text, _charset)
File "C:\Python2.4\lib\email\Message.py", line 218, in set_payload
self.set_charset(charset)
File "C:\Python2.4\lib\email\Message.py", line 260, in set_charset
self._payload = charset.body_encode(self._payload)
File "C:\Python2.4\lib\email\Charset.py", line 366, in body_encode
return email.base64MIME.body_encode(s)
File "C:\Python2.4\lib\email\base64MIME.py", line 136, in encode
enc = b2a_base64(s[i:i + max_unencoded])
UnicodeEncodeError: 'ascii' codec can't encode character u'\xa3' in
position 41: ordinal not in range(128)




Regards Manlio Perillo
 
C

Chris Withers

Manlio said:
Try with:

msg = MIMEText(
u'Some text with chars that need encoding: \xa3',
_charset='utf-8',
)


and you will obtain the error:

Traceback (most recent call last):
File "<pyshell#4>", line 3, in -toplevel-
_charset='utf-8',
File "C:\Python2.4\lib\email\MIMEText.py", line 28, in __init__
self.set_payload(_text, _charset)
File "C:\Python2.4\lib\email\Message.py", line 218, in set_payload
self.set_charset(charset)
File "C:\Python2.4\lib\email\Message.py", line 260, in set_charset
self._payload = charset.body_encode(self._payload)
File "C:\Python2.4\lib\email\Charset.py", line 366, in body_encode
return email.base64MIME.body_encode(s)
File "C:\Python2.4\lib\email\base64MIME.py", line 136, in encode
enc = b2a_base64(s[i:i + max_unencoded])
UnicodeEncodeError: 'ascii' codec can't encode character u'\xa3' in
position 41: ordinal not in range(128)

OK, but I fail to see how replacing one unicode error with another is
any help... :-S

Chris
 
P

Peter Otten

Chris said:
The following piece of code is giving me issues:

from email.Charset import Charset,QP
from email.MIMEText import MIMEText
charset = Charset('utf-8')
charset.body_encoding = QP
msg = MIMEText(
u'Some text with chars that need encoding: \xa3',
'plain',
)
msg.set_charset(charset)
print msg.as_string()

Under Python 2.4.2, this produces the following output, as I'd expect:

MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Content-Type: text/plain; charset="utf-8"

Some text with chars that need encoding: =A3

However, under Python 2.4.3, I now get:

Traceback (most recent call last):
File "test_encoding.py", line 14, in ?
msg.as_string()
File "c:\python24\lib\email\Message.py", line 129,
in
as_string
g.flatten(self, unixfrom=unixfrom)
File "c:\python24\lib\email\Generator.py", line 82,
in flatten
self._write(msg)
File "c:\python24\lib\email\Generator.py", line 113,
in _write
self._dispatch(msg)
File "c:\python24\lib\email\Generator.py", line 139,
in
_dispatch
meth(msg)
File "c:\python24\lib\email\Generator.py", line 182,
in
_handle_text
self._fp.write(payload)
UnicodeEncodeError: 'ascii' codec can't encode
character
u'\xa3' in position 41:
ordinal not in range(128)

This seems to be as a result of this change:

http://svn.python.org/view/python/b...mail/Generator.py?rev=42272&r1=37910&r2=42272

...which is referred to as part of a fix for this bug:

http://sourceforge.net/tracker/?func=detail&aid=1409455&group_id=5470&atid=105470

Now, is this change to Generator.py in error or am I doing something
wrong?

I'm not familiar enough with the email package to answer that.
If the latter, how can I change my code such that it works as I'd expect?

email.Generator and email.Message use cStringIO.StringIO internally, which
can't cope with unicode. A quick fix might be to monkey-patch:

from StringIO import StringIO
from email import Generator, Message
Generator.StringIO = Message.StringIO = StringIO
# your code here

Peter
 
C

Chris Withers

Peter said:
http://sourceforge.net/tracker/?func=detail&aid=1409455&group_id=5470&atid=105470

I'm not familiar enough with the email package to answer that.

I'm hoping someone around here is ;-)
email.Generator and email.Message use cStringIO.StringIO internally, which
can't cope with unicode. A quick fix might be to monkey-patch:

I'm not sure that's correct, but I'm happy to stand corrected.

My understanding is that the StringIO's don't mind as long as they type
is consistent - ie: con't mix unicode and encoded strings, 'cos it
forced python's default ascii codec to kick in and spew unicode errors.

Now, I want to know what I'm supposed to do when I have unicode source
and want it to end up as either a text/plain or text/html mime part.

Is there a how-to for this anywhere? The email package's docs are short
on examples involving charsets, unicode and the like :-(

Chris
 
S

Steve Holden

Chris said:
I'm hoping someone around here is ;-)




I'm not sure that's correct, but I'm happy to stand corrected.

My understanding is that the StringIO's don't mind as long as they type
is consistent - ie: con't mix unicode and encoded strings, 'cos it
forced python's default ascii codec to kick in and spew unicode errors.

Now, I want to know what I'm supposed to do when I have unicode source
and want it to end up as either a text/plain or text/html mime part.

Is there a how-to for this anywhere? The email package's docs are short
on examples involving charsets, unicode and the like :-(
Well, it would seem like the easiest approach is to monkey-patch the use
of cStringIO to StringIO as recommended and see if that fixes your
problem. Wouldn't it?

regards
Steve
 
G

Gerard Flanagan

Chris said:
Now, I want to know what I'm supposed to do when I have unicode source
and want it to end up as either a text/plain or text/html mime part.

Is there a how-to for this anywhere? The email package's docs are short
on examples involving charsets, unicode and the like :-(

no expert in this, but have you tried the codecs module?

http://docs.python.org/lib/codec-objects.html

( with 'xmlcharrefreplace' for the html )?

Gerard
 
M

Manlio Perillo

Chris Withers ha scritto:
[...]

OK, but I fail to see how replacing one unicode error with another is
any help... :-S


The problem is simple: email package does not support well Unicode strings.

For now I'm using this:

charset = "utf-8" # the charset to be used for email


class HeadersMixin(object):
"""A custom mixin, for automatic internationalized headers
support.
"""

def __setitem__(self, name, val, **_params):
if isinstance(val, str):
try:
# only 7 bit ascii
val.decode("us-ascii")
except UnicodeDecodeError:
raise ValueError("8 bit strings not accepted")

return self.add_header(name, val)
else:
try:
# to avoid unnecessary trash
val = val.encode('us-ascii')
except:
val = Header.Header(val, charset).encode()

return self.add_header(name, val)


class MIMEText(HeadersMixin, _MIMEText.MIMEText):
"""A MIME Text message that allows only Unicode strings, or plain
ascii (7 bit) ones.
"""

def __init__(self, _text, _subtype="plain"):
_charset = charset

if isinstance(_text, str):
try:
# only 7 bit ascii
_text.decode("us-ascii")
_charset = "us-ascii"
except UnicodeDecodeError:
raise ValueError("8 bit strings not accepted")
else:
_text = _text.encode(charset)

return _MIMEText.MIMEText.__init__(self, _text, _subtype, _charset)


class MIMEMultipart(HeadersMixin, _MIMEMultipart.MIMEMultipart):
def __init__(self):
_MIMEMultipart.MIMEMultipart.__init__(self)



This only accepts Unicode strings or plain ascii strings.




Regards Manlio Perillo
 
C

Chris Withers

Steve said:
Well, it would seem like the easiest approach is to monkey-patch the use
of cStringIO to StringIO as recommended and see if that fixes your
problem. Wouldn't it?

No, not really, since at best that's a nasty (and I meant really nasty)
hack. I'm using the email package as part of a library that I'm building
which is to be used with various frameworks. Monkey patching modules is
about as bad as it gets in that situation...

At worst, and most likely based on my past experience of (c)StringIO
being used to accumulate output, it won't make a jot of difference...

Chris
 
C

Chris Withers

Manlio said:
The problem is simple: email package does not support well Unicode strings.

Really? All the character set support seems to indicate a fair bit of
thought went into this aspect, although it does appear that no-one
bothered to document it :-(

Chris
 
P

Peter Otten

Chris said:
At worst, and most likely based on my past experience of (c)StringIO
being used to accumulate output, it won't make a jot of difference...

What past experience?
Traceback (most recent call last):
File "<stdin>", line 1, in ?
UnicodeEncodeError: 'ascii' codec can't encode character u'\x80' in position
0: ordinal not in range(128)

Peter
 
S

Steve Holden

Chris said:
No, not really, since at best that's a nasty (and I meant really nasty)
hack. I'm using the email package as part of a library that I'm building
which is to be used with various frameworks. Monkey patching modules is
about as bad as it gets in that situation...

At worst, and most likely based on my past experience of (c)StringIO
being used to accumulate output, it won't make a jot of difference...
Under those circumstances you probably know best ...

regards
Steve
 
C

Chris Withers

Peter said:
What past experience?

Traceback (most recent call last):
File "<stdin>", line 1, in ?
UnicodeEncodeError: 'ascii' codec can't encode character u'\x80' in position
0: ordinal not in range(128)

OK, I stand corrected, although I suspect the bug is actually in
StringIO.StringIO in that it doesn't barf on unicodes.

(Python 3000 and all that)

Which again leads us back to the email package: it used to do the right
thing from what I can see, and now it doesn't, and ends up trying to
write a unicode to a cStringIO, which (rightly, I guess) barfs...

Barry, Barry, where are you? ;-)

Chris
 
C

Chris Withers

Peter said:
What past experience?

Traceback (most recent call last):
File "<stdin>", line 1, in ?
UnicodeEncodeError: 'ascii' codec can't encode character u'\x80' in position
0: ordinal not in range(128)

Okay, more out of desperation than anything else, lets try this:

from email.Charset import Charset,QP
from email.MIMEText import MIMEText
from StringIO import StringIO
from email import Generator,Message
Generator.StringIO = Message.StringIO = StringIO
charset = Charset('utf-8')
charset.body_encoding = QP
msg = MIMEText(u'Some text with chars that need encoding: \xa3','plain')
msg.set_charset(charset)
print repr(msg.as_string())
u'MIME-Version: 1.0\nContent-Transfer-Encoding: 8bit\nContent-Type:
text/plain; charset="utf-8"\n\nSome text with chars that need encoding:
\xa3'

Yay! No unicode error, but also no use:

File "c:\python24\lib\smtplib.py", line 692, in sendmail
(code,resp) = self.data(msg)
File "c:\python24\lib\smtplib.py", line 489, in data
self.send(q)
File "c:\python24\lib\smtplib.py", line 316, in send
self.sock.sendall(str)
File "<string>", line 1, in sendall
UnicodeEncodeError: 'ascii' codec can't encode character u'\xa3' in
position 297: ordinal not in range(128)

The other variant I've tried is:

from email.Charset import Charset,QP
from email.MIMEText import MIMEText
charset = Charset('utf-8')
charset.body_encoding = QP
msg = MIMEText('','plain',)
msg.set_charset(charset)
msg.set_payload(charset.body_encode(u'Some text with chars that need
encoding: \xa3'))
print msg.as_string()

Which is sort of okay:

MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Content-Type: text/plain; charset="utf-8"

Some text with chars that need encoding: =A3

....except it gets the transfer encoding wrong, which means Thunderbird
shows =A3 instead of the pound sign that it should :-(

....this is down to a pretty lame bit of code in Encoders.py which
basically checks for a unicode error *sigh*

Chris
 
C

Chris Withers

Chris said:
...except it gets the transfer encoding wrong, which means Thunderbird
shows =A3 instead of the pound sign that it should :-(

...this is down to a pretty lame bit of code in Encoders.py which
basically checks for a unicode error *sigh*

OK, slight progress... here a new version that actually works:

from email.Charset import Charset,QP
from email.MIMEText import MIMEText
charset = Charset('utf-8')
charset.body_encoding = QP
msg = MIMEText('','plain',None)
msg.set_payload(u'Some text with chars that need encoding:\xa3',charset)
print msg.as_string()

MIME-Version: 1.0
Content-Type: text/plain; charset; charset="utf-8"
Content-Transfer-Encoding: quoted-printable

Some text with chars that need encoding:=A3

Okay, so this actually does the right thing... wahey!

....but hold your horses, if Charset isn't set to quoted printable, then
you end up with problems:

charset = Charset('utf-8')
msg = MIMEText('','plain',None)
msg.set_payload(u'Some text with chars that need encoding:\xa3',charset)

Traceback (most recent call last):
File "C:\test_encoding.py", line 5, in ?
msg.set_payload(u'Some text with chars that need
encoding:\xa3',charset)
File "c:\python24\lib\email\Message.py", line 218, in set_payload
self.set_charset(charset)
File "c:\python24\lib\email\Message.py", line 260, in set_charset
self._payload = charset.body_encode(self._payload)
File "c:\python24\lib\email\Charset.py", line 366, in body_encode
return email.base64MIME.body_encode(s)
File "c:\python24\lib\email\base64MIME.py", line 136, in encode
enc = b2a_base64(s[i:i + max_unencoded])
UnicodeEncodeError: 'ascii' codec can't encode character u'\xa3' in
position 40: ordinal not in range(128)

Now what?

*sigh*

Chris
 
C

Chris Withers

Chris said:
print msg.as_string()

MIME-Version: 1.0
Content-Type: text/plain; charset; charset="utf-8"
^^^^^^^
Actually, even this isn't correct as you can see above...
charset = Charset('utf-8')
msg = MIMEText('','plain',None)
msg.set_payload(u'Some text with chars that need encoding:\xa3',charset)

Traceback (most recent call last):
File "C:\test_encoding.py", line 5, in ?
msg.set_payload(u'Some text with chars that need
encoding:\xa3',charset)
File "c:\python24\lib\email\Message.py", line 218, in set_payload
self.set_charset(charset)
File "c:\python24\lib\email\Message.py", line 260, in set_charset
self._payload = charset.body_encode(self._payload)
File "c:\python24\lib\email\Charset.py", line 366, in body_encode
return email.base64MIME.body_encode(s)
File "c:\python24\lib\email\base64MIME.py", line 136, in encode
enc = b2a_base64(s[i:i + max_unencoded])
UnicodeEncodeError: 'ascii' codec can't encode character u'\xa3' in
position 40: ordinal not in range(128)

....and I'm still left with this problem...

Has no-one ever successfully generated a correctly formatted email with
email.MIMEText where the message includes non-ascii characters?!

Chris
 
C

Chris Withers

Chris said:
Has no-one ever successfully generated a correctly formatted email with
email.MIMEText where the message includes non-ascii characters?!

I'm guessing not ;-)

Well, I think I have a winner, but it required me to subclass MIMEText:

from email.Charset import Charset,QP
from email.MIMEText import MIMEText as OriginalMIMEText
from email.MIMENonMultipart import MIMENonMultipart

class MIMEText(OriginalMIMEText):

def __init__(self, _text, _subtype='plain', _charset='us-ascii'):
if isinstance(_charset,Charset):
cs = _charset.input_charset
else:
cs = _charset
if isinstance(_text,unicode):
_text = _text.encode(charset.input_charset)
MIMENonMultipart.__init__(self, 'text', _subtype,
**{'charset': cs})
self.set_payload(_text, _charset)

charset = Charset('utf-8')
charset.body_encoding = QP
txt = u'Some text with chars that need encoding:\xa3'
msg = MIMEText(txt,'plain',charset)
print msg.as_string()

Which gives:

Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable

Some text with chars that need encoding:=C2=A3

It also works with non-QP charsets.

The reason the subclass is needed is because the
MIMNonMultipart.__init__ cannot handle a charset which isn't a simple
string. Since it's needed for that reason, it seems like the right place
to encode any incoming unicode.

So, by my count, there are two bugs:

1. email.MIMEText.MIMEText can't take a real Charset object to its
__init__ method.

2. email.Message.Message.set_payload has no clue about unicode.

Does that sounds fair? If so, should I open SF issues for them?

cheers,

Chris
 
M

Max M

Chris said:
^^^^^^^
Actually, even this isn't correct as you can see above...
Has no-one ever successfully generated a correctly formatted email with
email.MIMEText where the message includes non-ascii characters?!

What is the problem with encoding the message as utf-8 before setting
the payload? That has always worked for me.


pl = u'Some text with chars that need encoding:\xa3'.encode('utf-8')
msg.set_payload(pl ,charset)

From the docs:

"""
The payload is either a string in the case of simple message objects or
a list of Message objects for MIME container documents (e.g. multipart/*
and message/rfc822)
"""

Message objects are always encoded strings. I don't remember seeing that
it should be possible to use a unicode string as a message.

The charset passed in set_payload(pl ,charset) is the charset the the
string *is* encoded in. Not the charset it *should* be encoded in.

--

hilsen/regards Max M, Denmark

http://www.mxm.dk/
IT's Mad Science

Phone: +45 66 11 84 94
Mobile: +45 29 93 42 96
 
P

Peter Otten

Chris said:
Okay, more out of desperation than anything else, lets try this:

from email.Charset import Charset,QP
from email.MIMEText import MIMEText
from StringIO import StringIO
from email import Generator,Message
Generator.StringIO = Message.StringIO = StringIO
charset = Charset('utf-8')
charset.body_encoding = QP
msg = MIMEText(u'Some text with chars that need encoding: \xa3','plain')
msg.set_charset(charset)
print repr(msg.as_string())
u'MIME-Version: 1.0\nContent-Transfer-Encoding: 8bit\nContent-Type:
text/plain; charset="utf-8"\n\nSome text with chars that need encoding:
\xa3'

Yay! No unicode error, but also no use:

File "c:\python24\lib\smtplib.py", line 692, in sendmail
(code,resp) = self.data(msg)
File "c:\python24\lib\smtplib.py", line 489, in data
self.send(q)
File "c:\python24\lib\smtplib.py", line 316, in send
self.sock.sendall(str)
File "<string>", line 1, in sendall
UnicodeEncodeError: 'ascii' codec can't encode character u'\xa3' in
position 297: ordinal not in range(128)

Yes, it seemed to work with your original example, but of course you have to
encode unicode somehow before sending it through a wire. A severe case of
peephole debugging, sorry. I've looked into the email package source once
more, but I fear to understand the relevant parts you have to understand it
wholesale.

As Max suggested, your safest choice is probably passing in utf-8 instead of
unicode.

Peter
 
C

Chris Withers

Max said:
From the docs:

"""
The payload is either a string in the case of simple message objects or
a list of Message objects for MIME container documents (e.g. multipart/*
and message/rfc822)
"""

Where'd you find that? I must have missed it in my digging :-S
Message objects are always encoded strings. I don't remember seeing that
it should be possible to use a unicode string as a message.

Yes, I guess I just find that surprising in today's "everything should
be unicode" world.
The charset passed in set_payload(pl ,charset) is the charset the the
string *is* encoded in. Not the charset it *should* be encoded in.

Indeed, although there's still the bug that while set_payload can accept
a Charset instance for its _charset parameter, the __init__ method for
MIMENonMultipart cannot.

Incidentally, here's the class I finally ended up with:

from email.Charset import Charset
from email.MIMEText import MIMEText as OriginalMIMEText
from email.MIMENonMultipart import MIMENonMultipart

class MTText(OriginalMIMEText):

def __init__(self, _text, _subtype='plain', _charset='us-ascii'):
if not isinstance(_charset,Charset):
_charset = Charset(_charset)
if isinstance(_text,unicode):
_text = _text.encode(_charset.input_charset)
MIMENonMultipart.__init__(self, 'text', _subtype,
**{'charset': _charset.input_charset})
self.set_payload(_text, _charset)

cheers,

Chris
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,768
Messages
2,569,574
Members
45,048
Latest member
verona

Latest Threads

Top