stmplib MIMEText charset weirdness

A

Adam W.

Can someone explain to me why I can't set the charset after the fact and still have it work.

For example:Traceback (most recent call last):
File "<pyshell#53>", line 1, in <module>
text.as_string()
File "C:\Python32\lib\email\message.py", line 168, in as_string
g.flatten(self, unixfrom=unixfrom)
File "C:\Python32\lib\email\generator.py", line 91, in flatten
self._write(msg)
File "C:\Python32\lib\email\generator.py", line 137, in _write
self._dispatch(msg)
File "C:\Python32\lib\email\generator.py", line 163, in _dispatch
meth(msg)
File "C:\Python32\lib\email\generator.py", line 192, in _handle_text
raise TypeError('string payload expected: %s' % type(payload))
TypeError: string payload expected: <class 'bytes'>

As opposed to:'Content-Type: text/html; charset="utf-8"\nMIME-Version: 1.0\nContent-Transfer-Encoding: base64\n\n4p2kwqU=\n'


Side question:'MIME-Version: 1.0\nContent-Transfer-Encoding: 8bit\nContent-Type: text/html; charset="utf-8"\n\nâ¤Â¥'

Why is it now 8-bit encoding?
 
S

Steven D'Aprano

Can someone explain to me why I can't set the charset after the fact and
still have it work.

For example:


It would help if you tell us where this MIMEText function came from.
Based on the error messages you provide later, I'm going to assume it is
the one in the Python 3.2 email package:

from email.mime.text import MIMEText

The documentation for MIMEText is rather terse, but it implies that the
parameter given should be a string, not bytes:

http://docs.python.org/3.2/library/email.mime#email.mime.text.MIMEText

If I provide a string, it seems to work fine:


py> msg = 'â¤Â¥'
py> blob = MIMEText(msg, _charset='utf-8')
py> blob.as_string()
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: base64

4p2kwqU=



But if I provide bytes, as you do, I get the same error you do:


py> msg_as_bytes = msg.encode('utf-8')
py> print(msg_as_bytes)
b'\xe2\x9d\xa4\xc2\xa5'
py> blob = MIMEText(msg_as_bytes)
py> blob.as_string()
Traceback (most recent call last):
[...]
TypeError: string payload expected: <class 'bytes'>


So it pays to read the error message. It tells you that it expected the
payload should be a string, but was bytes instead.

As opposed to:

'Content-Type: text/html; charset="utf-8"\nMIME-Version:
1.0\nContent-Transfer-Encoding: base64\n\n4p2kwqU=\n'


My wild guess is that it is an accident (possibly a bug) that the above
works at all. I think it shouldn't; MIMEText is expecting a string, and
you provide a bytes object. The documentation for the email package
states:


Here are the major differences between email version 5.0 and version 4:

All operations are on unicode strings. Text inputs must be strings,
text outputs are strings. Outputs are limited to the ASCII character set
and so can be encoded to ASCII for transmission. Inputs are also limited
to ASCII; this is an acknowledged limitation of email 5.0 and means it
can only be used to parse email that is 7bit clean.
[end quote]

http://docs.python.org/3.2/library/email.html



but frankly, I'm not an expert on the email package. It may be that the
behaviour you describe is deliberate.
 
A

Adam W.

On Mon, 25 Feb 2013 20:00:24 -0800, Adam W. wrote:

The documentation for MIMEText is rather terse, but it implies that the

parameter given should be a string, not bytes:



http://docs.python.org/3.2/library/email.mime#email.mime.text.MIMEText



If I provide a string, it seems to work fine:


Ok, working under the assumption you need to provide it a string, it still leaves the question why adding the header after the fact (to a string input) does not produce the same result as declaring the encoding type inline.

As opposed to:
'Content-Type: text/html; charset="utf-8"\nMIME-Version:
1.0\nContent-Transfer-Encoding: base64\n\n4p2kwqU=\n'





My wild guess is that it is an accident (possibly a bug) that the above

works at all. I think it shouldn't; MIMEText is expecting a string, and

you provide a bytes object. The documentation for the email package

states:





Here are the major differences between email version 5.0 and version 4:



All operations are on unicode strings. Text inputs must be strings,

text outputs are strings. Outputs are limited to the ASCII character set

and so can be encoded to ASCII for transmission. Inputs are also limited

to ASCII; this is an acknowledged limitation of email 5.0 and means it

can only be used to parse email that is 7bit clean.

[end quote]



http://docs.python.org/3.2/library/email.html

I find this limitation hard to believe, why bother with encoding flags if it can only ever accept ASCII anyway?

The reason this issue came up was because I was adding the header after like in my examples and it wasn't working, so I Google'd around and found thisStackoverflow: http://stackoverflow.com/questions/...-charset-in-email-using-smtplib-in-python-2-7

Which seemed to be doing exactly what I wanted, with the only difference isthe inline deceleration of utf-8, with that change it started working as desired...
 
T

Terry Reedy

Can someone explain to me why I can't set the charset after the fact.

Email was revised to v.6 for 3.3, so the immediate answer to both your
why questions is 'because email was not revised yet'.
text = MIMEText('â¤Â¥'.encode('utf-8'), 'html')

In 3.3 this fails immediately with
AttributeError: 'bytes' object has no attribute 'encode'
because when _charset is not given, MIMEText.__init__ test encodes to
discover what it should be
if _charset is None:
try:
_text.encode('us-ascii')
_charset = 'us-ascii'
except UnicodeEncodeError:
_charset = 'utf-8'
text = MIMEText('â¤Â¥'.encode('utf-8'), 'html', 'utf-8')

If one provides bytes, one must provide the charset and MIMEText assumes
you are not lying.
text.as_string()
Content-Type: text/html; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: base64

4p2kwqU=
Side question:
text = MIMEText('â¤Â¥', 'html')
text.set_charset('utf-8')

This is redundant here. This method is inherited from Message and
appears pretty useless for the subclass.
text.as_string()
'MIME-Version: 1.0\nContent-Transfer-Encoding: 8bit\nContent-Type:
text/html;charset="utf-8"\n\nâ¤Â¥'

Why is it now 8-bit encoding?

Bug fixed in 3.3. Output now same as above. Use 3.3 for email unless you
cannot due to other dependencies not yet being available.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,768
Messages
2,569,574
Members
45,048
Latest member
verona

Latest Threads

Top