stmplib MIMEText charset weirdness

Discussion in 'Python' started by Adam W., Feb 26, 2013.

  1. Adam W.

    Adam W. Guest

    Can someone explain to me why I can't set the charset after the fact and still have it work.

    For example:
    >>> text = MIMEText('â¤Â¥'.encode('utf-8'), 'html')
    >>> text.set_charset('utf-8')
    >>> text.as_string()

    Traceback (most recent call last):
    File "<pyshell#53>", line 1, in <module>
    text.as_string()
    File "C:\Python32\lib\email\message.py", line 168, in as_string
    g.flatten(self, unixfrom=unixfrom)
    File "C:\Python32\lib\email\generator.py", line 91, in flatten
    self._write(msg)
    File "C:\Python32\lib\email\generator.py", line 137, in _write
    self._dispatch(msg)
    File "C:\Python32\lib\email\generator.py", line 163, in _dispatch
    meth(msg)
    File "C:\Python32\lib\email\generator.py", line 192, in _handle_text
    raise TypeError('string payload expected: %s' % type(payload))
    TypeError: string payload expected: <class 'bytes'>

    As opposed to:
    >>> text = MIMEText('â¤Â¥'.encode('utf-8'), 'html', 'utf-8')
    >>> text.as_string()

    'Content-Type: text/html; charset="utf-8"\nMIME-Version: 1.0\nContent-Transfer-Encoding: base64\n\n4p2kwqU=\n'


    Side question:
    >>> text = MIMEText('â¤Â¥', 'html')
    >>> text.set_charset('utf-8')
    >>> text.as_string()

    'MIME-Version: 1.0\nContent-Transfer-Encoding: 8bit\nContent-Type: text/html; charset="utf-8"\n\nâ¤Â¥'

    Why is it now 8-bit encoding?
    Adam W., Feb 26, 2013
    #1
    1. Advertising

  2. On Mon, 25 Feb 2013 20:00:24 -0800, Adam W. wrote:

    > Can someone explain to me why I can't set the charset after the fact and
    > still have it work.
    >
    > For example:
    >>>> text = MIMEText('â¤Â¥'.encode('utf-8'), 'html')



    It would help if you tell us where this MIMEText function came from.
    Based on the error messages you provide later, I'm going to assume it is
    the one in the Python 3.2 email package:

    from email.mime.text import MIMEText

    The documentation for MIMEText is rather terse, but it implies that the
    parameter given should be a string, not bytes:

    http://docs.python.org/3.2/library/email.mime#email.mime.text.MIMEText

    If I provide a string, it seems to work fine:


    py> msg = 'â¤Â¥'
    py> blob = MIMEText(msg, _charset='utf-8')
    py> blob.as_string()
    Content-Type: text/plain; charset="utf-8"
    MIME-Version: 1.0
    Content-Transfer-Encoding: base64

    4p2kwqU=



    But if I provide bytes, as you do, I get the same error you do:


    py> msg_as_bytes = msg.encode('utf-8')
    py> print(msg_as_bytes)
    b'\xe2\x9d\xa4\xc2\xa5'
    py> blob = MIMEText(msg_as_bytes)
    py> blob.as_string()
    Traceback (most recent call last):
    [...]
    TypeError: string payload expected: <class 'bytes'>


    So it pays to read the error message. It tells you that it expected the
    payload should be a string, but was bytes instead.


    > As opposed to:
    >
    >>>> text = MIMEText('â¤Â¥'.encode('utf-8'), 'html', 'utf-8')
    >>>> text.as_string()

    > 'Content-Type: text/html; charset="utf-8"\nMIME-Version:
    > 1.0\nContent-Transfer-Encoding: base64\n\n4p2kwqU=\n'



    My wild guess is that it is an accident (possibly a bug) that the above
    works at all. I think it shouldn't; MIMEText is expecting a string, and
    you provide a bytes object. The documentation for the email package
    states:


    Steven D'Aprano, Feb 26, 2013
    #2
    1. Advertising

  3. Adam W.

    Adam W. Guest

    On Tuesday, February 26, 2013 2:10:28 AM UTC-5, Steven D'Aprano wrote:
    > On Mon, 25 Feb 2013 20:00:24 -0800, Adam W. wrote:
    >
    > The documentation for MIMEText is rather terse, but it implies that the
    >
    > parameter given should be a string, not bytes:
    >
    >
    >
    > http://docs.python.org/3.2/library/email.mime#email.mime.text.MIMEText
    >
    >
    >
    > If I provide a string, it seems to work fine:
    >
    >



    Ok, working under the assumption you need to provide it a string, it still leaves the question why adding the header after the fact (to a string input) does not produce the same result as declaring the encoding type inline.


    >
    > > As opposed to:

    >
    > >

    >
    > >>>> text = MIMEText('â¤Â¥'.encode('utf-8'), 'html', 'utf-8')

    >
    > >>>> text.as_string()

    >
    > > 'Content-Type: text/html; charset="utf-8"\nMIME-Version:

    >
    > > 1.0\nContent-Transfer-Encoding: base64\n\n4p2kwqU=\n'

    >
    >
    >
    >
    >
    > My wild guess is that it is an accident (possibly a bug) that the above
    >
    > works at all. I think it shouldn't; MIMEText is expecting a string, and
    >
    > you provide a bytes object. The documentation for the email package
    >
    > states:
    >
    >
    >
    >
    >
    >
    Adam W., Feb 26, 2013
    #3
  4. Adam W.

    Terry Reedy Guest

    On 2/25/2013 11:00 PM, Adam W. wrote:
    > Can someone explain to me why I can't set the charset after the fact.


    Email was revised to v.6 for 3.3, so the immediate answer to both your
    why questions is 'because email was not revised yet'.

    > text = MIMEText('â¤Â¥'.encode('utf-8'), 'html')


    In 3.3 this fails immediately with
    AttributeError: 'bytes' object has no attribute 'encode'
    because when _charset is not given, MIMEText.__init__ test encodes to
    discover what it should be
    if _charset is None:
    try:
    _text.encode('us-ascii')
    _charset = 'us-ascii'
    except UnicodeEncodeError:
    _charset = 'utf-8'

    > text = MIMEText('â¤Â¥'.encode('utf-8'), 'html', 'utf-8')


    If one provides bytes, one must provide the charset and MIMEText assumes
    you are not lying.

    > text.as_string()
    > Content-Type: text/html; charset="utf-8"
    > MIME-Version: 1.0
    > Content-Transfer-Encoding: base64
    >
    > 4p2kwqU=


    > Side question:
    > text = MIMEText('â¤Â¥', 'html')
    > text.set_charset('utf-8')


    This is redundant here. This method is inherited from Message and
    appears pretty useless for the subclass.

    > text.as_string()
    > 'MIME-Version: 1.0\nContent-Transfer-Encoding: 8bit\nContent-Type:
    > text/html;charset="utf-8"\n\nâ¤Â¥'
    >
    > Why is it now 8-bit encoding?


    Bug fixed in 3.3. Output now same as above. Use 3.3 for email unless you
    cannot due to other dependencies not yet being available.

    --
    Terry Jan Reedy
    Terry Reedy, Feb 26, 2013
    #4
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Irmen de Jong
    Replies:
    1
    Views:
    671
    Irmen de Jong
    Jan 18, 2004
  2. steve
    Replies:
    4
    Views:
    528
    Brian van den Broek
    Mar 13, 2005
  3. Damjan

    Unicode in MIMEText

    Damjan, Nov 24, 2005, in forum: Python
    Replies:
    5
    Views:
    532
    Damjan
    Nov 28, 2005
  4. Robert
    Replies:
    0
    Views:
    310
    Robert
    Jan 18, 2006
  5. optimistx

    javascript charset <> page charset

    optimistx, Aug 14, 2008, in forum: Javascript
    Replies:
    2
    Views:
    272
    optimistx
    Aug 15, 2008
Loading...

Share This Page