Problems with email.Generator.Generator

Discussion in 'Python' started by Chris Withers, Sep 11, 2006.

  1. Hi All,

    The following piece of code is giving me issues:

    from email.Charset import Charset,QP
    from email.MIMEText import MIMEText
    charset = Charset('utf-8')
    charset.body_encoding = QP
    msg = MIMEText(
    u'Some text with chars that need encoding: \xa3',
    'plain',
    )
    msg.set_charset(charset)
    print msg.as_string()

    Under Python 2.4.2, this produces the following output, as I'd expect:

    MIME-Version: 1.0
    Content-Transfer-Encoding: 8bit
    Content-Type: text/plain; charset="utf-8"

    Some text with chars that need encoding: =A3

    However, under Python 2.4.3, I now get:

    Traceback (most recent call last):
    File "test_encoding.py", line 14, in ?
    msg.as_string()
    File "c:\python24\lib\email\Message.py", line 129,
    in
    as_string
    g.flatten(self, unixfrom=unixfrom)
    File "c:\python24\lib\email\Generator.py", line 82,
    in flatten
    self._write(msg)
    File "c:\python24\lib\email\Generator.py", line 113,
    in _write
    self._dispatch(msg)
    File "c:\python24\lib\email\Generator.py", line 139,
    in
    _dispatch
    meth(msg)
    File "c:\python24\lib\email\Generator.py", line 182,
    in
    _handle_text
    self._fp.write(payload)
    UnicodeEncodeError: 'ascii' codec can't encode
    character
    u'\xa3' in position 41:
    ordinal not in range(128)

    This seems to be as a result of this change:

    http://svn.python.org/view/python/b...mail/Generator.py?rev=42272&r1=37910&r2=42272

    ....which is referred to as part of a fix for this bug:

    http://sourceforge.net/tracker/?func=detail&aid=1409455&group_id=5470&atid=105470

    Now, is this change to Generator.py in error or am I doing something wrong?

    If the latter, how can I change my code such that it works as I'd expect?

    cheers,

    Chris

    --
    Simplistix - Content Management, Zope & Python Consulting
    - http://www.simplistix.co.uk
     
    Chris Withers, Sep 11, 2006
    #1
    1. Advertising

  2. Chris Withers ha scritto:
    > Hi All,
    >
    > The following piece of code is giving me issues:
    >
    > from email.Charset import Charset,QP
    > from email.MIMEText import MIMEText
    > charset = Charset('utf-8')
    > charset.body_encoding = QP
    > msg = MIMEText(
    > u'Some text with chars that need encoding: \xa3',
    > 'plain',
    > )
    > msg.set_charset(charset)
    > print msg.as_string()
    >
    > Under Python 2.4.2, this produces the following output, as I'd expect:
    >


    > [...]
    > However, under Python 2.4.3, I now get:
    >


    Try with:

    msg = MIMEText(
    u'Some text with chars that need encoding: \xa3',
    _charset='utf-8',
    )


    and you will obtain the error:


    Traceback (most recent call last):
    File "<pyshell#4>", line 3, in -toplevel-
    _charset='utf-8',
    File "C:\Python2.4\lib\email\MIMEText.py", line 28, in __init__
    self.set_payload(_text, _charset)
    File "C:\Python2.4\lib\email\Message.py", line 218, in set_payload
    self.set_charset(charset)
    File "C:\Python2.4\lib\email\Message.py", line 260, in set_charset
    self._payload = charset.body_encode(self._payload)
    File "C:\Python2.4\lib\email\Charset.py", line 366, in body_encode
    return email.base64MIME.body_encode(s)
    File "C:\Python2.4\lib\email\base64MIME.py", line 136, in encode
    enc = b2a_base64(s[i:i + max_unencoded])
    UnicodeEncodeError: 'ascii' codec can't encode character u'\xa3' in
    position 41: ordinal not in range(128)




    Regards Manlio Perillo
     
    Manlio Perillo, Sep 11, 2006
    #2
    1. Advertising

  3. Manlio Perillo wrote:
    > Try with:
    >
    > msg = MIMEText(
    > u'Some text with chars that need encoding: \xa3',
    > _charset='utf-8',
    > )
    >
    >
    > and you will obtain the error:
    >
    > Traceback (most recent call last):
    > File "<pyshell#4>", line 3, in -toplevel-
    > _charset='utf-8',
    > File "C:\Python2.4\lib\email\MIMEText.py", line 28, in __init__
    > self.set_payload(_text, _charset)
    > File "C:\Python2.4\lib\email\Message.py", line 218, in set_payload
    > self.set_charset(charset)
    > File "C:\Python2.4\lib\email\Message.py", line 260, in set_charset
    > self._payload = charset.body_encode(self._payload)
    > File "C:\Python2.4\lib\email\Charset.py", line 366, in body_encode
    > return email.base64MIME.body_encode(s)
    > File "C:\Python2.4\lib\email\base64MIME.py", line 136, in encode
    > enc = b2a_base64(s[i:i + max_unencoded])
    > UnicodeEncodeError: 'ascii' codec can't encode character u'\xa3' in
    > position 41: ordinal not in range(128)


    OK, but I fail to see how replacing one unicode error with another is
    any help... :-S

    Chris

    --
    Simplistix - Content Management, Zope & Python Consulting
    - http://www.simplistix.co.uk
     
    Chris Withers, Sep 11, 2006
    #3
  4. Chris Withers

    Peter Otten Guest

    Chris Withers wrote:

    > The following piece of code is giving me issues:
    >
    > from email.Charset import Charset,QP
    > from email.MIMEText import MIMEText
    > charset = Charset('utf-8')
    > charset.body_encoding = QP
    > msg = MIMEText(
    > u'Some text with chars that need encoding: \xa3',
    > 'plain',
    > )
    > msg.set_charset(charset)
    > print msg.as_string()
    >
    > Under Python 2.4.2, this produces the following output, as I'd expect:
    >
    > MIME-Version: 1.0
    > Content-Transfer-Encoding: 8bit
    > Content-Type: text/plain; charset="utf-8"
    >
    > Some text with chars that need encoding: =A3
    >
    > However, under Python 2.4.3, I now get:
    >
    > Traceback (most recent call last):
    > File "test_encoding.py", line 14, in ?
    > msg.as_string()
    > File "c:\python24\lib\email\Message.py", line 129,
    > in
    > as_string
    > g.flatten(self, unixfrom=unixfrom)
    > File "c:\python24\lib\email\Generator.py", line 82,
    > in flatten
    > self._write(msg)
    > File "c:\python24\lib\email\Generator.py", line 113,
    > in _write
    > self._dispatch(msg)
    > File "c:\python24\lib\email\Generator.py", line 139,
    > in
    > _dispatch
    > meth(msg)
    > File "c:\python24\lib\email\Generator.py", line 182,
    > in
    > _handle_text
    > self._fp.write(payload)
    > UnicodeEncodeError: 'ascii' codec can't encode
    > character
    > u'\xa3' in position 41:
    > ordinal not in range(128)
    >
    > This seems to be as a result of this change:
    >
    >

    http://svn.python.org/view/python/b...mail/Generator.py?rev=42272&r1=37910&r2=42272
    >
    > ...which is referred to as part of a fix for this bug:
    >
    >

    http://sourceforge.net/tracker/?func=detail&aid=1409455&group_id=5470&atid=105470
    >
    > Now, is this change to Generator.py in error or am I doing something
    > wrong?


    I'm not familiar enough with the email package to answer that.

    > If the latter, how can I change my code such that it works as I'd expect?


    email.Generator and email.Message use cStringIO.StringIO internally, which
    can't cope with unicode. A quick fix might be to monkey-patch:

    from StringIO import StringIO
    from email import Generator, Message
    Generator.StringIO = Message.StringIO = StringIO
    # your code here

    Peter
     
    Peter Otten, Sep 11, 2006
    #4
  5. Peter Otten wrote:
    > http://sourceforge.net/tracker/?func=detail&aid=1409455&group_id=5470&atid=105470
    >> Now, is this change to Generator.py in error or am I doing something
    >> wrong?

    >
    > I'm not familiar enough with the email package to answer that.


    I'm hoping someone around here is ;-)

    >> If the latter, how can I change my code such that it works as I'd expect?

    >
    > email.Generator and email.Message use cStringIO.StringIO internally, which
    > can't cope with unicode. A quick fix might be to monkey-patch:


    I'm not sure that's correct, but I'm happy to stand corrected.

    My understanding is that the StringIO's don't mind as long as they type
    is consistent - ie: con't mix unicode and encoded strings, 'cos it
    forced python's default ascii codec to kick in and spew unicode errors.

    Now, I want to know what I'm supposed to do when I have unicode source
    and want it to end up as either a text/plain or text/html mime part.

    Is there a how-to for this anywhere? The email package's docs are short
    on examples involving charsets, unicode and the like :-(

    Chris

    --
    Simplistix - Content Management, Zope & Python Consulting
    - http://www.simplistix.co.uk
     
    Chris Withers, Sep 11, 2006
    #5
  6. Chris Withers

    Steve Holden Guest

    Chris Withers wrote:
    > Peter Otten wrote:
    >
    >>http://sourceforge.net/tracker/?func=detail&aid=1409455&group_id=5470&atid=105470
    >>
    >>>Now, is this change to Generator.py in error or am I doing something
    >>>wrong?

    >>
    >>I'm not familiar enough with the email package to answer that.

    >
    >
    > I'm hoping someone around here is ;-)
    >
    >
    >>>If the latter, how can I change my code such that it works as I'd expect?

    >>
    >>email.Generator and email.Message use cStringIO.StringIO internally, which
    >>can't cope with unicode. A quick fix might be to monkey-patch:

    >
    >
    > I'm not sure that's correct, but I'm happy to stand corrected.
    >
    > My understanding is that the StringIO's don't mind as long as they type
    > is consistent - ie: con't mix unicode and encoded strings, 'cos it
    > forced python's default ascii codec to kick in and spew unicode errors.
    >
    > Now, I want to know what I'm supposed to do when I have unicode source
    > and want it to end up as either a text/plain or text/html mime part.
    >
    > Is there a how-to for this anywhere? The email package's docs are short
    > on examples involving charsets, unicode and the like :-(
    >

    Well, it would seem like the easiest approach is to monkey-patch the use
    of cStringIO to StringIO as recommended and see if that fixes your
    problem. Wouldn't it?

    regards
    Steve
    --
    Steve Holden +44 150 684 7255 +1 800 494 3119
    Holden Web LLC/Ltd http://www.holdenweb.com
    Skype: holdenweb http://holdenweb.blogspot.com
    Recent Ramblings http://del.icio.us/steve.holden
     
    Steve Holden, Sep 11, 2006
    #6
  7. Chris Withers wrote:
    >
    > Now, I want to know what I'm supposed to do when I have unicode source
    > and want it to end up as either a text/plain or text/html mime part.
    >
    > Is there a how-to for this anywhere? The email package's docs are short
    > on examples involving charsets, unicode and the like :-(


    no expert in this, but have you tried the codecs module?

    http://docs.python.org/lib/codec-objects.html

    ( with 'xmlcharrefreplace' for the html )?

    Gerard
     
    Gerard Flanagan, Sep 11, 2006
    #7
  8. Chris Withers ha scritto:
    > [...]
    >
    > OK, but I fail to see how replacing one unicode error with another is
    > any help... :-S
    >



    The problem is simple: email package does not support well Unicode strings.

    For now I'm using this:

    charset = "utf-8" # the charset to be used for email


    class HeadersMixin(object):
    """A custom mixin, for automatic internationalized headers
    support.
    """

    def __setitem__(self, name, val, **_params):
    if isinstance(val, str):
    try:
    # only 7 bit ascii
    val.decode("us-ascii")
    except UnicodeDecodeError:
    raise ValueError("8 bit strings not accepted")

    return self.add_header(name, val)
    else:
    try:
    # to avoid unnecessary trash
    val = val.encode('us-ascii')
    except:
    val = Header.Header(val, charset).encode()

    return self.add_header(name, val)


    class MIMEText(HeadersMixin, _MIMEText.MIMEText):
    """A MIME Text message that allows only Unicode strings, or plain
    ascii (7 bit) ones.
    """

    def __init__(self, _text, _subtype="plain"):
    _charset = charset

    if isinstance(_text, str):
    try:
    # only 7 bit ascii
    _text.decode("us-ascii")
    _charset = "us-ascii"
    except UnicodeDecodeError:
    raise ValueError("8 bit strings not accepted")
    else:
    _text = _text.encode(charset)

    return _MIMEText.MIMEText.__init__(self, _text, _subtype, _charset)


    class MIMEMultipart(HeadersMixin, _MIMEMultipart.MIMEMultipart):
    def __init__(self):
    _MIMEMultipart.MIMEMultipart.__init__(self)



    This only accepts Unicode strings or plain ascii strings.




    Regards Manlio Perillo
     
    Manlio Perillo, Sep 11, 2006
    #8
  9. Steve Holden wrote:
    >> Is there a how-to for this anywhere? The email package's docs are short
    >> on examples involving charsets, unicode and the like :-(
    >>

    > Well, it would seem like the easiest approach is to monkey-patch the use
    > of cStringIO to StringIO as recommended and see if that fixes your
    > problem. Wouldn't it?


    No, not really, since at best that's a nasty (and I meant really nasty)
    hack. I'm using the email package as part of a library that I'm building
    which is to be used with various frameworks. Monkey patching modules is
    about as bad as it gets in that situation...

    At worst, and most likely based on my past experience of (c)StringIO
    being used to accumulate output, it won't make a jot of difference...

    Chris

    --
    Simplistix - Content Management, Zope & Python Consulting
    - http://www.simplistix.co.uk
     
    Chris Withers, Sep 11, 2006
    #9
  10. Manlio Perillo wrote:
    >
    > The problem is simple: email package does not support well Unicode strings.


    Really? All the character set support seems to indicate a fair bit of
    thought went into this aspect, although it does appear that no-one
    bothered to document it :-(

    Chris

    --
    Simplistix - Content Management, Zope & Python Consulting
    - http://www.simplistix.co.uk
     
    Chris Withers, Sep 11, 2006
    #10
  11. Chris Withers

    Peter Otten Guest

    Chris Withers wrote:

    > At worst, and most likely based on my past experience of (c)StringIO
    > being used to accumulate output, it won't make a jot of difference...


    What past experience?

    >>> StringIO.StringIO().write(unichr(128))
    >>> cStringIO.StringIO().write(unichr(128))

    Traceback (most recent call last):
    File "<stdin>", line 1, in ?
    UnicodeEncodeError: 'ascii' codec can't encode character u'\x80' in position
    0: ordinal not in range(128)

    Peter
     
    Peter Otten, Sep 11, 2006
    #11
  12. Chris Withers

    Steve Holden Guest

    Chris Withers wrote:
    > Steve Holden wrote:
    >
    >>> Is there a how-to for this anywhere? The email package's docs are
    >>> short on examples involving charsets, unicode and the like :-(
    >>>

    >> Well, it would seem like the easiest approach is to monkey-patch the
    >> use of cStringIO to StringIO as recommended and see if that fixes your
    >> problem. Wouldn't it?

    >
    >
    > No, not really, since at best that's a nasty (and I meant really nasty)
    > hack. I'm using the email package as part of a library that I'm building
    > which is to be used with various frameworks. Monkey patching modules is
    > about as bad as it gets in that situation...
    >
    > At worst, and most likely based on my past experience of (c)StringIO
    > being used to accumulate output, it won't make a jot of difference...
    >

    Under those circumstances you probably know best ...

    regards
    Steve
    --
    Steve Holden +44 150 684 7255 +1 800 494 3119
    Holden Web LLC/Ltd http://www.holdenweb.com
    Skype: holdenweb http://holdenweb.blogspot.com
    Recent Ramblings http://del.icio.us/steve.holden
     
    Steve Holden, Sep 11, 2006
    #12
  13. Peter Otten wrote:
    > What past experience?
    >
    >>>> StringIO.StringIO().write(unichr(128))
    >>>> cStringIO.StringIO().write(unichr(128))

    > Traceback (most recent call last):
    > File "<stdin>", line 1, in ?
    > UnicodeEncodeError: 'ascii' codec can't encode character u'\x80' in position
    > 0: ordinal not in range(128)


    OK, I stand corrected, although I suspect the bug is actually in
    StringIO.StringIO in that it doesn't barf on unicodes.

    (Python 3000 and all that)

    Which again leads us back to the email package: it used to do the right
    thing from what I can see, and now it doesn't, and ends up trying to
    write a unicode to a cStringIO, which (rightly, I guess) barfs...

    Barry, Barry, where are you? ;-)

    Chris

    --
    Simplistix - Content Management, Zope & Python Consulting
    - http://www.simplistix.co.uk
     
    Chris Withers, Sep 11, 2006
    #13
  14. Peter Otten wrote:
    > Chris Withers wrote:
    >
    >> At worst, and most likely based on my past experience of (c)StringIO
    >> being used to accumulate output, it won't make a jot of difference...

    >
    > What past experience?
    >
    >>>> StringIO.StringIO().write(unichr(128))
    >>>> cStringIO.StringIO().write(unichr(128))

    > Traceback (most recent call last):
    > File "<stdin>", line 1, in ?
    > UnicodeEncodeError: 'ascii' codec can't encode character u'\x80' in position
    > 0: ordinal not in range(128)


    Okay, more out of desperation than anything else, lets try this:

    from email.Charset import Charset,QP
    from email.MIMEText import MIMEText
    from StringIO import StringIO
    from email import Generator,Message
    Generator.StringIO = Message.StringIO = StringIO
    charset = Charset('utf-8')
    charset.body_encoding = QP
    msg = MIMEText(u'Some text with chars that need encoding: \xa3','plain')
    msg.set_charset(charset)
    print repr(msg.as_string())
    u'MIME-Version: 1.0\nContent-Transfer-Encoding: 8bit\nContent-Type:
    text/plain; charset="utf-8"\n\nSome text with chars that need encoding:
    \xa3'

    Yay! No unicode error, but also no use:

    File "c:\python24\lib\smtplib.py", line 692, in sendmail
    (code,resp) = self.data(msg)
    File "c:\python24\lib\smtplib.py", line 489, in data
    self.send(q)
    File "c:\python24\lib\smtplib.py", line 316, in send
    self.sock.sendall(str)
    File "<string>", line 1, in sendall
    UnicodeEncodeError: 'ascii' codec can't encode character u'\xa3' in
    position 297: ordinal not in range(128)

    The other variant I've tried is:

    from email.Charset import Charset,QP
    from email.MIMEText import MIMEText
    charset = Charset('utf-8')
    charset.body_encoding = QP
    msg = MIMEText('','plain',)
    msg.set_charset(charset)
    msg.set_payload(charset.body_encode(u'Some text with chars that need
    encoding: \xa3'))
    print msg.as_string()

    Which is sort of okay:

    MIME-Version: 1.0
    Content-Transfer-Encoding: 7bit
    Content-Type: text/plain; charset="utf-8"

    Some text with chars that need encoding: =A3

    ....except it gets the transfer encoding wrong, which means Thunderbird
    shows =A3 instead of the pound sign that it should :-(

    ....this is down to a pretty lame bit of code in Encoders.py which
    basically checks for a unicode error *sigh*

    Chris

    --
    Simplistix - Content Management, Zope & Python Consulting
    - http://www.simplistix.co.uk
     
    Chris Withers, Sep 11, 2006
    #14
  15. Chris Withers wrote:
    > ...except it gets the transfer encoding wrong, which means Thunderbird
    > shows =A3 instead of the pound sign that it should :-(
    >
    > ...this is down to a pretty lame bit of code in Encoders.py which
    > basically checks for a unicode error *sigh*


    OK, slight progress... here a new version that actually works:

    from email.Charset import Charset,QP
    from email.MIMEText import MIMEText
    charset = Charset('utf-8')
    charset.body_encoding = QP
    msg = MIMEText('','plain',None)
    msg.set_payload(u'Some text with chars that need encoding:\xa3',charset)
    print msg.as_string()

    MIME-Version: 1.0
    Content-Type: text/plain; charset; charset="utf-8"
    Content-Transfer-Encoding: quoted-printable

    Some text with chars that need encoding:=A3

    Okay, so this actually does the right thing... wahey!

    ....but hold your horses, if Charset isn't set to quoted printable, then
    you end up with problems:

    charset = Charset('utf-8')
    msg = MIMEText('','plain',None)
    msg.set_payload(u'Some text with chars that need encoding:\xa3',charset)

    Traceback (most recent call last):
    File "C:\test_encoding.py", line 5, in ?
    msg.set_payload(u'Some text with chars that need
    encoding:\xa3',charset)
    File "c:\python24\lib\email\Message.py", line 218, in set_payload
    self.set_charset(charset)
    File "c:\python24\lib\email\Message.py", line 260, in set_charset
    self._payload = charset.body_encode(self._payload)
    File "c:\python24\lib\email\Charset.py", line 366, in body_encode
    return email.base64MIME.body_encode(s)
    File "c:\python24\lib\email\base64MIME.py", line 136, in encode
    enc = b2a_base64(s[i:i + max_unencoded])
    UnicodeEncodeError: 'ascii' codec can't encode character u'\xa3' in
    position 40: ordinal not in range(128)

    Now what?

    *sigh*

    Chris

    --
    Simplistix - Content Management, Zope & Python Consulting
    - http://www.simplistix.co.uk
     
    Chris Withers, Sep 11, 2006
    #15
  16. Chris Withers wrote:
    > print msg.as_string()
    >
    > MIME-Version: 1.0
    > Content-Type: text/plain; charset; charset="utf-8"

    ^^^^^^^
    Actually, even this isn't correct as you can see above...

    > charset = Charset('utf-8')
    > msg = MIMEText('','plain',None)
    > msg.set_payload(u'Some text with chars that need encoding:\xa3',charset)
    >
    > Traceback (most recent call last):
    > File "C:\test_encoding.py", line 5, in ?
    > msg.set_payload(u'Some text with chars that need
    > encoding:\xa3',charset)
    > File "c:\python24\lib\email\Message.py", line 218, in set_payload
    > self.set_charset(charset)
    > File "c:\python24\lib\email\Message.py", line 260, in set_charset
    > self._payload = charset.body_encode(self._payload)
    > File "c:\python24\lib\email\Charset.py", line 366, in body_encode
    > return email.base64MIME.body_encode(s)
    > File "c:\python24\lib\email\base64MIME.py", line 136, in encode
    > enc = b2a_base64(s[i:i + max_unencoded])
    > UnicodeEncodeError: 'ascii' codec can't encode character u'\xa3' in
    > position 40: ordinal not in range(128)


    ....and I'm still left with this problem...

    Has no-one ever successfully generated a correctly formatted email with
    email.MIMEText where the message includes non-ascii characters?!

    Chris

    --
    Simplistix - Content Management, Zope & Python Consulting
    - http://www.simplistix.co.uk
     
    Chris Withers, Sep 12, 2006
    #16
  17. Problems with email.Generator.Generator - Solved?

    Chris Withers wrote:
    > Has no-one ever successfully generated a correctly formatted email with
    > email.MIMEText where the message includes non-ascii characters?!


    I'm guessing not ;-)

    Well, I think I have a winner, but it required me to subclass MIMEText:

    from email.Charset import Charset,QP
    from email.MIMEText import MIMEText as OriginalMIMEText
    from email.MIMENonMultipart import MIMENonMultipart

    class MIMEText(OriginalMIMEText):

    def __init__(self, _text, _subtype='plain', _charset='us-ascii'):
    if isinstance(_charset,Charset):
    cs = _charset.input_charset
    else:
    cs = _charset
    if isinstance(_text,unicode):
    _text = _text.encode(charset.input_charset)
    MIMENonMultipart.__init__(self, 'text', _subtype,
    **{'charset': cs})
    self.set_payload(_text, _charset)

    charset = Charset('utf-8')
    charset.body_encoding = QP
    txt = u'Some text with chars that need encoding:\xa3'
    msg = MIMEText(txt,'plain',charset)
    print msg.as_string()

    Which gives:

    Content-Type: text/plain; charset="utf-8"
    MIME-Version: 1.0
    Content-Transfer-Encoding: quoted-printable

    Some text with chars that need encoding:=C2=A3

    It also works with non-QP charsets.

    The reason the subclass is needed is because the
    MIMNonMultipart.__init__ cannot handle a charset which isn't a simple
    string. Since it's needed for that reason, it seems like the right place
    to encode any incoming unicode.

    So, by my count, there are two bugs:

    1. email.MIMEText.MIMEText can't take a real Charset object to its
    __init__ method.

    2. email.Message.Message.set_payload has no clue about unicode.

    Does that sounds fair? If so, should I open SF issues for them?

    cheers,

    Chris

    --
    Simplistix - Content Management, Zope & Python Consulting
    - http://www.simplistix.co.uk
     
    Chris Withers, Sep 12, 2006
    #17
  18. Chris Withers

    Max M Guest

    Chris Withers wrote:
    > Chris Withers wrote:
    >> print msg.as_string()
    >>
    >> MIME-Version: 1.0
    >> Content-Type: text/plain; charset; charset="utf-8"

    > ^^^^^^^
    > Actually, even this isn't correct as you can see above...
    >
    >> charset = Charset('utf-8')
    >> msg = MIMEText('','plain',None)
    >> msg.set_payload(u'Some text with chars that need encoding:\xa3',charset)


    > Has no-one ever successfully generated a correctly formatted email with
    > email.MIMEText where the message includes non-ascii characters?!


    What is the problem with encoding the message as utf-8 before setting
    the payload? That has always worked for me.


    pl = u'Some text with chars that need encoding:\xa3'.encode('utf-8')
    msg.set_payload(pl ,charset)

    From the docs:

    """
    The payload is either a string in the case of simple message objects or
    a list of Message objects for MIME container documents (e.g. multipart/*
    and message/rfc822)
    """

    Message objects are always encoded strings. I don't remember seeing that
    it should be possible to use a unicode string as a message.

    The charset passed in set_payload(pl ,charset) is the charset the the
    string *is* encoded in. Not the charset it *should* be encoded in.

    --

    hilsen/regards Max M, Denmark

    http://www.mxm.dk/
    IT's Mad Science

    Phone: +45 66 11 84 94
    Mobile: +45 29 93 42 96
     
    Max M, Sep 12, 2006
    #18
  19. Chris Withers

    Peter Otten Guest

    Chris Withers wrote:

    > Okay, more out of desperation than anything else, lets try this:
    >
    > from email.Charset import Charset,QP
    > from email.MIMEText import MIMEText
    > from StringIO import StringIO
    > from email import Generator,Message
    > Generator.StringIO = Message.StringIO = StringIO
    > charset = Charset('utf-8')
    > charset.body_encoding = QP
    > msg = MIMEText(u'Some text with chars that need encoding: \xa3','plain')
    > msg.set_charset(charset)
    > print repr(msg.as_string())
    > u'MIME-Version: 1.0\nContent-Transfer-Encoding: 8bit\nContent-Type:
    > text/plain; charset="utf-8"\n\nSome text with chars that need encoding:
    > \xa3'
    >
    > Yay! No unicode error, but also no use:
    >
    > File "c:\python24\lib\smtplib.py", line 692, in sendmail
    > (code,resp) = self.data(msg)
    > File "c:\python24\lib\smtplib.py", line 489, in data
    > self.send(q)
    > File "c:\python24\lib\smtplib.py", line 316, in send
    > self.sock.sendall(str)
    > File "<string>", line 1, in sendall
    > UnicodeEncodeError: 'ascii' codec can't encode character u'\xa3' in
    > position 297: ordinal not in range(128)


    Yes, it seemed to work with your original example, but of course you have to
    encode unicode somehow before sending it through a wire. A severe case of
    peephole debugging, sorry. I've looked into the email package source once
    more, but I fear to understand the relevant parts you have to understand it
    wholesale.

    As Max suggested, your safest choice is probably passing in utf-8 instead of
    unicode.

    Peter
     
    Peter Otten, Sep 12, 2006
    #19
  20. Max M wrote:
    > From the docs:
    >
    > """
    > The payload is either a string in the case of simple message objects or
    > a list of Message objects for MIME container documents (e.g. multipart/*
    > and message/rfc822)
    > """


    Where'd you find that? I must have missed it in my digging :-S

    > Message objects are always encoded strings. I don't remember seeing that
    > it should be possible to use a unicode string as a message.


    Yes, I guess I just find that surprising in today's "everything should
    be unicode" world.

    > The charset passed in set_payload(pl ,charset) is the charset the the
    > string *is* encoded in. Not the charset it *should* be encoded in.


    Indeed, although there's still the bug that while set_payload can accept
    a Charset instance for its _charset parameter, the __init__ method for
    MIMENonMultipart cannot.

    Incidentally, here's the class I finally ended up with:

    from email.Charset import Charset
    from email.MIMEText import MIMEText as OriginalMIMEText
    from email.MIMENonMultipart import MIMENonMultipart

    class MTText(OriginalMIMEText):

    def __init__(self, _text, _subtype='plain', _charset='us-ascii'):
    if not isinstance(_charset,Charset):
    _charset = Charset(_charset)
    if isinstance(_text,unicode):
    _text = _text.encode(_charset.input_charset)
    MIMENonMultipart.__init__(self, 'text', _subtype,
    **{'charset': _charset.input_charset})
    self.set_payload(_text, _charset)

    cheers,

    Chris

    --
    Simplistix - Content Management, Zope & Python Consulting
    - http://www.simplistix.co.uk
     
    Chris Withers, Sep 12, 2006
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Peter
    Replies:
    0
    Views:
    3,429
    Peter
    Jul 1, 2003
  2. Martin Maurer
    Replies:
    3
    Views:
    5,095
    Peter
    Apr 19, 2006
  3. TheDustbustr
    Replies:
    1
    Views:
    496
    Sami Hangaslammi
    Jul 25, 2003
  4. Replies:
    9
    Views:
    610
  5. Terry Reedy

    Generator functions subclass generator?

    Terry Reedy, Jun 18, 2009, in forum: Python
    Replies:
    0
    Views:
    502
    Terry Reedy
    Jun 18, 2009
Loading...

Share This Page