Newline interpretation issue with MIMEApplication with binary data,Python 3.3.2

N

nilsbunger

Hi,

I'm having trouble encoding a MIME message with a binary file. Newline characters are being interpreted even though the content is supposed to be binary. This is using Python 3.3.2

Small test case:

app = MIMEApplication(b'Q\x0dQ', _encoder=encode_noop)
b = io.BytesIO()
g = BytesGenerator(b)
g.flatten(app)
for i in b.getvalue()[-3:]:
print ("%02x " % i, end="")
print ()

This prints 51 0a 51, meaning the 0x0d character got reinterpreted as a newline.

I've tried setting an email policy of HTTP policy, but that goes even further, converting \r to \r\n

This is for HTTP transport, so binary encoding is normal.

Any thoughts how I can do this properly?
 
C

Chris Angelico

app = MIMEApplication(b'Q\x0dQ', _encoder=encode_noop)

What is MIMEApplication? It's not a builtin, so your test case is
missing an import, at least. Is this email.mime.MIMEApplication?

ChrisA
 
N

Nils Bunger

Chris,

Thanks for answering.

Yes, it's email.mime.MIMEApplication. I've pasted a snippet with the imports below.

I'm trying to use this to build a multi-part MIME message, with this as one part.

I really can't figure out any way to attach a binary part like this to a multi-part MIME message without the encoding issue... any help would be greatly appreciate!

Nils

---------

import io
from email.mime.application import MIMEApplication
from email.generator import BytesGenerator
from email.encoders import encode_noop

app = MIMEApplication(b'Q\x0dQ', _encoder=encode_noop)
b = io.BytesIO()
g = BytesGenerator(b)
g.flatten(app)
for i in b.getvalue()[-3:]:
print("%02x " % i, end="")
print()
 
C

Chris Angelico

Yes, it's email.mime.MIMEApplication. I've pasted a snippet with the imports below.

I'm trying to use this to build a multi-part MIME message, with this as one part.

I really can't figure out any way to attach a binary part like this to a multi-part MIME message without the encoding issue... any help would be greatly appreciate!

I partly responded just to ping your thread, as I'm not particularly
familiar with the email.mime module. But a glance at the docs suggests
that MIMEApplication is a "subclass of MIMENonMultipart", so might it
be a problem to use that for multipart??

It's designed to handle text, so you may want to use an encoder (like
the default base64 one) rather than trying to push binary data through
it.

Random ideas, hopefully someone who actually knows the module can respond.

ChrisA
 
N

Neil Cerutti

I partly responded just to ping your thread, as I'm not
particularly familiar with the email.mime module. But a glance
at the docs suggests that MIMEApplication is a "subclass of
MIMENonMultipart", so might it be a problem to use that for
multipart??

It's designed to handle text, so you may want to use an encoder
(like the default base64 one) rather than trying to push binary
data through it.

Random ideas, hopefully someone who actually knows the module
can respond.

I got interested in it since I have never used any of the
modules. So I played with it enough to discover that the part of
the code above that converts the \r to \n is the flatten call.

I got to here and RFC 2049 and gave up.

The following guidelines may be useful to anyone devising a data
format (media type) that is supposed to survive the widest range of
networking technologies and known broken MTAs unscathed. Note that
anything encoded in the base64 encoding will satisfy these rules, but
that some well-known mechanisms, notably the UNIX uuencode facility,
will not. Note also that anything encoded in the Quoted-Printable
encoding will survive most gateways intact, but possibly not some
gateways to systems that use the EBCDIC character set.

(1) Under some circumstances the encoding used for data may
change as part of normal gateway or user agent
operation. In particular, conversion from base64 to
quoted-printable and vice versa may be necessary. This
may result in the confusion of CRLF sequences with line
breaks in text bodies. As such, the persistence of
CRLF as something other than a line break must not be
relied on.

(2) Many systems may elect to represent and store text data
using local newline conventions. Local newline
conventions may not match the RFC822 CRLF convention --
systems are known that use plain CR, plain LF, CRLF, or
counted records. The result is that isolated CR and LF
characters are not well tolerated in general; they may
be lost or converted to delimiters on some systems, and
hence must not be relied on.

So putting a raw CR in a binary chunk maybe be intolerable, and
you need to use a different encoder. But I'm out of my element.
 
N

Nils Bunger

Hi Neil,

Thanks for looking at this.

I'm trying to create a multipart MIME for an HTTP POST request, not an email. This is for a third-party API that requires a multipart POST with a binary file, so I don't have the option to just use a different encoding.

Multipart HTTP is standardized in HTTP 1.0 and supports binary parts. Also,no one will re-interpret contents of HTTP on the wire, as binary is quite normal in HTTP.

The issue seems to be some parts of the python MIME encoder still assume it's for email only, where everything would be b64 encoded.

Maybe I have to roll my own to create a multipart msg with a binary file? Iwas hoping to avoid that.

Nils

ps. You probably know this, but in case anyone else reads this thread, HTTPrequires all headers to have CRLF, not native line endings. The python MIME modules can do that properly as of python 3.2 (fixed as of this bug http://hg.python.org/cpython/rev/ebf6741a8d6e/)
 
N

Nils Bunger

Hi all,

I was able to workaround this problem by encoding a unique 'marker' in the binary part, then replacing the marker with the actual binary content after generating the MIME message.

See my answer on Stack Overflow http://stackoverflow.com/a/19033750/526098 for the code.

Thanks, your suggestions helped me think of this.

Nils
 
P

Piet van Oostrum

Nils Bunger said:
Hi Neil,

Thanks for looking at this.

I'm trying to create a multipart MIME for an HTTP POST request, not an
email. This is for a third-party API that requires a multipart POST
with a binary file, so I don't have the option to just use a different
encoding.

Multipart HTTP is standardized in HTTP 1.0 and supports binary parts.
Also, no one will re-interpret contents of HTTP on the wire, as binary
is quite normal in HTTP.

The issue seems to be some parts of the python MIME encoder still
assume it's for email only, where everything would be b64 encoded.

Maybe I have to roll my own to create a multipart msg with a binary
file? I was hoping to avoid that.

The email MIME stuff is not really adapted for HTTP. I would advise to
use the Requests package (http://docs.python-requests.org/en/latest/) or
the Uploading Files part from Doug Hellmann's page
(http://doughellmann.com/2009/07/pymotw-urllib2-library-for-opening-urls.html).
This is for Python2; I can send you a Python3 version if you want.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,767
Messages
2,569,572
Members
45,045
Latest member
DRCM

Latest Threads

Top