Parsing email attachments: get_payload() produces unsaveable data

D

dpapathanasiou

I'm using python to access an email account via POP, then for each
incoming message, save any attachments.

This is the function which scans the message for attachments:

def save_attachments (local_folder, msg_text):
"""Scan the email message text and save the attachments (if any)
in the local_folder"""
if msg_text:
for part in email.message_from_string(msg_text).walk():
if part.is_multipart() or part.get_content_maintype() ==
'text':
continue
filename = part.get_filename(None)
if filename:
filedata = part.get_payload(decode=True)
if filedata:
write_file(local_folder, filename, filedata)

All the way up to write_file(), it's working correctly.

The filename variable matches the name of the attached file, and the
filedata variable contains binary data corresponding to the file's
contents.

When I try to write the filedata to a file system folder, though, I
get an AttributeError in the stack trace.

Here is my write_file() function:

def write_file (folder, filename, f, chunk_size=4096):
"""Write the the file data f to the folder and filename
combination"""
result = False
if confirm_folder(folder):
try:
file_obj = open(os.path.join(folder, file_base_name
(filename)), 'wb', chunk_size)
for file_chunk in read_buffer(f, chunk_size):
file_obj.write(file_chunk)
file_obj.close()
result = True
except (IOError):
print "file_utils.write_file: could not write '%s' to
'%s'" % (file_base_name(filename), folder)
return result

I also tried applying this regex:

filedata = re.sub(r'\r(?!=\n)', '\r\n', filedata) # Bare \r becomes \r
\n

after reading this post (http://stackoverflow.com/questions/787739/
python-email-getpayload-decode-fails-when-hitting-equal-sign), but it
hasn't resolved the problem.

Is there any way of correcting the output of get_payload() so I can
save it to a file?
 
A

Albert Hopkins

When I try to write the filedata to a file system folder, though, I
get an AttributeError in the stack trace.

And where might we be able to see that stack trace?

-a
 
D

dpapathanasiou

And where might we be able to see that stack trace?

This is it:

Exception: ('AttributeError', '<no args>', [' File "/opt/server/smtp/
smtps.py", line 213, in handle\n e
mail_replier.post_reply(recipient_mbox, \'\'.join(data))\n', ' File "/
opt/server/smtp/email_replier.py", l
ine 108, in post_reply\n save_attachments(result[2], msg_text)\n',
' File "/opt/server/smtp/email_repli
er.py", line 79, in save_attachments\n data_manager.upload_file
(item_id, filename, filedata)\n', ' File
"../db/data_manager.py", line 697, in upload_file\n if
docs_db.save_file(item_id, file_name, file_data)
:\n', ' File "../db/docs_db.py", line 102, in save_file\n result =
file_utils.write_file(saved_file_pat
h, saved_file_name + saved_file_ext, file_data)\n'])

If you're wondering, I'm using this to capture the exception:

def formatExceptionInfo(maxTBlevel=5):
"""For displaying exception information"""
cla, exc, trbk = sys.exc_info()
excName = cla.__name__
try:
excArgs = exc.__dict__["args"]
except KeyError:
excArgs = "<no args>"
excTb = traceback.format_tb(trbk, maxTBlevel)
return (excName, excArgs, excTb)
 
A

Albert Hopkins

And where might we be able to see that stack trace?

This is it:

Exception: ('AttributeError', '<no args>', [' File "/opt/server/smtp/
smtps.py", line 213, in handle\n e
mail_replier.post_reply(recipient_mbox, \'\'.join(data))\n', ' File "/
opt/server/smtp/email_replier.py", l
ine 108, in post_reply\n save_attachments(result[2], msg_text)\n',
' File "/opt/server/smtp/email_repli
er.py", line 79, in save_attachments\n data_manager.upload_file
(item_id, filename, filedata)\n', ' File
"../db/data_manager.py", line 697, in upload_file\n if
docs_db.save_file(item_id, file_name, file_data)
:\n', ' File "../db/docs_db.py", line 102, in save_file\n result =
file_utils.write_file(saved_file_pat
h, saved_file_name + saved_file_ext, file_data)\n'])

If you're wondering, I'm using this to capture the exception:

def formatExceptionInfo(maxTBlevel=5):
"""For displaying exception information"""
cla, exc, trbk = sys.exc_info()
excName = cla.__name__
try:
excArgs = exc.__dict__["args"]
except KeyError:
excArgs = "<no args>"
excTb = traceback.format_tb(trbk, maxTBlevel)
return (excName, excArgs, excTb)

Which is *really* difficult (for me) to read. Any chance of providing a
"normal" traceback?
 
D

dpapathanasiou

Which is *really* difficult (for me) to read.  Any chance of providing a
"normal" traceback?

File "/opt/server/smtp/smtps.py", line 213, in handle
email_replier.post_reply(recipient_mbox, ''.join(data))
File "/opt/server/smtp/email_replier.py", line 108, in post_reply
save_attachments(result[2], msg_text)
File "/opt/server/smtp/email_replier.py", line 79, in
save_attachments
data_manager.upload_file(item_id, filename, filedata)
File "../db/data_manager.py", line 697, in upload_file
if docs_db.save_file(item_id, file_name, file_data):
File "../db/docs_db.py", line 102, in save_file
result = file_utils.write_file(saved_file_path, saved_file_name +
saved_file_ext, file_data)

AttributeError
 
A

Albert Hopkins

Which is *really* difficult (for me) to read. Any chance of providing a
"normal" traceback?

File "/opt/server/smtp/smtps.py", line 213, in handle
email_replier.post_reply(recipient_mbox, ''.join(data))
File "/opt/server/smtp/email_replier.py", line 108, in post_reply
save_attachments(result[2], msg_text)
File "/opt/server/smtp/email_replier.py", line 79, in
save_attachments
data_manager.upload_file(item_id, filename, filedata)
File "../db/data_manager.py", line 697, in upload_file
if docs_db.save_file(item_id, file_name, file_data):
File "../db/docs_db.py", line 102, in save_file
result = file_utils.write_file(saved_file_path, saved_file_name +
saved_file_ext, file_data)

AttributeError

Are you sure this is the complete traceback? Usually an AttributeError
returns a text message such as:

AttributeError: foo has no such attribute bar

Also, the traceback says the exception happened in "save_file", but the
code you posted was a function called "save_attachments" and the
function call is different.

Would be nice if we could get the full traceback with the exact matching
code. Otherwise we have to make guesses. But I've given up. Perhaps
someone else is better off helping you.

-a
 
D

dpapathanasiou

I'm using python to access an email account via POP, then for each
incoming message, save any attachments.

This is the function which scans the message for attachments:

def save_attachments (local_folder, msg_text):
    """Scan the email message text and save the attachments (if any)
in the local_folder"""
    if msg_text:
        for part in email.message_from_string(msg_text).walk():
            if part.is_multipart() or part.get_content_maintype() ==
'text':
                continue
            filename = part.get_filename(None)
            if filename:
                filedata = part.get_payload(decode=True)
                if filedata:
                    write_file(local_folder, filename, filedata)

All the way up to write_file(), it's working correctly.

The filename variable matches the name of the attached file, and the
filedata variable contains binary data corresponding to the file's
contents.

When I try to write the filedata to a file system folder, though, I
get an AttributeError in the stack trace.

Here is my write_file() function:

def write_file (folder, filename, f, chunk_size=4096):
    """Write the the file data f to the folder and filename
combination"""
    result = False
    if confirm_folder(folder):
        try:
            file_obj = open(os.path.join(folder, file_base_name
(filename)), 'wb', chunk_size)
            for file_chunk in read_buffer(f, chunk_size):
                file_obj.write(file_chunk)
            file_obj.close()
            result = True
        except (IOError):
            print "file_utils.write_file: could not write '%s' to
'%s'" % (file_base_name(filename), folder)
    return result

I also tried applying this regex:

filedata = re.sub(r'\r(?!=\n)', '\r\n', filedata) # Bare \r becomes \r
\n

after reading this post (http://stackoverflow.com/questions/787739/
python-email-getpayload-decode-fails-when-hitting-equal-sign), but it
hasn't resolved the problem.

Is there any way of correcting the output of get_payload() so I can
save it to a file?

An update for the record (and in case anyone else also has this
problem):

The regex suggested in the StackOverflow post (i.e., filedata = re.sub
(r'\r(?!=\n)', '\r\n', filedata) # Bare \r becomes \r\n) is necessary
but not sufficient.

It turns out that because get_payload() returns a binary stream, the
right way to save those bytes to a file is to use a function like
this:

def write_binary_file (folder, filename, filedata):
"""Write the binary file data to the folder and filename
combination"""
result = False
if confirm_folder(folder):
try:
file_obj = open(os.path.join(folder, file_base_name
(filename)), 'wb')
file_obj.write(filedata)
file_obj.close()
result = True
except (IOError):
print "file_utils.write_file: could not write '%s' to
'%s'" % (file_base_name(filename), folder)
return result

I.e., filedata, the output of get_payload(), can be written all at
once, w/o reading and writing in 4k chunks.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,482
Members
44,900
Latest member
Nell636132

Latest Threads

Top