Parsing email attachments: get_payload() produces unsaveable data

Discussion in 'Python' started by dpapathanasiou, Oct 4, 2009.

  1. I'm using python to access an email account via POP, then for each
    incoming message, save any attachments.

    This is the function which scans the message for attachments:

    def save_attachments (local_folder, msg_text):
    """Scan the email message text and save the attachments (if any)
    in the local_folder"""
    if msg_text:
    for part in email.message_from_string(msg_text).walk():
    if part.is_multipart() or part.get_content_maintype() ==
    'text':
    continue
    filename = part.get_filename(None)
    if filename:
    filedata = part.get_payload(decode=True)
    if filedata:
    write_file(local_folder, filename, filedata)

    All the way up to write_file(), it's working correctly.

    The filename variable matches the name of the attached file, and the
    filedata variable contains binary data corresponding to the file's
    contents.

    When I try to write the filedata to a file system folder, though, I
    get an AttributeError in the stack trace.

    Here is my write_file() function:

    def write_file (folder, filename, f, chunk_size=4096):
    """Write the the file data f to the folder and filename
    combination"""
    result = False
    if confirm_folder(folder):
    try:
    file_obj = open(os.path.join(folder, file_base_name
    (filename)), 'wb', chunk_size)
    for file_chunk in read_buffer(f, chunk_size):
    file_obj.write(file_chunk)
    file_obj.close()
    result = True
    except (IOError):
    print "file_utils.write_file: could not write '%s' to
    '%s'" % (file_base_name(filename), folder)
    return result

    I also tried applying this regex:

    filedata = re.sub(r'\r(?!=\n)', '\r\n', filedata) # Bare \r becomes \r
    \n

    after reading this post (http://stackoverflow.com/questions/787739/
    python-email-getpayload-decode-fails-when-hitting-equal-sign), but it
    hasn't resolved the problem.

    Is there any way of correcting the output of get_payload() so I can
    save it to a file?
     
    dpapathanasiou, Oct 4, 2009
    #1
    1. Advertising

  2. On Sun, 2009-10-04 at 07:27 -0700, dpapathanasiou wrote:
    > When I try to write the filedata to a file system folder, though, I
    > get an AttributeError in the stack trace.


    And where might we be able to see that stack trace?

    -a
     
    Albert Hopkins, Oct 4, 2009
    #2
    1. Advertising


  3. > And where might we be able to see that stack trace?


    This is it:

    Exception: ('AttributeError', '<no args>', [' File "/opt/server/smtp/
    smtps.py", line 213, in handle\n e
    mail_replier.post_reply(recipient_mbox, \'\'.join(data))\n', ' File "/
    opt/server/smtp/email_replier.py", l
    ine 108, in post_reply\n save_attachments(result[2], msg_text)\n',
    ' File "/opt/server/smtp/email_repli
    er.py", line 79, in save_attachments\n data_manager.upload_file
    (item_id, filename, filedata)\n', ' File
    "../db/data_manager.py", line 697, in upload_file\n if
    docs_db.save_file(item_id, file_name, file_data)
    :\n', ' File "../db/docs_db.py", line 102, in save_file\n result =
    file_utils.write_file(saved_file_pat
    h, saved_file_name + saved_file_ext, file_data)\n'])

    If you're wondering, I'm using this to capture the exception:

    def formatExceptionInfo(maxTBlevel=5):
    """For displaying exception information"""
    cla, exc, trbk = sys.exc_info()
    excName = cla.__name__
    try:
    excArgs = exc.__dict__["args"]
    except KeyError:
    excArgs = "<no args>"
    excTb = traceback.format_tb(trbk, maxTBlevel)
    return (excName, excArgs, excTb)
     
    dpapathanasiou, Oct 4, 2009
    #3
  4. On Sun, 2009-10-04 at 08:16 -0700, dpapathanasiou wrote:
    > > And where might we be able to see that stack trace?

    >
    > This is it:
    >
    > Exception: ('AttributeError', '<no args>', [' File "/opt/server/smtp/
    > smtps.py", line 213, in handle\n e
    > mail_replier.post_reply(recipient_mbox, \'\'.join(data))\n', ' File "/
    > opt/server/smtp/email_replier.py", l
    > ine 108, in post_reply\n save_attachments(result[2], msg_text)\n',
    > ' File "/opt/server/smtp/email_repli
    > er.py", line 79, in save_attachments\n data_manager.upload_file
    > (item_id, filename, filedata)\n', ' File
    > "../db/data_manager.py", line 697, in upload_file\n if
    > docs_db.save_file(item_id, file_name, file_data)
    > :\n', ' File "../db/docs_db.py", line 102, in save_file\n result =
    > file_utils.write_file(saved_file_pat
    > h, saved_file_name + saved_file_ext, file_data)\n'])
    >
    > If you're wondering, I'm using this to capture the exception:
    >
    > def formatExceptionInfo(maxTBlevel=5):
    > """For displaying exception information"""
    > cla, exc, trbk = sys.exc_info()
    > excName = cla.__name__
    > try:
    > excArgs = exc.__dict__["args"]
    > except KeyError:
    > excArgs = "<no args>"
    > excTb = traceback.format_tb(trbk, maxTBlevel)
    > return (excName, excArgs, excTb)
    >


    Which is *really* difficult (for me) to read. Any chance of providing a
    "normal" traceback?
     
    Albert Hopkins, Oct 4, 2009
    #4

  5. > Which is *really* difficult (for me) to read.  Any chance of providing a
    > "normal" traceback?


    File "/opt/server/smtp/smtps.py", line 213, in handle
    email_replier.post_reply(recipient_mbox, ''.join(data))
    File "/opt/server/smtp/email_replier.py", line 108, in post_reply
    save_attachments(result[2], msg_text)
    File "/opt/server/smtp/email_replier.py", line 79, in
    save_attachments
    data_manager.upload_file(item_id, filename, filedata)
    File "../db/data_manager.py", line 697, in upload_file
    if docs_db.save_file(item_id, file_name, file_data):
    File "../db/docs_db.py", line 102, in save_file
    result = file_utils.write_file(saved_file_path, saved_file_name +
    saved_file_ext, file_data)

    AttributeError
     
    dpapathanasiou, Oct 4, 2009
    #5
  6. On Sun, 2009-10-04 at 09:17 -0700, dpapathanasiou wrote:
    > > Which is *really* difficult (for me) to read. Any chance of providing a
    > > "normal" traceback?

    >
    > File "/opt/server/smtp/smtps.py", line 213, in handle
    > email_replier.post_reply(recipient_mbox, ''.join(data))
    > File "/opt/server/smtp/email_replier.py", line 108, in post_reply
    > save_attachments(result[2], msg_text)
    > File "/opt/server/smtp/email_replier.py", line 79, in
    > save_attachments
    > data_manager.upload_file(item_id, filename, filedata)
    > File "../db/data_manager.py", line 697, in upload_file
    > if docs_db.save_file(item_id, file_name, file_data):
    > File "../db/docs_db.py", line 102, in save_file
    > result = file_utils.write_file(saved_file_path, saved_file_name +
    > saved_file_ext, file_data)
    >
    > AttributeError


    Are you sure this is the complete traceback? Usually an AttributeError
    returns a text message such as:

    AttributeError: foo has no such attribute bar

    Also, the traceback says the exception happened in "save_file", but the
    code you posted was a function called "save_attachments" and the
    function call is different.

    Would be nice if we could get the full traceback with the exact matching
    code. Otherwise we have to make guesses. But I've given up. Perhaps
    someone else is better off helping you.

    -a
     
    Albert Hopkins, Oct 4, 2009
    #6
  7. On Oct 4, 10:27 am, dpapathanasiou <>
    wrote:
    > I'm using python to access an email account via POP, then for each
    > incoming message, save any attachments.
    >
    > This is the function which scans the message for attachments:
    >
    > def save_attachments (local_folder, msg_text):
    >     """Scan the email message text and save the attachments (if any)
    > in the local_folder"""
    >     if msg_text:
    >         for part in email.message_from_string(msg_text).walk():
    >             if part.is_multipart() or part.get_content_maintype() ==
    > 'text':
    >                 continue
    >             filename = part.get_filename(None)
    >             if filename:
    >                 filedata = part.get_payload(decode=True)
    >                 if filedata:
    >                     write_file(local_folder, filename, filedata)
    >
    > All the way up to write_file(), it's working correctly.
    >
    > The filename variable matches the name of the attached file, and the
    > filedata variable contains binary data corresponding to the file's
    > contents.
    >
    > When I try to write the filedata to a file system folder, though, I
    > get an AttributeError in the stack trace.
    >
    > Here is my write_file() function:
    >
    > def write_file (folder, filename, f, chunk_size=4096):
    >     """Write the the file data f to the folder and filename
    > combination"""
    >     result = False
    >     if confirm_folder(folder):
    >         try:
    >             file_obj = open(os.path.join(folder, file_base_name
    > (filename)), 'wb', chunk_size)
    >             for file_chunk in read_buffer(f, chunk_size):
    >                 file_obj.write(file_chunk)
    >             file_obj.close()
    >             result = True
    >         except (IOError):
    >             print "file_utils.write_file: could not write '%s' to
    > '%s'" % (file_base_name(filename), folder)
    >     return result
    >
    > I also tried applying this regex:
    >
    > filedata = re.sub(r'\r(?!=\n)', '\r\n', filedata) # Bare \r becomes \r
    > \n
    >
    > after reading this post (http://stackoverflow.com/questions/787739/
    > python-email-getpayload-decode-fails-when-hitting-equal-sign), but it
    > hasn't resolved the problem.
    >
    > Is there any way of correcting the output of get_payload() so I can
    > save it to a file?


    An update for the record (and in case anyone else also has this
    problem):

    The regex suggested in the StackOverflow post (i.e., filedata = re.sub
    (r'\r(?!=\n)', '\r\n', filedata) # Bare \r becomes \r\n) is necessary
    but not sufficient.

    It turns out that because get_payload() returns a binary stream, the
    right way to save those bytes to a file is to use a function like
    this:

    def write_binary_file (folder, filename, filedata):
    """Write the binary file data to the folder and filename
    combination"""
    result = False
    if confirm_folder(folder):
    try:
    file_obj = open(os.path.join(folder, file_base_name
    (filename)), 'wb')
    file_obj.write(filedata)
    file_obj.close()
    result = True
    except (IOError):
    print "file_utils.write_file: could not write '%s' to
    '%s'" % (file_base_name(filename), folder)
    return result

    I.e., filedata, the output of get_payload(), can be written all at
    once, w/o reading and writing in 4k chunks.
     
    dpapathanasiou, Oct 14, 2009
    #7
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. chuck amadi
    Replies:
    1
    Views:
    383
    Piet van Oostrum
    Jun 23, 2004
  2. chuck amadi
    Replies:
    0
    Views:
    356
    chuck amadi
    Jun 22, 2004
  3. Chuck Amadi
    Replies:
    0
    Views:
    533
    Chuck Amadi
    Jun 22, 2004
  4. Replies:
    2
    Views:
    1,234
    Larry Bates
    Oct 14, 2005
  5. lrotger
    Replies:
    0
    Views:
    467
    lrotger
    Jul 3, 2006
Loading...

Share This Page