Extract zip file from email attachment

Discussion in 'Python' started by erikcw, Apr 6, 2007.

  1. erikcw

    erikcw Guest

    Hi all,

    I'm trying to extract zip file (containing an xml file) from an email
    so I can process it. But I'm running up against some brick walls.
    I've been googling and reading all afternoon, and can't seem to figure
    it out.

    Here is what I have so far.

    p = POP3("mail.server.com")
    print p.getwelcome()
    # authentication, etc.
    print p.user("USER")
    print p.pass_("PASS")
    print "This mailbox has %d messages, totaling %d bytes." % p.stat()
    msg_list = p.list()
    print msg_list
    if not msg_list[0].startswith('+OK'):
    # Handle error
    exit(1)

    for msg in msg_list[1]:
    msg_num, _ = msg.split()
    resp = p.retr(msg_num)
    if resp[0].startswith('+OK'):
    #print resp, '=======================\n'
    #extract message body and attachment.
    parsed_msg = email.message_from_string('\n'.join(resp[1]))
    payload= parsed_msg.get_payload(decode=True)
    print payload #doesn't seem to work
    else:
    pass# Deal with error retrieving message.

    How do I:
    a) retrieve the body of the email into a string so I can do some
    processing? (I can get at the header attributes without any trouble)
    b) retrieve the zip file attachment, and unzip into a string for xml
    processing?

    Thanks so much for your help!
    Erik
     
    erikcw, Apr 6, 2007
    #1
    1. Advertising

  2. erikcw

    hlubenow Guest

    erikcw wrote:

    > Hi all,
    >
    > I'm trying to extract zip file (containing an xml file) from an email
    > so I can process it. But I'm running up against some brick walls.
    > I've been googling and reading all afternoon, and can't seem to figure
    > it out.
    >
    > Here is what I have so far.
    >
    > p = POP3("mail.server.com")
    > print p.getwelcome()
    > # authentication, etc.
    > print p.user("USER")
    > print p.pass_("PASS")
    > print "This mailbox has %d messages, totaling %d bytes." % p.stat()
    > msg_list = p.list()
    > print msg_list
    > if not msg_list[0].startswith('+OK'):
    > # Handle error
    > exit(1)
    >
    > for msg in msg_list[1]:
    > msg_num, _ = msg.split()
    > resp = p.retr(msg_num)
    > if resp[0].startswith('+OK'):
    > #print resp, '=======================\n'
    > #extract message body and attachment.
    > parsed_msg = email.message_from_string('\n'.join(resp[1]))
    > payload= parsed_msg.get_payload(decode=True)
    > print payload #doesn't seem to work
    > else:
    > pass# Deal with error retrieving message.
    >
    > How do I:
    > a) retrieve the body of the email into a string so I can do some
    > processing? (I can get at the header attributes without any trouble)
    > b) retrieve the zip file attachment, and unzip into a string for xml
    > processing?
    >
    > Thanks so much for your help!
    > Erik


    Hi,

    some weeks ago I wrote some code to extract attachments from emails.
    It's not that long, so maybe it could be of help for you:

    -------------------------------------------

    #!/usr/bin/env python

    import poplib
    import email
    import os
    import sys
    import string

    #
    # attsave.py
    # Check emails at PROVIDER for attachments and save them to SAVEDIR.
    #

    PROVIDER = "pop.YourMailProvider.de"
    USER = "YourUserName"
    PASSWORD = "YourPassword"

    SAVEDIR = "/home/YourUserDirectory"


    def saveAttachment(mstring):

    filenames = []
    attachedcontents = []

    msg = email.message_from_string(mstring)

    for part in msg.walk():

    fn = part.get_filename()

    if fn <> None:
    filenames.append(fn)
    attachedcontents.append(part.get_payload())

    for i in range(len(filenames)):
    fp = file(SAVEDIR + "/" + filenames, "wb")
    fp.write(attachedcontents)
    print 'Found and saved attachment "' + filenames + '".'
    fp.close()

    try:
    client = poplib.POP3(PROVIDER)
    except:
    print "Error: Provider not found."
    sys.exit(1)

    client.user(USER)
    client.pass_(PASSWORD)

    anzahl_mails = len(client.list()[1])

    for i in range(anzahl_mails):
    lines = client.retr(i + 1)[1]
    mailstring = string.join(lines, "\n")
    saveAttachment(mailstring)

    client.quit()

    -------------------------------------------

    See you

    H.
     
    hlubenow, Apr 6, 2007
    #2
    1. Advertising

  3. erikcw

    erikcw Guest

    On Apr 5, 8:00 pm, hlubenow <> wrote:
    > erikcw wrote:
    > > Hi all,

    >
    > > I'm trying to extract zip file (containing an xml file) from an email
    > > so I can process it. But I'm running up against some brick walls.
    > > I've been googling and reading all afternoon, and can't seem to figure
    > > it out.

    >
    > > Here is what I have so far.

    >
    > > p = POP3("mail.server.com")
    > > print p.getwelcome()
    > > # authentication, etc.
    > > print p.user("USER")
    > > print p.pass_("PASS")
    > > print "This mailbox has %d messages, totaling %d bytes." % p.stat()
    > > msg_list = p.list()
    > > print msg_list
    > > if not msg_list[0].startswith('+OK'):
    > > # Handle error
    > > exit(1)

    >
    > > for msg in msg_list[1]:
    > > msg_num, _ = msg.split()
    > > resp = p.retr(msg_num)
    > > if resp[0].startswith('+OK'):
    > > #print resp, '=======================\n'
    > > #extract message body and attachment.
    > > parsed_msg = email.message_from_string('\n'.join(resp[1]))
    > > payload= parsed_msg.get_payload(decode=True)
    > > print payload #doesn't seem to work
    > > else:
    > > pass# Deal with error retrieving message.

    >
    > > How do I:
    > > a) retrieve the body of the email into a string so I can do some
    > > processing? (I can get at the header attributes without any trouble)
    > > b) retrieve the zip file attachment, and unzip into a string for xml
    > > processing?

    >
    > > Thanks so much for your help!
    > > Erik

    >
    > Hi,
    >
    > some weeks ago I wrote some code to extract attachments from emails.
    > It's not that long, so maybe it could be of help for you:
    >
    > -------------------------------------------
    >
    > #!/usr/bin/env python
    >
    > import poplib
    > import email
    > import os
    > import sys
    > import string
    >
    > #
    > # attsave.py
    > # Check emails at PROVIDER for attachments and save them to SAVEDIR.
    > #
    >
    > PROVIDER = "pop.YourMailProvider.de"
    > USER = "YourUserName"
    > PASSWORD = "YourPassword"
    >
    > SAVEDIR = "/home/YourUserDirectory"
    >
    > def saveAttachment(mstring):
    >
    > filenames = []
    > attachedcontents = []
    >
    > msg = email.message_from_string(mstring)
    >
    > for part in msg.walk():
    >
    > fn = part.get_filename()
    >
    > if fn <> None:
    > filenames.append(fn)
    > attachedcontents.append(part.get_payload())
    >
    > for i in range(len(filenames)):
    > fp = file(SAVEDIR + "/" + filenames, "wb")
    > fp.write(attachedcontents)
    > print 'Found and saved attachment "' + filenames + '".'
    > fp.close()
    >
    > try:
    > client = poplib.POP3(PROVIDER)
    > except:
    > print "Error: Provider not found."
    > sys.exit(1)
    >
    > client.user(USER)
    > client.pass_(PASSWORD)
    >
    > anzahl_mails = len(client.list()[1])
    >
    > for i in range(anzahl_mails):
    > lines = client.retr(i + 1)[1]
    > mailstring = string.join(lines, "\n")
    > saveAttachment(mailstring)
    >
    > client.quit()
    >
    > -------------------------------------------
    >
    > See you
    >
    > H.


    Thanks H!

    I'm now able to get the name of the zip file, and the contents (is it
    still encoded?).

    I now need to be able to unzip the zip file into a string and get the
    body of the email into a string.

    Here is my updated code:
    p = POP3("mail.**********.com")
    print p.getwelcome()
    # authentication, etc.
    print p.user("USER")
    print p.pass_("PASS")
    print "This mailbox has %d messages, totaling %d bytes." % p.stat()
    msg_list = p.list()
    print msg_list
    if not msg_list[0].startswith('+OK'):
    # Handle error in listings
    exit(1)

    for msg in msg_list[1]:
    msg_num, _ = msg.split()
    resp = p.retr(msg_num)
    if resp[0].startswith('+OK'):
    #print resp, '=======================\n'
    parsed_msg = email.message_from_string('\n'.join(resp[1]))
    for part in parsed_msg.walk():
    fn = part.get_filename()
    if fn <> None:
    fileObj = StringIO.StringIO()
    fileObj.write( part.get_payload() )
    #attachment = zlib.decompress(part.get_payload())
    #print zipfile.is_zipfile(fileObj)
    attachment = zipfile.ZipFile(fileObj)
    print fn, '\n', attachment
    payload= parsed_msg.get_payload(decode=True)
    print payload

    else:
    pass# Deal with error retrieving message.
    I get this error:
    Traceback (most recent call last):
    File "wa.py", line 208, in <module>
    attachment = zipfile.ZipFile(fileObj)
    File "/usr/lib/python2.5/zipfile.py", line 346, in __init__
    self._GetContents()
    File "/usr/lib/python2.5/zipfile.py", line 366, in _GetContents
    self._RealGetContents()
    File "/usr/lib/python2.5/zipfile.py", line 378, in _RealGetContents
    raise BadZipfile, "File is not a zip file"
    zipfile.BadZipfile: File is not a zip file

    Is the zip file still encoded? Or am I passing in the wrong arguments
    to the zipfile module?

    Thanks for your help!
    Erik
     
    erikcw, Apr 6, 2007
    #3
  4. erikcw wrote:


    > resp = p.retr(msg_num)
    > if resp[0].startswith('+OK'):


    You don't have to check this; errors are transformed into exceptions.

    > fileObj = StringIO.StringIO()


    cStringIO is faster

    > fileObj.write( part.get_payload() )


    You have to reset the file pointer to the beginning: fileObj.seek(0),
    else ZipFile will not be able to read the contents.

    --
    Gabriel Genellina
     
    Gabriel Genellina, Apr 6, 2007
    #4
  5. erikcw

    erikcw Guest

    On Apr 6, 12:51 am, "Gabriel Genellina" <>
    wrote:
    > erikcw wrote:
    > > resp = p.retr(msg_num)
    > > if resp[0].startswith('+OK'):

    >
    > You don't have to check this; errors are transformed into exceptions.
    >
    > > fileObj = StringIO.StringIO()

    >
    > cStringIO is faster
    >
    > > fileObj.write( part.get_payload() )

    >
    > You have to reset the file pointer to the beginning: fileObj.seek(0),
    > else ZipFile will not be able to read the contents.
    >
    > --
    > Gabriel Genellina


    Hi Gabriel,

    I added fileObj.seek(0) on the line directly after
    fileObj.write( part.get_payload() ) and I'm still getting the
    following error.

    Traceback (most recent call last):
    File "wa.py", line 209, in <module>
    attachment = zipfile.ZipFile(fileObj)
    File "/usr/lib/python2.5/zipfile.py", line 346, in __init__
    self._GetContents()
    File "/usr/lib/python2.5/zipfile.py", line 366, in _GetContents
    self._RealGetContents()
    File "/usr/lib/python2.5/zipfile.py", line 378, in _RealGetContents
    raise BadZipfile, "File is not a zip file"
    zipfile.BadZipfile: File is not a zip file

    Could the file like object still be encoded in MIME or something?

    Thanks!
    Erik
     
    erikcw, Apr 6, 2007
    #5
  6. erikcw

    Basilisk96 Guest

    >
    > Could the file like object still be encoded in MIME or something?
    >


    Yes it is. You don't need to seek(0).
    Try this:

    decoded = email.base64mime.decode(part.get_payload())
    fileObj.write(decoded)


    -Basilisk96
     
    Basilisk96, Apr 7, 2007
    #6
  7. Basilisk96 wrote:
    > >
    > > Could the file like object still be encoded in MIME or something?
    > >

    >
    > Yes it is. You don't need to seek(0).
    > Try this:
    >
    > decoded = email.base64mime.decode(part.get_payload())
    > fileObj.write(decoded)
    >
    >
    > -Basilisk96
     
    Gabriel Genellina, Apr 7, 2007
    #7
  8. Basilisk96 wrote:
    > >
    > > Could the file like object still be encoded in MIME or something?
    > >

    >
    > Yes it is. You don't need to seek(0).
    > Try this:
    >
    > decoded = email.base64mime.decode(part.get_payload())
    > fileObj.write(decoded)


    Or better:
    decoded = part.get_payload(decode=True)
    fileObj.write(decoded)
    fileObj.seek(0)
    zip = zipfile.ZipFile(fileObj)
    zip.printdir()
     
    Gabriel Genellina, Apr 7, 2007
    #8
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. abcd
    Replies:
    7
    Views:
    513
    Edward Elliott
    May 1, 2006
  2. majj81
    Replies:
    0
    Views:
    336
    majj81
    May 31, 2006
  3. Chris Lambacher

    Re: Add file to zip, or replace file in zip

    Chris Lambacher, Jun 1, 2006, in forum: Python
    Replies:
    0
    Views:
    610
    Chris Lambacher
    Jun 1, 2006
  4. James
    Replies:
    1
    Views:
    252
    Paul Lalli
    Aug 4, 2004
  5. ecureuil
    Replies:
    0
    Views:
    329
    ecureuil
    May 28, 2006
Loading...

Share This Page