simple script to read and parse mailbox

Discussion in 'Python' started by chuck amadi, Jun 5, 2004.

  1. chuck amadi

    chuck amadi Guest

    Hi , Im trying to parse a specific users mailbox (testwwws) and output
    the body of the messages to a file ,that file will then be loaded into a
    PostGresql DB at some point .

    I have read the email posts and been advised to use the email Module
    and mailbox Module.

    The blurb from a memeber of this list . Im not at work at the moment So
    I cant test this out , but if someone could take a look and check that
    im on the write track as this Monday I need to show my Boss and get the
    body data out of the user's mailbox.

    **Blurb form a member who's directed me**

    Thus started with the mailbox and email modules. Mailbox lets you iterate over a
    mailbox yielding individual messages of type email. The e-mail object lets
    you parse and operate on the message components. From there you should be
    able to extract your data.




    ## The email messages is read as flat text form a file or other source,
    ##the text is parsed to produce the object structure of the email message.
    #!/usr/bon/env python
    import mboxutils
    import mailbox
    import email
    import sys
    import os
    import rfc822
    import StringIO
    import email.Parser
    import types

    # email package for managing email messages
    # Open Users Mailbox
    # class Message()
    #mbox = mailbox.UnixMailbox(open("/var/spool/mail/chucka"))

    def main():

    # The Directory that will contain the Survey Results

    dir = "/tmp/SurveyResults/"

    # The Web Survey User Inbox
    # Mailbox /home/testwwws/Mail/inbox

    maildir = "/home/testwwws/Mail/inbox"
    for file in os.listdir(maildir):

    print os.path.join(maildir, file)

    fp = open(os.path.join(maildir, file), "rb")
    p = email.Parser.Parser()
    msg = p.parse(fp)
    fp.close()
    #print msg.get("From")
    #print msg.get("Content-Type")

    counter = 1
    for part in msg.walk():
    if part.get_main_type() == 'multipart':
    continue

    filename = part.get_param("name")
    if filename==None:
    filename = "part-%i" % counter
    counter += 1


    fp = open(os.path.join(dir, filename), 'wb')
    print os.path.join(dir, filename)
    fp.write(part.get_payload(decode=1))
    fp.close()


    if __name__ == '__main__':
    main()

    Cheers all this list has been very helpful.
     
    chuck amadi, Jun 5, 2004
    #1
    1. Advertising

  2. chuck amadi

    fishboy Guest

    On Sat, 05 Jun 2004 15:27:36 +0100, chuck amadi
    <> wrote:

    >Hi , Im trying to parse a specific users mailbox (testwwws) and output
    >the body of the messages to a file ,that file will then be loaded into a
    >PostGresql DB at some point .
    >
    >I have read the email posts and been advised to use the email Module
    >and mailbox Module.
    >
    >The blurb from a memeber of this list . Im not at work at the moment So
    >I cant test this out , but if someone could take a look and check that
    >im on the write track as this Monday I need to show my Boss and get the
    >body data out of the user's mailbox.
    >
    >**Blurb form a member who's directed me**
    >
    >Thus started with the mailbox and email modules. Mailbox lets you iterate over a
    >mailbox yielding individual messages of type email. The e-mail object lets
    >you parse and operate on the message components. From there you should be
    >able to extract your data.
    >


    Hi again Chuck,

    I've been reading a few of your posts and I'm wondering. Are the
    emails that you're parsing have binary attachments, like pictures and
    stuff, or are you just trying to get the text of the body?

    Or is it a little of both? It looks like you're expecting emails with
    multiple binary attachments.

    Other than that it looks good. You can access the header fields
    directly, like:

    print msg['From']

    Save you a little typing.

    ><{{{*>
     
    fishboy, Jun 6, 2004
    #2
    1. Advertising

  3. chuck amadi

    chuck amadi Guest

    fishboy wrote:

    >On Sat, 05 Jun 2004 15:27:36 +0100, chuck amadi
    ><> wrote:
    >
    >
    >
    >>Hi , Im trying to parse a specific users mailbox (testwwws) and output
    >>the body of the messages to a file ,that file will then be loaded into a
    >>PostGresql DB at some point .
    >>
    >>I have read the email posts and been advised to use the email Module
    >>and mailbox Module.
    >>
    >>The blurb from a memeber of this list . Im not at work at the moment So
    >>I cant test this out , but if someone could take a look and check that
    >>im on the write track as this Monday I need to show my Boss and get the
    >>body data out of the user's mailbox.
    >>
    >>**Blurb form a member who's directed me**
    >>
    >>Thus started with the mailbox and email modules. Mailbox lets you iterate over a
    >>mailbox yielding individual messages of type email. The e-mail object lets
    >>you parse and operate on the message components. From there you should be
    >>able to extract your data.
    >>
    >>
    >>

    >
    >Hi again Chuck,
    >
    >I've been reading a few of your posts and I'm wondering. Are the
    >emails that you're parsing have binary attachments, like pictures and
    >stuff, or are you just trying to get the text of the body?
    >
    >Or is it a little of both? It looks like you're expecting emails with
    >multiple binary attachments.
    >
    >Other than that it looks good. You can access the header fields
    >directly, like:
    >
    >print msg['From']
    >
    >Save you a little typing.
    >
    >
    >
    >><{{{*>
    >>
    >>

    >
    >
    >

    Just Trying to get the body of the messages .

    I have built and developed a dtml zope web form that encapsulate the
    survey data.
    Thus the created user testwwws mail box with have the results that I
    must parse and process to a file that I can then use to populate a
    database .

    Cheers Chuck
     
    chuck amadi, Jun 6, 2004
    #3
  4. chuck amadi

    chuck amadi Guest

    fishboy wrote:

    >On Sat, 05 Jun 2004 15:27:36 +0100, chuck amadi
    ><> wrote:
    >
    >
    >
    >>Hi , Im trying to parse a specific users mailbox (testwwws) and output
    >>the body of the messages to a file ,that file will then be loaded into a
    >>PostGresql DB at some point .
    >>
    >>I have read the email posts and been advised to use the email Module
    >>and mailbox Module.
    >>
    >>The blurb from a memeber of this list . Im not at work at the moment So
    >>I cant test this out , but if someone could take a look and check that
    >>im on the write track as this Monday I need to show my Boss and get the
    >>body data out of the user's mailbox.
    >>
    >>**Blurb form a member who's directed me**
    >>
    >>Thus started with the mailbox and email modules. Mailbox lets you iterate over a
    >>mailbox yielding individual messages of type email. The e-mail object lets
    >>you parse and operate on the message components. From there you should be
    >>able to extract your data.
    >>
    >>
    >>

    >
    >Hi again Chuck,
    >
    >I've been reading a few of your posts and I'm wondering. Are the
    >emails that you're parsing have binary attachments, like pictures and
    >stuff, or are you just trying to get the text of the body?
    >
    >Or is it a little of both? It looks like you're expecting emails with
    >multiple binary attachments.
    >
    >Other than that it looks good. You can access the header fields
    >directly, like:
    >
    >print msg['From']
    >
    >Save you a little typing.
    >
    >
    >
    >><{{{*>
    >>
    >>

    >
    >
    >

    Well I did hack most of the code . I was trying using the mboxutils
    module but I could only get the headers . I assume form this script I
    can get the text of the body . The reason I haven't tested is while at
    work I started the write (Oops Hack ) the script then emailed it home .
    Because I use pop3 account I onlt have a /var/spool/mail/Chucka not as
    in work /home/User/Mail/inbox that I usuaslly scan to view data in inbox.

    So please re-affirm that my hack script will be able to parse the text
    of the body ( No attachments of binaries will exist within the email
    messages.

    Cheers for you help.

    print msg['Body']

    I just need the text of the body. But from your psi I can -
     
    chuck amadi, Jun 6, 2004
    #4
  5. chuck amadi

    fishboy Guest

    On Sun, 06 Jun 2004 11:15:22 +0100, chuck amadi
    <> wrote:

    >>

    >Well I did hack most of the code . I was trying using the mboxutils
    >module but I could only get the headers . I assume form this script I
    >can get the text of the body . The reason I haven't tested is while at
    >work I started the write (Oops Hack ) the script then emailed it home .
    >Because I use pop3 account I onlt have a /var/spool/mail/Chucka not as
    >in work /home/User/Mail/inbox that I usuaslly scan to view data in inbox.
    >
    >So please re-affirm that my hack script will be able to parse the text
    >of the body ( No attachments of binaries will exist within the email
    >messages.
    >
    >Cheers for you help.
    >
    >print msg['Body']
    >
    >I just need the text of the body. But from your psi I can -
    >


    Ah, the problem is far too simple for our complicated minds.
    just do:
    body = msg.get_payload()
    That will give you the plain text message body of an email

    get_payload(decode=True) is for binary stuff (or maybe unicode, maybe)
    all that get_content_type(),get_param() stuff can be ignored if you're
    just doing plain text
    The script you are adapting is for multiple binary (like pictures)
    attachments

    So, looking at the doc page for mailbox there's an interesting code
    fragment:

    import email
    import mailbox
    mbox = mailbox.UnixMailbox(fp, email.message_from_file)

    So if you emails are all plain/text you could just write:

    import email
    import mailbox
    fp = open("/var/spool/mail/chucka")
    mbox = mailbox.UnixMailbox(fp, email.message_from_file)
    bodies = []
    for msg in mbox:
    body = msg.get_payload()
    bodies.append(body)

    Which will leave you with a list of strings, each one a message body.

    msg = email.message_from_file(fileobj) does the same thing as

    p = email.Parser.Parser()
    msg = p.parse(fileobj)

    it's just a short cut
    As is passing Unixmailbox email.message_from_file as a handler

    You could also do

    fp = open("/var/spool/mail/chucka")
    mbox = mailbox.UnixMailbox(fp) # no handler
    for mail in mbox:
    msg = email.message_from_file(mail) # handle here
    body = msg.get_payload()


    Hth,
    ><{{{*>
     
    fishboy, Jun 7, 2004
    #5
  6. chuck amadi

    Chuck Amadi Guest

    Hi all exspecailly fishboy here's the script I'm just waiting to get
    confirmation where im going to run the script form.

    I have added a output =('/tmp/SurveyResults','w+a') which I believe will
    process the body messages data to this file for future work ie database
    loading.

    Also that using I can add 'a' opens the file # for appending any data written
    to the file is automatically added to the end.Is this logical .I have tried to
    # comments to aid my learning process So bear with me.


    chuck@sevenofnine:~/pythonScript> cat getSurveyMail.py
    ###############################################################
    ## This script will open and parse email messages body content.
    ## This Python script will reside on Mail Server on ds9:
    ## Emails are all plain/text you could just write the following
    ## Which will leave a list of strings , each one a message body.
    ## The Survey User is testwws and the .procmailrc file folder is
    ## Survey . i.e /home/testwws/Mail/inbox/Survey .
    ###############################################################
    ## file:getSurveyMail.py Created : 06/06/04 Amended date: 07/06/04
    ###############################################################

    #The following line makes it run itself(executable script on UN*X)
    #!/usr/bin/env python

    import sys
    import os
    import email
    import mailbox

    # Open the testwws user mailbox (tmp user chuck)
    # fp denotes factory paremeter

    output =('/tmp/SurveyResults','w+a')
    fp = open("/var/spool/mail/chuck")

    #fp = open("/var/spool/mail/testwws")

    # message_from_file returns a message object struct tree from an
    # open file object.

    mbox = mailbox.UnixMailbox(fp, email.message_from_file)
    # list of body messages.
    bodies = []

    # for loop iterates through the msg in the mbox(mailbox).
    # Subparts of messages can be accessed via the -
    # get_payload() method will return a string object.
    # If it is multipart, use the "walk" method to iterate through each part and
    the
    # get the payload.In our case it's not multipart So ignore.
    # for part in msg.walk():
    # msg = part.get_payload()
    # # do something(print)

    for msg in mbox:
    body = msg.get_payload()
    bodies.append(body)
    # Print to screen for testing purposes.
    # print the bodies list of the messages.
    print bodies
    chuck@sevenofnine:~/pythonScript> vi getSurveyMail.py
    chuck@sevenofnine:~/pythonScript> python getSurveyMail.py
    []

    The last line I assume would list all the body messages within the bodies list []/

    Cheers for all your help list.
     
    Chuck Amadi, Jun 7, 2004
    #6
  7. chuck amadi

    Chuck Amadi Guest

    Sorry to bovver you again (again) here's script.

    I still can't see why the get_payload() doesn't produce
    the plain text message body of an emails in the testwwws users mailbox.
    AS you can see I have tried a few things but no joy what am I missing.

    Cheers

    Chuck

    ds9:[pythonScriptMail] % cat getSurveyMail.py
    ###############################################################
    ## This script will open and parse email messages body content.
    ## This Python script will reside on Mail Server on ds9:
    ## Emails are all plain/text you could just write the following
    ## Which will leave a list of strings , each one a message body.
    ## The Survey User is testwws and the .procmailrc file folder is
    ## Survey . i.e /home/testwws/Mail/inbox/Survey .
    ###############################################################
    ## file:getSurveyMail.py Created : 06/06/04 Amended date: 07/06/04
    ###############################################################

    #The following line makes it run itself(executable script on UN*X)
    #!/usr/bin/env python

    import sys
    import os
    import email
    import mailbox

    # Open the testwws user mailbox (tmp user chuck)
    # fp denotes factory paraemeter
    # mode can be 'r' when the file will only be read, 'w' for only writing
    #(an existing file with the same name will be erased), and 'a' opens the file
    # for appending; any data written to the file is automatically added to the
    end.
    # 'r+' opens the file for both reading and writing. The mode.
    output =("/tmp/SurveyResults", "w+a")
    #output =('/tmp/SurveyResults','w')

    # open() returns a file object, and is most commonly used with two arguments:
    # "open(filename, mode)".
    # /home/testwwws/Mail/work
    #
    #fp The file or file-like object passed at instantiation time. This can be
    used to read the message content.
    fp = open("/var/spool/mail/testwwws")

    #fp = open("/home/testwwws/Mail/work")

    # message_from_file returns a message object struct tree from an
    # open file object.

    mbox = mailbox.UnixMailbox(fp, email.message_from_file)
    # list of body messages.
    bodies = []

    msg = email.message_from_file(fp)
    # for loop iterates through the msg in the mbox(mailbox).
    # Subparts of messages can be accessed via the -
    # get_payload() method will return a string object.
    # If it is multipart, use the "walk" method to iterate through each part and
    the
    # get the payload.In our case it's not multipart So ignore.
    # for part in msg.walk():
    # msg = part.get_payload()
    # # do something(print)

    for msg in mbox:
    body = msg.get_payload()
    bodies.append(body)
    # output.close() to close it and free up any system resources taken up by the
    open file.
    # After calling output.close(), attempts to use the file object will
    automatically fail.
    #print bodies
    print fp
    print msg
    print msg['body']
    # print body - NameError: name 'msg' is not defined
    #
    #print >> output,bodies
    #output.close()
    #print the bodies list of the messages.
    print bodies
     
    Chuck Amadi, Jun 7, 2004
    #7
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Alex
    Replies:
    5
    Views:
    582
    Kevin Altis
    Aug 25, 2003
  2. Chuck Amadi
    Replies:
    4
    Views:
    604
    William Park
    Jun 6, 2004
  3. Chuck Amadi

    simple script to read and parse mailbox

    Chuck Amadi, Jun 7, 2004, in forum: Python
    Replies:
    0
    Views:
    301
    Chuck Amadi
    Jun 7, 2004
  4. Chuck Amadi

    simple script to read and parse mailbox

    Chuck Amadi, Jun 7, 2004, in forum: Python
    Replies:
    0
    Views:
    607
    Chuck Amadi
    Jun 7, 2004
  5. Chuck Amadi
    Replies:
    16
    Views:
    2,434
    Chuck Amadi
    Jun 9, 2004
Loading...

Share This Page