simple script to read and parse mailbox

C

chuck amadi

Hi , Im trying to parse a specific users mailbox (testwwws) and output
the body of the messages to a file ,that file will then be loaded into a
PostGresql DB at some point .

I have read the email posts and been advised to use the email Module
and mailbox Module.

The blurb from a memeber of this list . Im not at work at the moment So
I cant test this out , but if someone could take a look and check that
im on the write track as this Monday I need to show my Boss and get the
body data out of the user's mailbox.

**Blurb form a member who's directed me**

Thus started with the mailbox and email modules. Mailbox lets you iterate over a
mailbox yielding individual messages of type email. The e-mail object lets
you parse and operate on the message components. From there you should be
able to extract your data.




## The email messages is read as flat text form a file or other source,
##the text is parsed to produce the object structure of the email message.
#!/usr/bon/env python
import mboxutils
import mailbox
import email
import sys
import os
import rfc822
import StringIO
import email.Parser
import types

# email package for managing email messages
# Open Users Mailbox
# class Message()
#mbox = mailbox.UnixMailbox(open("/var/spool/mail/chucka"))

def main():

# The Directory that will contain the Survey Results

dir = "/tmp/SurveyResults/"

# The Web Survey User Inbox
# Mailbox /home/testwwws/Mail/inbox

maildir = "/home/testwwws/Mail/inbox"
for file in os.listdir(maildir):

print os.path.join(maildir, file)

fp = open(os.path.join(maildir, file), "rb")
p = email.Parser.Parser()
msg = p.parse(fp)
fp.close()
#print msg.get("From")
#print msg.get("Content-Type")

counter = 1
for part in msg.walk():
if part.get_main_type() == 'multipart':
continue

filename = part.get_param("name")
if filename==None:
filename = "part-%i" % counter
counter += 1


fp = open(os.path.join(dir, filename), 'wb')
print os.path.join(dir, filename)
fp.write(part.get_payload(decode=1))
fp.close()


if __name__ == '__main__':
main()

Cheers all this list has been very helpful.
 
F

fishboy

Hi , Im trying to parse a specific users mailbox (testwwws) and output
the body of the messages to a file ,that file will then be loaded into a
PostGresql DB at some point .

I have read the email posts and been advised to use the email Module
and mailbox Module.

The blurb from a memeber of this list . Im not at work at the moment So
I cant test this out , but if someone could take a look and check that
im on the write track as this Monday I need to show my Boss and get the
body data out of the user's mailbox.

**Blurb form a member who's directed me**

Thus started with the mailbox and email modules. Mailbox lets you iterate over a
mailbox yielding individual messages of type email. The e-mail object lets
you parse and operate on the message components. From there you should be
able to extract your data.

Hi again Chuck,

I've been reading a few of your posts and I'm wondering. Are the
emails that you're parsing have binary attachments, like pictures and
stuff, or are you just trying to get the text of the body?

Or is it a little of both? It looks like you're expecting emails with
multiple binary attachments.

Other than that it looks good. You can access the header fields
directly, like:

print msg['From']

Save you a little typing.
 
C

chuck amadi

fishboy said:
Hi , Im trying to parse a specific users mailbox (testwwws) and output
the body of the messages to a file ,that file will then be loaded into a
PostGresql DB at some point .

I have read the email posts and been advised to use the email Module
and mailbox Module.

The blurb from a memeber of this list . Im not at work at the moment So
I cant test this out , but if someone could take a look and check that
im on the write track as this Monday I need to show my Boss and get the
body data out of the user's mailbox.

**Blurb form a member who's directed me**

Thus started with the mailbox and email modules. Mailbox lets you iterate over a
mailbox yielding individual messages of type email. The e-mail object lets
you parse and operate on the message components. From there you should be
able to extract your data.

Hi again Chuck,

I've been reading a few of your posts and I'm wondering. Are the
emails that you're parsing have binary attachments, like pictures and
stuff, or are you just trying to get the text of the body?

Or is it a little of both? It looks like you're expecting emails with
multiple binary attachments.

Other than that it looks good. You can access the header fields
directly, like:

print msg['From']

Save you a little typing.


Just Trying to get the body of the messages .

I have built and developed a dtml zope web form that encapsulate the
survey data.
Thus the created user testwwws mail box with have the results that I
must parse and process to a file that I can then use to populate a
database .

Cheers Chuck
 
C

chuck amadi

fishboy said:
Hi , Im trying to parse a specific users mailbox (testwwws) and output
the body of the messages to a file ,that file will then be loaded into a
PostGresql DB at some point .

I have read the email posts and been advised to use the email Module
and mailbox Module.

The blurb from a memeber of this list . Im not at work at the moment So
I cant test this out , but if someone could take a look and check that
im on the write track as this Monday I need to show my Boss and get the
body data out of the user's mailbox.

**Blurb form a member who's directed me**

Thus started with the mailbox and email modules. Mailbox lets you iterate over a
mailbox yielding individual messages of type email. The e-mail object lets
you parse and operate on the message components. From there you should be
able to extract your data.

Hi again Chuck,

I've been reading a few of your posts and I'm wondering. Are the
emails that you're parsing have binary attachments, like pictures and
stuff, or are you just trying to get the text of the body?

Or is it a little of both? It looks like you're expecting emails with
multiple binary attachments.

Other than that it looks good. You can access the header fields
directly, like:

print msg['From']

Save you a little typing.


Well I did hack most of the code . I was trying using the mboxutils
module but I could only get the headers . I assume form this script I
can get the text of the body . The reason I haven't tested is while at
work I started the write (Oops Hack ) the script then emailed it home .
Because I use pop3 account I onlt have a /var/spool/mail/Chucka not as
in work /home/User/Mail/inbox that I usuaslly scan to view data in inbox.

So please re-affirm that my hack script will be able to parse the text
of the body ( No attachments of binaries will exist within the email
messages.

Cheers for you help.

print msg['Body']

I just need the text of the body. But from your psi I can -
 
F

fishboy

Well I did hack most of the code . I was trying using the mboxutils
module but I could only get the headers . I assume form this script I
can get the text of the body . The reason I haven't tested is while at
work I started the write (Oops Hack ) the script then emailed it home .
Because I use pop3 account I onlt have a /var/spool/mail/Chucka not as
in work /home/User/Mail/inbox that I usuaslly scan to view data in inbox.

So please re-affirm that my hack script will be able to parse the text
of the body ( No attachments of binaries will exist within the email
messages.

Cheers for you help.

print msg['Body']

I just need the text of the body. But from your psi I can -

Ah, the problem is far too simple for our complicated minds.
just do:
body = msg.get_payload()
That will give you the plain text message body of an email

get_payload(decode=True) is for binary stuff (or maybe unicode, maybe)
all that get_content_type(),get_param() stuff can be ignored if you're
just doing plain text
The script you are adapting is for multiple binary (like pictures)
attachments

So, looking at the doc page for mailbox there's an interesting code
fragment:

import email
import mailbox
mbox = mailbox.UnixMailbox(fp, email.message_from_file)

So if you emails are all plain/text you could just write:

import email
import mailbox
fp = open("/var/spool/mail/chucka")
mbox = mailbox.UnixMailbox(fp, email.message_from_file)
bodies = []
for msg in mbox:
body = msg.get_payload()
bodies.append(body)

Which will leave you with a list of strings, each one a message body.

msg = email.message_from_file(fileobj) does the same thing as

p = email.Parser.Parser()
msg = p.parse(fileobj)

it's just a short cut
As is passing Unixmailbox email.message_from_file as a handler

You could also do

fp = open("/var/spool/mail/chucka")
mbox = mailbox.UnixMailbox(fp) # no handler
for mail in mbox:
msg = email.message_from_file(mail) # handle here
body = msg.get_payload()


Hth,
 
C

Chuck Amadi

Hi all exspecailly fishboy here's the script I'm just waiting to get
confirmation where im going to run the script form.

I have added a output =('/tmp/SurveyResults','w+a') which I believe will
process the body messages data to this file for future work ie database
loading.

Also that using I can add 'a' opens the file # for appending any data written
to the file is automatically added to the end.Is this logical .I have tried to
# comments to aid my learning process So bear with me.


chuck@sevenofnine:~/pythonScript> cat getSurveyMail.py
###############################################################
## This script will open and parse email messages body content.
## This Python script will reside on Mail Server on ds9:
## Emails are all plain/text you could just write the following
## Which will leave a list of strings , each one a message body.
## The Survey User is testwws and the .procmailrc file folder is
## Survey . i.e /home/testwws/Mail/inbox/Survey .
###############################################################
## file:getSurveyMail.py Created : 06/06/04 Amended date: 07/06/04
###############################################################

#The following line makes it run itself(executable script on UN*X)
#!/usr/bin/env python

import sys
import os
import email
import mailbox

# Open the testwws user mailbox (tmp user chuck)
# fp denotes factory paremeter

output =('/tmp/SurveyResults','w+a')
fp = open("/var/spool/mail/chuck")

#fp = open("/var/spool/mail/testwws")

# message_from_file returns a message object struct tree from an
# open file object.

mbox = mailbox.UnixMailbox(fp, email.message_from_file)
# list of body messages.
bodies = []

# for loop iterates through the msg in the mbox(mailbox).
# Subparts of messages can be accessed via the -
# get_payload() method will return a string object.
# If it is multipart, use the "walk" method to iterate through each part and
the
# get the payload.In our case it's not multipart So ignore.
# for part in msg.walk():
# msg = part.get_payload()
# # do something(print)

for msg in mbox:
body = msg.get_payload()
bodies.append(body)
# Print to screen for testing purposes.
# print the bodies list of the messages.
print bodies
chuck@sevenofnine:~/pythonScript> vi getSurveyMail.py
chuck@sevenofnine:~/pythonScript> python getSurveyMail.py
[]

The last line I assume would list all the body messages within the bodies list []/

Cheers for all your help list.
 
C

Chuck Amadi

Sorry to bovver you again (again) here's script.

I still can't see why the get_payload() doesn't produce
the plain text message body of an emails in the testwwws users mailbox.
AS you can see I have tried a few things but no joy what am I missing.

Cheers

Chuck

ds9:[pythonScriptMail] % cat getSurveyMail.py
###############################################################
## This script will open and parse email messages body content.
## This Python script will reside on Mail Server on ds9:
## Emails are all plain/text you could just write the following
## Which will leave a list of strings , each one a message body.
## The Survey User is testwws and the .procmailrc file folder is
## Survey . i.e /home/testwws/Mail/inbox/Survey .
###############################################################
## file:getSurveyMail.py Created : 06/06/04 Amended date: 07/06/04
###############################################################

#The following line makes it run itself(executable script on UN*X)
#!/usr/bin/env python

import sys
import os
import email
import mailbox

# Open the testwws user mailbox (tmp user chuck)
# fp denotes factory paraemeter
# mode can be 'r' when the file will only be read, 'w' for only writing
#(an existing file with the same name will be erased), and 'a' opens the file
# for appending; any data written to the file is automatically added to the
end.
# 'r+' opens the file for both reading and writing. The mode.
output =("/tmp/SurveyResults", "w+a")
#output =('/tmp/SurveyResults','w')

# open() returns a file object, and is most commonly used with two arguments:
# "open(filename, mode)".
# /home/testwwws/Mail/work
#
#fp The file or file-like object passed at instantiation time. This can be
used to read the message content.
fp = open("/var/spool/mail/testwwws")

#fp = open("/home/testwwws/Mail/work")

# message_from_file returns a message object struct tree from an
# open file object.

mbox = mailbox.UnixMailbox(fp, email.message_from_file)
# list of body messages.
bodies = []

msg = email.message_from_file(fp)
# for loop iterates through the msg in the mbox(mailbox).
# Subparts of messages can be accessed via the -
# get_payload() method will return a string object.
# If it is multipart, use the "walk" method to iterate through each part and
the
# get the payload.In our case it's not multipart So ignore.
# for part in msg.walk():
# msg = part.get_payload()
# # do something(print)

for msg in mbox:
body = msg.get_payload()
bodies.append(body)
# output.close() to close it and free up any system resources taken up by the
open file.
# After calling output.close(), attempts to use the file object will
automatically fail.
#print bodies
print fp
print msg
print msg['body']
# print body - NameError: name 'msg' is not defined
#
#print >> output,bodies
#output.close()
#print the bodies list of the messages.
print bodies
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,582
Members
45,059
Latest member
cryptoseoagencies

Latest Threads

Top