Help on Email Parsing

D

dont bother

Hey,
I have been trying to parse emails:
But I could not find any examples or snippets of
parsing emails in python from the documentation.
Google did not help me much too.
I am trying to understand the module 'email' and the
functions described there to parse email but seems
difficult.
Can anyone help me in locating some pointers or
snippets on this issue.
Thanks a Ton
Dont

__________________________________
Do you Yahoo!?
Yahoo! Mail SpamGuard - Read only the mail you want.
http://antispam.yahoo.com/tools
 
J

Jeremy Sanders

But I could not find any examples or snippets of parsing emails in
python from the documentation.

Here is a simple program (a bit of a hack) I wrote to count the number of
messages in a mailbox in each day (used for counting spams). It may be of
some use to you, although I don't actually parse the message itself, and
only the headers.

Jeremy

# Released under the GPL (version 2 or greater)
# Copyright (C) 2003 Jeremy Sanders

import mailbox
import string
import email
import email.Utils
import time
import sys

# open passed mailbox filename
# (yes - we need checking of this)
fp = open(sys.argv[1], 'r')

# open mailbox from file
mbox = mailbox.PortableUnixMailbox(fp)

secsinday = 86400
counts = {}

# get current time
nowtime = time.time()

# iterate over mail messages
while 1:
# get next message
msg = mbox.next()
# exit if we've looked at the last one
if msg == None:
break

# get received header
received = msg.get('received')
# skip messages with no received header
if received == None:
continue

# get unix time of email
date_rfind = string.rfind(received, ';')
date = received[date_rfind+1:]
pd = email.Utils.parsedate( string.strip(date) )

# skip messages we can't parse the date on
if pd == None:
continue

# get time between now and received date in message
unixtime = time.mktime(pd)
day = int( (unixtime-nowtime) / secsinday)

# increment counter for day
# (using a dict allows us to parse the messages only once)
if not day in counts:
counts[day] = 0
counts[day] += 1

# sort days into numerical order
daylist = counts.keys()
daylist.sort()

# print out counts
for d in daylist:
print d, counts[d]
 
D

deelan

dont said:
Hey,
I have been trying to parse emails:
But I could not find any examples or snippets of
parsing emails in python from the documentation.
Google did not help me much too.
I am trying to understand the module 'email' and the
functions described there to parse email but seems
difficult.
Can anyone help me in locating some pointers or
snippets on this issue.

this script will extract one or more images
from an email message given as argument

hope this helps.



"""Extracts all images from given rfc822-compliant email message.
A quick hack by deelan

python extract.py filename
"""

# good MIME's
mimes = 'image/gif', 'image/jpeg', 'image/png'

import email

def main(filename):
f = file(filename, 'r')
m = email.message_from_file(f)
f.close()

# loop thru message body and look for JPEG, GIF and PNG images
images = [(part.get_filename(), part.get_payload(decode=True))
for part in m.get_payload() if part.get_type() in mimes]

for name, data in images:
print 'writing', name, '...'
f = file(name, 'wb')
f.write(data)
f.close()

print 'done %d image(s).' % len(images)

if __name__ == '__main__':
import sys
if len(sys.argv) > 1:
main(sys.argv[1])
else:
print __doc__
 
J

John Roth

dont bother said:
Hey,
I have been trying to parse emails:
But I could not find any examples or snippets of
parsing emails in python from the documentation.
Google did not help me much too.
I am trying to understand the module 'email' and the
functions described there to parse email but seems
difficult.
Can anyone help me in locating some pointers or
snippets on this issue.
Thanks a Ton
Dont

You may want to study the MIME format a
bit first. It's not a particularly simple format.

The final example in the email documentation
seems to be fairly straightforward. The line:

msg = email.message_from_file(fp)

does everything and leaves the result in
memory as objects.

Of course, this is the *new* email package
that is in 2.2.3 and later. I don't believe the
old one was particularly easy to work with.

John Roth

..
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,055
Latest member
SlimSparkKetoACVReview

Latest Threads

Top