parsing downloaded mail via POP3

K

Kevin F

I have the following script:

emails = []
for msg in messagesInfo:
msgNum = int(msg.split()[0])
msgSize = int(msg.split()[1])
if(msgSize < 20000):
message = server.retr(msgNum)[1]
Message = join(message, “\n”)
emails.append(message)


It downloads messages for me via my POP3 server, however, the message
format (attached below) includes ridiculous amounts of data and I just
want to return the from, subject, and body. Any pointers on how to do this?

/// sample message downloaded

fe5.bluebottle.com (fe5 [209.144.225.81])', '\t by
bluebottle-be1.bluebottle.com (Cyrus v2.2.8) with LMTPA;', '\t Tue, 21
Mar 2006 23:47:22 -0600', 'X-Sieve: CMU Sieve 2.2', 'Received: from
fe7.bluebottle.com (fe7 [209.144.225.70])', '\tby fe5.bluebottle.com
(8.13.4/8.13.4) with ESMTP id k2M5hhkd023264', '\tfor
<[email protected]>; Tue, 21 Mar 2006 23:44:35 -0600',
'Received: from smtp-relay.wharton.upenn.edu
(smtp-relay.wharton.upenn.edu [130.91.161.218])', '\tby
fe7.bluebottle.com (8.13.4/8.13.4) with ESMTP id k2M5hea4022775',
'\t(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO)',
'\tfor <[email protected]>; Tue, 21 Mar 2006 23:43:41 -0600',
'Received: from FAIRMOUNT.wharton.upenn.edu
(fairmount2.wharton.Upenn.Edu [128.91.87.58])', '\tby
smtp-relay.wharton.upenn.edu (8.13.1/8.13.1) with ESMTP id
k2M5heQv007094', '\tfor <[email protected]>; Wed, 22 Mar 2006
00:43:40 -0500', 'X-DomainKeys: Sendmail DomainKeys Filter v0.3.2
smtp-relay.wharton.upenn.edu k2M5heQv007094', 'DomainKey-Signature:
a=rsa-sha1; s=smtp-relay; d=wharton.upenn.edu; c=nofws; q=dns;',
'\tb=TZ7xn8PLJNMsq8iCl7eqlME0EDnCC7fKUvpKmALqe1FQ5gG/fG+V/bomQMKyblplJ',
'\tlg6wTqPoeao6lkM4yu+Rw==', 'Received: from webmail1.wharton.upenn.edu
([172.16.32.58]) by FAIRMOUNT.wharton.upenn.edu with Microsoft
SMTPSVC(6.0.3790.1830);', '\t Wed, 22 Mar 2006 00:43:39 -0500',
'Received: from [165.123.150.168] ([165.123.150.168]) by
webmail1.wharton.upenn.edu over TLS secured channel with Microsoft
SMTPSVC(6.0.3790.1830);', '\t Wed, 22 Mar 2006 00:43:39 -0500',
'User-Agent: Microsoft-Entourage/11.0.0.040405', 'Date: Wed, 22 Mar 2006
00:43:37 -0500', 'Subject: KNOCKITY-----KNOCK-----WHOS-----THERE',
'From: Kevin Feng <[email protected]>', 'To:
"(e-mail address removed)" <[email protected]>',
'Message-ID: <C0464E39.4E34%[email protected]>', 'Mime-version:
1.0', 'Content-type: text/plain;', '\tcharset="US-ASCII"',
'Content-transfer-encoding: 7bit', 'X-OriginalArrivalTime: 22 Mar 2006
05:43:39.0441 (UTC) FILETIME=[921A4210:01C64D73]', 'X-Virus-Scanned:
ClamAV version 0.88, clamav-milter version 0.87 on fe7.bluebottle.com',
'X-Virus-Status: Clean', 'Trusted-Delivery-Validation-State: Not
validated', '', 'ANITA-----ANITA WHO----ANITA BETTER JOKE', '', ''], 2266)
 
G

Gerard Flanagan

Kevin said:
I have the following script:

emails = []
for msg in messagesInfo:
msgNum = int(msg.split()[0])
msgSize = int(msg.split()[1])
if(msgSize < 20000):
message = server.retr(msgNum)[1]
Message = join(message, "\n")
emails.append(message)


It downloads messages for me via my POP3 server, however, the message
format (attached below) includes ridiculous amounts of data and I just
want to return the from, subject, and body. Any pointers on how to do this?

Have you tried server.top ?

MAX_SUMMARY_LINES = 20

def get_headers(self):
server = poplib.POP3(self.server_name)
server.user(self.user_name)
server.pass_(self.password)
hdrs = []
try:
msgCount, msgBytes = server.stat()
for i in range(msgCount):
msgNum = i+1
hdr, message, octets = server.top(msgNum,
MAX_SUMMARY_LINES)
hdrs.append(message)
finally:
server.quit()

alternatively, something like:


import poplib, email
from email.Utils import getaddresses, parseaddr

def download_mail(self):
server = poplib.POP3(self.server_name)
server.user(self.user_name)
server.pass_(self.password)
try:
msgCount, msgBytes = server.stat()
self.messages = []
for i in range(msgCount):
msgNum = i+1
hdr, message, octets = server.retr(msgNum)
mail_msg = '\n'.join( message)
self.messages.append(
email.message_from_string(mail_msg) )
finally:
server.quit()

def print_headers(self):
for message in self.messages:
print '#' * 80
print parseaddr( message['from'] )
print message['subject']
print message['date']
print getaddresses( message.get_all('to', []) )
print getaddresses( message.get_all('cc', []) )

Gerard
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,579
Members
45,053
Latest member
BrodieSola

Latest Threads

Top