way to extract only the message from pop3

flit · Apr 3, 2007

Hello All,

Using poplib in python I can extract only the headers using the .top,
there is a way to extract only the message text without the headers?

like remove the fields below:
"
Return-Path:
X-Original-To:
Received: from [
by (Postfix) with ESMTP id B32382613C
for Tue, 3 Apr 2007 09:54:28 -0300 (BRT)
Date: Tue, 03 Apr 2007 09:52:15 -0300
From: <@>
To:
Subject: test
Message-Id:
MIME-Version: 1.0
Content-Type: text/plain; charset="US-ASCII"
Content-Transfer-Encoding: 7bit
X-Mailer: Becky! ver. 2.24.02 [en]
X-UIDL: !Dn!!HKT!!/k
Status: RO
"
and only get this:

this is a text message..
...

Thanks

kyosohma · Apr 3, 2007

Hello All,

Using poplib in python I can extract only the headers using the .top,
there is a way to extract only the message text without the headers?

like remove the fields below:
"
Return-Path:
X-Original-To:
Received: from [
by (Postfix) with ESMTP id B32382613C
for Tue, 3 Apr 2007 09:54:28 -0300 (BRT)
Date: Tue, 03 Apr 2007 09:52:15 -0300
From: <@>
To:
Subject: test
Message-Id:
MIME-Version: 1.0
Content-Type: text/plain; charset="US-ASCII"
Content-Transfer-Encoding: 7bit
X-Mailer: Becky! ver. 2.24.02 [en]
X-UIDL: !Dn!!HKT!!/k
Status: RO
"
and only get this:

this is a text message..
..

Thanks

I found a tutorial on parsing email that should help you:

http://www.devshed.com/c/a/Python/Python-Email-Libraries-SMTP-and-Email-Parsing/

Also see the email module:

http://www.python.org/doc/2.3.5/lib/module-email.html

Mike

hlubenow · Apr 3, 2007

I found a tutorial on parsing email that should help you:

http://www.devshed.com/c/a/Python/Python-Email-Libraries-SMTP-and-Email-Parsing/

Also see the email module:

http://www.python.org/doc/2.3.5/lib/module-email.html

Mike

Well, I couldn't work with that stuff, especially with that tutorial, some
time ago. It worked better, when I took a look at the code, user "rogen"
wrote in his first posting here:

http://www.python-forum.de/topic-7507.html

It's really not beautiful code, but as soon as you understand, what he does,
you're almost there.

See You

H.

Tim Roberts · Apr 4, 2007

flit said:
Using poplib in python I can extract only the headers using the .top,
there is a way to extract only the message text without the headers?

Only by using Python code. The other responses gave you good pointers
toward that path, but I thought I would point out the reason why.

The POP3 protocol is surprisingly primitive. There are only two commands
to fetch a message: RETR, which fetches the headers and the entire message,
and TOP, which fetches the headers and the first N lines of the message.
The key point is that both commands fetch the headers.

flit · Apr 5, 2007

Yep you are right..
I made an filter to get the data in the message I want..
So it´s not the most beatiful code, but works.

hlubenow · Apr 5, 2007

flit said:
Hello All,

Using poplib in python I can extract only the headers using the .top,
there is a way to extract only the message text without the headers?

As mentioned before, you should use module "email":

------------------------------------

#!/usr/bin/env python

import poplib
import email
import os
import sys
import string

PROVIDER = "pop.YourMailProvider.de"
USER = "YourUserName"
PASSWORD = "YourPassword"

try:
client = poplib.POP3(PROVIDER)
except:
print "Error: Provider not found."
sys.exit(1)

client.user(USER)
client.pass_(PASSWORD)

nrof_mails = len(client.list()[1])

for i in range(nrof_mails):
lines = client.retr(i + 1)[1]
mailstring = string.join(lines, "\n")

msg = email.message_from_string(mailstring)

for part in msg.walk():
blockit = 0

if part.get_content_maintype() == "text" and blockit == 0:
blockit = 1
mycontent = part.get_payload()
mycontent = mycontent.decode("quopri_codec")
print mycontent
print

client.quit()

hlubenow · Apr 5, 2007

flit said:
Hello All,

Using poplib in python I can extract only the headers using the .top,
there is a way to extract only the message text without the headers?

As mentioned before, you should use module "email":

------------------------------------

#!/usr/bin/env python

import poplib
import email
import os
import sys
import string

PROVIDER = "pop.YourMailProvider.de"
USER = "YourUserName"
PASSWORD = "YourPassword"

try:
client = poplib.POP3(PROVIDER)
except:
print "Error: Provider not found."
sys.exit(1)

client.user(USER)
client.pass_(PASSWORD)

nrof_mails = len(client.list()[1])

for i in range(nrof_mails):
lines = client.retr(i + 1)[1]
mailstring = string.join(lines, "\n")
blockit = 0

msg = email.message_from_string(mailstring)

for part in msg.walk():

if part.get_content_maintype() == "text" and blockit == 0:
blockit = 1
mycontent = part.get_payload()
mycontent = mycontent.decode("quopri_codec")
print mycontent
print

client.quit()

Gabriel Genellina · Apr 5, 2007

message=whole_message[len(headers):None]

You can omit the word None: it is just there for clarity purposes.

Uhm... I can't find any usage of slices including an explicit None in
code.google.com (except on the Python test suite), and really I don't
consider that to be more readable than whole_message[len(headers):]
But of course this is just a stylistic issue.

Tim Williams · Apr 6, 2007

so get two strings: only headers, and the whole message.
find the length of the headers, and chop that off the beginning of the whole
message:

message=whole_message[len(headers):None]

Click to expand...

This way you have to perform 2 downloads, the headers and the whole
message. Then join them both into strings and subtract one from the
other by slicing or other means.

(other means? body = whole_message.replace(headers,'' ) or maybe not !

)

The body starts at the first blank line after the Subject: header, in
practice this is the first blank line. This is a good starting point
for something simple like my earlier suggestion:

msg = '\r\n'.join( M.retr(i+1)[1] ) # retrieve the email into string
hdrs,body = msg.split('\r\n\r\n',1) # split it into hdrs & body

If the original poster required the body to be seperated from the
headers (and I received a private reply from the OP to my original
post that suggested it probably was) then splitting a joined whole
message at the first blank line is sufficient and only requires 1
download without using the email module

If the OP required just the text parts extracted from the message then
it gets a bit trickier, the email module is the way to go but not
quite how a previous poster used it.

Consider an email that routed through my (python) SMTP servers and
filters today,.

Content: ['text/plain', 'text/html', 'message/delivery-status',
'text/plain', 'text/plain', 'text/plain', 'unknown', 'message/rfc822',
'text/plain', 'text/html']

Is text/html a text part or an html part for this exercise ?

You need to walk the parts and use something like

# part.get_content_maintype() requires a further call
# to get_content_subtype() , so use
# part.get_content_type() instead.

required = ['text/plain', 'text/tab-separated-values']
for part in EMAIL_OBJ.walk():
text_parts = []
if part.get_content_type() in required:
text_parts.append(part)

print ('\r\n' + '='*76 +'\r\n').join(text_parts)
# print all the text parts seperated by a line of '='
# end

Whether you use the email module or not, you need to join the
retrieved message into a string. You can use \n but if you plan to
push the text back out in an email '\r\n' is required for the SMTP
sending part. Your client may or may not convert \n to \r\n at
sending time

HTH

Tim Williams · Apr 6, 2007

Content: ['text/plain', 'text/html', 'message/delivery-status',
'text/plain', 'text/plain', 'text/plain', 'unknown', 'message/rfc822',
'text/plain', 'text/html']

I should explain that this was the content in a single email

# part.get_content_maintype() requires a further call
# to get_content_subtype() , so use
# part.get_content_type() instead.

required = ['text/plain', 'text/tab-separated-values']
for part in EMAIL_OBJ.walk():
text_parts = []
if part.get_content_type() in required:
text_parts.append(part)

print ('\r\n' + '='*76 +'\r\n').join(text_parts)
# print all the text parts seperated by a line of '='
# end

Content: ['text/plain', 'text/html', 'message/delivery-status',
'text/plain', 'text/plain', 'text/plain', 'unknown', 'message/rfc822',
'text/plain', 'text/html']

Is text/html a text part or an html part for this exercise ?

You need to walk the parts and use something like

# part.get_content_maintype() requires a further call
# to get_content_subtype() , so use
# part.get_content_type() instead.

required = ['text/plain', 'text/tab-separated-values']
for part in EMAIL_OBJ.walk():
# text_parts = [] <== oops, this should be above the for.....
if part.get_content_type() in required:
text_parts.append(part)

print ('\r\n' + '='*76 +'\r\n').join(text_parts)
# print all the text parts seperated by a line of '='
# end

POP3 - Saving an image attachment only.	0	Aug 24, 2007
parsing downloaded mail via POP3	1	Mar 22, 2006
How do I Extract Attachment from Newsgroup Message	2	May 31, 2007
big trouble getting data from attachment in email	0	Feb 25, 2011
Double Non delivery of FAQ failure message	0	Oct 31, 2007
VIRUS IN YOUR MAIL	0	Aug 27, 2003
ezmlm warning	0	Feb 17, 2013
BOUNCE [email protected]: Non-member submission from[[email protected]]	0	Jul 19, 2005

way to extract only the message from pop3

flit

kyosohma

hlubenow

Tim Roberts

flit

hlubenow

hlubenow

Gabriel Genellina

Tim Williams

Tim Williams

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads