way to extract only the message from pop3

F

flit

Hello All,

Using poplib in python I can extract only the headers using the .top,
there is a way to extract only the message text without the headers?

like remove the fields below:
"
Return-Path:
X-Original-To:
Received: from [
by (Postfix) with ESMTP id B32382613C
for Tue, 3 Apr 2007 09:54:28 -0300 (BRT)
Date: Tue, 03 Apr 2007 09:52:15 -0300
From: <@>
To:
Subject: test
Message-Id:
MIME-Version: 1.0
Content-Type: text/plain; charset="US-ASCII"
Content-Transfer-Encoding: 7bit
X-Mailer: Becky! ver. 2.24.02 [en]
X-UIDL: !Dn!!HKT!!/k
Status: RO
"
and only get this:

this is a text message..
...

Thanks
 
K

kyosohma

Hello All,

Using poplib in python I can extract only the headers using the .top,
there is a way to extract only the message text without the headers?

like remove the fields below:
"
Return-Path:
X-Original-To:
Received: from [
by (Postfix) with ESMTP id B32382613C
for Tue, 3 Apr 2007 09:54:28 -0300 (BRT)
Date: Tue, 03 Apr 2007 09:52:15 -0300
From: <@>
To:
Subject: test
Message-Id:
MIME-Version: 1.0
Content-Type: text/plain; charset="US-ASCII"
Content-Transfer-Encoding: 7bit
X-Mailer: Becky! ver. 2.24.02 [en]
X-UIDL: !Dn!!HKT!!/k
Status: RO
"
and only get this:

this is a text message..
..

Thanks



I found a tutorial on parsing email that should help you:

http://www.devshed.com/c/a/Python/Python-Email-Libraries-SMTP-and-Email-Parsing/

Also see the email module:

http://www.python.org/doc/2.3.5/lib/module-email.html

Mike
 
H

hlubenow


Well, I couldn't work with that stuff, especially with that tutorial, some
time ago. It worked better, when I took a look at the code, user "rogen"
wrote in his first posting here:

http://www.python-forum.de/topic-7507.html

It's really not beautiful code, but as soon as you understand, what he does,
you're almost there.

See You

H.
 
T

Tim Roberts

flit said:
Using poplib in python I can extract only the headers using the .top,
there is a way to extract only the message text without the headers?

Only by using Python code. The other responses gave you good pointers
toward that path, but I thought I would point out the reason why.

The POP3 protocol is surprisingly primitive. There are only two commands
to fetch a message: RETR, which fetches the headers and the entire message,
and TOP, which fetches the headers and the first N lines of the message.
The key point is that both commands fetch the headers.
 
F

flit

Yep you are right..
I made an filter to get the data in the message I want..
So it´s not the most beatiful code, but works. :)
 
H

hlubenow

flit said:
Hello All,

Using poplib in python I can extract only the headers using the .top,
there is a way to extract only the message text without the headers?

As mentioned before, you should use module "email":

------------------------------------

#!/usr/bin/env python

import poplib
import email
import os
import sys
import string


PROVIDER = "pop.YourMailProvider.de"
USER = "YourUserName"
PASSWORD = "YourPassword"

try:
client = poplib.POP3(PROVIDER)
except:
print "Error: Provider not found."
sys.exit(1)

client.user(USER)
client.pass_(PASSWORD)

nrof_mails = len(client.list()[1])

for i in range(nrof_mails):
lines = client.retr(i + 1)[1]
mailstring = string.join(lines, "\n")


msg = email.message_from_string(mailstring)

for part in msg.walk():
blockit = 0

if part.get_content_maintype() == "text" and blockit == 0:
blockit = 1
mycontent = part.get_payload()
mycontent = mycontent.decode("quopri_codec")
print mycontent
print

client.quit()
 
H

hlubenow

flit said:
Hello All,

Using poplib in python I can extract only the headers using the .top,
there is a way to extract only the message text without the headers?

As mentioned before, you should use module "email":

------------------------------------

#!/usr/bin/env python

import poplib
import email
import os
import sys
import string


PROVIDER = "pop.YourMailProvider.de"
USER = "YourUserName"
PASSWORD = "YourPassword"

try:
client = poplib.POP3(PROVIDER)
except:
print "Error: Provider not found."
sys.exit(1)

client.user(USER)
client.pass_(PASSWORD)

nrof_mails = len(client.list()[1])

for i in range(nrof_mails):
lines = client.retr(i + 1)[1]
mailstring = string.join(lines, "\n")
blockit = 0

msg = email.message_from_string(mailstring)

for part in msg.walk():

if part.get_content_maintype() == "text" and blockit == 0:
blockit = 1
mycontent = part.get_payload()
mycontent = mycontent.decode("quopri_codec")
print mycontent
print

client.quit()
 
G

Gabriel Genellina

message=whole_message[len(headers):None]

You can omit the word None: it is just there for clarity purposes.

Uhm... I can't find any usage of slices including an explicit None in
code.google.com (except on the Python test suite), and really I don't
consider that to be more readable than whole_message[len(headers):]
But of course this is just a stylistic issue.
 
T

Tim Williams

so get two strings: only headers, and the whole message.
find the length of the headers, and chop that off the beginning of the whole
message:
message=whole_message[len(headers):None]

This way you have to perform 2 downloads, the headers and the whole
message. Then join them both into strings and subtract one from the
other by slicing or other means.

(other means? body = whole_message.replace(headers,'' ) or maybe not ! :) )

The body starts at the first blank line after the Subject: header, in
practice this is the first blank line. This is a good starting point
for something simple like my earlier suggestion:

msg = '\r\n'.join( M.retr(i+1)[1] ) # retrieve the email into string
hdrs,body = msg.split('\r\n\r\n',1) # split it into hdrs & body

If the original poster required the body to be seperated from the
headers (and I received a private reply from the OP to my original
post that suggested it probably was) then splitting a joined whole
message at the first blank line is sufficient and only requires 1
download without using the email module

If the OP required just the text parts extracted from the message then
it gets a bit trickier, the email module is the way to go but not
quite how a previous poster used it.

Consider an email that routed through my (python) SMTP servers and
filters today,.

Content: ['text/plain', 'text/html', 'message/delivery-status',
'text/plain', 'text/plain', 'text/plain', 'unknown', 'message/rfc822',
'text/plain', 'text/html']

Is text/html a text part or an html part for this exercise ? :)

You need to walk the parts and use something like

# part.get_content_maintype() requires a further call
# to get_content_subtype() , so use
# part.get_content_type() instead.

required = ['text/plain', 'text/tab-separated-values']
for part in EMAIL_OBJ.walk():
text_parts = []
if part.get_content_type() in required:
text_parts.append(part)

print ('\r\n' + '='*76 +'\r\n').join(text_parts)
# print all the text parts seperated by a line of '='
# end

Whether you use the email module or not, you need to join the
retrieved message into a string. You can use \n but if you plan to
push the text back out in an email '\r\n' is required for the SMTP
sending part. Your client may or may not convert \n to \r\n at
sending time :)

HTH :)
 
T

Tim Williams

Content: ['text/plain', 'text/html', 'message/delivery-status',
'text/plain', 'text/plain', 'text/plain', 'unknown', 'message/rfc822',
'text/plain', 'text/html']

I should explain that this was the content in a single email
# part.get_content_maintype() requires a further call
# to get_content_subtype() , so use
# part.get_content_type() instead.

required = ['text/plain', 'text/tab-separated-values']
for part in EMAIL_OBJ.walk():
text_parts = []
if part.get_content_type() in required:
text_parts.append(part)

print ('\r\n' + '='*76 +'\r\n').join(text_parts)
# print all the text parts seperated by a line of '='
# end

Content: ['text/plain', 'text/html', 'message/delivery-status',
'text/plain', 'text/plain', 'text/plain', 'unknown', 'message/rfc822',
'text/plain', 'text/html']

Is text/html a text part or an html part for this exercise ? :)

You need to walk the parts and use something like

# part.get_content_maintype() requires a further call
# to get_content_subtype() , so use
# part.get_content_type() instead.

required = ['text/plain', 'text/tab-separated-values']
for part in EMAIL_OBJ.walk():
# text_parts = [] <== oops, this should be above the for.....
if part.get_content_type() in required:
text_parts.append(part)

print ('\r\n' + '='*76 +'\r\n').join(text_parts)
# print all the text parts seperated by a line of '='
# end
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,776
Messages
2,569,602
Members
45,183
Latest member
OrderGlycoEase

Latest Threads

Top