Flattening an email message


Florian Lindner


I want to use some machine learning stuff on mail messages. First step is get
some flattened text from a mail message, python's email package does not work
as automatically as I wish. Right now I have:
def mail_preprocessor(str):
msg = email.message_from_string(str)
msg_body = ""

for part in msg.walk():
if part.get_content_type() == "text/plain":
msg_body += part.get_payload(decode=True)

msg_body = msg_body.lower()
msg_body = msg_body.replace("\n", " ")
msg_body = msg_body.replace("\t", " ")
return msg_body

For getting a text from html I could use BeautifulSoup. Right now I'm still a
layer down (encapsulation etc.) at RFC 2822 stuff.

Does anybody knows about some package or code I can throw an email message at
and get some kind of text from it? Attachments being discarded, HTML I can
take care of...




Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question