Extract data from email - Tmail, Hpricot

G

George Cooper

Hi all,

I have an html email that I would like to parse.

The problem I'm having is removing all html tags and getting past the
header information. Then I want to extract all the information per row
to put into a database.

the email is pasted here: http://pastie.textmate.org/265259

I have tried Tmail, but can't seem to extract just the body. Then I
tried Hpricot and wasn't sure what to use before the .inner_html. So
basically I'm very lost on where to start.

Any help is appreciated.

Thanks!
 
M

Michael Morin

George said:
Hi all,

I have an html email that I would like to parse.

The problem I'm having is removing all html tags and getting past the
header information. Then I want to extract all the information per row
to put into a database.

the email is pasted here: http://pastie.textmate.org/265259

I have tried Tmail, but can't seem to extract just the body. Then I
tried Hpricot and wasn't sure what to use before the .inner_html. So
basically I'm very lost on where to start.

Any help is appreciated.

Thanks!

It would help if you posted some code that didn't work, so people can
have a better idea of what you're trying to do. Tmail should have been
able to parse that without problem, however, extracting the body is
easy. The box follows the empty line. You could use something like
split, but duping such huge strings could be slow. When you read the
mail, try to read a line at a time until you get the empty line, then
read the rest into a buffer for hpricot.

--
Michael Morin
Guide to Ruby
http://ruby.about.com/
Become an About.com Guide: beaguide.about.com
About.com is part of the New York Times Company
 
G

Geo _C

I got Tmail to extract the body of my email. The solution (very simple
and embarrassing) is below. Now I'm trying to figure out Hpricot, but
examples seem to be fairly thin. If anyone knows of a good tutorial for
beginners, please post. I have been using
http://code.whytheluckystiff.net/doc/hpricot/ , but could use something
more basic.

Thanks for the help!

Thomas said:
It would help if you posted some code that didn't work, so people can
require 'rubygems'
require 'tmail'

email = TMail::Mail.load( 'emailhtml.eml' )

puts email['body'] # comes back nil

Don't see why it would be nil. I would contact Mikel.

I needed to use email.body instead of email['body'] to return the body.
thanks Peter!
http://lindsaar.net/
puts email['from']
puts email['Delivered-To']
puts email['to'] # comes back nil

I don't see a 'to' in the header, so is this a surprise?

My mistake there. You are correct, there is no 'to' for me to use.
Tue, 2 Sep 2008 19:05:00 -0400
02 Sep 2008 23:10:35.0578 (UTC) FILETIME=[1B2659A0:01C90D51]

T.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads


Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,581
Members
45,056
Latest member
GlycogenSupporthealth

Latest Threads

Top