A little help with Perl & Email Messages

A

artmerar

Hi,

Ok, our server runs Qmail. In the users home directory we can ut a file, .qmail, which acts like a .forward. So, we can take the incoming messages and forward them to a script:

| /home/johndoe/filter.pl

I want to take the emails and extract the From, To, Subject & Body. I was using Email::Filter and was getting everything, but the body contains all the tags & MIME information, etc.

I'm really looking to only get the body text, without all the tags, etc.

I tried Email::Filter, Email::MIME, MIME::parser, all with no luck. When I use Email::Filter I start like this: my $mail = Email::Filter->new();

That gives some type of Hash Array: Email::Filter=HASH(0x1f220730)

But all those Perl modules are expecting the input to come from 'somewhere'. But I'm not sure in this case where 'somewhere' is. It is not STDIN, it is not a file, it was forwarded via the .qmail file: | /home/johndoe/filter.pl

So very long story short, any example where I can extract the body text without all the MIME stuff?

Many thanks!
 
A

artmerar

Quoth (e-mail address removed):










It gives you an Email::Filter object, which you can manipulate using the

methods in the documentation for that module.









Invoking a command from .qmail like that sends the mail through the

command's STDIN.







I can't see anything obvious, but with a half-decent understanding of

how MIME works you should be able to do this with Email::MIME.



Ben

I hate to be a dork, but do you have any sample code? When I look at Email::MIME, it talks about other modules needed and such and I really get lost. Maybe I am not understanding the example, etc........

Thanks in advance.
 
A

artmerar

Quoth (e-mail address removed):










It gives you an Email::Filter object, which you can manipulate using the

methods in the documentation for that module.









Invoking a command from .qmail like that sends the mail through the

command's STDIN.







I can't see anything obvious, but with a half-decent understanding of

how MIME works you should be able to do this with Email::MIME.



Ben


Example from website:

use Email::MIME;
my $parsed = Email::MIME->new($message);

my @parts = $parsed->parts; # These will be Email::MIME objects, too.
my $decoded = $parsed->body;
my $non_decoded = $parsed->body_raw;

my $content_type = $parsed->content_type;

Now, where is $message coming from? I do not have a variable with $message. If I try $mail = Email::Filter->new();, that does not work......
 
A

artmerar

When you say "forward them to a script", what do you mean? What exactly

will the system do with the message and the program? My guess would be

that, as instructed by this ".qmail" file, the mail client program will

invoke filter.pl and *pass it the text of the message* in some

well-defined way, most likely via STDIN.







Er yes, that's a Perl object of type "Email::Filter". I sense that you

need to read up on how to use Perl objects such as that one.






See my earlier question: are you sure? From what you've written I do

think it is STDIN. If that's the case then you merely open STDIN inside

"filter.pl", read in the text and pass it to your Email::Filter object.



I've no idea about Email::Filter but someone else suggested Email::MIME

which I do know a bit about, having used it. If you use that module

then something like this might be a place to start:



#!/usr/bin/perl

use strict;

use warnings;

my $msg;

$msg .= $_ while <>;

chomp $msg;



my $email = Email::MIME->new($msg);

# ... etc



There are better ways of slurping in the whole of a STDIN stream but I

don't have time to look them up right now; sorry. I hope this helps.





--



Henry Law Manchester, England


I tried the following code:

use MIME::parser;
my $parser = new MIME::parser;
$parser->decode_headers(1);
my $mail = $parser->parse(\*STDIN) or die "parse failed\n";

$mail = MIME::Entity=HASH(0x12d4f420)

Not sure what that means......
 
J

Jürgen Exner

[using full-quote to demonstrate my point]
I tried the following code:

use MIME::parser;
my $parser = new MIME::parser;
$parser->decode_headers(1);
my $mail = $parser->parse(\*STDIN) or die "parse failed\n";

$mail = MIME::Entity=HASH(0x12d4f420)

Not sure what that means......

Would you mind stopping to add an additional blank line after every
single line that you quoted and claiming the previous poster wrote
those? This really does not improve readability one bit....

jue
 
A

artmerar

Please make a clear distinction between Perl code (down as far as "my

$mail =") and comment, that is, your last two lines.



You're nearly there! You've got a MIME::Entity object containing the

text of your email. You can do with it anything that the methods of

MIME::Entity allow you to do. Read it up. Try perldoc MIME::Entity, or

find the same material on the web. Look; I've done a little bit for

you, starting with your code above:



$ cat tryout

#!/usr/bin/perl

use strict;

use warnings;

use 5.010;



use MIME::parser;



my $parser = new MIME::parser;

$parser->decode_headers(1);

my $mail = $parser->parse(\*STDIN) or die "parse failed\n";



$mail->dump_skeleton;



$ cat TestEmail.txt | ./tryout

Content-type: text/plain

Effective-type: text/plain

Body-file: ./msg-7497-1.txt

Subject: This is a test email



--

henry@eris:~/Perl/tryout$



--



Henry Law Manchester, England

Sigh, I feel so helpless. LOL.

I tried another example:

use MIME::Lite;
use MIME::parser;
user MIME::Body
use Email::MIME;

my $parser = new MIME::parser;
my $entity = $parser->parse(\*STDIN);
my $body = $entity->bodyhandle;
print LOG "HERE: $body\n";

And the log is empty......I also see a MIME::Body, do I need to reference that?
 
J

Jürgen Exner

Henry Law said:
You're right, of course, and I'll do that.

Henry, it's not you who is doubling the number of lines, it is artmerar.
He/she has to fix that, nothing _you_ can do about it.
But don't you have something more useful to add to the discussion? I
know from previous history that you know orders of magnitude more Perl
than I do; I should have thought that your time might have been better
spent advising the OP.

Well, thanks for the flattery, but MIME is not my strong side.

jue
 
A

artmerar

You're right, of course, and I'll do that.



But don't you have something more useful to add to the discussion? I

know from previous history that you know orders of magnitude more Perl

than I do; I should have thought that your time might have been better

spent advising the OP.



--



Henry Law Manchester, England


I tried this, and while it is dumping out the body to files on disk in msg*files, it dump many of them, from all previous messages......not sure why:

my $parser = new MIME::parser;
my $entity = $parser->parse(\*STDIN);
my $body = $entity->bodyhandle;
$body = new MIME::Body::File "/home";

my $IO = $body->open("r") || die "open body: $!";
while (defined($_ = $IO->getline)) {
print LOG "HERE: $_\n";
}
$IO->close || die "close I/O handle: $!";
 
A

artmerar

Quoth (e-mail address removed):





Unless you're likely to be dealing with messages that won't fit into

memory, I'd recommend Email::MIME over MIME::parser. It's a lot easier

to work with.






Don't use that syntax; from time to time it will do something

unexpected.



my $parser = MIME::parser->new;






perldoc perltoot or perldoc perlboot



Ben

I'm experiencing 3 conditions:

1) It writes msg* files to disk, LOTS OF THEM, many of previous messages sent to that address.

2) I get the body with all the MIME junk in it.

3) I get nothing.

I'm looking for the text only......I've honestly tried about 30 examples with no luck. So, either I'm a dork, or I'm just missing something.
 
A

artmerar

I don't know why you're finding this so difficult.



Why don't you start with this? As far as I can tell it does what you

want, as you described in your original post.



$ cat art.pl

#!/usr/bin/perl

use strict;

use warnings;



use Email::Simple; # Why not? Your requirement is Simple.



# Read the message from STDIN

my $txt;

$txt .= $_ while <>;



# Create the message object and extract its headers

my $msg = Email::Simple->new($txt)

or die "Couldn't create an email from that text\n";

my %headers = $msg->header_pairs();



# Now you probably have all the things you said you needed

print "This message is from '$headers{From}' to '$headers{To}'\n";

print "Here is the subject: '$headers{Subject}'.\n";

print "------BODY-------\n" , $msg->body() , "------BODY END-------\n";



$



I've done your qmail homework for you too. Look in qmail-command(8) and

you'll find this:







So make your own version of my sample program the one that's named in

the .qmail file for the user, and you'll be able to do whatever you want

with the mail. (Read the rest of qmail-command for details of return

codes and their meanings).



NB: If security is any concern then you need to look carefully at

ownership of the "forwarding" program. I've not checked but it could

run with elevated privileges of some kind, thus providing a back door.



--



Henry Law Manchester, England

Henry,

That does work, but is there a way to get rid of all the header information:

Here is the subject: 'TEST'.
------BODY-------
This is a multipart message in MIME format.

------=_NextPart_000_12F3_01CE045C.72E20340
Content-Type: text/plain;
charset="us-ascii"
Content-Transfer-Encoding: 7bit

Fdsfdsasda

Fdsafdasdfas

------=_NextPart_000_12F3_01CE045C.72E20340
Content-Type: text/html;
charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
<html xmlns:v=3D"urn:schemas-microsoft-com:vml" =
xmlns:eek:=3D"urn:schemas-microsoft-com:eek:ffice:eek:ffice" =
xmlns:w=3D"urn:schemas-microsoft-com:eek:ffice:word" =
..
..
..
 
J

Jürgen Exner

And so are the readers of comp.lang.perl.misc. They are not interested
in empty lines unless they contribute to the readability of the text in
a meaningful way.
And they are not interested in lines exceeding the established Usenet
standard of ~75 characters, either.
I've honestly tried about 30 examples with no luck. So, either I'm a dork, or I'm just missing something.

Well, go figure

*PLONK*

jue
 
A

artmerar

If it's in MIME format then you need Email::MIME. It extends

Email::Simple to allow you to separate the different MIME parts, and

fiddle around with their Content-Type and Transfer-Encoding and so

forth. It will also (as I recall) return the body (with or without

decoding), which presumably is what you want.



Please have a go at that -- write some code -- and if it doesn't do what

you expect then you can get help here.



--



Henry Law Manchester, England

Henry,

Thanks for the pointer. It is almost working, but still have the MIME content:

my $txt;
$txt .= $_ while <>;

my $parsed = Email::MIME->new($txt);
my $decoded = $parsed->body;
my $non_decoded = $parsed->body_raw;

print "DECODED: $decoded\n";
print "NON: $non_decoded\n";


DECODED: This is a multipart message in MIME format.


NON: This is a multipart message in MIME format.

------=_NextPart_000_1500_01CE053E.2ADE0E30
Content-Type: text/plain;
charset="us-ascii"
Content-Transfer-Encoding: 7bit

STUFF

MORE STUFF


------=_NextPart_000_1500_01CE053E.2ADE0E30
Content-Type: text/html;
charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
 
A

artmerar

<snipped>



It is working perfectly. Look up the structure of multipart MIME

messages and read the documentation.



Hint: you'll find some help in the third line of the code snippet in the

Synopsis section of the perldoc for Email::MIME. There's another broad

hint in the section describing the "body" method.



--



Henry Law Manchester, England

Thanks for all the pointers Henry. I did the reading and changed the code. Looking at the @parts array, it contains some hashes:

PARTS: Email::MIME=HASH(0x16465de0) Email::MIME=HASH(0x164660c0)

I looked at the content type and got this:

CONTENT: multipart/alternative; boundary="----=_NextPart_000_1574_01CE0591.17983190"

Still looking to get just the text, but making some slow progress. This stuff is a bit more complex than I originally thought.
 
A

artmerar

<snipped>



It is working perfectly. Look up the structure of multipart MIME

messages and read the documentation.



Hint: you'll find some help in the third line of the code snippet in the

Synopsis section of the perldoc for Email::MIME. There's another broad

hint in the section describing the "body" method.



--



Henry Law Manchester, England


Well, what I found out was this:

If I send the email, like from Outlook, and specifically choose Plain Text, the script works fine. But, if the email is in HTML, which is the default, I just cannot get plain text from it.

So, I give, Gonna try and loop through the content and find the info I need. Thanks for all the pointers though, it did help quite a bit.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,044
Messages
2,570,388
Members
47,052
Latest member
ketan

Latest Threads

Top