Are there alternatives to Mail::Box and Email::Folder?

  • Thread starter Trond Michelsen
  • Start date
T

Trond Michelsen

Hi.

I'm working on a mod_perl based webmail system, which is currently using
MIME::parser to decode the individual messages. This is working pretty well
today, but we feel the need to have a more abstract, object-oriented
inteface to the mailfolders themselves. Well - any library would do, to be
honest, the main point is to let support systems access the folders through
the same library as the webmail does, instead of cut'n'pasting code all over
the place. But, it would also be nice to be able to access other types of
folders, including POP and IMAP accounts, so a common interface is clearly
desirable.

I've been looking at Mail::Box and Email::Folder, and they both look
interesting. Unfortunately, they use Mail::Message and Email::Simple
respectively for the individual messages. And since both of these modules
prefers to have the entire mail in memory, they're pretty much useless in a
mod_perl environment.

So - are there any other interfaces like Email::Folder or Mail::Box that
uses less memory? I'm tempted to try to make something that borrows the
interface from Email::Folder, but returns MIME::Entity objects, but if
something else already exists, I'll have a look at that instead.


BTW: Here is a small test showing the memory usage of Mail::Box and
Email::Folder
The folder contains a single message with a size of 22486134 bytes.

--8<--
use GTop;
use Mail::Box::Manager;
my $start = GTop->new->proc_mem($$)->size;
printf "Startup: %.1fMB\n", $start/1024/1024;
my $f = Mail::Box::Manager->new->open(folder => "Maildir");
foreach my $msg ($f->messages) {
print "Subject: ", $msg->subject, "\n";
}
my $end = GTop->new->proc_mem($$)->size;
printf "Usage: %.1fMB\n", ($end-$start)/1024/1024
__END__
--8<--
Startup: 5.0MB
Subject: big mail
Usage: 1.2MB

Not that bad, but we're only accessing the header. Once we try to look at
the body, memory usage goes up.

--8<--
use GTop;
use Mail::Box::Manager;
my $start = GTop->new->proc_mem($$)->size;
printf "Startup: %.1fMB\n", $start/1024/1024;
my $f = Mail::Box::Manager->new->open(folder => "Maildir");
foreach my $msg ($f->messages) {
print "Subject: ", $msg->subject, " (", scalar $msg->parts, " parts)\n";
}
my $end = GTop->new->proc_mem($$)->size;
printf "Usage: %.1fMB\n", ($end-$start)/1024/1024
__END__
--8<--
Startup: 5.0MB
Subject: big mail (5 parts)
Usage: 205.3MB


Finally, there's Email::Folder. It's a lot leaner than Mail::Box during
startup, but it gets really fat once you access the message.

--8<--
use GTop;
use Email::Folder;
my $start = GTop->new->proc_mem($$)->size;
printf "Startup: %.1fMB\n", $start/1024/1024;
my $f = Email::Folder->new("Maildir");
while (my $msg = $f->next_message) {
print "Subject: ", $msg->header("subject"), "\n";
}
my $end = GTop->new->proc_mem($$)->size;
printf "Usage: %.1fMB\n", ($end-$start)/1024/1024;
__END__
--8<--
Startup: 2.0MB
Subject: big mail
Usage: 214.2MB


Obviously, if there's something I've missed, and there are other ways of
accessing the messages, that doesn't require all this memory, or if my
measurement of the usage is insanely wrong, then I'll be very happy to have
it pointed out :)

Oh - and just as a comparison to MIME::parser, here's what it's like if I
access that particular message through MIME::parser:

--8<--
use GTop;
use MIME::parser;
my $start = GTop->new->proc_mem($$)->size;
printf "Startup: %.1fMB\n", $start/1024/1024;
my $parser = MIME::parser->new;
$parser->output_dir("/tmp/");
my $file = "Maildir/cur/1094480066.29295.localhost,S=22486134:2,";
my $msg = $parser->parse_open($file);
print "Subject: ", $msg->head->get("subject"), " (", scalar $msg->parts, "
parts)\n";
my $end = GTop->new->proc_mem($$)->size;
printf "Usage: %.1fMB\n", ($end-$start)/1024/1024
__END__
--8<--
Startup: 4.8MB
Subject: big mail
(5 parts)
Usage: 0.5MB
 
G

Gunnar Hjalmarsson

Trond said:
And since both of these modules prefers to have the entire mail in
memory, they're pretty much useless in a mod_perl environment.

Even if I'm not able to advise you as regards the best modules, I
couldn't help wondering what you mean by that. Why would mod_perl
preclude modules that read messages into memory? A (normal) email
message is hardly a huge amount of data. Isn't rather the opposite
true, i.e. in order to be happy with mod_perl, you'd better not be
short of memory?
 
T

Trond Michelsen

Gunnar Hjalmarsson said:
Even if I'm not able to advise you as regards the best modules, I
couldn't help wondering what you mean by that. Why would mod_perl
preclude modules that read messages into memory?

Because when you have 50 httpd-processes, it's unfortunate if they all use
200MB of non-shared memory.
A (normal) email message is hardly a huge amount of data.

But with mod_perl that's not really relevant. Once the process has been
expanded to 200MB, this memory won't be made available for other processes
(like the other httpd-children) until that httpd-child is terminated. If you
have something like 100 simultaneous users, you are likely to hit a big
message pretty often.

Besides, when there's an attachment in the message, we're not going to show
that inline, we'll provide a link to it. We never need to know what the data
actually is, so there's absolutely no benefit of having it readily
accessible in memory. And, if we don't write the decoded attachment to disk
first, we'll have to re-read the message when the user wants to download the
attachment. Since it's possible to tell MIME::parser where to put the
decoded parts of the message, we can leave the downloading to Apache.

There's also the issue of performance. It takes MIME::parser about 1.3
seconds to parse the 20MB message. Mail::Box spends 8.5 seconds. I've only
tested Email::Folder with Email::Simple, which uses 1.5 seconds, but that's
without any decoding of attachments.
Isn't rather the opposite true, i.e. in order to be happy with mod_perl,
you'd better not be short of memory?

Sure, but when one solution uses more than 400 times more memory than our
current solution, it will create problems. Email::Folder is particulary
problematic here, as it will read the entire message, even if you just want
the headers. So if somebody just have a single 20MB mail in their mailbox,
just listing the contents of that mailbox will require 200+MB

BTW: The max message size on our system is 40MB, and many users do take
advantage of that, so this isn't just a theoretical problem.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,768
Messages
2,569,574
Members
45,050
Latest member
AngelS122

Latest Threads

Top