OutOfMemoryError with javamail getmessage

P

Patrick Hahn

Hello

In a java program, i want for statistics purpose, read one by one all messages from a Mail
Folder (it has a million messages but each of them is less than 5k).
However, after the program has retrieved a number of messages (in my case about 20000 messages),
i get an OutOfMemoryError. Even when i force a Runtime or System gc every 1000 messages,
the result is the same; it seems that garbage collection is not working and the retrieved
(and no longer used) previous messages or strings do not get freed.

Does anyone know how the program could be rewritten so that gc works and i do not get the
OutOfError?
The only workarround i found is, after every 1000 messages, to close the mail Folder, and
immediately open it again to read the next part of the messages.

Bellow is a portion of my program.

import java.io.*;
import java.util.*;
import javax.mail.*;
import javax.mail.internet.*;
import javax.activation.*;

public class Mail {
public static void main(String[] argv) {
int mTotal=0,mCount;
Properties mProps = System.getProperties();
mProps.put("mail.imap.timeout", "300000");
Session mSession = Session.getInstance(mProps, null);
Store mStore = null;
Folder mFolder = null;
try {
mStore = mSession.getStore(new URLName("imap://user:p[email protected]"));
mStore.connect();
mFolder = mStore.getFolder("Inbox");
mFolder.open(Folder.READ_ONLY);
mTotal = mFolder.getMessageCount();
}
catch (Exception e) {System.out.println(""+e); System.exit(1); }
System.out.println("Total msg:"+mTotal);
Calendar mCalBegin=Calendar.getInstance();
mCalBegin.set(2008,5,1,0,0,0);
Calendar mCalEnd=Calendar.getInstance();
mCalEnd.set(2008,6,1,0,0,0);
for(mCount=1;mCount<=mTotal;mCount++) {
if(mCount%1000==0) {
try {
mFolder.close(false);
mFolder = mStore.getFolder("Inbox");
mFolder.open(Folder.READ_ONLY);
}
catch(Exception e){System.out.println("Error\n"+e);System.exit(1);}
System.out.println("Cnt "+mCount+", Free "+Runtime.getRuntime().freeMemory());
}
try {
Message mMsg = mFolder.getMessage(mCount);
Calendar mCalMsg = Calendar.getInstance();
mCalMsg.setTime(mMsg.getSentDate());
if(mCalBegin.before(mCalMsg) & mCalEnd.after(mCalMsg)) {
String mSubject = mMsg.getSubject();
String mFrom = mMsg.getFrom()[0].toString().toLowerCase();
String mTo =mMsg.getRecipients(Message.RecipientType.TO)[0].toString().toLowerCase();
System.out.println(mFrom+";"+mTo+";"+mSubject);
}
}
catch(Exception e){System.out.println("Error\n"+e);System.exit(1);}
}
try {
mFolder.close(false);
mStore.close();
}
catch(Exception e){ System.out.println("Error\n"+e);}
}
}

PS: if you want to reply to my mail adress, remove the xs.
 
M

Martin Gregorie

Hello

In a java program, i want for statistics purpose, read one by one all
messages from a Mail Folder (it has a million messages but each of them
is less than 5k). However, after the program has retrieved a number of
messages (in my case about 20000 messages), i get an OutOfMemoryError.
Even when i force a Runtime or System gc every 1000 messages, the result
is the same; it seems that garbage collection is not working and the
retrieved (and no longer used) previous messages or strings do not get
freed.
I had a similar problem reading a 1GB mbox file via the mstor provider. My
logic is similar to yours. As you say, gc calls don't help. I eventually
got round it by setting the -Xmx option to -Xmx350m

This worked OK, rather to my surprise since the JVM grew to a bit over 400
MB but at the time I only had 256 MB RAM installed. The JVM heap
overflowed into swap space but this had much less effect on performance
than I'd expected. During all this both named, a caching DNS server, and
postgres are active. Both are used by my application to check and store
each message.

My guess is that something internally in JavaMail seems to hold large
chunks of messages for a while, but eventually releases them. Its not
obvious what is going on - just that the JVM memory requirement both grows
and shrinks while the mbox file is processed. It may be connected with
buffering the mbox file, since JavaMail seems to read a long way down the
file before my application starts to process messages - diagnostic logging
in the application shows when it starts to be passed messages. The
application doesn't get its first message until the application has run
for several minutes. During this the JVM uses 90% CPU and its memory grows
to about 200 MB. The length of this phase is definitely related to the
file size - if it is reading a mailbox of 10-20 test messages it starts up
about as fast as any other Java application.
Does anyone know how the program could be rewritten so that gc works and
i do not get the OutOfError?
If you're seeing the same problem as I did, you should also see a
significant pause between (re)opening the mailbox and your application
receiving the next message.

Try increasing the JVM memory with the -Xmx option and commenting out the
periodic close/open cycle. Note that this memory increase works in the
Linux environment but may not be so useful with other operating systems.
 
D

Dave Miller

Patrick said:
<snip>

Even when i force a Runtime or System gc every 1000 messages,
the result is the same; it seems that garbage collection is not working and the retrieved
(and no longer used) previous messages or strings do not get freed.

If GC were to ever fail, all hell would break loose with everyone in
this group - presume it's your code and not the GC. An article on GC:

http://java.sun.com/docs/books/performance/1st_edition/html/JPAppGC.fm.html

If the above doesn't help:
The only workarround i found is, after every 1000 messages, to close the mail Folder, and
immediately open it again to read the next part of the messages.

close() evidently releases references so that the GC works. The source
will tell you what close() is doing. JavaMail source link from here:

http://java.sun.com/products/javamail/
 
J

John W Kennedy

Lew said:
Why does anybody ever expect that calls to 'System.gc()' will ever help
a tight memory situation?

That depends on what you mean by "help". Taking a "System.gc()" between
human transactions can improve apparent response time if it forestalls
system-initiated collections.
 
M

Martin Gregorie

Why does anybody ever expect that calls to 'System.gc()' will ever help a
tight memory situation?
Pass. I tried it, found it didn't work and moved on. Now I know it doesn't
help and that the Linux implementation of the JVM doesn't thrash if it
oversubscribes RAM: the latter point was quite a surprise.
 
P

Patrick Hahn

Hello
thank you for your answers.

i found the reason and a solution for the apparent memory leak in javamail:
the reason is, Folder.getMessage(int) creates a cache where the headers of all
retrieved messages will be stored; this memory is not released (before the Folder
is closed) and thus cannot be garbage-collected when i retrieve the next message.
There is however a method invalidateHeaders() in the (undocumented) class IMAPMessage
to free this cache, and then the free heap memory will not fall to zero when i read
a lot of messages.

two lines of code need to be added in my example program:

import com.sun.java.mail.imap;
(otherwise the IMAPMessage class is not found), and
((IMAPMessage) mMsg).invalidateHeaders();
at the end of the for-loop.
 
P

Patrick Hahn

Martin said:
(...) I eventually got round it by setting the -Xmx option to -Xmx350m
(...) My guess is that something internally in JavaMail seems to hold large
chunks of messages for a while, but eventually releases them.
(...) If you're seeing the same problem as I did, you should also see a
significant pause between (re)opening the mailbox and your application
receiving the next message.

Thank you, Martin.
In my case, i could avoid the outofmemoryerror when setting -Xmx500m

I dont know if the memory used internally by javamail will eventually be
released (before closing the Folder); i think the larger size of heap is now
sufficient to hold the headers of all my 500000 messages.

I did measures in my program, and the time between closing the folder and
after reading the first message (after it was opened again) is about 500ms,
the time to read 1000 messages is 2 seconds.

So, the extra time it takes to my program to periodically close and reopen
the Folder is less than 5 minutes, but the extra time it took me to completely
solve the mystery of the memory leak in Folder.getMessage is about some days.

Patrick.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,483
Members
44,901
Latest member
Noble71S45

Latest Threads

Top