how to use javamail to handle a mail of mime typeapplication/vnd.ms-excel


E

emrefan

I had somebody sent me an email (using good old Outlook '97 - don't
ask) and this email has the following headers:

Received: by chk_exchange6.whatever.local
id <[email protected]_exchange6.whatever.local>; Wed, 5 Dec 2007
16:27:24 +0800
Message-ID:
<[email protected]_exchange6.whatever.local>
From: "Some guy" <[email protected]>
To: "me" <[email protected]>
Subject: excel attached
Date: Wed, 5 Dec 2007 16:27:24 +0800
MIME-Version: 1.0
Content-Type: application/vnd.ms-excel;
name="nevermind.xls"
Content-Transfer-Encoding: base64
Content-Disposition: attachment;
filename="nevermind.xls"

Now, I have a java program that uses the javamail API to recieve a
mail of mime type multipart/mixed and extracts the attached Excel file
from it and it's working nicely. But if the email in question is one
like the above-mentioned, my program fails. So, has anybody got
experience handling a mail of the above type?

The following code is what I used for handling a multipart/mixed mail:

Multipart mimeMsg = (Multipart) aryInMsg[ msgNdx ].getContent();

for (int partNdx=0; partNdx < mimeMsg.getCount(); partNdx++) {
Part part = mimeMsg.getBodyPart( partNdx );
String disp = part.getDisposition();
String attachFilename = part.getFileName();
if (disp != null && disp.equals( Part.ATTACHMENT ) &&
(attachFilename.endsWith( ".xls" ) ||
attachFilename.endsWith( ".XLS" ))) {

System.err.println( "From " + primeSender +
", attached file " +
attachFilename );

File ordFil = new File( uplFilDirPath +
"/ord-" + chkoutDateY4md + ".xls" );
if (ordFil.exists()) {
System.err.println( "Order file " + uplFilDirPath + "/" +
attachFilename + " is a duplicate -> rejected" );
}
OutputStream os = new FileOutputStream( ordFil );
part.writeTo( os );
}
}
 
Ad

Advertisements

A

Andrew Thompson

emrefan said:
I had somebody sent me an email (using good old Outlook '97 - don't
ask) and this email has the following headers: ..
...But if the email in question is one
like the above-mentioned, my program fails.

Fails in what way?
Perhaps a ServesWarmUnmixedMartiniException?
No? Maybe you can elucidate.

BTW - looked over that code snippet and could not
understand why the first println was to System.err.
That might be more clear if you can produce an
SSCCE* that fetches an example mutli-part email
from a publicly available URL (a bit munged, if
needed, for privacy).

* <http://www.physci.org/codes/sscce.html>

--
Andrew Thompson
http://www.physci.org/

Message posted via JavaKB.com
http://www.javakb.com/Uwe/Forums.aspx/java-general/200712/1
 
M

Manish Pandit

I had somebody sent me an email (using good old Outlook '97 - don't
String attachFilename = part.getFileName();
if (disp != null && disp.equals( Part.ATTACHMENT ) &&
(attachFilename.endsWith( ".xls" ) ||
attachFilename.endsWith( ".XLS" ))) {

Can you print the attachFilename and see if you're getting the correct
file name to begin with?

-cheers,
Manish
 
M

Martin Gregorie

emrefan said:
Now, I have a java program that uses the javamail API to recieve a
mail of mime type multipart/mixed and extracts the attached Excel file
from it and it's working nicely. But if the email in question is one
like the above-mentioned, my program fails. So, has anybody got
experience handling a mail of the above type?
No, but I'm working on database-based a mail archive and using JavaMail
to prepare messages for archiving (i.e. picking out the headers used for
indexing and the main message body for content searches). I've noticed
that JavaMail is intolerant of non-standard headers and MIME structures.
Once JavaMail has found an error that's it: I don't know of any way to
recover or even to persuade it to give me the bits it can't handle.

I'm about to see if I can persuade it to write messages cause parse
errors to a rejects file for manual correction and re-input. I'm
intending to try using mstor to send the entire message to an mbox file
and, if that doesn't work, to see if Part.getAllHeaders() and
Part.getInputStream() can retrieve the complete message, which I can
then write to a file.

I have a collection of 25,480 messages that I'm using for performance
and system testing. Of these, 408 are being rejected because of parsing
errors and the majority of these seem to be coming from an older version
of Outlook belonging to one of my friends.

Similarly, another friend who also uses Outlook regularly sends me
malformed messages with attached images. I use Evolution as my MUA. Its
pretty stable but occasionally it is unable to open the image and almost
always it is unable to remove the attachment when I hit reply.

Bottom line: some versions of Outlook are just plain buggy. They have a
habit of sending munged address headers and/or MIME structures. The best
move is to persuade their users to install another MUA (maybe just a
later copy of Outlook). Otherwise be prepared to spend a LOT of time
manually extracting the attachment and feeding it through a base64 decoder.
 
E

emrefan

Fails in what way?
Perhaps a ServesWarmUnmixedMartiniException?
No? Maybe you can elucidate.

BTW - looked over that code snippet and could not
understand why the first println was to System.err.
That might be more clear if you can produce an
SSCCE* that fetches an example mutli-part email
from a publicly available URL (a bit munged, if
needed, for privacy).

I didn't quote the error I got because I thought my code is doomed to
fail if it's handed an email of MIME type application/vnd.ms-excel -
It does work when the MIME type is multipart/mixed. So I am querying
how I have to adapt my code to also handle that application/vnd.ms-
excel MIME type.

The printing to System.err is for debugging.

But I will quote the error message for completeness:

Exception in thread "main" java.lang.ClassCastException:
com.sun.mail.util.BASE64DecoderStream
at GetEmailAttachment.main(GetEmailAttachment.java:63)
 
E

emrefan

emrefan wrote:

Bottom line: some versions of Outlook are just plain buggy. They have a
habit of sending munged address headers and/or MIME structures. The best
move is to persuade their users to install another MUA (maybe just a
later copy of Outlook). Otherwise be prepared to spend a LOT of time
manually extracting the attachment and feeding it through a base64 decoder.

I understand that the Outlook we are dealing with is ancient, but can
we claim an email with the header fields I quoted is invalid? If not,
I still wish to know to handle an email of such type.
 
Ad

Advertisements

E

emrefan

Can you print the attachFilename and see if you're getting the correct
file name to begin with?

I don't think that's the problem because if the email it is handed is
of MIME type multipart/mixed, the program works fine. When the email
is of MIME type application/vnd.ms-excel, I think I am not going to
get a Multipart object at all.
 
M

Martin Gregorie

emrefan said:
I understand that the Outlook we are dealing with is ancient, but can
we claim an email with the header fields I quoted is invalid? If not,
I still wish to know to handle an email of such type.
>
Don't forget that RFC 2045 is definitive for what is acceptable and what
is not. The RFC finder is at http://www.rfc-editor.org/rfcsearch.html
From a close look at this definition your MIME message looks OK.

I think the problem is that you're starting to parse the message by
assuming it is a multi-part message when it isn't. If you start by
referencing the Message as a Part (Message implements Part) and then
parsing that in the usual recursive manner. Look at the first example in
Appendix B of the dumpPart() method in the JavaMail API Design
Specification. My content parser was derived from that method, so I
lightly modified your message (changed the part to text/plain, added
part boundaries):

From "Some guy" <[email protected]>
Received: by chk_exchange6.whatever.local id
<[email protected]_exchange6.whatever.local>; Wed, 5 Dec 2007
16:27:24 +0800Message-ID:
<[email protected]_exchange6.whatever.locFrom:
"Some guy" <[email protected]>
To: "me" <[email protected]>
Subject: Single part
Date: Wed, 5 Dec 2007 16:27:24 +0800
MIME-Version: 1.0

------------=_4757F0D8.9F8C1F5C
Content-Type: text/plain; charset=iso-8859-1
Content-Disposition: inline
Content-Transfer-Encoding: 8bit

A line of content text.

------------=_4757F0D8.9F8C1F5C--

....and this parsed and loaded into my database without any problems.
 
M

Martin Gregorie

Martin Gregorie wrote:

Following myself up - I'm not convinced that I correctly munged the
message that appears below. If you can send me the complete original or
a complete test message created by the same user but containing just a
small test spreadsheet I'd most grateful.

My actual e-mail address is in the sig at the end of this message.
 
Ad

Advertisements

E

emrefan

I think the problem is that you're starting to parse the message by
assuming it is a multi-part message when it isn't. If you start by
referencing the Message as a Part (Message implements Part) and then
parsing that in the usual recursive manner. Look at the first example in
Appendix B of the dumpPart() method in the JavaMail API Design
Specification. My content parser was derived from that method, so I
lightly modified your message (changed the part to text/plain, added
part boundaries):

You are right and I realize I shouldn't be treating it as a Multipart
thing when it isn't. Now I am treating the Message (once I know its
ContentType is application/vnd.ms-excel) as a Part and calling
getContent() on it and that gives me an InputStream (since the
DataHandler don't know how to handle a Part of MIME type application/
vnd.ms-excel). And once I have this InputStream, the rest is pretty
easy. I really should have looked more closely at the documentation.
Thanks Martin and all who tried to help.
 

Top