Reading Outlook .msg file using Python

J

John Henry

Hello all:

I have a need to read .msg files exported from Outlook. Google search
came out with a few very old posts about the topic but nothing really
useful. The email module in Python is no help - everything comes back
blank and it can't even see if there are attachments. Did find a Java
library to do the job and I suppose when push come to shove, I would
have to learn Jython and see if it can invoke the Java library. But
before I do that, can somebody point me to a Python/COM solution?

I don't need to gain access to Exchange or Outlook. I just need to
read the .msg file and extract information + attachments from it.

Thanks,
 
J

John Henry

In message


Try using EML format instead. That’s plain text.

Thanks for the reply. I would have to check to see if my client's
Outlook can export in EML format directly. I don't want to use a
converter.
 
T

Tim Golden

I have a need to read .msg files exported from Outlook. Google search
came out with a few very old posts about the topic but nothing really
useful. The email module in Python is no help - everything comes back
blank and it can't even see if there are attachments. Did find a Java
library to do the job and I suppose when push come to shove, I would
have to learn Jython and see if it can invoke the Java library. But
before I do that, can somebody point me to a Python/COM solution?

I don't need to gain access to Exchange or Outlook. I just need to
read the .msg file and extract information + attachments from it.

..msg files are Compound Documents -- a file format which obviously
seemed like a jolly good idea at the time, but which frustrates
me every time I have to do anything with it :)

Hopefully this code snippet will get you going. The idea is to open
the compound document using the Structured Storage API. That gives
you an IStorage-ish object which you can then convert to an IMessage-ish
object with the convenience function OpenIMsgOnIStg. At that point you
enter the marvellous world of Extended MAPI. The get_body_from_stream
function does a Q&D job of pulling the body text out. You can get
attachments as well: look at the PyIMessage docs, but come back if
you need help with that:

<code>
import os, sys

from win32com.mapi import mapi, mapitags
from win32com.shell import shell, shellcon
from win32com.storagecon import *
import pythoncom

def get_body_from_stream (message):
CHUNK_SIZE = 10000
stream = message.OpenProperty (mapitags.PR_BODY,
pythoncom.IID_IStream, 0, 0)
text = ""
while True:
bytes = stream.read (CHUNK_SIZE)
if bytes:
text += bytes
else:
break
return text.decode ("utf16")

def main (filepath):
mapi.MAPIInitialize ((mapi.MAPI_INIT_VERSION, 0))
storage_flags = STGM_DIRECT | STGM_READ | STGM_SHARE_EXCLUSIVE
storage = pythoncom.StgOpenStorage (filepath, None, storage_flags,
None, 0)
mapi_session = mapi.OpenIMsgSession ()
message = mapi.OpenIMsgOnIStg (mapi_session, None, storage, None, 0,
mapi.MAPI_UNICODE)
print get_body_from_stream (message)

if __name__ == '__main__':
main (*sys.argv[1:])

</code>

TJG
 
J

John Henry

I have a need to read .msg files exported from Outlook.  Google search
came out with a few very old posts about the topic but nothing really
useful.  The email module in Python is no help - everything comes back
blank and it can't even see if there are attachments.  Did find a Java
library to do the job and I suppose when push come to shove, I would
have to learn Jython and see if it can invoke the Java library.  But
before I do that, can somebody point me to a Python/COM solution?
I don't need to gain access to Exchange or Outlook.  I just need to
read the .msg file and extract information + attachments from it.

.msg files are Compound Documents -- a file format which obviously
seemed like a jolly good idea at the time, but which frustrates
me every time I have to do anything with it :)

Hopefully this code snippet will get you going. The idea is to open
the compound document using the Structured Storage API. That gives
you an IStorage-ish object which you can then convert to an IMessage-ish
object with the convenience function OpenIMsgOnIStg. At that point you
enter the marvellous world of Extended MAPI. The get_body_from_stream
function does a Q&D job of pulling the body text out. You can get
attachments as well: look at the PyIMessage docs, but come back if
you need help with that:

<code>
import os, sys

from win32com.mapi import mapi, mapitags
from win32com.shell import shell, shellcon
from win32com.storagecon import *
import pythoncom

def get_body_from_stream (message):
   CHUNK_SIZE = 10000
   stream = message.OpenProperty (mapitags.PR_BODY,
pythoncom.IID_IStream, 0, 0)
   text = ""
   while True:
     bytes = stream.read (CHUNK_SIZE)
     if bytes:
       text += bytes
     else:
       break
   return text.decode ("utf16")

def main (filepath):
   mapi.MAPIInitialize ((mapi.MAPI_INIT_VERSION, 0))
   storage_flags = STGM_DIRECT | STGM_READ | STGM_SHARE_EXCLUSIVE
   storage = pythoncom.StgOpenStorage (filepath, None, storage_flags,
None, 0)
   mapi_session = mapi.OpenIMsgSession ()
   message = mapi.OpenIMsgOnIStg (mapi_session, None, storage, None, 0,
mapi.MAPI_UNICODE)
   print get_body_from_stream (message)

if __name__ == '__main__':
   main (*sys.argv[1:])

</code>

TJG

Thank you for your reply.

I am trying your code but when it get to the line:
mapi.MAPIInitialize ((mapi.MAPI_INIT_VERSION, 0))

I got the error message:

Either there is no default mail client or the current mail client
cannot fulfill the messsage requrest. Please run Microsoft
Outlook ... client.

I have Outlook (not Express - Outlook 2002) running and I did set it
to be the default mail client. Does MAPI works with Exchange only?
(And why do I need MAPI to read the file?)

Regards,
 
T

Tim Golden

I am trying your code but when it get to the line:


I got the error message:

Either there is no default mail client or the current mail client
cannot fulfill the messsage requrest. Please run Microsoft
Outlook ... client.

I have Outlook (not Express - Outlook 2002) running and I did set it
to be the default mail client. Does MAPI works with Exchange only?

No. I was running it with Outlook 2003 installed (not running,
in fact, although it is the default mail client on that machine).
(And why do I need MAPI to read the file?)

Basically because the MAPI subsystem already contains all
the code to interpret that particular format of structured
storage. If you can find some source of info which tells
you how to parse the format directly, then you can sidestep
MAPI. Presumably this is what is done by the Java code you
mentioned.

I'm afraid I'm not at work at the moment, and I don't run
Outlook on this machine. (So I can't even save an .msg file
to test). FWIW the code did run successfully on my work
machine and produced the plain text of the email, so it
is just a configuration sort of issue. If no-one chips
in with a suggestion in a few hours, might be worth
posting to python-win32; there might be people there who
don't watch this (higher-traffic) list.

I have a vague memory that when I set this kind of thing
up to run on our Helpdesk server where I use this to
ingest incoming emails I did have to install a sort
of server-only alternative to Outlook. I'll try to remote
into the server later to see if I can spot it. But that
still wouldn't explain what the problem was if you were
actually running Outlook in any case.

TJG
 
J

John Henry

No. I was running it with Outlook 2003 installed (not running,
in fact, although it is the default mail client on that machine).


Basically because the MAPI subsystem already contains all
the code to interpret that particular format of structured
storage. If you can find some source of info which tells
you how to parse the format directly, then you can sidestep
MAPI. Presumably this is what is done by the Java code you
mentioned.

I'm afraid I'm not at work at the moment, and I don't run
Outlook on this machine. (So I can't even save an .msg file
to test). FWIW the code did run successfully on my work
machine and produced the plain text of the email, so it
is just a configuration sort of issue. If no-one chips
in with a suggestion in a few hours, might be worth
posting to python-win32; there might be people there who
don't watch this (higher-traffic) list.

I have a vague memory that when I set this kind of thing
up to run on our Helpdesk server where I use this to
ingest incoming emails I did have to install a sort
of server-only alternative to Outlook. I'll try to remote
into the server later to see if I can spot it. But that
still wouldn't explain what the problem was if you were
actually running Outlook in any case.

TJG

According to:

http://support.microsoft.com/kb/813745

I need to reset my Outlook registry keys. Unfortunately, I don't have
my Office Install CD with me. This would have to wait.

Thanks,
 
J

John Henry

No. I was running it with Outlook 2003 installed (not running,
in fact, although it is the default mail client on that machine).


Basically because the MAPI subsystem already contains all
the code to interpret that particular format of structured
storage. If you can find some source of info which tells
you how to parse the format directly, then you can sidestep
MAPI. Presumably this is what is done by the Java code you
mentioned.

I'm afraid I'm not at work at the moment, and I don't run
Outlook on this machine. (So I can't even save an .msg file
to test). FWIW the code did run successfully on my work
machine and produced the plain text of the email, so it
is just a configuration sort of issue. If no-one chips
in with a suggestion in a few hours, might be worth
posting to python-win32; there might be people there who
don't watch this (higher-traffic) list.

I have a vague memory that when I set this kind of thing
up to run on our Helpdesk server where I use this to
ingest incoming emails I did have to install a sort
of server-only alternative to Outlook. I'll try to remote
into the server later to see if I can spot it. But that
still wouldn't explain what the problem was if you were
actually running Outlook in any case.

TJG

According to:

http://support.microsoft.com/kb/813745

I need to reset my Outlook registry keys. Unfortunately, I don't have
my Office Install CD with me. This would have to wait.

Thanks,
 
J

John Henry

Thanks for the information; I'm keen to see if you're able
to use the solution I posted once this fix is in place.

TJG

Okay, after fixing the Outlook reg entries as described above, I am
able to go further. Now, the code stops at:

message = mapi.OpenIMsgOnIStg (mapi_session, None, storage, None, 0,
mapi.MAPI_UNICODE)

with an error message:

pywintypes.com_error: (-2147221242, 'OLE error 0x80040106', None,
None)
 
T

Tim Golden

Okay, after fixing the Outlook reg entries as described above, I am
able to go further. Now, the code stops at:

message = mapi.OpenIMsgOnIStg (mapi_session, None, storage, None, 0,
mapi.MAPI_UNICODE)

with an error message:

pywintypes.com_error: (-2147221242, 'OLE error 0x80040106', None,
None)

Strange. That's UNKNOWN_FLAGS. Try the call without the MAPI_UNICODE,
ie make the last param zero. Maybe there's something with Outlook 2002...
I've never tried it myself.

TJG
 
J

John Henry

Strange. That's UNKNOWN_FLAGS. Try the call without the MAPI_UNICODE,
ie make the last param zero. Maybe there's something with Outlook 2002...
I've never tried it myself.

TJG

Okay, omitting the MAPI_UNICODE works!

Now, I have to search and see how I get the header info, and extract
the attachment.
 
J

John Henry

Okay, omitting the MAPI_UNICODE works!

Now, I have to search and see how I get the header info, and extract
the attachment.

Not knowing anything about MAPI, I tried a number of the MAPI flags,
the only one that works appears to be PR_SUBJECT.
PR_CLIENT_SUBMIT_TIME, PR_CREATION_TIME and so forth doesn't work.
 
T

Tim Golden

Not knowing anything about MAPI, I tried a number of the MAPI flags,
the only one that works appears to be PR_SUBJECT.
PR_CLIENT_SUBMIT_TIME, PR_CREATION_TIME and so forth doesn't work.

I'll try to fish out some of the code we use, but for most
of the fields, having got the body, I simply used the email
module to parse it. (Obviously that doesn't give you anything
which isn't included in the MIME version of the email).

I have a lightweight wrapper that does some of the MAPI
spadework. If you're interested, let me know and I can
send it across or post it somewhere.

TJG
 
J

John Henry

I'll try to fish out some of the code we use, but for most
of the fields, having got the body, I simply used the email
module to parse it. (Obviously that doesn't give you anything
which isn't included in the MIME version of the email).

I have a lightweight wrapper that does some of the MAPI
spadework. If you're interested, let me know and I can
send it across or post it somewhere.

TJG

In case you didn't receive my message sent via "reply to author",
please sent wrapper to e c s 1 7 4 9 (at) gmail (dot) com.

Thanks,
 
J

John Henry

Strange. That's UNKNOWN_FLAGS. Try the call without the MAPI_UNICODE,
ie make the last param zero. Maybe there's something with Outlook 2002...
I've never tried it myself.

TJG

Looks like this flag is valid only if you are getting messages
directly from Outlook. When reading the msg file, the flag is
invalid.

Same issue when accessing attachments. In addition, the MAPITable
method does not seem to work at all when trying to get attachments out
of the msg file (works when dealing with message in an Outlook
mailbox). Eitherway, the display_name doesn't work when trying to
display the filename of the attachment.

I was able to get the date by using the PR_TRANSPORT_MESSAGE_HEADERS
mapitags
 
J

John Henry

Strange. That's UNKNOWN_FLAGS. Try the call without the MAPI_UNICODE,
ie make the last param zero. Maybe there's something with Outlook 2002...
I've never tried it myself.

TJG

Looks like this flag is valid only if you are getting messages
directly from Outlook. When reading the msg file, the flag is
invalid.

Same issue when accessing attachments. In addition, the MAPITable
method does not seem to work at all when trying to get attachments out
of the msg file (works when dealing with message in an Outlook
mailbox). Eitherway, the display_name doesn't work when trying to
display the filename of the attachment.

I was able to get the date by using the PR_TRANSPORT_MESSAGE_HEADERS
mapitags
 
J

John Henry

Looks like this flag is valid only if you are getting messages
directly from Outlook.  When reading the msg file, the flag is
invalid.

Same issue when accessing attachments.  In addition, the MAPITable
method does not seem to work at all when trying to get attachments out
of the msg file (works when dealing with message in an Outlook
mailbox).  Eitherway, the display_name doesn't work when trying to
display the filename of the attachment.

I was able to get the date by using the PR_TRANSPORT_MESSAGE_HEADERS
mapitags

This flag means the mapi.MAPI_UNICODE flag.
 
T

Tim Golden

Looks like this flag is valid only if you are getting messages
directly from Outlook. When reading the msg file, the flag is
invalid.

Same issue when accessing attachments. In addition, the MAPITable
method does not seem to work at all when trying to get attachments out
of the msg file (works when dealing with message in an Outlook
mailbox). Eitherway, the display_name doesn't work when trying to
display the filename of the attachment.

I was able to get the date by using the PR_TRANSPORT_MESSAGE_HEADERS
mapitags

Ah, thanks. As you will have realised, my code is basically geared
to reading an Outlook/Exchange message box. I hadn't really tried
it on individual message files, except my original excerpt. If it
were opportune, I'd be interested in seeing your working code.

TJG
 
J

John Henry

Ah, thanks. As you will have realised, my code is basically geared
to reading an Outlook/Exchange message box. I hadn't really tried
it on individual message files, except my original excerpt. If it
were opportune, I'd be interested in seeing your working code.

TJG

When (and if) I finally figure out how to get it done, I surely will
make the code available. It's pretty close. All I need is to figure
out how to extract the attachments.

Too bad I don't know (and don't have) C#. This guy did it so cleanly:

http://www.codeproject.com/KB/office/reading_an_outlook_msg.aspx?msg=3639675#xx3639675xx

May be somebody that knows both C# and Python can convert the code
(not much code) and then the Python community will have it. As it
stands, it seems the solution is available in Java, C#, VB .... but
not Python.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,766
Messages
2,569,569
Members
45,043
Latest member
CannalabsCBDReview

Latest Threads

Top