Reading Outlook .msg file using Python

J

John Henry

When (and if) I finally figure out how to get it done, I surely will
make the code available.  It's pretty close.  All I need is to figure
out how to extract the attachments.

Too bad I don't know (and don't have) C#.  This guy did it so cleanly:

http://www.codeproject.com/KB/office/reading_an_outlook_msg.aspx?msg=....

May be somebody that knows both C# and Python can convert the code
(not much code) and then the Python community will have it.  As it
stands, it seems the solution is available in Java, C#, VB .... but
not Python.

BTW: For the benefit of future search on this topic, with the code
listed above where:

storage_flags = STGM_DIRECT | STGM_READ | STGM_SHARE_EXCLUSIVE

I had to change it to:

storage_flags = STGM_DIRECT | STGM_READ | STGM_SHARE_DENY_NONE |
STGM_TRANSACTED

otherwise I get a sharing violation (see
http://efreedom.com/Question/1-1086814/Opening-OLE-Compound-Documents-Read-StgOpenStorage).

For now, I am using a brute force method (http://mail.python.org/
pipermail/python-win32/2009-February/008825.html) to get the names of
the attachments and if I need to extract the attachments, I pop up the
message in Outlook and let Outlook extract the files. Ugly but fits
my client's need for now. Hopefully there will be a cleaner solution
down the road.

Here's my code for brute forcing attachments out of the msg file (very
ugly):

def get_attachments(self, fileID):
#from win32com.storagecon import *
from win32com import storagecon
import pythoncom

flags = storagecon.STGM_READ | storagecon.STGM_SHARE_DENY_NONE |
storagecon.STGM_TRANSACTED
try:
storage = pythoncom.StgOpenStorage (fileID, None, flags)
except:
return []

flags = storagecon.STGM_READ | storagecon.STGM_SHARE_EXCLUSIVE
attachments=[]
for data in storage.EnumElements ():
print data[0], data[1]
if data[1] == 2 or data[0] == "__substg1.0_007D001F":
stream = storage.OpenStream (data[0], None, flags)
try:
msg = stream.Read (data[2])
except:
pass
else:
msg = repr (msg).replace("\
\x00","").strip("'").replace("%23","#")
if data[0] == "__substg1.0_007D001F":
try:
attachments.append(msg.split("name=\"")[1].split("\"")[0])
except:
pass

return attachments
 
J

Jon Clements

When (and if) I finally figure out how to get it done, I surely will
make the code available.  It's pretty close.  All I need is to figure
out how to extract the attachments.
Too bad I don't know (and don't have) C#.  This guy did it so cleanly:

May be somebody that knows both C# and Python can convert the code
(not much code) and then the Python community will have it.  As it
stands, it seems the solution is available in Java, C#, VB .... but
not Python.

BTW: For the benefit of future search on this topic, with the code
listed above where:

storage_flags = STGM_DIRECT | STGM_READ | STGM_SHARE_EXCLUSIVE

I had to change it to:

storage_flags = STGM_DIRECT | STGM_READ | STGM_SHARE_DENY_NONE |
STGM_TRANSACTED

otherwise I get a sharing violation (seehttp://efreedom.com/Question/1-1086814/Opening-OLE-Compound-Documents...).

For now, I am using a brute force method (http://mail.python.org/
pipermail/python-win32/2009-February/008825.html) to get the names of
the attachments and if I need to extract the attachments, I pop up the
message in Outlook and let Outlook extract the files.  Ugly but fits
my client's need for now.  Hopefully there will be a cleaner solution
down the road.

Here's my code for brute forcing attachments out of the msg file (very
ugly):

        def get_attachments(self, fileID):
                #from win32com.storagecon import *
                from win32com import storagecon
                import pythoncom

                flags = storagecon.STGM_READ | storagecon.STGM_SHARE_DENY_NONE |
storagecon.STGM_TRANSACTED
                try:
                        storage = pythoncom.StgOpenStorage (fileID, None, flags)
                except:
                        return []

                flags = storagecon.STGM_READ | storagecon.STGM_SHARE_EXCLUSIVE
                attachments=[]
                for data in storage.EnumElements ():
                        print data[0], data[1]
                        if data[1] == 2 or data[0] == "__substg1.0_007D001F":
                                stream = storage.OpenStream (data[0], None, flags)
                                try:
                                        msg = stream.Read (data[2])
                                except:
                                        pass
                                else:
                                        msg = repr (msg).replace("\
\x00","").strip("'").replace("%23","#")
                                        if data[0] == "__substg1.0_007D001F":
                                                try:
                                                        attachments.append(msg.split("name=\"")[1].split("\"")[0])
                                                except:
                                                        pass

                return attachments

Only just noticed this thread, and had something similar. I took the
following approach:-

(I'm thinking this might be relevant as you mentioned checking whether
your client's Outlook could export .EML directly, which indicates (to
me at least) that you have some control over that...)

- Set up an IMAP email server on a machine (in this case linux and
dovecot)
- Got client to set up a new account in Outlook for the new server
- Got client to use the Outlook interface to copy relevant emails (or
the whole lot) to new server
- Used the standard imaplib and related modules to do what was needed

From my POV I didn't have to mess around with proprietary formats or
deal with files. From the client's POV, they were able to, with an
interface familiar to them, add/remove what needed processing. It also
enabled multiple people at the client's site to contribute their
emails that might have been relevant for the task.

The program created a sub-folder under the new server, did the
processing, and injected the results to that folder, the client could
then drag 'n' drop to whatever folder they personally used for filing
their end.

They felt in control, and I didn't have to bugger about with maildir/
mbox/pst/eml, whether it was outlook/thunderbird/evolution etc...

If you're only doing "an email here or email there" and don't want to/
can't go full blown mail server route, then a possible option would be
to mock an imap server (most likely using the twisted framework) that
upon an 'APPEND' processes the 'received' email appropriately... (kind
of a server/procmail route...)


Just a couple of ideas.

Cheers,

Jon.
 
T

Tim Golden

Only just noticed this thread, and had something similar. I took the
following approach:-

(I'm thinking this might be relevant as you mentioned checking whether
your client's Outlook could export .EML directly, which indicates (to
me at least) that you have some control over that...)

- Set up an IMAP email server on a machine (in this case linux and
dovecot)
- Got client to set up a new account in Outlook for the new server
- Got client to use the Outlook interface to copy relevant emails (or
the whole lot) to new server
- Used the standard imaplib and related modules to do what was needed

Nice lateral approach. It would also be possible to do this same
kind of thing via the native Microsoft toolset alone if the OP
has access to the appropriate Outlook / Exchange accounts. (Indeed,
Exchange itself can act as an IMAP server which might be another
approach). I confess I was starting from the original "Can I read an
..msg file?" question.

TJG
 
J

John Henry

Nice lateral approach. It would also be possible to do this same
kind of thing via the native Microsoft toolset alone if the OP
has access to the appropriateOutlook/ Exchange accounts. (Indeed,
Exchange itself can act as an IMAP server which might be another
approach). I confess I was starting from the original "Can I read an
.msg file?" question.

TJG

Found some useful information:

http://www.fileformat.info/format/outlookmsg/index.htm

At least it takes some mystery out of the msg file. It explains why
my attempt to read the msg file fails sometimes. It appears some
messages don't have a header info (or at least not in the format as
described). I need to keep trying and see how I can get the header
info.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,777
Messages
2,569,604
Members
45,216
Latest member
topweb3twitterchannels

Latest Threads

Top