Question about email-handling modules

R

Robert Latest

Hello,

I'm new to Python but have lots of programming experience in C, C++ and
Perl. Browsing through the docs, the email handling modules caught my eye
because I'd always wanted to write a script to handle my huge, ancient, and
partially corrupted email archives.

Of course I know that this kind of project shouldn't be tackled by a
beginner in a language, but I still thought I'd give it a spin.

So I wrote the stuff at the bottom. It lists senders, subjects and
addressees of all messages in an mbox.

Things that I don't understand:

1. Why can I get the 'subject' and 'from' header field unsig the []
notation, but not 'to'? When I print Message.keys I get a list of all header
fields of the message, including 'to'. What's the difference between
message['to'] and message.get('to')?

2. Why can't I call the get_payload() method on the message? What I get is
this cryptic error: "AttributeError: Message instance has no attribute
'get_payload'". I'm trying to call a method here, not an attribute. It makes
no difference if I put parentheses after get_payload or not. I looked into
the email/Message module and found get_payload defined there.

I don't want to waste your time by requesting that you pick apart my silly
example. But maybe you can give me a pointer in the right direction. This is
python 2.4 on a Debian box.

---------------------------

#!/usr/bin/python
import mailbox
import email # doesn't make a difference
from email import Message # neither does this

mbox = file("mail.old/friends")

for message in mailbox.UnixMailbox(mbox):
subject = message['subject']
frm = message['from']
# to = message['to'] # this throws a "Key Error"
to = message.get('to'); # ...but this works
print frm, "writes about", subject, "to", to
# print message.get_payload() # this doesn't work
 
M

Matt Nordhoff

Robert said:
Hello,

I'm new to Python but have lots of programming experience in C, C++ and
Perl. Browsing through the docs, the email handling modules caught my eye
because I'd always wanted to write a script to handle my huge, ancient, and
partially corrupted email archives.

Of course I know that this kind of project shouldn't be tackled by a
beginner in a language, but I still thought I'd give it a spin.

So I wrote the stuff at the bottom. It lists senders, subjects and
addressees of all messages in an mbox.

Things that I don't understand:

1. Why can I get the 'subject' and 'from' header field unsig the []
notation, but not 'to'? When I print Message.keys I get a list of all header
fields of the message, including 'to'. What's the difference between
message['to'] and message.get('to')?

On dicts, and presumably on Messages too, .get returns a default value
(None, or you can specify another with .get("key", "default") if the key
doesn't exist.

I can't say why ['to'] doesn't work when it's in the list of keys, though.
2. Why can't I call the get_payload() method on the message? What I get is
this cryptic error: "AttributeError: Message instance has no attribute
'get_payload'". I'm trying to call a method here, not an attribute. It makes
no difference if I put parentheses after get_payload or not. I looked into
the email/Message module and found get_payload defined there.

Methods are attributes. When you do "obj.method()", "obj.method" and
"()" are really two separate things: It gets the "method" attribute of
"obj", and then calls it.
I don't want to waste your time by requesting that you pick apart my silly
example. But maybe you can give me a pointer in the right direction. This is
python 2.4 on a Debian box.

---------------------------

#!/usr/bin/python
import mailbox
import email # doesn't make a difference
from email import Message # neither does this

mbox = file("mail.old/friends")

for message in mailbox.UnixMailbox(mbox):
subject = message['subject']
frm = message['from']
# to = message['to'] # this throws a "Key Error"
to = message.get('to'); # ...but this works
print frm, "writes about", subject, "to", to
# print message.get_payload() # this doesn't work

(Oops, I wrote this like half an hour ago, but I never sent it.)
--
 
S

Steven D'Aprano

1. Why can I get the 'subject' and 'from' header field unsig the []
notation, but not 'to'? When I print Message.keys I get a list of all
header fields of the message, including 'to'. What's the difference
between message['to'] and message.get('to')?

message['to'] looks up the key 'to', raising an exception if it doesn't
exist. message.get('to') looks up the key and returns a default value if
it doesn't exist.

See help(message.get) for more detail.

2. Why can't I call the get_payload() method on the message? What I get
is this cryptic error: "AttributeError: Message instance has no
attribute 'get_payload'". I'm trying to call a method here, not an
attribute. It makes no difference if I put parentheses after get_payload
or not. I looked into the email/Message module and found get_payload
defined there.

All methods are attributes (although the opposite is not the case), so if
a method doesn't exist, you will get an AttributeError.

The email.Message.Message class has a get_payload, but you're not using
that class. You're using mailbox.UnixMailbox, which returns an instance
of rfc822.Message which *doesn't* have a get_payload method.

Damned if I can work out how to actually *use* the email module to read
an mbox mail box. I might have to RTFM :(

http://docs.python.org/lib/module-email.html
http://docs.python.org/lib/module-mailbox.html


*later*

Ah! The Fine Manual is some help after all. Try this:

# copied from http://docs.python.org/lib/mailbox-deprecated.html
import email
import email.Errors
import mailbox
def msgfactory(fp):
try:
return email.message_from_file(fp)
except email.Errors.MessageParseError:
# Don't return None since that will
# stop the mailbox iterator
return ''

fp = file('mymailbox', 'rb')
mbox = mailbox.UnixMailbox(fp, msgfactory)
for message in mbox:
print message.get_payload()



But note that message.get_payload() will return either a string (for
single part emails) or a list of Messages (for multi-part messages).
 
T

tinnews

Steven D'Aprano said:
On Thu, 20 Dec 2007 09:31:10 +0000, Robert Latest wrote:
[snip most of question and helpful answer]
But note that message.get_payload() will return either a string (for
single part emails) or a list of Messages (for multi-part messages).
Note also that the mailbox module in python 2.5 is quite unlike the
mailbox module in python 2.4 so code written for the 2.4 mailbox will
be most unlikely to work under 2.5 without at least some changes.

At least that's my experience/understanding.

Also, from the way things currently work in the 2.5 version I think
there will (hopefully) be some more quite significant changes.
 
R

Robert Latest

Steven said:
message['to'] looks up the key 'to', raising an exception if it doesn't
exist. message.get('to') looks up the key and returns a default value if
it doesn't exist.

Ah, so the [] notation got hung up on some message right at the beginning
and didn't even let the script continue. Makes sense.
All methods are attributes (although the opposite is not the case), so if
a method doesn't exist, you will get an AttributeError.

I see. I've already gathered that Python likes to use different words for
common things (attribute instead of member or method).
Damned if I can work out how to actually *use* the email module to read
an mbox mail box. I might have to RTFM :(

Yeah, I think I haven't picked the right module to get started with.
But note that message.get_payload() will return either a string (for
single part emails) or a list of Messages (for multi-part messages).

Yes, I did note that.

Thanks for the tips (also to the others who have answered).

Python looks like fun though. Maybe I should try to tackle some other
problem first.

robert
 
T

thebjorn

Steven D'Aprano wrote: [...]
All methods are attributes (although the opposite is not the case), so if
a method doesn't exist, you will get an AttributeError.

I see. I've already gathered that Python likes to use different words for
common things (attribute instead of member or method).

...we were hoping it would make you feel comfy when coming from
Perl ;-)

On a more serious note, Python is actually quite semantically regular:
a dot (.) always means the same thing, as do parens. It might not be
immediately obvious exactly _what_ it means if you're coming from
languages that confuse issues with syntactic sweetness.

When you see code that says

foo.bar(baz)

there are two distinct operations happening, namely

tmp = foo.bar # attribute lookup (the dot-operator)
tmp(baz) # function call (the paren-operator)

this will give you the insight to one of the first optimization
methods you can use if a loop is a bottleneck

for i in range(100000):
foo.bar(i) # profiling says this is a bottleneck

attribute lookup hoisting optimization

tmp = foo.bar # move attribute lookup outside the loop
for i in range(100000):
tmp(i)

in the interest of full disclosure, I should probably mention that I'm
of course lying to you ;-) You can override both attribute lookup and
function call in your own classes, but (a) that shouldn't be important
to you at this point *wink*, and (b) at that level Python is quite
semantically regular.

-- bjorn
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,770
Messages
2,569,584
Members
45,076
Latest member
OrderKetoBeez

Latest Threads

Top