Sorting Unix mailboxes

Discussion in 'Python' started by sfeil@io.com, Sep 13, 2005.

  1. Guest

    I'm writing a program in python to sort the mail in standard Unix
    email boxes. In my "prof of concept" example I am coping a letter to a
    second mailbox if the letter was send from a particular email
    address. When I read the destination mailbox with cat, I can see that
    something is getting copied to it, but the mail program does not
    recognize any new letters in the destination mailbox. It would seam
    that the "OutFile.write(Message.get_unixfrom())" line is
    essential. However if I run with this line uncommented then I get an
    the following error. "TypeError: argument 1 must be string or
    read-only character buffer, not None". I created this program by
    following an example posted somewhere on the Internet, that I can't
    seam to find anymore. At one time I was able to get Python to put new
    letters in a mailbox.

    Also, I was wondering is there were a way to use Python to delete items
    from a mailbox. I could create I temp box of non-deleted then copy to
    the source box, but that seams messy.

    Here is my example program..


    def CopyToBox(Source,Address,Destination):
    AddressRE=re.compile(
    "([a-zA-Z0-9._-]+)@([a-zA-Z0-9._-]+)\.([a-zA-Z0-9]+)")
    InFile = open("/home/stevef/mail/%s" % Source)
    OutFile = open("/home/stevef/mail/%s" % Destination,"a")
    Box = mailbox.PortableUnixMailbox(InFile)
    Envelope=Box.next()
    while 1:
    Envelope=Box.next()
    if Envelope == None:
    break
    print Envelope.getallmatchingheaders("from")[0]
    Match=AddressRE.search(
    Envelope.getallmatchingheaders("from")[0])
    if Match:
    Set=Match.groups()
    if "%s@%s.%s" % Set == Address:
    print "Copy letter from %s@%s.%s" % Set
    Message = email.message_from_file(Envelope.fp)
    #OutFile.write(Message.get_unixfrom()) ##error
    OutFile.write("\n")
    OutFile.write(Message.as_string())
    InFile.close()
    OutFile.close()
    return
     
    , Sep 13, 2005
    #1
    1. Advertising

  2. Tom Anderson Guest

    [posted and mailed, in case the OP has given up on reading the group!]

    On Tue, 13 Sep 2005, wrote:

    > I'm writing a program in python to sort the mail in standard Unix
    > email boxes. In my "prof of concept" example I am coping a letter to a
    > second mailbox if the letter was send from a particular email
    > address. When I read the destination mailbox with cat, I can see that
    > something is getting copied to it, but the mail program does not
    > recognize any new letters in the destination mailbox. It would seam
    > that the "OutFile.write(Message.get_unixfrom())" line is
    > essential.


    Absolutely! The From line is the key element in mailbox structure.

    > However if I run with this line uncommented then I get an the following
    > error. "TypeError: argument 1 must be string or read-only character
    > buffer, not None".


    This is happening because Message.get_unixfrom is returning None, rather
    than a proper From line. According to its documentation, thus method
    "defaults to None if the envelope header was never set". Since you've
    never set the envelope header, this behaviour is therefore not surprising.
    But didn't the envelope header get set when you created the message?
    Actually, no - you created it with "email.message_from_file(Envelope.fp)",
    which reads the contents of the email from the file Envelope.fp.
    Envelope.fp, however, isn't the complete text of the mailbox entry, it's
    just (AFAICT) the payload of the message. Therefore, the message you
    create has no headers or envelope, just the body.

    > I created this program by following an example posted somewhere on the
    > Internet, that I can't seam to find anymore. At one time I was able to
    > get Python to put new letters in a mailbox.
    >
    > Also, I was wondering is there were a way to use Python to delete items
    > from a mailbox.


    Not really. This is a universal problem which affects all programs,
    regardless of language, with work with file formats consisting of
    variable-sized records - there's no wasy way to delete them.

    > I could create I temp box of non-deleted then copy to the source box,
    > but that seams messy.


    A cleaner way would be to copy the non-deleted messages to a new file,
    then to throw away the old file and rename the new one to replace it. This
    would avoid the second copy. Alternatively, you could read and write
    simultaneously with one file, then truncate at the end; this takes a bit
    more care, though.

    > Here is my example program..


    Right. Some of this makes sense to me, but there's quite a lot here that i
    don't get. Perhaps some of this is a result of the code being excised from
    its natural context, though.

    > def CopyToBox(Source,Address,Destination):
    > AddressRE=re.compile(
    > "([a-zA-Z0-9._-]+)@([a-zA-Z0-9._-]+)\.([a-zA-Z0-9]+)")


    Why did you write the regexp to capture the address as three groups? It
    seems like the only thing you ever do with the groups is put them back
    together again!

    Also, it's better to define the regexp once, at global scope, to avoid
    having to compile it every time the function runs.

    > InFile = open("/home/stevef/mail/%s" % Source)
    > OutFile = open("/home/stevef/mail/%s" % Destination,"a")
    > Box = mailbox.PortableUnixMailbox(InFile)
    > Envelope=Box.next()


    Why 'Envelope'? That object is a Message, not an Envelope!

    And did you really mean to throw away the first message in the box like
    this?

    > while 1:
    > Envelope=Box.next()
    > if Envelope == None:
    > break


    Why an infinite loop with a break and an explicit next call? Why not a for
    loop over the mailbox?

    > print Envelope.getallmatchingheaders("from")[0]
    > Match=AddressRE.search(
    > Envelope.getallmatchingheaders("from")[0])


    Why getallmatchingheaders("from")[0] rather than
    getfirstmatchingheader["from"]?

    > if Match:
    > Set=Match.groups()
    > if "%s@%s.%s" % Set == Address:
    > print "Copy letter from %s@%s.%s" % Set
    > Message = email.message_from_file(Envelope.fp)


    Message now contains the email's payload, but not its headers or envelope
    details, so ...

    > #OutFile.write(Message.get_unixfrom()) ##error


    That doesn't work.

    > OutFile.write("\n")
    > OutFile.write(Message.as_string())
    > InFile.close()
    > OutFile.close()
    > return


    There's no need for an explicit return here.

    I have to sympathise with you over python's mail-handling libraries,
    though; having both the rfc822 and email modules around at the same time
    is quite a headache. Luckily, there's a way to make things simpler and
    much easier to work with, using a trick described in the docs for the
    mailbox module: rather than letting the mailbox module make the message
    objects (using the rfc822 module to do it), we can supply our own message
    factory function, with which we can create email-module messages right
    from the start. You need a function like this:

    def msgfactory(f):
    while True:
    try:
    return email.message_from_file(f)
    except:
    pass

    Then you can make a mailbox like this:

    mbox = mailbox.PortableUnixMailbox(f, msgfactory)

    The messages in it will then be email.Message instances.

    I'd then write the main function like this (you'll need to import
    os.path):

    MBOX_DIR = "/home/stevef/mail"

    def CopyToBox(src, addr, dst):
    in_ = file(os.path.join(MBOX_DIR, src))
    out = file(os.path.join(MBOX_DIR, dst), "a")
    for mail in mailbox.PortableUnixMailbox(in_, msgfactory):
    if (addr in mail["from"]):
    out.write(mail.as_string(True))
    in_.close()
    out.close()

    Simple, eh?

    tom

    --
    Also, a 'dark future where there is only war!' ... have you seen the news lately? -- applez
     
    Tom Anderson, Sep 15, 2005
    #2
    1. Advertising

  3. William Park Guest

    <> wrote:
    > I'm writing a program in python to sort the mail in standard Unix
    > email boxes. In my "prof of concept" example I am coping a letter to a
    > second mailbox if the letter was send from a particular email
    > address. When I read the destination mailbox with cat, I can see that
    > something is getting copied to it, but the mail program does not
    > recognize any new letters in the destination mailbox. It would seam
    > that the "OutFile.write(Message.get_unixfrom())" line is
    > essential. However if I run with this line uncommented then I get an
    > the following error. "TypeError: argument 1 must be string or
    > read-only character buffer, not None". I created this program by
    > following an example posted somewhere on the Internet, that I can't
    > seam to find anymore. At one time I was able to get Python to put new
    > letters in a mailbox.
    >
    > Also, I was wondering is there were a way to use Python to delete items
    > from a mailbox. I could create I temp box of non-deleted then copy to
    > the source box, but that seams messy.


    Before writing Python script, perhaps, you should look at
    man procmailrc
    man formail
    and take the relevant process and implement that in Python.

    --
    William Park <>, Toronto, Canada
    ThinFlash: Linux thin-client on USB key (flash) drive
    http://home.eol.ca/~parkw/thinflash.html
    BashDiff: Super Bash shell
    http://freshmeat.net/projects/bashdiff/
     
    William Park, Sep 16, 2005
    #3
  4. On Tue, Sep 13, 2005 at 09:23:35AM -0700, wrote:
    > I'm writing a program in python to sort the mail in standard Unix
    > email boxes. In my "prof of concept" example I am coping a letter to a
    > second mailbox if the letter was send from a particular email
    > address.

    [...]
    > Also, I was wondering is there were a way to use Python to delete items
    > from a mailbox.


    As part of Google's Summer of Code program, I wrote a replacement for
    the mailbox module that would be well suited for this. It might be
    included in the Python 2.5 distribution. Until then, you could get the
    module from Python CVS, under nondist/sandbox/mailbox.

    More info is on the project Web site: http://gkj.freeshell.org/soc/

    --
    Gregory K. Johnson
     
    Gregory K. Johnson, Sep 16, 2005
    #4
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Parsing mailboxes

    , Aug 24, 2004, in forum: Perl
    Replies:
    1
    Views:
    454
    Steve Bennett
    Aug 27, 2004
  2. Grzegorz Adam Hankiewicz
    Replies:
    3
    Views:
    357
    Andrew Dalke
    Jul 13, 2003
  3. Grzegorz Adam Hankiewicz
    Replies:
    0
    Views:
    324
    Grzegorz Adam Hankiewicz
    Jul 26, 2003
  4. Jed Parsons
    Replies:
    6
    Views:
    346
    Matthew Dixon Cowles
    Oct 19, 2004
  5. harold barker

    Sorting Unix mailboxes

    harold barker, Aug 4, 2007, in forum: Python
    Replies:
    0
    Views:
    305
    harold barker
    Aug 4, 2007
Loading...

Share This Page