[script] dis/assembling mbox email

Discussion in 'Python' started by William Park, Jun 10, 2004.

  1. William Park

    William Park Guest

    Crossposted to Python group, because I think this is cleaner
    approach. :)

    Time to time, I need to
    - extract main header/body from a MIME email,
    - parse and extract multipart segments, recursively,
    - walk through the email tree, and edit/delete/add stuffs
    - regenerate new MIME email.

    You can edit the file manually, but it's difficult to keep track of
    where you are. So, I wrote shell scripts (included below my signature):
    1. unmbox.sh -- to extract email components into directory tree
    2. mbox.sh -- to generate email from directory tree
    So, you can "walk" through MIME email by simply "walking" through
    directory tree.

    Analogy is 'tar' file. You extract files into directory tree, and you
    create tarball from the directory tree. Or, if you are using Slackware,
    analogy is 'explodepkg' and 'makepkg'.


    Usage are
    unmbox.sh dir < email
    mbox.sh dir > email

    'unmbox.sh' will extract email components into directory tree. Header
    and body will be saved respectively as 'header' and 'body' files. If
    it's MIME, then each multipart segment will be saved as 'xx[0-9][0-9]'
    file, and it will in turn be decomposed recursively. In reverse,
    'mbox.sh' recursively walks the directory tree, and assembles email
    components into mbox-format.

    Strictly speaking, MIME boundary pattern consists of any number of
    [A-Za-z0-9 '()+_,./:?-]
    not ending in space. And, boundary line in the message body consists of
    \n--pattern\n
    \n--pattern--\n
    where 'pattern' is the boundary pattern assigned from Content-Type:
    header.

    For the sake of sanity,

    1. The script recognizes only
    boundary="..."
    as MIME boundary parameter, ie. it must be double-quoted and no
    spaces around '='.

    2. Only lines consisting of '--pattern' or '--pattern--' are recognized
    as boundary lines, because Formail puts blank line (if doesn't
    already exist) at the top and bottom of email body, undoing '\n'
    prefix/suffix anyways.

    3. '.' needs to be escaped for Sed and Grep, and '()+.?' needs to be
    escaped for Csplit and Egrep.


    Use at your risk, and enjoy.
    --
    William Park, Open Geometry Consulting, <>
    No, I will not fix your computer! I'll reformat your harddisk, though.


    -----------------------------------------------------------------------

    #! /bin/sh
    # Usage: unmbox.sh dir < email

    [ ! -d $1 ] && mkdir $1

    cd $1
    cat > input
    formail -f -X '' < input > header # no blank lines
    formail -I '' < input > body # blank lines at top/bottom

    if grep -o "boundary=\"[A-Za-z0-9 '()+_,./:?-]*[A-Za-z0-9'()+_,./:?-]\"" header > boundary; then
    . boundary
    eboundary=`sed 's/[()+.?]/\\&/g' <<< "$boundary"`
    csplit body "/^--$eboundary/" '{*}' # xx00, xx01, ...
    for i in xx??; do
    if head -1 $i | egrep "^--$eboundary\$" > /dev/null; then
    sed '1d' $i | unmbox.sh $i.mbox
    fi
    done
    else
    rm boundary
    fi

    -----------------------------------------------------------------------

    #! /bin/sh
    # Usage: mbox.sh dir > email

    cd $1
    sed '/^$/ d' header # NO blank lines in header

    if [ -f boundary ]; then
    . boundary
    echo
    for i in xx??.mbox; do
    echo "--$boundary"
    mbox.sh $i
    done
    echo "--$boundary--"
    echo
    else
    [ "`head -1 body`" ] && echo # blank line at top
    cat body
    [ "`tail -1 body`" ] && echo # blank line at bottom
    : # dummy, so that return code is 0
    fi

    -----------------------------------------------------------------------
     
    William Park, Jun 10, 2004
    #1
    1. Advertising

  2. William Park

    Alan Connor Guest

    On 10 Jun 2004 21:34:04 GMT, William Park <> wrote:
    >



    <snip>

    > mbox.sh $i
    > done
    > echo "--$boundary--"
    > echo
    > else
    > [ "`head -1 body`" ] && echo # blank line at top
    > cat body
    > [ "`tail -1 body`" ] && echo # blank line at bottom
    > : # dummy, so that return code is 0
    > fi
    >
    > -----------------------------------------------------------------------


    Thanks, William. Tucked it away. Could come in REAL handy.


    AC

    --
    http://angel.1jh.com./nanae/kooks/alanconnor.html
    http://www.killfile.org./dungeon/why/connor.html
     
    Alan Connor, Jun 11, 2004
    #2
    1. Advertising

  3. William> Time to time, I need to
    William> - extract main header/body from a MIME email,
    William> - parse and extract multipart segments, recursively,
    William> - walk through the email tree, and edit/delete/add stuffs
    William> - regenerate new MIME email.

    ...

    William> Usage are
    William> unmbox.sh dir < email
    William> mbox.sh dir > email

    ...

    You might be interested in the splitndirs.py script which is part of the
    Spambayes distribution. There is no joindirs.py script, but it's perhaps a
    five-line script using the mboxutils.getmbox function (also part of
    Spambayes).

    Skip
     
    Skip Montanaro, Jun 11, 2004
    #3
  4. [fu-t set]

    in comp.mail.misc i read:

    > Crossposted to Python group, because I think this is cleaner
    > approach. :)


    but with not an ounce of python in your solution. and no followup-to.
    sad.

    --
    a signature
     
    those who know me have no need of my name, Jun 11, 2004
    #4
  5. William Park

    William Park Guest

    In <comp.unix.shell> Skip Montanaro <> wrote:
    >
    > William> Time to time, I need to
    > William> - extract main header/body from a MIME email,
    > William> - parse and extract multipart segments, recursively,
    > William> - walk through the email tree, and edit/delete/add stuffs
    > William> - regenerate new MIME email.


    > William> Usage are
    > William> unmbox.sh dir < email
    > William> mbox.sh dir > email


    > You might be interested in the splitndirs.py script which is part of
    > the Spambayes distribution. There is no joindirs.py script, but it's
    > perhaps a five-line script using the mboxutils.getmbox function (also
    > part of Spambayes).


    I think splitndirs.py is Python's version of
    formail -s
    Of course, the inverse is simply to concatenate the files, and that
    would one-liner. :)

    --
    William Park, Open Geometry Consulting, <>
    No, I will not fix your computer! I'll reformat your harddisk, though.
     
    William Park, Jun 11, 2004
    #5
  6. William Park

    William Park Guest

    In <comp.unix.shell> William Park <> wrote:
    > Strictly speaking, MIME boundary pattern consists of any number of
    > [ A-Za-z0-9'()+_,./:?-]

    ....
    > if grep -o "boundary=\"[ A-Za-z0-9'()+_,./:?-]*[A-Za-z0-9'()+_,./:?-]\"" header > boundary; then


    Typo:
    Add '=' (equal sign) to the regexp above.

    --
    William Park, Open Geometry Consulting, <>
    No, I will not fix your computer! I'll reformat your harddisk, though.
     
    William Park, Jun 11, 2004
    #6
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Rogue Chameleon
    Replies:
    6
    Views:
    354
  2. Malte

    jar assembling tool ...?

    Malte, Apr 23, 2005, in forum: Java
    Replies:
    9
    Views:
    785
    JScoobyCed
    Apr 25, 2005
  3. Ron Adam

    dis.dis question

    Ron Adam, Oct 8, 2005, in forum: Python
    Replies:
    5
    Views:
    440
    Ron Adam
    Oct 16, 2005
  4. Replies:
    3
    Views:
    575
    John Nagle
    Aug 27, 2010
  5. Skye
    Replies:
    1
    Views:
    376
    Dennis Lee Bieber
    Sep 24, 2010
Loading...

Share This Page