Removing base64 from mbox formatted file.

Discussion in 'Perl Misc' started by me at, Dec 26, 2008.

  1. me at

    me at Guest

    Hi,

    I have a text based email application I use on my personal ISP
    shell account. The file size of the mbox grows rather large
    because of encoded attachments. I have been editing it by
    searching for, [Bb][Aa][Ss][Ee]64, in vi/vim and marking and
    deleting. It is very time consuming, and error prone.

    I wonder if you might have a trick for the following example that
    would delete all lines between the
    base64 and the
    --0__

    Maybe a little perl script I could run against the file.

    Thanks,

    Vic


    Content-transfer-encoding: base64

    V5cJPPkIjFDdeEabQbd6WgICTxiiz0f5dBKquXF6k4senwEhYGnKEFJeGrxUZy8dB8gmAXI/sPvH
    ESfCwVt5hTgYiqQqtdRNHQIU1PJ33ZqmzgE90OwLaoJcnMop1WiMmgkPHQRIrwgFuNV90A3doNKT
    mrKIN07AnGcI9BQjhCBN4RfA1qIZnMqorJCogKfGQnxSCDilTVIA0yl5ciTovgLuBDKFUDE9aQcw
    9SA+rjSNf9/M1gxrj6VwDTS0IUSElMzBfsj0NFXR2kwsV1A5IF1grLgLL/r1R40BZEnuBWgm
    9SA+QEyb

    --0__=07BBFF96DFCC19C68f9e8a93df938690918c07BBFF96DFCC19C6--
    me at, Dec 26, 2008
    #1
    1. Advertising

  2. me at <> wrote:

    > delete all lines between the
    > base64 and the
    > --0__



    > Content-transfer-encoding: base64
    >
    > V5cJPPkIjFDdeEabQbd6WgICTxiiz0f5dBKquXF6k4senwEhYGnKEFJeGrxUZy8dB8gmAXI/sPvH
    > ESfCwVt5hTgYiqQqtdRNHQIU1PJ33ZqmzgE90OwLaoJcnMop1WiMmgkPHQRIrwgFuNV90A3doNKT
    > mrKIN07AnGcI9BQjhCBN4RfA1qIZnMqorJCogKfGQnxSCDilTVIA0yl5ciTovgLuBDKFUDE9aQcw
    > 9SA+rjSNf9/M1gxrj6VwDTS0IUSElMzBfsj0NFXR2kwsV1A5IF1grLgLL/r1R40BZEnuBWgm
    > 9SA+QEyb
    >
    > --0__=07BBFF96DFCC19C68f9e8a93df938690918c07BBFF96DFCC19C6--



    perl -n -i -e 'print unless /^Content-transfer-encoding: base64/ .. /^--0__' file.mbox


    --
    Tad McClellan
    email: perl -le "print scalar reverse qq/moc.noitatibaher\100cmdat/"
    Tad J McClellan, Dec 26, 2008
    #2
    1. Advertising

  3. On 2008-12-26 22:59, Tad J McClellan <> wrote:
    > me at <> wrote:
    >> delete all lines between the
    >> base64 and the
    >> --0__

    >
    >
    >> Content-transfer-encoding: base64
    >>
    >> V5cJPPkIjFDdeEabQbd6WgICTxiiz0f5dBKquXF6k4senwEhYGnKEFJeGrxUZy8dB8gmAXI/sPvH
    >> ESfCwVt5hTgYiqQqtdRNHQIU1PJ33ZqmzgE90OwLaoJcnMop1WiMmgkPHQRIrwgFuNV90A3doNKT
    >> mrKIN07AnGcI9BQjhCBN4RfA1qIZnMqorJCogKfGQnxSCDilTVIA0yl5ciTovgLuBDKFUDE9aQcw
    >> 9SA+rjSNf9/M1gxrj6VwDTS0IUSElMzBfsj0NFXR2kwsV1A5IF1grLgLL/r1R40BZEnuBWgm
    >> 9SA+QEyb
    >>
    >> --0__=07BBFF96DFCC19C68f9e8a93df938690918c07BBFF96DFCC19C6--

    >
    >
    > perl -n -i -e 'print unless /^Content-transfer-encoding: base64/ .. /^--0__' file.mbox


    A perfect example of why it is sometimes not a good idea to answer the
    questipn as asked. A message-part in a mime-encoded message does not
    always start and end with "--0__". The delimiter has to be extracted
    from the Content-Type header. Also base64 encoding is not limited to
    "attachments". Some mail programs (e.g. Lotus Notes) tend to encode even
    normal ASCII text in base64.

    hp
    Peter J. Holzer, Dec 27, 2008
    #3
  4. me at

    me at Guest

    Sat, 27 Dec 2008 12:09:47 +0100 Peter J. Holzer <> wrote:
    | A perfect example of why it is sometimes not a good idea to answer the
    | questipn as asked. A message-part in a mime-encoded message does not
    | always start and end with "--0__". The delimiter has to be extracted
    | from the Content-Type header. Also base64 encoding is not limited to
    | "attachments". Some mail programs (e.g. Lotus Notes) tend to encode even
    | normal ASCII text in base64.


    Yup, you're right, I don't know what I am doing, but a lot of work
    and experimenting shows not all encoding is the same. I have
    changed my search, oh, I had to add the last / to make searches
    work, and print to printf?

    I have changed my search to

    perl -n -i -e 'printf unless /^[Cc]ontent-[Tt]ransfer-[Ee]ncoding/ .. /^------_=_/' mbox/inbox

    Are the brackets [Cc] ok in perl, it seems to work? And I left out
    if it is base64 or whatever, I found 7 different encodings,

    I wonder do they always end ------_=_ ?
    Not as easy for me to figure out.

    Thanks,

    Vic
    me at, Dec 27, 2008
    #4
  5. me at

    Tim Greer Guest

    me at wrote:

    > perl -n -i -e 'printf unless /^[Cc]ontent-[Tt]ransfer-[Ee]ncoding/ ..
    > /^------_=_/' mbox/inbox
    >
    > Are the brackets [Cc] ok in perl, it seems to work?


    Yes, they serve the exact purpose you might be used to in some other
    languages. I.e., [Cc] in a regular expression will either match an
    upper case or lower case 'c'.
    --
    Tim Greer, CEO/Founder/CTO, BurlyHost.com, Inc.
    Shared Hosting, Reseller Hosting, Dedicated & Semi-Dedicated servers
    and Custom Hosting. 24/7 support, 30 day guarantee, secure servers.
    Industry's most experienced staff! -- Web Hosting With Muscle!
    Tim Greer, Dec 27, 2008
    #5
  6. On 2008-12-27, me at <> wrote:
    > Sat, 27 Dec 2008 12:09:47 +0100 Peter J. Holzer <> wrote:
    >| A perfect example of why it is sometimes not a good idea to answer the
    >| questipn as asked. A message-part in a mime-encoded message does not
    >| always start and end with "--0__". The delimiter has to be extracted
    >| from the Content-Type header. Also base64 encoding is not limited to
    >| "attachments". Some mail programs (e.g. Lotus Notes) tend to encode even
    >| normal ASCII text in base64.
    >
    >
    > Yup, you're right, I don't know what I am doing, but a lot of work
    > and experimenting shows not all encoding is the same. I have
    > changed my search, oh, I had to add the last / to make searches
    > work, and print to printf?
    >
    > I have changed my search to
    >
    > perl -n -i -e

    'printf unless /^[Cc]ontent-[Tt]ransfer-[Ee]ncoding/ .. /^------_=_/' mbox/inbox
    >
    > Are the brackets [Cc] ok in perl, it seems to work? And I left out
    > if it is base64 or whatever, I found 7 different encodings,


    Yes, but you'll be better reading L<perlre>, then you'll find I<i>
    modifier. Or even better -- give CPAN a chance, there're modules ready
    to parse MIME.

    > I wonder do they always end ------_=_ ?
    > Not as easy for me to figure out.


    RFC1341, have a nice reading.


    --
    Torvalds' goal for Linux is very simple: World Domination
    Stallman's goal for GNU is even simpler: Freedom
    Eric Pozharski, Dec 28, 2008
    #6
  7. me at

    Bart Lateur Guest

    me at wrote:

    >I wonder do they always end ------_=_ ?
    >Not as easy for me to figure out.


    No. You have to extract the boundary from the MIME header and search for
    that, in the specified usage. Or you can cheat.

    For example, in one particular MIME mail in my mailbox I see the header

    Content-Type: multipart/alternative;
    boundary="b1_20eb16834381951dd528290ba6c2fd76"

    The usage of this boundary I see just before the headers of a new
    section as

    --b1_20eb16834381951dd528290ba6c2fd76

    (which as I said, you can use to cheat)

    and after the last section as

    --b1_20eb16834381951dd528290ba6c2fd76--

    The "=" is just a very popular character in delimiters because its usage
    is restricted in base64 encoding, so the risk of clashes with data is
    very low to non-existent, especially in combination with "_".

    See the MIME RFCs (RFC2045 and RFC 2046) for the details, in particular,
    section 5.1 (Multipart Media Type) in RFC 2046.

    http://tools.ietf.org/html/rfc2045
    http://tools.ietf.org/html/rfc2046

    Some extracts:

    The Content-Type field for multipart entities requires one
    parameter, "boundary". The boundary delimiter line is then
    defined as a line consisting entirely of two hyphen characters
    ("-", decimal value 45) followed by the boundary parameter
    value from the Content-Type header field, optional linear
    whitespace, and a terminating CRLF.

    The body must then contain one or more body parts, each preceded
    by a boundary delimiter line, and the last one followed by a
    closing boundary delimiter line.

    The boundary delimiter line following the last body part is a
    distinguished delimiter that indicates that no further body
    parts will follow. Such a delimiter line is identical to the
    previous delimiter lines, with the addition of two more hyphens
    after the boundary parameter value.

    --
    Bart.
    Bart Lateur, Dec 28, 2008
    #7
  8. me at

    me at Guest

    Hi,

    seems to work.

    perl -n -i -e 'printf unless /^[Cc]ontent-[Tt]ransfer-[Ee]ncoding: [Bb][Aa][Ss][Ee]64/ .. /--$/' filename

    Thanks for all of the tips.

    --
    Vic
    me at, Dec 28, 2008
    #8
  9. me at

    Bart Lateur Guest

    me at wrote:

    >
    >seems to work.
    >
    > perl -n -i -e 'printf unless /^[Cc]ontent-[Tt]ransfer-[Ee]ncoding: [Bb][Aa][Ss][Ee]64/ .. /--$/' filename
    >
    >Thanks for all of the tips.
    >


    You can just make the first regex case insensitive, and for the latter,
    I think maybe you'd better sync on leading hyphens instead of trailing
    hyphens, because now you're throwing away *everything* starting from the
    base64 encoded attachment:

    Oh, and you should use print, not printf, or you'll get *big* trouble if
    the line contains percent signs.

    perl -n -i -e 'print unless /^Content-Transfer-Encoding: base64/i ..
    /^--/' filename

    --
    Bart.
    Bart Lateur, Dec 28, 2008
    #9
  10. me at

    me at Guest

    Sun, 28 Dec 2008 23:29:34 +0100 Bart Lateur <> wrote:
    | You can just make the first regex case insensitive, and for the latter,
    | I think maybe you'd better sync on leading hyphens instead of trailing
    | hyphens, because now you're throwing away *everything* starting from the
    | base64 encoded attachment:
    |
    | Oh, and you should use print, not printf, or you'll get *big* trouble if
    | the line contains percent signs.
    |
    | perl -n -i -e 'print unless /^Content-Transfer-Encoding: base64/i ..
    | /^--/' filename


    Done,
    Thanks,
    me at, Dec 29, 2008
    #10
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Michael  March
    Replies:
    1
    Views:
    280
    Cowmix
    Feb 27, 2006
  2. Christoph Krammer

    MemoryError on reading mbox file

    Christoph Krammer, Sep 12, 2007, in forum: Python
    Replies:
    6
    Views:
    445
    Christoph Krammer
    Sep 13, 2007
  3. Replies:
    2
    Views:
    351
  4. Replies:
    3
    Views:
    559
    John Nagle
    Aug 27, 2010
Loading...

Share This Page