Removing base64 from mbox formatted file.

M

me at

Hi,

I have a text based email application I use on my personal ISP
shell account. The file size of the mbox grows rather large
because of encoded attachments. I have been editing it by
searching for, [Bb][Aa][Ss][Ee]64, in vi/vim and marking and
deleting. It is very time consuming, and error prone.

I wonder if you might have a trick for the following example that
would delete all lines between the
base64 and the
--0__

Maybe a little perl script I could run against the file.

Thanks,

Vic


Content-transfer-encoding: base64

V5cJPPkIjFDdeEabQbd6WgICTxiiz0f5dBKquXF6k4senwEhYGnKEFJeGrxUZy8dB8gmAXI/sPvH
ESfCwVt5hTgYiqQqtdRNHQIU1PJ33ZqmzgE90OwLaoJcnMop1WiMmgkPHQRIrwgFuNV90A3doNKT
mrKIN07AnGcI9BQjhCBN4RfA1qIZnMqorJCogKfGQnxSCDilTVIA0yl5ciTovgLuBDKFUDE9aQcw
9SA+rjSNf9/M1gxrj6VwDTS0IUSElMzBfsj0NFXR2kwsV1A5IF1grLgLL/r1R40BZEnuBWgm
9SA+QEyb

--0__=07BBFF96DFCC19C68f9e8a93df938690918c07BBFF96DFCC19C6--
 
T

Tad J McClellan

me at said:
delete all lines between the
base64 and the
--0__

Content-transfer-encoding: base64

V5cJPPkIjFDdeEabQbd6WgICTxiiz0f5dBKquXF6k4senwEhYGnKEFJeGrxUZy8dB8gmAXI/sPvH
ESfCwVt5hTgYiqQqtdRNHQIU1PJ33ZqmzgE90OwLaoJcnMop1WiMmgkPHQRIrwgFuNV90A3doNKT
mrKIN07AnGcI9BQjhCBN4RfA1qIZnMqorJCogKfGQnxSCDilTVIA0yl5ciTovgLuBDKFUDE9aQcw
9SA+rjSNf9/M1gxrj6VwDTS0IUSElMzBfsj0NFXR2kwsV1A5IF1grLgLL/r1R40BZEnuBWgm
9SA+QEyb

--0__=07BBFF96DFCC19C68f9e8a93df938690918c07BBFF96DFCC19C6--


perl -n -i -e 'print unless /^Content-transfer-encoding: base64/ .. /^--0__' file.mbox
 
P

Peter J. Holzer

perl -n -i -e 'print unless /^Content-transfer-encoding: base64/ .. /^--0__' file.mbox

A perfect example of why it is sometimes not a good idea to answer the
questipn as asked. A message-part in a mime-encoded message does not
always start and end with "--0__". The delimiter has to be extracted
from the Content-Type header. Also base64 encoding is not limited to
"attachments". Some mail programs (e.g. Lotus Notes) tend to encode even
normal ASCII text in base64.

hp
 
M

me at

| A perfect example of why it is sometimes not a good idea to answer the
| questipn as asked. A message-part in a mime-encoded message does not
| always start and end with "--0__". The delimiter has to be extracted
| from the Content-Type header. Also base64 encoding is not limited to
| "attachments". Some mail programs (e.g. Lotus Notes) tend to encode even
| normal ASCII text in base64.


Yup, you're right, I don't know what I am doing, but a lot of work
and experimenting shows not all encoding is the same. I have
changed my search, oh, I had to add the last / to make searches
work, and print to printf?

I have changed my search to

perl -n -i -e 'printf unless /^[Cc]ontent-[Tt]ransfer-[Ee]ncoding/ .. /^------_=_/' mbox/inbox

Are the brackets [Cc] ok in perl, it seems to work? And I left out
if it is base64 or whatever, I found 7 different encodings,

I wonder do they always end ------_=_ ?
Not as easy for me to figure out.

Thanks,

Vic
 
T

Tim Greer

me said:
perl -n -i -e 'printf unless /^[Cc]ontent-[Tt]ransfer-[Ee]ncoding/ ..
/^------_=_/' mbox/inbox

Are the brackets [Cc] ok in perl, it seems to work?

Yes, they serve the exact purpose you might be used to in some other
languages. I.e., [Cc] in a regular expression will either match an
upper case or lower case 'c'.
 
E

Eric Pozharski

| A perfect example of why it is sometimes not a good idea to answer the
| questipn as asked. A message-part in a mime-encoded message does not
| always start and end with "--0__". The delimiter has to be extracted
| from the Content-Type header. Also base64 encoding is not limited to
| "attachments". Some mail programs (e.g. Lotus Notes) tend to encode even
| normal ASCII text in base64.


Yup, you're right, I don't know what I am doing, but a lot of work
and experimenting shows not all encoding is the same. I have
changed my search, oh, I had to add the last / to make searches
work, and print to printf?

I have changed my search to

perl -n -i -e
'printf unless /^[Cc]ontent-[Tt]ransfer-[Ee]ncoding/ .. /^------_=_/' mbox/inbox
Are the brackets [Cc] ok in perl, it seems to work? And I left out
if it is base64 or whatever, I found 7 different encodings,

Yes, but you'll be better reading L<perlre>, then you'll find I<i>
modifier. Or even better -- give CPAN a chance, there're modules ready
to parse MIME.
I wonder do they always end ------_=_ ?
Not as easy for me to figure out.

RFC1341, have a nice reading.
 
B

Bart Lateur

me said:
I wonder do they always end ------_=_ ?
Not as easy for me to figure out.

No. You have to extract the boundary from the MIME header and search for
that, in the specified usage. Or you can cheat.

For example, in one particular MIME mail in my mailbox I see the header

Content-Type: multipart/alternative;
boundary="b1_20eb16834381951dd528290ba6c2fd76"

The usage of this boundary I see just before the headers of a new
section as

--b1_20eb16834381951dd528290ba6c2fd76

(which as I said, you can use to cheat)

and after the last section as

--b1_20eb16834381951dd528290ba6c2fd76--

The "=" is just a very popular character in delimiters because its usage
is restricted in base64 encoding, so the risk of clashes with data is
very low to non-existent, especially in combination with "_".

See the MIME RFCs (RFC2045 and RFC 2046) for the details, in particular,
section 5.1 (Multipart Media Type) in RFC 2046.

http://tools.ietf.org/html/rfc2045
http://tools.ietf.org/html/rfc2046

Some extracts:

The Content-Type field for multipart entities requires one
parameter, "boundary". The boundary delimiter line is then
defined as a line consisting entirely of two hyphen characters
("-", decimal value 45) followed by the boundary parameter
value from the Content-Type header field, optional linear
whitespace, and a terminating CRLF.

The body must then contain one or more body parts, each preceded
by a boundary delimiter line, and the last one followed by a
closing boundary delimiter line.

The boundary delimiter line following the last body part is a
distinguished delimiter that indicates that no further body
parts will follow. Such a delimiter line is identical to the
previous delimiter lines, with the addition of two more hyphens
after the boundary parameter value.
 
M

me at

Hi,

seems to work.

perl -n -i -e 'printf unless /^[Cc]ontent-[Tt]ransfer-[Ee]ncoding: [Bb][Aa][Ss][Ee]64/ .. /--$/' filename

Thanks for all of the tips.
 
B

Bart Lateur

me said:
seems to work.

perl -n -i -e 'printf unless /^[Cc]ontent-[Tt]ransfer-[Ee]ncoding: [Bb][Aa][Ss][Ee]64/ .. /--$/' filename

Thanks for all of the tips.

You can just make the first regex case insensitive, and for the latter,
I think maybe you'd better sync on leading hyphens instead of trailing
hyphens, because now you're throwing away *everything* starting from the
base64 encoded attachment:

Oh, and you should use print, not printf, or you'll get *big* trouble if
the line contains percent signs.

perl -n -i -e 'print unless /^Content-Transfer-Encoding: base64/i ..
/^--/' filename
 
M

me at

| You can just make the first regex case insensitive, and for the latter,
| I think maybe you'd better sync on leading hyphens instead of trailing
| hyphens, because now you're throwing away *everything* starting from the
| base64 encoded attachment:
|
| Oh, and you should use print, not printf, or you'll get *big* trouble if
| the line contains percent signs.
|
| perl -n -i -e 'print unless /^Content-Transfer-Encoding: base64/i ..
| /^--/' filename


Done,
Thanks,
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,763
Messages
2,569,562
Members
45,038
Latest member
OrderProperKetocapsules

Latest Threads

Top