good email parser ??

Discussion in 'Perl Misc' started by Jack, Feb 7, 2009.

  1. Jack

    Jack Guest

    Hi I havent had any luck with the CPAN email modules, I just want to
    parse multipart and mime and base64, with all the varieties of email
    files out there, these modules just dont work... does anyone know a
    free or low cost command line driven email client or parser that can
    do the job.

    Thank you,

    Jack
     
    Jack, Feb 7, 2009
    #1
    1. Advertising

  2. Jack

    rabbits77 Guest

    Jack wrote:
    > Hi I havent had any luck with the CPAN email modules, I just want to
    > parse multipart and mime and base64, with all the varieties of email
    > files out there, these modules just dont work... does anyone know a
    > free or low cost command line driven email client or parser that can
    > do the job.
    >
    > Thank you,

    I have done some work parsing email in the(fairly distant) past.
    Email really isn't that varied!
    In order for email to work at all, in fact, it needs to be pretty
    predictable!
    I bet that you could do this yourself.
    Where are your sticking points?
    If I understand your question, do you just want to remove all
    email attachments?
     
    rabbits77, Feb 8, 2009
    #2
    1. Advertising

  3. On 2009-02-07 23:59, Jack <> wrote:
    > Hi I havent had any luck with the CPAN email modules, I just want to
    > parse multipart and mime and base64, with all the varieties of email
    > files out there, these modules just dont work...


    MIME::parser works for me. It is a bit slow and tends to use ridiculuous
    amounts of memory if you want to avoid temporary files, but I have yet
    to find a (syntactically correct) email which can't parse.

    hp
     
    Peter J. Holzer, Feb 8, 2009
    #3
  4. Jack

    Jack Guest

    On Feb 8, 12:08 pm, "Peter J. Holzer" <> wrote:
    > On 2009-02-07 23:59, Jack <> wrote:
    >
    > > Hi I havent had any luck with the CPAN email modules, I just want to
    > > parse multipart and mime and base64, with all the varieties of email
    > > files out there, these modules just dont work...

    >
    > MIME::parser works for me. It is a bit slow and tends to use ridiculuous
    > amounts of memory if you want to avoid temporary files, but I have yet
    > to find a (syntactically correct) email which can't parse.
    >
    >         hp


    Thanks Peter for the posting.. can you provide some guidance then.. I
    tried the below code and figured the skeleton would report the base64
    image attachments in a MIME message, but isnt picking it up. I need
    to be able to deal with text body, base64 body, and image attachments,
    and want to parse them out correctly. I can do the base64 decoding,
    etc. - how do I accomplish this with MIME::parser ??

    Code:
    use MIME::parser;

    if (@ARGV[0] eq undef) {
    $filename1="no dest filename" ;
    } else {
    $filename1=@ARGV[0];
    }

    ### Create a new parser object:
    my $parser = new MIME::parser;

    ### Tell it where to put things:
    $parser->output_under("e:\\tmp");

    ### Parse an input filehandle:
    $entity = $parser->parse($filename1);

    ### Congratulations: you now have a (possibly multipart) MIME
    entity!
    $entity->dump_skeleton;

    ####HERES THE OUTPUT
    Content-type: text/plain
    Effective-type: text/plain
    Content-encoding: 7bit
    Body-location: (IN CORE)
    Body-size: 0
    --

    ####
    It appears to not picking up this from the email itself -
    Content-Type: image/jpeg; name="cardamage1.jpg"
    Content-Disposition: attachment; filename="cardamage1.jpg"
    Content-Transfer-Encoding: base64
    X-Attachment-Id: f_fqzhlhly0


    ###
    Also I tried to build my own parser based on the "boundary" definition
    but as you can see from the below example, its not clear why I have >
    1 boundary !

    Date: Sun, 24 Aug 2008 06:46:48 -0700
    From: "Ben Brewster" <>
    To:
    Subject: car for sale two images
    MIME-Version: 1.0
    Content-Type: multipart/mixed;
    boundary="----=_Part_13503_152406.1219585608169"

    ------=_Part_13503_152406.1219585608169
    Content-Type: multipart/alternative;
    boundary="----=_Part_13504_19292996.1219585608169"

    ------=_Part_13504_19292996.1219585608169
    Content-Type: text/plain; charset=ISO-8859-1
    Content-Transfer-Encoding: 7bit
    Content-Disposition: inline

    Hi


    ------=_Part_13504_19292996.1219585608169
    Content-Type: text/html; charset=ISO-8859-1
    Content-Transfer-Encoding: 7bit
    Content-Disposition: inline

    <div dir="ltr"></div>

    ------=_Part_13504_19292996.1219585608169--

    ------=_Part_13503_152406.1219585608169
    Content-Type: image/jpeg; name=masertione.jpg
    Content-Transfer-Encoding: base64
    X-Attachment-Id: f_fk9pr8s20
    Content-Disposition: attachment; filename=masertione.jpg
     
    Jack, Feb 9, 2009
    #4
  5. Jack wrote:
    > On Feb 8, 12:08 pm, "Peter J. Holzer" <> wrote:
    >> On 2009-02-07 23:59, Jack <> wrote:
    >>
    >>> Hi I havent had any luck with the CPAN email modules, I just want to
    >>> parse multipart and mime and base64, with all the varieties of email
    >>> files out there, these modules just dont work...

    >> MIME::parser works for me. It is a bit slow and tends to use ridiculuous
    >> amounts of memory if you want to avoid temporary files, but I have yet
    >> to find a (syntactically correct) email which can't parse.

    >
    > Thanks Peter for the posting.. can you provide some guidance then.. I
    > tried the below code and figured the skeleton would report the base64
    > image attachments in a MIME message, but isnt picking it up. I need
    > to be able to deal with text body, base64 body, and image attachments,
    > and want to parse them out correctly. I can do the base64 decoding,
    > etc. - how do I accomplish this with MIME::parser ??
    >
    > Code:


    use warnings;
    use strict;

    > use MIME::parser;
    >
    > if (@ARGV[0] eq undef) {


    You cannot use undef in a comparison. Perl will just convert it
    internally to a numeric, or in this case, a string representation of
    "false", 0 or '' respectively. You shouldn't use a list in scalar
    context. If you had warnings enabled then perl would have warned about
    this.

    if ( not defined $ARGV[ 0 ] ) {

    > $filename1="no dest filename" ;
    > } else {
    > $filename1=@ARGV[0];


    $filename1 = $ARGV[ 0 ];

    > }


    Or if you have Perl version 5.10 installed you could write that as:

    my $filename1 = $ARGV[ 0 ] // 'no dest filename';

    For older perl's that would be:

    my $filename1 = defined $ARGV[ 0 ] ? $ARGV[ 0 ] : 'no dest filename';



    John
    --
    Those people who think they know everything are a great
    annoyance to those of us who do. -- Isaac Asimov
     
    John W. Krahn, Feb 10, 2009
    #5
  6. Jack

    Uri Guttman Guest

    >>>>> "JWK" == John W Krahn <> writes:

    >> use MIME::parser;
    >> if (@ARGV[0] eq undef) {


    JWK> You cannot use undef in a comparison. Perl will just convert it
    JWK> internally to a numeric, or in this case, a string representation of
    JWK> "false", 0 or '' respectively. You shouldn't use a list in scalar
    JWK> context. If you had warnings enabled then perl would have warned
    JWK> about this.

    couple of nits to pick. undef is coerced to '' with eq since it is
    string context. and @ARGV[0] is a slice but it will return a single
    value here. sure it is incorrect but it will work.

    JWK> if ( not defined $ARGV[ 0 ] ) {

    >> $filename1="no dest filename" ;
    >> } else {
    >> $filename1=@ARGV[0];


    JWK> $filename1 = $ARGV[ 0 ];

    >> }


    JWK> Or if you have Perl version 5.10 installed you could write that as:

    JWK> my $filename1 = $ARGV[ 0 ] // 'no dest filename';

    JWK> For older perl's that would be:

    JWK> my $filename1 = defined $ARGV[ 0 ] ? $ARGV[ 0 ] : 'no dest filename';

    you should know better. the best way to check for elements in an array
    is checking its count. since he wants only one arg this should do fine:

    @ARGV or die "missing file name argument" ;
    my $filename = shift ;

    and to the OP, you can never have an undef in @ARGV unless you put it
    there yourself. @ARGV is passed in from the exec call (the shell does
    this for command line programs) and shell doesn't know about undef.

    uri

    --
    Uri Guttman ------ -------- http://www.sysarch.com --
    ----- Perl Code Review , Architecture, Development, Training, Support ------
    --------- Free Perl Training --- http://perlhunter.com/college.html ---------
    --------- Gourmet Hot Cocoa Mix ---- http://bestfriendscocoa.com ---------
     
    Uri Guttman, Feb 10, 2009
    #6
  7. Jack

    Hans Mulder Guest

    Jack wrote:
    > On Feb 8, 12:08 pm, "Peter J. Holzer" <> wrote:
    >> On 2009-02-07 23:59, Jack <> wrote:
    >>
    >>> Hi I havent had any luck with the CPAN email modules, I just want to
    >>> parse multipart and mime and base64, with all the varieties of email
    >>> files out there, these modules just dont work...

    >> MIME::parser works for me. It is a bit slow and tends to use ridiculuous
    >> amounts of memory if you want to avoid temporary files, but I have yet
    >> to find a (syntactically correct) email which can't parse.
    >>
    >> hp

    >
    > Thanks Peter for the posting.. can you provide some guidance then.. I
    > tried the below code and figured the skeleton would report the base64
    > image attachments in a MIME message, but isnt picking it up.


    The parse() method takes a file handle argument. So you'll have to
    open the file yourself and pass the resulting handle to parse():

    use warnings;
    use strict;

    use MIME::parser;

    my $dir = "e:\\tmp";

    if (not -d $dir) {
    mkdir $dir or die "Can't create directory $dir: $!\n";
    }

    my $filename1 = $ARGV[0] || "no input filename";

    ### Create a new parser object:
    my $parser = new MIME::parser;

    ### Tell it where to put things:
    $parser->output_under($dir);

    ### Open the file:
    open my $fh, '<', $filename1 or die "Can't read $filename1: $!\n";

    ### Parse an input filehandle:
    my $entity = $parser->parse($fh);

    ### Congratulations: you now have a (possibly multipart) MIME entity!
    $entity->dump_skeleton;
    __END__

    This prints:

    Content-type: multipart/mixed
    Effective-type: multipart/mixed
    Body-file: NONE
    Subject: car for sale two images
    Num-parts: 2
    --
    Content-type: multipart/alternative
    Effective-type: multipart/alternative
    Body-file: NONE
    Num-parts: 2
    --
    Content-type: text/plain
    Effective-type: text/plain
    Body-file: e:\tmp/msg-1234304022-16083-0/msg-16083-1.txt
    --
    Content-type: text/html
    Effective-type: text/html
    Body-file: e:\tmp/msg-1234304022-16083-0/msg-16083-2.html
    --
    Content-type: image/jpeg
    Effective-type: image/jpeg
    Body-file: e:\tmp/msg-1234304022-16083-0/masertione.jpg
    Recommended-filename: masertione.jpg
    --

    > I need
    > to be able to deal with text body, base64 body, and image attachments,
    > and want to parse them out correctly. I can do the base64 decoding,
    > etc. -


    MIME::parser will do the base64 decoding for you.

    > how do I accomplish this with MIME::parser ??


    Read the documentation carefully:

    parse INSTREAM
    Instance method. Takes a MIME-stream and splits it into its compo-
    nent entities.

    The INSTREAM can be given as a readable FileHandle, an IO::File, a
    globref filehandle (like "\*STDIN"), or as any blessed object con-
    forming to the IO:: interface (which minimally implements getline()
    and read()).

    It does not mention the possibility of passing a filename and parse()
    opening it on your behalf. This suggest that this feature does not
    exist in this version of MIME::parser.

    Hope this helps,

    -- HansM
     
    Hans Mulder, Feb 10, 2009
    #7
  8. Jack

    Jack Guest

    On Feb 10, 2:38 pm, Hans Mulder <> wrote:
    > Jack wrote:
    > > On Feb 8, 12:08 pm, "Peter J. Holzer" <> wrote:
    > >> On 2009-02-07 23:59, Jack <> wrote:

    >
    > >>> Hi I havent had any luck with the CPAN email modules, I just want to
    > >>> parse multipart andmimeand base64, with all the varieties of email
    > >>> files out there, these modules just dont work...
    > >>MIME::parser works for me. It is a bit slow and tends to use ridiculuous
    > >> amounts of memory if you want to avoid temporary files, but I have yet
    > >> to find a (syntactically correct) email which can't parse.

    >
    > >>         hp

    >
    > > Thanks Peter for the posting.. can you provide some guidance then.. I
    > > tried the below code and figured the skeleton would report the base64
    > > image attachments in aMIMEmessage, but isnt picking it up.

    >
    > The parse() method takes a file handle argument.  So you'll have to
    > open the file yourself and pass the resulting handle to parse():
    >
    > use warnings;
    > use strict;
    >
    > useMIME::parser;
    >
    > my $dir = "e:\\tmp";
    >
    > if (not -d $dir) {
    >      mkdir $dir or die "Can't create directory $dir: $!\n";
    >
    > }
    >
    > my $filename1 = $ARGV[0] || "no input filename";
    >
    > ### Create a new parser object:
    > my $parser = newMIME::parser;
    >
    > ### Tell it where to put things:
    > $parser->output_under($dir);
    >
    > ### Open the file:
    > open my $fh, '<', $filename1 or die "Can't read $filename1: $!\n";
    >
    > ### Parse an input filehandle:
    > my $entity = $parser->parse($fh);
    >
    > ### Congratulations: you now have a (possibly multipart)MIMEentity!
    > $entity->dump_skeleton;
    > __END__
    >
    > This prints:
    >
    > Content-type: multipart/mixed
    > Effective-type: multipart/mixed
    > Body-file: NONE
    > Subject: car for sale two images
    > Num-parts: 2
    > --
    >      Content-type: multipart/alternative
    >      Effective-type: multipart/alternative
    >      Body-file: NONE
    >      Num-parts: 2
    >      --
    >          Content-type: text/plain
    >          Effective-type: text/plain
    >          Body-file: e:\tmp/msg-1234304022-16083-0/msg-16083-1.txt
    >          --
    >          Content-type: text/html
    >          Effective-type: text/html
    >          Body-file: e:\tmp/msg-1234304022-16083-0/msg-16083-2.html
    >          --
    >      Content-type: image/jpeg
    >      Effective-type: image/jpeg
    >      Body-file: e:\tmp/msg-1234304022-16083-0/masertione.jpg
    >      Recommended-filename: masertione.jpg
    >      --
    >
    > > I need
    > > to be able to deal with text body, base64 body, and image attachments,
    > > and want to parse them out correctly.  I can do the base64 decoding,
    > > etc. -

    >
    > MIME::parser will do the base64 decoding for you.
    >
    > > how do I accomplish this withMIME::parser ??

    >
    > Read the documentation carefully:
    >
    > parse INSTREAM
    >     Instance method.  Takes aMIME-stream and splits it into its compo-
    >     nent entities.
    >
    >     The INSTREAM can be given as a readable FileHandle, an IO::File, a
    >     globref filehandle (like "\*STDIN"), or as any blessed object con-
    >     forming to the IO:: interface (which minimally implements getline()
    >     and read()).
    >
    > It does not mention the possibility of passing a filename and parse()
    > opening it on your behalf.  This suggest that this feature does not
    > exist in this version ofMIME::parser.
    >
    > Hope this helps,
    >
    > -- HansM


    Thanks Hans... can you tell me if MIME:parser will handle / process
    RFC (non mime) emails ?
     
    Jack, Feb 21, 2009
    #8
  9. Jack

    Jack Guest

    On Feb 10, 2:38 pm, Hans Mulder <> wrote:
    > Jack wrote:
    > > On Feb 8, 12:08 pm, "Peter J. Holzer" <> wrote:
    > >> On 2009-02-07 23:59, Jack <> wrote:

    >
    > >>> Hi I havent had any luck with the CPAN email modules, I just want to
    > >>> parse multipart andmimeand base64, with all the varieties of email
    > >>> files out there, these modules just dont work...
    > >>MIME::parser works for me. It is a bit slow and tends to use ridiculuous
    > >> amounts of memory if you want to avoid temporary files, but I have yet
    > >> to find a (syntactically correct) email which can't parse.

    >
    > >>         hp

    >
    > > Thanks Peter for the posting.. can you provide some guidance then.. I
    > > tried the below code and figured the skeleton would report the base64
    > > image attachments in aMIMEmessage, but isnt picking it up.

    >
    > The parse() method takes a file handle argument.  So you'll have to
    > open the file yourself and pass the resulting handle to parse():
    >
    > use warnings;
    > use strict;
    >
    > useMIME::parser;
    >
    > my $dir = "e:\\tmp";
    >
    > if (not -d $dir) {
    >      mkdir $dir or die "Can't create directory $dir: $!\n";
    >
    > }
    >
    > my $filename1 = $ARGV[0] || "no input filename";
    >
    > ### Create a new parser object:
    > my $parser = newMIME::parser;
    >
    > ### Tell it where to put things:
    > $parser->output_under($dir);
    >
    > ### Open the file:
    > open my $fh, '<', $filename1 or die "Can't read $filename1: $!\n";
    >
    > ### Parse an input filehandle:
    > my $entity = $parser->parse($fh);
    >
    > ### Congratulations: you now have a (possibly multipart)MIMEentity!
    > $entity->dump_skeleton;
    > __END__
    >
    > This prints:
    >
    > Content-type: multipart/mixed
    > Effective-type: multipart/mixed
    > Body-file: NONE
    > Subject: car for sale two images
    > Num-parts: 2
    > --
    >      Content-type: multipart/alternative
    >      Effective-type: multipart/alternative
    >      Body-file: NONE
    >      Num-parts: 2
    >      --
    >          Content-type: text/plain
    >          Effective-type: text/plain
    >          Body-file: e:\tmp/msg-1234304022-16083-0/msg-16083-1.txt
    >          --
    >          Content-type: text/html
    >          Effective-type: text/html
    >          Body-file: e:\tmp/msg-1234304022-16083-0/msg-16083-2.html
    >          --
    >      Content-type: image/jpeg
    >      Effective-type: image/jpeg
    >      Body-file: e:\tmp/msg-1234304022-16083-0/masertione.jpg
    >      Recommended-filename: masertione.jpg
    >      --
    >
    > > I need
    > > to be able to deal with text body, base64 body, and image attachments,
    > > and want to parse them out correctly.  I can do the base64 decoding,
    > > etc. -

    >
    > MIME::parser will do the base64 decoding for you.
    >
    > > how do I accomplish this withMIME::parser ??

    >
    > Read the documentation carefully:
    >
    > parse INSTREAM
    >     Instance method.  Takes aMIME-stream and splits it into its compo-
    >     nent entities.
    >
    >     The INSTREAM can be given as a readable FileHandle, an IO::File, a
    >     globref filehandle (like "\*STDIN"), or as any blessed object con-
    >     forming to the IO:: interface (which minimally implements getline()
    >     and read()).
    >
    > It does not mention the possibility of passing a filename and parse()
    > opening it on your behalf.  This suggest that this feature does not
    > exist in this version ofMIME::parser.
    >
    > Hope this helps,
    >
    > -- HansM


    Also how does one capture the directory name its creating on the fly
    into a variable ??
     
    Jack, Feb 21, 2009
    #9
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Bernd Oninger
    Replies:
    0
    Views:
    764
    Bernd Oninger
    Jun 9, 2004
  2. ZOCOR

    XML Parser VS HTML Parser

    ZOCOR, Oct 3, 2004, in forum: Java
    Replies:
    11
    Views:
    818
    Paul King
    Oct 5, 2004
  3. Bernd Oninger
    Replies:
    0
    Views:
    813
    Bernd Oninger
    Jun 9, 2004
  4. Joel Hedlund
    Replies:
    2
    Views:
    511
    Joel Hedlund
    Nov 11, 2006
  5. Matěj Cepl
    Replies:
    0
    Views:
    76
    Matěj Cepl
    Jan 9, 2014
Loading...

Share This Page