Help with a regexp please

Discussion in 'Perl Misc' started by Nigel Scott, Jan 29, 2004.

  1. Nigel Scott

    Nigel Scott Guest

    Hi

    I am writing a piece of perl for processing emails and part of the
    process involves finding the boundaries of multiple MIME parts.

    I am trying to extract the boundary from the headers using a pattern
    like this:

    my $pattern = ".*boundary *= *[\'\"]*(.*)[\'\"]*.*";

    This is to cover the cases where the boundary itself may be contained in
    double quotes, single quotes or no quotes at all. For some reason
    though, if the boundary is contained double quotes, eg.

    Content-Type: multipart/mixed;
    boundary="----=_Part_174034_7372797.1070374686532"

    and I use:
    my $boundary =~ s/$pattern/$1/is;

    $boundary becomes ----=_Part_174034_7372797.1070374686532'
    with an extra single quote on the end.

    I have tried looking at various perl and regexp tutorials, but I can't
    work out what is wrong with my pattern.

    Any help appreciated,
    Nige.
    Nigel Scott, Jan 29, 2004
    #1
    1. Advertising

  2. Nigel Scott <> writes:

    > I am writing a piece of perl for processing emails and part of the
    > process involves finding the boundaries of multiple MIME parts.


    There are modules to do that, you know.

    > my $pattern = ".*boundary *= *[\'\"]*(.*)[\'\"]*.*";

    ^^^^^^^^^^^

    Firstly it's easier to see what's what if you qute regex using qr//
    not qq().

    my $pattern = qr/.*boundary *= *['"]*(.*)['"]*.*/;

    Now let's focus on just one bit of that

    /['"]*(.*)['"]*/

    If you have two greedy subexpressions in a regex the first one gets
    first bite and the character class ['"] is a subset of the character
    class . so the above is equivalent to:

    /['"]*(.*)/

    Perhaps you meant

    /(['"]?)(.*)\1/

    For real examples of parsing MIME headers see the source code of the
    modules you should be using anyhow.

    --
    \\ ( )
    . _\\__[oo
    .__/ \\ /\@
    . l___\\
    # ll l\\
    ###LL LL\\
    Brian McCauley, Jan 29, 2004
    #2
    1. Advertising

  3. Nigel Scott

    Nige Guest

    Brian McCauley wrote:

    > If you have two greedy subexpressions in a regex the first one gets
    > first bite and the character class ['"] is a subset of the character
    > class . so the above is equivalent to:
    >
    > /['"]*(.*)/
    >
    > Perhaps you meant
    >
    > /(['"]?)(.*)\1/
    >
    > For real examples of parsing MIME headers see the source code of the
    > modules you should be using anyhow.
    >


    Hi Brian - thanks for the reply.

    I have actually installed the MIME::parser modules and attempted to use
    them, however I end up with empty files for each part of the message,
    and only certain parts are written. The debug from the module simple
    says something along the lines of "writing to file" and then finishes
    with some timing stats. I don't have the exact output as I am now at
    home. Also, I don't really need the full MIME parsing functionality -
    all I need is to extract the inline text/plain parts from the message,
    hence my attempts you see above.

    I've had a read up about greedy expressions and understand that my
    pattern is wrong, so I'll give it another with a ? instead, tomorrow at
    work.

    Thanks again,
    Nige
    Nige, Jan 29, 2004
    #3
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. KK
    Replies:
    2
    Views:
    516
    Big Brian
    Oct 14, 2003
  2. Greg Hurrell
    Replies:
    4
    Views:
    148
    James Edward Gray II
    Feb 14, 2007
  3. Mikel Lindsaar
    Replies:
    0
    Views:
    461
    Mikel Lindsaar
    Mar 31, 2008
  4. Joao Silva
    Replies:
    16
    Views:
    337
    7stud --
    Aug 21, 2009
  5. Uldis  Bojars
    Replies:
    2
    Views:
    183
    Janwillem Borleffs
    Dec 17, 2006
Loading...

Share This Page