Simple Regex Doubt

Discussion in 'Perl Misc' started by Donato Azevedo, Jul 16, 2009.

  1. Hi everyone,

    I've got a simple question to which Ive, to this point, not been able
    to solve:

    I have these regexes which I want to convert into a single one:

    if ( $raw_content =~ /Doc1(?:=rev)?:(?<document1>.*?)\r\n
    Doc2(?:=rev)?:(?<document2>.*?)\r\n
    Item:(?<item>.*?)\r\n
    Data\s+doc1:(?<data1>.*?)\r\n
    Data\s+doc2:(?<data2>.*?)\r\n
    Obs:(?<observation>.*?)\r\n
    Critic:(?<criticality>.*?)\r\n
    Comments:(?<comments>.*)
    /isx ||
    $raw_content =~ /Doc1(?:=rev)?:(?<document1>.*?)\r\n
    Doc2(?:=rev)?:(?<document2>.*?)\r\n
    Item:(?<item>.*?)\r\n
    Data\s+doc1:(?<data1>.*?)\r\n
    Data\s+doc2:(?<data2>.*?)\r\n
    Obs:(?<observation>.*?)\r\n
    Critic:(?<criticality>.*)
    /isx ) {

    this is to match text that can either end in:

    Critic:foobartext

    or

    Critic:foo
    Comments:bar

    The problem seems to be the greediness of the last captures: I tried
    doing

    Critic:(?<criticality>.*?)(\r\nComments:(?<comments>.*))?

    and

    Critic:(?<criticality>.*)(\r\nComments:(?<comments>.*))?

    but I must be missing something... It must be something quite simple
    I'd say.

    Well, any ideas?
     
    Donato Azevedo, Jul 16, 2009
    #1
    1. Advertising

  2. Donato Azevedo

    C.DeRykus Guest

    On Jul 16, 9:17 am, Donato Azevedo <> wrote:
    > Hi everyone,
    >
    > I've got a simple question to which Ive, to this point, not been able
    > to solve:
    >
    > I have these regexes which I want to convert into a single one:
    >
    >         if ( $raw_content =~ /Doc1(?:=rev)?:(?<document1>.*?)\r\n
    >                                           Doc2(?:=rev)?:(?<document2>.*?)\r\n
    >                                           Item:(?<item>.*?)\r\n
    >                                           Data\s+doc1:(?<data1>.*?)\r\n
    >                                           Data\s+doc2:(?<data2>.*?)\r\n
    >                                           Obs:(?<observation>.*?)\r\n
    >                                           Critic:(?<criticality>.*?)\r\n
    >                                           Comments:(?<comments>.*)
    >                                         /isx ||
    >         $raw_content =~ /Doc1(?:=rev)?:(?<document1>.*?)\r\n
    >                                          Doc2(?:=rev)?:(?<document2>.*?)\r\n
    >                                          Item:(?<item>.*?)\r\n
    >                                          Data\s+doc1:(?<data1>.*?)\r\n
    >                                          Data\s+doc2:(?<data2>.*?)\r\n
    >                                          Obs:(?<observation>.*?)\r\n
    >                                          Critic:(?<criticality>.*)
    >                                         /isx ) {
    >
    > this is to match text that can either end in:
    >
    > Critic:foobartext
    >
    > or
    >
    > Critic:foo
    > Comments:bar
    >
    > The problem seems to be the greediness of the last captures: I tried
    > doing
    >
    > Critic:(?<criticality>.*?)(\r\nComments:(?<comments>.*))?
    >
    > and
    >
    > Critic:(?<criticality>.*)(\r\nComments:(?<comments>.*))?
    >
    > but I must be missing something... It must be something quite simple
    > I'd say.
    >
    > Well, any ideas?



    You might want to post a simple, minimal example to
    demo what is/isn't working. The following worked
    for me:

    $_ = <<'END';
    one line
    another line
    Critic: foobartext
    Comments: bunches of comments
    END
    my $regex = qr /.*? Critic: (?<criticality>.*?)\n
    (?:Comments: (?<comments>.*))?
    /isx;
    if ( /$regex/ ) {
    print "criticality: $+{criticality}", "\n",
    "comments: $+{comments}"
    }

    --
    Charles DeRykus
     
    C.DeRykus, Jul 16, 2009
    #2
    1. Advertising

  3. Donato Azevedo

    Guest

    On Thu, 16 Jul 2009 09:17:59 -0700 (PDT), Donato Azevedo <> wrote:

    >Hi everyone,
    >
    >I've got a simple question to which Ive, to this point, not been able
    >to solve:
    >
    >I have these regexes which I want to convert into a single one:
    >
    > if ( $raw_content =~ /Doc1(?:=rev)?:(?<document1>.*?)\r\n
    > Doc2(?:=rev)?:(?<document2>.*?)\r\n
    > Item:(?<item>.*?)\r\n
    > Data\s+doc1:(?<data1>.*?)\r\n
    > Data\s+doc2:(?<data2>.*?)\r\n
    > Obs:(?<observation>.*?)\r\n
    > Critic:(?<criticality>.*?)\r\n
    > Comments:(?<comments>.*)
    > /isx ||
    > $raw_content =~ /Doc1(?:=rev)?:(?<document1>.*?)\r\n
    > Doc2(?:=rev)?:(?<document2>.*?)\r\n
    > Item:(?<item>.*?)\r\n
    > Data\s+doc1:(?<data1>.*?)\r\n
    > Data\s+doc2:(?<data2>.*?)\r\n
    > Obs:(?<observation>.*?)\r\n
    > Critic:(?<criticality>.*)
    > /isx ) {
    >
    >this is to match text that can either end in:
    >
    >Critic:foobartext
    >
    >or
    >
    >Critic:foo
    >Comments:bar
    >
    >The problem seems to be the greediness of the last captures: I tried
    >doing
    >
    >Critic:(?<criticality>.*?)(\r\nComments:(?<comments>.*))?
    >
    >and
    >
    >Critic:(?<criticality>.*)(\r\nComments:(?<comments>.*))?
    >
    >but I must be missing something... It must be something quite simple
    >I'd say.
    >
    >Well, any ideas?


    Wow, looks complicated, but isin't. Yes, as DeRykus says,
    you need a quantifier '?' (0 or 1) around a non capture grouping
    of --> Critic:(?<criticality>.*) in the first regex.

    This will at least assign $+{criticality} a '' if there is no 'Critic:'
    data (.*)and will assign (just like the $n vars I think) undef if there is no 'Critic:'

    I haven't checked 5.10 much but, there may not even exist $+{criticality} if '?'
    for the group is 0. Regex satisfied, but who knows how %+ hash is reset.
    Probably exists, but set to undef, like its unamed capture counterpart.

    Btw, whats this bizz: /(.*?)\r\n/s ??

    -sln
     
    , Jul 17, 2009
    #3
  4. Donato Azevedo

    Guest

    On Fri, 17 Jul 2009 12:46:17 -0700, wrote:

    >On Thu, 16 Jul 2009 09:17:59 -0700 (PDT), Donato Azevedo <> wrote:


    <snip>

    >Wow, looks complicated, but isin't. Yes, as DeRykus says,
    >you need a quantifier '?' (0 or 1) around a non capture grouping
    >of --> Critic:(?<criticality>.*) in the first regex.
    >
    >This will at least assign $+{criticality} a '' if there is no 'Critic:'
    >data (.*)and will assign (just like the $n vars I think) undef if there is no 'Critic:'
    >
    >I haven't checked 5.10 much but, there may not even exist $+{criticality} if '?'
    >for the group is 0. Regex satisfied, but who knows how %+ hash is reset.
    >Probably exists, but set to undef, like its unamed capture counterpart.
    >
    >Btw, whats this bizz: /(.*?)\r\n/s ??
    >
    >-sln
    >


    ^^
    Oh, I'm sorry, s/comments/criticality/g it the above reply-post.

    -sln
     
    , Jul 17, 2009
    #4
  5. Donato Azevedo

    Guest

    On Fri, 17 Jul 2009 12:52:22 -0700, wrote:

    >On Fri, 17 Jul 2009 12:46:17 -0700, wrote:
    >
    >>On Thu, 16 Jul 2009 09:17:59 -0700 (PDT), Donato Azevedo <> wrote:

    >
    ><snip>
    >
    >>Wow, looks complicated, but isin't. Yes, as DeRykus says,
    >>you need a quantifier '?' (0 or 1) around a non capture grouping
    >>of --> Critic:(?<criticality>.*) in the first regex.
    >>
    >>This will at least assign $+{criticality} a '' if there is no 'Critic:'
    >>data (.*)and will assign (just like the $n vars I think) undef if there is no 'Critic:'
    >>
    >>I haven't checked 5.10 much but, there may not even exist $+{criticality} if '?'
    >>for the group is 0. Regex satisfied, but who knows how %+ hash is reset.
    >>Probably exists, but set to undef, like its unamed capture counterpart.
    >>
    >>Btw, whats this bizz: /(.*?)\r\n/s ??
    >>
    >>-sln
    >>

    >
    >^^
    >Oh, I'm sorry, s/comments/criticality/g it the above reply-post.
    >
    >-sln


    Warning!! ignore that man behind the curtain..
    The saga continues, s/criticality/comments/g
    Dyslexia is a terrible thing to waste.

    -sln
     
    , Jul 17, 2009
    #5
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Bob Nelson

    doubt about doubt

    Bob Nelson, Jul 28, 2006, in forum: C Programming
    Replies:
    11
    Views:
    641
  2. Replies:
    3
    Views:
    795
    Reedick, Andrew
    Jul 1, 2008
  3. Replies:
    0
    Views:
    581
  4. Peter Otten
    Replies:
    2
    Views:
    131
    Cousin Stanley
    Aug 10, 2013
  5. Terry Reedy
    Replies:
    0
    Views:
    124
    Terry Reedy
    Aug 10, 2013
Loading...

Share This Page