Regexp greediness.

Discussion in 'Perl Misc' started by adamomitcheney@kiwis.co.uk, Feb 14, 2006.

  1. Guest

    Hi there Perl gurus,

    I'm using (trying to use) a regexp to extract a path and a comment from
    the output of a 'describe' command in clearcase. I suspect I'm being
    daft, so please go easy on me... I have read what I think is the
    appropriate perldoc (perldoc -q greedy - "What does it mean that
    regexes are greedy? How can I get around it? greedy greediness"), but
    I'm already doing what it suggests - that is, reducing the greediness
    of the '.+' expression with a '?'. I guess I must have missed
    something..

    The input ($hl) should look something like this:
    ->
    M:\adam_admin\eq-gla_playpen\eq-cc-gla_playpen\testforccase\TestTwo.PRM@@
    "This is the to-text"

    I'm trying to get at the path and the comment:

    if ($hl =~ m%^[->|<-]%)
    {
    $hl =~ s%^[->|<-] (.+?)@@ "(.+?)"%$1%;
    $comment = $2;
    }
    else
    {
    $hl = 0;
    }
    print "Target is $hl\n";
    print "Comment is \"$comment\"\n";

    Produces the following output:
    Target is ->
    M:\adam_admin\eq-gla_playpen\eq-cc-gla_playpen\testforccase\TestTwo.PRM@@
    "This is the to-text"
    Comment is ""

    I've also tried escaping the '@@' thus '\@\@' but that hasn't made any
    difference.

    I realise I could probably use split to do this and then substitute out
    the -> or <-, but I'm quite keen to understand what I'm doing wrong.

    Cheers - Adam...
    , Feb 14, 2006
    #1
    1. Advertising

  2. Paul Lalli Guest

    wrote:

    > I'm using (trying to use) a regexp to extract a path and a comment from
    > the output of a 'describe' command in clearcase. I suspect I'm being
    > daft, so please go easy on me... I have read what I think is the
    > appropriate perldoc (perldoc -q greedy - "What does it mean that
    > regexes are greedy? How can I get around it? greedy greediness"), but
    > I'm already doing what it suggests - that is, reducing the greediness
    > of the '.+' expression with a '?'. I guess I must have missed
    > something..


    Yes... the documentation for what the special characters do in a
    regexp... ;-)

    > The input ($hl) should look something like this:


    In general "something like this" is not good enough. When composing
    your post, you should endeavor to give *actual* sample input, output,
    and desired output.

    > ->
    > M:\adam_admin\eq-gla_playpen\eq-cc-gla_playpen\testforccase\TestTwo.PRM@@
    > "This is the to-text"
    >
    > I'm trying to get at the path and the comment:
    >
    > if ($hl =~ m%^[->|<-]%)


    Are you under the impression that this is looking for either a '->' or
    a '<-' sequence? It's not. It's looking for exactly one of -, >, |,
    or <.

    > {
    > $hl =~ s%^[->|<-] (.+?)@@ "(.+?)"%$1%;


    Is there a space after ] and before (? I'd guess that's (at least part
    of) your problem. You're looking for the beginning of the string,
    exactly one of -, >, |, or <, followed by a space. No such sequence
    exists.

    > $comment = $2;


    Never assign to a $1, $2, $3 etc variable without verifying that the
    match succeeded:

    if ($h1 =~ s/<your pattern here>/<your replacement here>/){
    print "s/// succeeded\n";
    $comment = $2;
    } else {
    warn "s/// failed\n";
    }

    > }
    > else
    > {
    > $hl = 0;
    > }
    > print "Target is $hl\n";
    > print "Comment is \"$comment\"\n";
    >
    > Produces the following output:
    > Target is ->
    > M:\adam_admin\eq-gla_playpen\eq-cc-gla_playpen\testforccase\TestTwo.PRM@@
    > "This is the to-text"
    > Comment is ""
    >
    > I've also tried escaping the '@@' thus '\@\@' but that hasn't made any
    > difference.


    The "throw it at the wall and see what sticks" method is rarely a good
    way of programming.

    > I realise I could probably use split to do this


    I doubt it. split() uses regexps too, so you'd probably copy over your
    error.

    > and then substitute out
    > the -> or <-, but I'm quite keen to understand what I'm doing wrong.


    [ ] make up a character class. They look for any ONE character within
    their contained class. You were trying to do an alternation and
    cluster the two alternates together. That is accomplished with
    parentheses and the |, like so:

    (?:->|<-)

    The ?: prevents these parentheses from being recognized as a capturing
    grouping, thus setting one of the $1, $2, etc variables.

    perldoc perlretut
    perldoc perlre
    perldoc perlreref

    Paul Lalli
    Paul Lalli, Feb 14, 2006
    #2
    1. Advertising

  3. Guest

    I'd just like to start with a D'Oh!

    >> of the '.+' expression with a '?'. I guess I must have missed
    >> something..

    > Yes... the documentation for what the special characters do in a
    > regexp... ;-)


    No, it was just being daft - I wasn't even looking there (initially
    using '[' when I should have been grouping with '()' and I should have
    known better.

    > [ ] make up a character class. They look for any ONE character within
    > their contained class. You were trying to do an alternation and
    > cluster the two alternates together. That is accomplished with
    > parentheses and the |, like so:
    >
    > (?:->|<-)
    >
    > The ?: prevents these parentheses from being recognized as a capturing
    > grouping, thus setting one of the $1, $2, etc variables.


    Aye, that was what I was missing.

    Thanks for taking the time to point it out.

    Adam...
    , Feb 14, 2006
    #3
  4. wrote:
    > I'm using (trying to use) a regexp to extract a path and a comment from
    > the output of a 'describe' command in clearcase. I suspect I'm being
    > daft, so please go easy on me... I have read what I think is the
    > appropriate perldoc (perldoc -q greedy - "What does it mean that
    > regexes are greedy? How can I get around it? greedy greediness"), but
    > I'm already doing what it suggests - that is, reducing the greediness
    > of the '.+' expression with a '?'. I guess I must have missed
    > something..


    Well, your problem has nothing to do with greediness.

    > The input ($hl) should look something like this:
    > ->
    > M:\adam_admin\eq-gla_playpen\eq-cc-gla_playpen\testforccase\TestTwo.PRM@@
    > "This is the to-text"
    >
    > I'm trying to get at the path and the comment:
    >
    > if ($hl =~ m%^[->|<-]%)


    You use the notation for a character class, but you probably just want
    to capture the alternate arrows:

    if ($hl =~ m%^(?:->|<-)%)

    > {
    > $hl =~ s%^[->|<-] (.+?)@@ "(.+?)"%$1%;

    ------------------^-----^^-------^

    1. See the above comment
    2. A blank doesn't match a newline
    3. No need to make them non-greedy (even if that doesn't hurt...)

    In other words, this line should do it:

    $hl =~ s%^(?:->|<-)\s+(.+)@@\s+"(.+)"%$1%;

    If you haven't already, please also study "perldoc perlre".

    --
    Gunnar Hjalmarsson
    Email: http://www.gunnar.cc/cgi-bin/contact.pl
    Gunnar Hjalmarsson, Feb 14, 2006
    #4
  5. Guest

    > 1. See the above comment
    > 2. A blank doesn't match a newline


    No, but wasn't intended to - I should have specified that the input I
    posted was all one line, but posting it on groups.google munged it a
    bit.

    > 3. No need to make them non-greedy (even if that doesn't hurt...)


    No, quite. I understand that now.

    > In other words, this line should do it:
    >
    > $hl =~ s%^(?:->|<-)\s+(.+)@@\s+"(.+)"%$1%;
    >
    > If you haven't already, please also study "perldoc perlre".


    I have done, but in this case it was a near-terminal case of stupidity
    brought on, I think, by tiredness.

    Thanks Gunnar.

    Adam...
    , Feb 14, 2006
    #5
  6. robic0 Guest

    On Tue, 14 Feb 2006 19:42:22 +0100, Gunnar Hjalmarsson <> wrote:

    > wrote:
    >> I'm using (trying to use) a regexp to extract a path and a comment from
    >> the output of a 'describe' command in clearcase. I suspect I'm being
    >> daft, so please go easy on me... I have read what I think is the
    >> appropriate perldoc (perldoc -q greedy - "What does it mean that
    >> regexes are greedy? How can I get around it? greedy greediness"), but
    >> I'm already doing what it suggests - that is, reducing the greediness
    >> of the '.+' expression with a '?'. I guess I must have missed
    >> something..

    >
    >Well, your problem has nothing to do with greediness.
    >
    >> The input ($hl) should look something like this:
    >> ->
    >> M:\adam_admin\eq-gla_playpen\eq-cc-gla_playpen\testforccase\TestTwo.PRM@@
    >> "This is the to-text"
    >>
    >> I'm trying to get at the path and the comment:
    >>
    >> if ($hl =~ m%^[->|<-]%)

    >
    >You use the notation for a character class, but you probably just want
    >to capture the alternate arrows:
    >
    > if ($hl =~ m%^(?:->|<-)%)
    >
    >> {
    >> $hl =~ s%^[->|<-] (.+?)@@ "(.+?)"%$1%;

    >------------------^-----^^-------^
    >
    >1. See the above comment
    >2. A blank doesn't match a newline
    >3. No need to make them non-greedy (even if that doesn't hurt...)
    >
    >In other words, this line should do it:
    >
    > $hl =~ s%^(?:->|<-)\s+(.+)@@\s+"(.+)"%$1%;


    I didn't read the whole thread yet, but just a note on this line..

    $hl =~ s%^(?:->|<-)\s+(.+)@@\s+"(.+)"%$1%;

    This non-capture grouping '(?:->|<-)' will only match ->- or -<-
    If thats whats needed then the grouping is not really necessary,
    ->|<- does the same thing.

    It might be possible (?:->)|(?:<-) was intended, in this case it
    will match -> or <-
    >
    >If you haven't already, please also study "perldoc perlre".
    robic0, Feb 15, 2006
    #6
  7. robic0 Guest

    On Tue, 14 Feb 2006 19:42:22 +0100, Gunnar Hjalmarsson <> wrote:

    > wrote:
    >> I'm using (trying to use) a regexp to extract a path and a comment from
    >> the output of a 'describe' command in clearcase. I suspect I'm being
    >> daft, so please go easy on me... I have read what I think is the
    >> appropriate perldoc (perldoc -q greedy - "What does it mean that
    >> regexes are greedy? How can I get around it? greedy greediness"), but
    >> I'm already doing what it suggests - that is, reducing the greediness
    >> of the '.+' expression with a '?'. I guess I must have missed
    >> something..

    >
    >Well, your problem has nothing to do with greediness.
    >
    >> The input ($hl) should look something like this:
    >> ->
    >> M:\adam_admin\eq-gla_playpen\eq-cc-gla_playpen\testforccase\TestTwo.PRM@@
    >> "This is the to-text"
    >>
    >> I'm trying to get at the path and the comment:
    >>
    >> if ($hl =~ m%^[->|<-]%)

    >
    >You use the notation for a character class, but you probably just want
    >to capture the alternate arrows:
    >
    > if ($hl =~ m%^(?:->|<-)%)
    >
    >> {
    >> $hl =~ s%^[->|<-] (.+?)@@ "(.+?)"%$1%;

    >------------------^-----^^-------^
    >
    >1. See the above comment
    >2. A blank doesn't match a newline
    >3. No need to make them non-greedy (even if that doesn't hurt...)
    >
    >In other words, this line should do it:
    >
    > $hl =~ s%^(?:->|<-)\s+(.+)@@\s+"(.+)"%$1%;
    >


    This '(.+)' or '(.*)' should never be used without '?' unless you intend to capture
    all until the end of the line (or restriction), otherwise (.+?) must be used mid-string
    when real content follows in the match requirement.

    In this case the '"' would be captured and result in a failed match. Be especially
    careful when the intention of use is mid-string.

    Why would you allow any character but a newline here: "(.+)"%$1% ?
    Use the 's' modifier here.. $hl =~ s%^(?:->|<-)\s+(.+)@@\s+"(.+)"%$1%s;

    Possible new regex:

    $hl =~ s%^(?:->)|(?:<-)\s+(.+)@@\s+"(.+?)"%$1%s;

    -good luck-

    >If you haven't already, please also study "perldoc perlre".
    robic0, Feb 15, 2006
    #7
  8. robic0 Guest

    On Tue, 14 Feb 2006 19:42:22 +0100, Gunnar Hjalmarsson <> wrote:

    > wrote:
    >> I'm using (trying to use) a regexp to extract a path and a comment from
    >> the output of a 'describe' command in clearcase. I suspect I'm being
    >> daft, so please go easy on me... I have read what I think is the
    >> appropriate perldoc (perldoc -q greedy - "What does it mean that
    >> regexes are greedy? How can I get around it? greedy greediness"), but
    >> I'm already doing what it suggests - that is, reducing the greediness
    >> of the '.+' expression with a '?'. I guess I must have missed
    >> something..

    >
    >Well, your problem has nothing to do with greediness.
    >
    >> The input ($hl) should look something like this:
    >> ->
    >> M:\adam_admin\eq-gla_playpen\eq-cc-gla_playpen\testforccase\TestTwo.PRM@@
    >> "This is the to-text"
    >>
    >> I'm trying to get at the path and the comment:
    >>
    >> if ($hl =~ m%^[->|<-]%)

    >
    >You use the notation for a character class, but you probably just want
    >to capture the alternate arrows:
    >
    > if ($hl =~ m%^(?:->|<-)%)
    >
    >> {
    >> $hl =~ s%^[->|<-] (.+?)@@ "(.+?)"%$1%;

    >------------------^-----^^-------^
    >
    >1. See the above comment
    >2. A blank doesn't match a newline
    >3. No need to make them non-greedy (even if that doesn't hurt...)
    >
    >In other words, this line should do it:
    >
    > $hl =~ s%^(?:->|<-)\s+(.+)@@\s+"(.+)"%$1%;
    >
    >If you haven't already, please also study "perldoc perlre".


    Haven't studied it but I'm not seeing the need for special delimeters '%', maybe you could
    instruct me.

    Final note, this needs to be done globally. Just as an exercise, to do this en mass,
    assuming from a file...

    $hl = join ('', <DATA>);
    $hl =~ s/^(?:->)|(?:<-)\s+(.+)@@\s+"(.+?)"/$1/sg;

    Should you be doing this perpetually...

    $RxHl = qr/^(?:->)|(?:<-)\s+(.+)@@\s+"(.+?)"/;
    while ($hl = <DATA>) {
    $hl =~ s/$RxHl/g;
    }
    robic0, Feb 15, 2006
    #8
  9. robic0 wrote:
    > Gunnar Hjalmarsson wrote:
    >>
    >> $hl =~ s%^(?:->|<-)\s+(.+)@@\s+"(.+)"%$1%;

    >
    > I didn't read the whole thread yet,


    That's a break of the netiquette. OTOH, in your case it probably
    wouldn't have made a difference.

    > but just a note on this line..
    >
    > $hl =~ s%^(?:->|<-)\s+(.+)@@\s+"(.+)"%$1%;
    >
    > This non-capture grouping '(?:->|<-)' will only match ->- or -<-


    Wrong, but what else could we expect from that robic0 character?

    > If thats whats needed then the grouping is not really necessary,
    > ->|<- does the same thing.


    There is more in the regex but those arrows, so your discussion is out
    of context and thus irrelevant.

    > It might be possible (?:->)|(?:<-) was intended, in this case it
    > will match -> or <-


    Sigh.

    > This '(.+)' or '(.*)' should never be used without '?' unless you intend to capture
    > all until the end of the line (or restriction), otherwise (.+?) must be used mid-string
    > when real content follows in the match requirement.
    >
    > In this case the '"' would be captured and result in a failed match. Be especially
    > careful when the intention of use is mid-string.


    More BS statements. Fact is that greediness _never_ affects whether a
    regex matches or not.

    > Why would you allow any character but a newline here: "(.+)"%$1% ?
    > Use the 's' modifier here.. $hl =~ s%^(?:->|<-)\s+(.+)@@\s+"(.+)"%$1%s;


    LOL, robic0 commenting on the /s modifier again. Maybe you could explain
    how it would make a difference in this case? (Second thought: Please
    don't!!)

    --
    Gunnar Hjalmarsson
    Email: http://www.gunnar.cc/cgi-bin/contact.pl
    Gunnar Hjalmarsson, Feb 15, 2006
    #9
  10. robic0 wrote:
    > Gunnar Hjalmarsson wrote:
    >>
    >>In other words, this line should do it:
    >>
    >> $hl =~ s%^(?:->|<-)\s+(.+)@@\s+"(.+)"%$1%;
    >>
    >>If you haven't already, please also study "perldoc perlre".

    >
    > Haven't studied it


    There was absolutely no need to tell us that explicitly.

    --
    Gunnar Hjalmarsson
    Email: http://www.gunnar.cc/cgi-bin/contact.pl
    Gunnar Hjalmarsson, Feb 15, 2006
    #10
  11. robic0 Guest

    On Tue, 14 Feb 2006 19:42:22 +0100, Gunnar Hjalmarsson <> wrote:

    > wrote:
    >> I'm using (trying to use) a regexp to extract a path and a comment from
    >> the output of a 'describe' command in clearcase. I suspect I'm being
    >> daft, so please go easy on me... I have read what I think is the
    >> appropriate perldoc (perldoc -q greedy - "What does it mean that
    >> regexes are greedy? How can I get around it? greedy greediness"), but
    >> I'm already doing what it suggests - that is, reducing the greediness
    >> of the '.+' expression with a '?'. I guess I must have missed
    >> something..

    >
    >Well, your problem has nothing to do with greediness.
    >
    >> The input ($hl) should look something like this:
    >> ->
    >> M:\adam_admin\eq-gla_playpen\eq-cc-gla_playpen\testforccase\TestTwo.PRM@@
    >> "This is the to-text"
    >>
    >> I'm trying to get at the path and the comment:
    >>
    >> if ($hl =~ m%^[->|<-]%)

    >
    >You use the notation for a character class, but you probably just want
    >to capture the alternate arrows:
    >
    > if ($hl =~ m%^(?:->|<-)%)
    >
    >> {
    >> $hl =~ s%^[->|<-] (.+?)@@ "(.+?)"%$1%;

    >------------------^-----^^-------^
    >
    >1. See the above comment
    >2. A blank doesn't match a newline
    >3. No need to make them non-greedy (even if that doesn't hurt...)
    >
    >In other words, this line should do it:
    >
    > $hl =~ s%^(?:->|<-)\s+(.+)@@\s+"(.+)"%$1%;
    >
    >If you haven't already, please also study "perldoc perlre".


    Didn't see the other '(.+)', yes that needs a '?' as well...
    $hl =~ s/^(?:->)|(?:<-)\s+(.+?)@@\s+"(.+?)"/$1/sg; # should be done globally unless you know otherwise

    You should avoid repetative substitution if processing large data strings.
    In this case the qr// will do no good since it will
    not pre-compile the regexp with substitution unknowns.

    The faster alternative when processing large strings is to capture and continue...
    (this is just my opinion)

    while ($hl =~ /(?:(?:->)|(?:<-)\s+(.+?)@@\s+"(.+?)")|(.*?)/sg) {
    ( ( )|( ) 1 1 2 2 )|3 3
    if (defined ($1) {
    $hl_new .= $2;
    } else {$hl_new .= $3;}
    }

    I'm sure I've made nistakes in the other posts (actually 1)
    -good luck-
    robic0, Feb 15, 2006
    #11
  12. robic0 Guest

    On Tue, 14 Feb 2006 19:42:22 +0100, Gunnar Hjalmarsson <> wrote:

    > wrote:
    >> I'm using (trying to use) a regexp to extract a path and a comment from
    >> the output of a 'describe' command in clearcase. I suspect I'm being
    >> daft, so please go easy on me... I have read what I think is the
    >> appropriate perldoc (perldoc -q greedy - "What does it mean that
    >> regexes are greedy? How can I get around it? greedy greediness"), but
    >> I'm already doing what it suggests - that is, reducing the greediness
    >> of the '.+' expression with a '?'. I guess I must have missed
    >> something..

    >
    >Well, your problem has nothing to do with greediness.
    >
    >> The input ($hl) should look something like this:
    >> ->
    >> M:\adam_admin\eq-gla_playpen\eq-cc-gla_playpen\testforccase\TestTwo.PRM@@
    >> "This is the to-text"
    >>
    >> I'm trying to get at the path and the comment:
    >>
    >> if ($hl =~ m%^[->|<-]%)

    >
    >You use the notation for a character class, but you probably just want
    >to capture the alternate arrows:
    >
    > if ($hl =~ m%^(?:->|<-)%)
    >
    >> {
    >> $hl =~ s%^[->|<-] (.+?)@@ "(.+?)"%$1%;

    >------------------^-----^^-------^
    >
    >1. See the above comment
    >2. A blank doesn't match a newline
    >3. No need to make them non-greedy (even if that doesn't hurt...)
    >
    >In other words, this line should do it:
    >
    > $hl =~ s%^(?:->|<-)\s+(.+)@@\s+"(.+)"%$1%;
    >
    >If you haven't already, please also study "perldoc perlre".



    Here's just something to bust Gunnar's balls, its the
    anti-greedy formula, if you can understand it...

    $_ =
    qr/(?:<(?:(?:(\/*)($Name)\s*(\/*))|(?:META(.*?))|(?:($Name)((?:\s+$Name\s*=\s*["'][^<]*['"])+)\s*(\/*))|(?:\?(.*?)\?)|(?:!(?:(?:DOCTYPE(.*?))|(?:\[CDATA\[(.*?)\]\])|(?:--(.*?[^-])--)|(?:ATTLIST(.*?))|(?:ELEMENT(.*?))|(?:ENTITY(.*?)))))>)|(.+?)/s;

    ... Gunnar, float some iceburgs ...
    robic0, Feb 15, 2006
    #12
  13. Lukas Mai Guest

    robic0 schrob:
    >
    > $hl = join ('', <DATA>);


    Eww. $hl = do {local $/; <DATA>};
    Or just File::Slurp.

    > $hl =~ s/^(?:->)|(?:<-)\s+(.+)@@\s+"(.+?)"/$1/sg;


    This regex doesn't make sense. It's parsed as:

    ( ^-> ) | ( <-\s+(.+)@@\s+"(.+?)" )

    because | has very low precedence. (?:->) by itself is always the same
    as -> alone. This also means $1 is undef if the first part succeeds.

    Please read perldoc perlretut and perldoc perlre.

    HTH, Lukas
    Lukas Mai, Feb 15, 2006
    #13
  14. robic0 Guest

    On Tue, 14 Feb 2006 19:42:22 +0100, Gunnar Hjalmarsson <> wrote:

    > wrote:
    >> I'm using (trying to use) a regexp to extract a path and a comment from
    >> the output of a 'describe' command in clearcase. I suspect I'm being
    >> daft, so please go easy on me... I have read what I think is the
    >> appropriate perldoc (perldoc -q greedy - "What does it mean that
    >> regexes are greedy? How can I get around it? greedy greediness"), but
    >> I'm already doing what it suggests - that is, reducing the greediness
    >> of the '.+' expression with a '?'. I guess I must have missed
    >> something..

    >
    >Well, your problem has nothing to do with greediness.
    >
    >> The input ($hl) should look something like this:
    >> ->
    >> M:\adam_admin\eq-gla_playpen\eq-cc-gla_playpen\testforccase\TestTwo.PRM@@
    >> "This is the to-text"
    >>
    >> I'm trying to get at the path and the comment:
    >>
    >> if ($hl =~ m%^[->|<-]%)

    >
    >You use the notation for a character class, but you probably just want
    >to capture the alternate arrows:
    >
    > if ($hl =~ m%^(?:->|<-)%)
    >
    >> {
    >> $hl =~ s%^[->|<-] (.+?)@@ "(.+?)"%$1%;

    >------------------^-----^^-------^
    >
    >1. See the above comment
    >2. A blank doesn't match a newline
    >3. No need to make them non-greedy (even if that doesn't hurt...)
    >
    >In other words, this line should do it:
    >
    > $hl =~ s%^(?:->|<-)\s+(.+)@@\s+"(.+)"%$1%;
    >
    >If you haven't already, please also study "perldoc perlre".


    On the greedy issue, given

    "this string \"some other\"" =~ m/"(.+)"/;

    would match and $1 would equal <this string "some other">

    it matches the very first doubl quote and the very last double quote.

    To match 'this string ' use m/"(.+?)"/
    This is preferred since greedy is rarely the intention and it prevents
    over-match on imperfect (unknown) text sample data.

    As a general rule, always tack on a '?' (anti-greedy) when using wildcard like constructs that
    could match multiple characters, or a large set of characters, that would blurr or distort a
    specifically intended match construct.

    Some examples:
    ..+?
    ..*?
    [^']*?

    etc...
    robic0, Feb 15, 2006
    #14
  15. Paul Lalli Guest

    robic0 wrote:

    > On the greedy issue, given
    >
    > "this string \"some other\"" =~ m/"(.+)"/;
    >
    > would match and $1 would equal <this string "some other">


    No it wouldn't.

    > it matches the very first doubl quote and the very last double quote.


    Yes. There are only two double-quote characters in that string. One
    before 'some' and one after 'other'. The other " characters that you
    typed delimit the string, and are not a part of it.

    > To match 'this string ' use m/"(.+?)"/


    Nope. That would match the exact same thing hte non-greedy version
    matched.

    > This is preferred since greedy is rarely the intention


    It's the intention when it is the intention. There is no general rule.
    Regexps do what they're needed to do when they're needed to do it.

    > As a general rule, always tack on a '?' (anti-greedy) when using wildcard like
    > constructs that
    > could match multiple characters, or a large set of characters, that would blurr or distort


    No, as a general rule, write the right regexp for the given situation.
    If you need greediness, use greediness. If you don't, don't.

    Paul Lalli
    Paul Lalli, Feb 15, 2006
    #15
  16. Lukas Mai Guest

    robic0 schrob:
    >
    > Here's just something to bust Gunnar's balls, its the

    ^ it's
    > anti-greedy formula, if you can understand it...


    > $_ =
    > qr/(?:<(?:(?:(\/*)($Name)\s*(\/*))|(?:META(.*?))|(?:($Name)((?:\s+$Name\s*=\s*["'][^<]*['"])+)\s*(\/*))|(?:\?(.*?)\?)|(?:!(?:(?:DOCTYPE(.*?))|(?:\[CDATA\[(.*?)\]\])|(?:--(.*?[^-])--)|(?:ATTLIST(.*?))|(?:ELEMENT(.*?))|(?:ENTITY(.*?)))))>)|(.+?)/s;


    OK, let's see:
    The last (.+?) doesn't make sense because it's not followed by any
    pattern, which means +? will never backtrack to consume more. It should
    be equivalent to (.).

    The whole thing looks like a horribly broken regex for HTML parsing. It
    produces weird results for input like '<META content=">foo">' or '<img
    alt="foo"> this is not part of "foo">'. The last one is due to
    inappropriate greediness.

    > .. Gunnar, float some iceburgs ...

    I don't understand that but it's "icebergs".

    HTH, Lukas
    Lukas Mai, Feb 15, 2006
    #16
  17. <> wrote:

    > I'm using (trying to use) a regexp to extract a path and a comment from
    > the output of a 'describe' command in clearcase.


    > I have read what I think is the
    > appropriate perldoc (perldoc -q greedy



    Greediness has no application to the problem you specify.


    > I guess I must have missed
    > something..



    The "Using character classes" section in:

    perldoc perlretut


    > The input ($hl) should look something like this:



    Developing a regular expression requires an *exact* understanding
    of the format of the string to be matched against.

    In what ways can your data be different that what we have
    been shown?


    > ->
    > M:\adam_admin\eq-gla_playpen\eq-cc-gla_playpen\testforccase\TestTwo.PRM@@
    > "This is the to-text"



    You should speak Perl whenever possible.

    Have you seen the Posting Guidelines that are posted here frequently?


    > if ($hl =~ m%^[->|<-]%)



    That is exactly equivalent to:

    if ($hl =~ m%^[<>-|]%)

    Your string does start with a hyphen, so that part should be matching OK.


    > {
    > $hl =~ s%^[->|<-] (.+?)@@ "(.+?)"%$1%;



    Your pattern matches when the string starts with one of the four
    characters in the char class, followed by a space.

    Your string does not start with one of those characters followed
    by a space, so the pattern fails to match.

    You probably wanted grouping rather than a character class:

    $hl =~ s%^(->|<-) (.+?)@@ "(.+?)"%$2%;

    or, if you don't want to mess up the numbering of the captures:

    $hl =~ s%^(?:->|<-) (.+?)@@ "(.+?)"%$1%;

    But your string will _still_ not match because the pattern requires
    a space following the arrow, but your data above has a newline
    following the arrow.


    > but I'm quite keen to understand what I'm doing wrong.



    Using [brackets] instead of (parenthesis).



    This short and complete program that you can run may help:

    ------------------------------------
    #!/usr/bin/perl
    use warnings;
    use strict;

    my $hl = '-> M:\adam_admin\eq-gla_playpen\eq-cc-gla_playpen'
    . '\testforccase\TestTwo.PRM@@ "This is the to-text"';

    my($path, $comment) = $hl =~ m/^-[><] (.+)@@ "(.+)"/; # m// in list context

    print qq(path="$path"\n);
    print qq(comment="$comment"\n);
    ------------------------------------


    --
    Tad McClellan SGML consulting
    Perl programming
    Fort Worth, Texas
    Tad McClellan, Feb 15, 2006
    #17
  18. Xicheng Guest

    wrote:
    > Hi there Perl gurus,
    >
    > I'm using (trying to use) a regexp to extract a path and a comment from
    > the output of a 'describe' command in clearcase. I suspect I'm being
    > daft, so please go easy on me... I have read what I think is the
    > appropriate perldoc (perldoc -q greedy - "What does it mean that
    > regexes are greedy? How can I get around it? greedy greediness"), but
    > I'm already doing what it suggests - that is, reducing the greediness
    > of the '.+' expression with a '?'. I guess I must have missed
    > something..
    >
    > The input ($hl) should look something like this:
    > ->
    > M:\adam_admin\eq-gla_playpen\eq-cc-gla_playpen\testforccase\TestTwo.PRM@@
    > "This is the to-text"
    >
    > I'm trying to get at the path and the comment:
    >
    > if ($hl =~ m%^[->|<-]%)
    > {
    > $hl =~ s%^[->|<-] (.+?)@@ "(.+?)"%$1%;
    > $comment = $2;
    > }
    > else
    > {
    > $hl = 0;
    > }
    > print "Target is $hl\n";
    > print "Comment is \"$comment\"\n";
    >
    > Produces the following output:
    > Target is ->
    > M:\adam_admin\eq-gla_playpen\eq-cc-gla_playpen\testforccase\TestTwo.PRM@@
    > "This is the to-text"
    > Comment is ""
    >
    > I've also tried escaping the '@@' thus '\@\@' but that hasn't made any
    > difference.


    My suggestion for you is using something more specific ([^@]*), ([^"]*)
    instead of always swaying between greedy and non-greedy things(.*?) or
    (.*). For you case, if you can make sure that there is not any '@' in
    you pathname, you can do it this way:
    =============================
    $hl =q(->
    M:\adam_admin\eq-gla_playpen\eq-cc-gla_playpen\testforccase\TestTwo.PRM@@
    "This is the to-text");

    if ($hl =~ m%^(?:->|<-)%)
    {
    $hl =~ s%^(?:->|<-)\s*([^@]*)@@\s*"([^"]*)"%$1%x;
    $comment = $2;
    }

    else
    {
    $hl = 0;
    }

    print "Target is $hl\n";
    print "Comment is \"$comment\"\n";
    =============================
    Best,
    Xicheng

    > se I could probably use split to do this and then substitute out
    > the -> or <-, but I'm quite keen to understand what I'm doing wrong.
    >
    > Cheers - Adam...
    Xicheng, Feb 15, 2006
    #18
  19. robic0 <> wrote:

    > $hl =~ s%^(?:->|<-)\s+(.+)@@\s+"(.+)"%$1%;
    >
    > This non-capture grouping '(?:->|<-)' will only match ->- or -<-



    No it won't.

    perl -le 'print "matched" if "->no hyphen" =~ /(?:->|<-)/'

    (prints "matched")


    > If thats whats needed then the grouping is not really necessary,
    > ->|<- does the same thing.



    No it doesn't.

    perl -le 'print "matched" if "->" =~ /(?:->|<-)\s+/'

    (makes no output)

    perl -le 'print "matched" if "->" =~ /->|<-\s+/'

    (prints "matched")


    > It might be possible (?:->)|(?:<-) was intended,



    It was not possible that that was intended, as there is more
    stuff to match after the right side of the alternation that
    needn't be there for the left side to match.


    >>If you haven't already, please also study "perldoc perlre".



    If you haven't already, please also study "perldoc perlre"


    --
    Tad McClellan SGML consulting
    Perl programming
    Fort Worth, Texas
    Tad McClellan, Feb 16, 2006
    #19
  20. Tad McClellan wrote:
    > robic0 wrote:
    >> Gunnar Hjalmarsson wrote:
    >>>

    >>
    >> <various crap>
    >>
    >>>If you haven't already, please also study "perldoc perlre".

    >
    > If you haven't already, please also study "perldoc perlre"


    Please quote using proper attributions to prevent confusion.

    --
    Gunnar Hjalmarsson
    Email: http://www.gunnar.cc/cgi-bin/contact.pl
    Gunnar Hjalmarsson, Feb 16, 2006
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. local greediness ???

    , Apr 19, 2006, in forum: Python
    Replies:
    3
    Views:
    301
    Paul McGuire
    Apr 19, 2006
  2. Wes Gamble
    Replies:
    5
    Views:
    90
    Wes Gamble
    Aug 4, 2006
  3. Greg Hurrell
    Replies:
    4
    Views:
    151
    James Edward Gray II
    Feb 14, 2007
  4. Joao Silva
    Replies:
    16
    Views:
    342
    7stud --
    Aug 21, 2009
  5. justme

    greediness problem?

    justme, Feb 7, 2005, in forum: Perl Misc
    Replies:
    5
    Views:
    89
    Tintin
    Feb 7, 2005
Loading...

Share This Page