if ('A:B:C' =~ /:(.*?)$/) then why the heck is $1 'B:C' and not just 'C'

Discussion in 'Perl Misc' started by OwlHoot, Nov 12, 2010.

  1. OwlHoot

    OwlHoot Guest

    To repeat the title, in case it is munged by Google Groups:

    if ('A:B:C' =~ /:(.*?)$/) then why the heck is $1 'B:C' and not just
    'C'

    I've been developing with perl for years; but even simple things in it
    still
    sometimes throw up surprises.

    The regexp /:(.*?)$/ is anchored on the right by $, then comes a non-
    greedy
    match which, AIUI, is the "shortest string it can get away with",
    preceded
    by a colon. So I would expect this to pick up just the "C", as it does
    with
    /([^:]*)$/.

    Am I assuming/doing something silly? It is friday afternoon after all.


    Cheers

    John R Ramsden
     
    OwlHoot, Nov 12, 2010
    #1
    1. Advertising

  2. Re: if ('A:B:C' =~ /:(.*?)$/) then why the heck is $1 'B:C' and notjust 'C'

    On 12.11.2010 15:38, OwlHoot wrote:
    > To repeat the title, in case it is munged by Google Groups:
    >
    > if ('A:B:C' =~ /:(.*?)$/) then why the heck is $1 'B:C' and not just
    > 'C'
    >
    > I've been developing with perl for years; but even simple things in it
    > still
    > sometimes throw up surprises.
    >
    > The regexp /:(.*?)$/ is anchored on the right by $, then comes a non-
    > greedy
    > match which, AIUI, is the "shortest string it can get away with",
    > preceded
    > by a colon. So I would expect this to pick up just the "C", as it does
    > with
    > /([^:]*)$/.


    The regexp matches from the left to the right, even if there is an
    anchor on the right side of the string.

    Thus the : first tries to match first : in your string, i.e the one
    between A and B. Then .*? tries to match any number of chars, starting
    from zero because of then ?. But if zero chars are matched, the $ fails.
    So the regexp tries to make the number of characters matched by the .*?
    longer and longer, and finally the $ matches. The regexp does not need
    to go back and select the next : in this case.

    ..*? means: take as few chars as possible _at this position_
    It does not mean: do backtracking and try to find if it could match
    fewer chars at some other place in the string

    So if you add .* to the beginning, you will get the last : in your string.
    /.*:(.*?)$/
    In this case the .* would try to eat as many chars as possible, then
    search for a :. So this would try the last : first.

    Anyway, you could also use (split /:/, 'A:B:C')[-1] here.

    Cheers, Wolf
     
    Wolf Behrenhoff, Nov 12, 2010
    #2
    1. Advertising

  3. OwlHoot

    Guest

    On Fri, 12 Nov 2010 06:38:08 -0800 (PST), OwlHoot <> wrote:

    >To repeat the title, in case it is munged by Google Groups:
    >
    > if ('A:B:C' =~ /:(.*?)$/) then why the heck is $1 'B:C' and not just
    >'C'
    >
    >I've been developing with perl for years; but even simple things in it
    >still
    >sometimes throw up surprises.
    >
    >The regexp /:(.*?)$/ is anchored on the right by $, then comes a non-
    >greedy
    >match which, AIUI, is the "shortest string it can get away with",
    >preceded
    >by a colon. So I would expect this to pick up just the "C", as it does
    >with
    > /([^:]*)$/.
    >


    Its not the shortest, its the first to satisfy it.
    It is anchored on the left and right. The regex is allowing
    another ':' when it traverses the string from the left.
    /:(.*)$/ has the same result without checking chars between the
    first ':' and the end of string.

    Notice that /:(.*?):/ does the same thing, it says get all between
    the first ':' and the next ':'. However,
    'A:B:C:D' =~ /:(.*):/
    greedily grabs all between the first and last ':', but
    'A:B:C:D' =~ /:(.*?):/
    grabs only that between the first 2 ':'s.

    Since there is only one end of line, it gets all between the first ':'
    and end of line regardless of ?.

    -sln
     
    , Nov 12, 2010
    #3
  4. Wolf Behrenhoff <> writes:
    > On 12.11.2010 15:38, OwlHoot wrote:
    >> To repeat the title, in case it is munged by Google Groups:
    >>
    >> if ('A:B:C' =~ /:(.*?)$/) then why the heck is $1 'B:C' and not just
    >> 'C'


    You should ask your question in the body of your message anyway.
    Newsreaders vary in how they display subject lines.

    >> I've been developing with perl for years; but even simple things in
    >> it still sometimes throw up surprises.
    >>
    >> The regexp /:(.*?)$/ is anchored on the right by $, then comes a non-
    >> greedy match which, AIUI, is the "shortest string it can get away
    >> with", preceded by a colon. So I would expect this to pick up just
    >> the "C", as it does with
    >> /([^:]*)$/.

    >
    > The regexp matches from the left to the right, even if there is an
    > anchor on the right side of the string.
    >

    [more explanation snipped]
    >
    > Anyway, you could also use (split /:/, 'A:B:C')[-1] here.


    Another possibility is
    if ('A:B:C' =~ /:([^:]*)$/)

    --
    Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
    Nokia
    "We must do something. This is something. Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"
     
    Keith Thompson, Nov 12, 2010
    #4
  5. OwlHoot

    C.DeRykus Guest

    Re: if ('A:B:C' =~ /:(.*?)$/) then why the heck is $1 'B:C' and notjust 'C'

    On Nov 12, 8:44 am, Keith Thompson <> wrote:

    ....

    >
    > > Anyway, you could also use (split /:/, 'A:B:C')[-1] here.

    >
    > Another possibility is
    >     if ('A:B:C' =~ /:([^:]*)$/)
    >


    Yet another:

    'A:B:C' =~ /.*:(.*)/;



    --
    Charles DeRykus
     
    C.DeRykus, Nov 12, 2010
    #5
  6. OwlHoot

    Uri Guttman Guest

    >>>>> "O" == OwlHoot <> writes:

    O> The regexp /:(.*?)$/ is anchored on the right by $, then comes a non-
    O> greedy
    O> match which, AIUI, is the "shortest string it can get away with",
    O> preceded
    O> by a colon. So I would expect this to pick up just the "C", as it does
    O> with
    O> /([^:]*)$/.

    as others have said, you didn't get what ? does for quantifiers. perl
    will match the leftmost working match. with a greedy quantifier, it will
    continue to match chars until it fails and then stop. with the
    non-greedy modifier ? it will stop after the first (and locally
    shortest) match. it will not globally find the shortest possible match
    anywhere in the string. so the key is remembering leftmost correct match
    first and then short or greedy based on the modifier.

    uri

    --
    Uri Guttman ------ -------- http://www.sysarch.com --
    ----- Perl Code Review , Architecture, Development, Training, Support ------
    --------- Gourmet Hot Cocoa Mix ---- http://bestfriendscocoa.com ---------
     
    Uri Guttman, Nov 12, 2010
    #6
  7. Re: if ('A:B:C' =~ /:(.*?)$/) then why the heck is $1 'B:C' and notjust 'C'

    OwlHoot wrote:
    > To repeat the title, in case it is munged by Google Groups:
    >
    > if ('A:B:C' =~ /:(.*?)$/) then why the heck is $1 'B:C' and not just
    > 'C'
    >
    > I've been developing with perl for years; but even simple things in it
    > still
    > sometimes throw up surprises.
    >
    > The regexp /:(.*?)$/ is anchored on the right by $, then


    There is no "then". Being anchored at the end does not change the order
    of evaluation (or at least, does not do so in a way that effects the
    outcome--the optimized engine can do things in whatever order it wants,
    as long as behaves as if it were done left to right.)


    > comes a non-
    > greedy


    Really it is not non-greedy. It is still greedy, it just greedy for
    less, rather than greedy for more. It it is still greedy because it
    satisfies itself, without looking around at the "wants" of others.

    > match which, AIUI, is the "shortest string it can get away with",
    > preceded
    > by a colon.


    The colon is also greedy. It is greedy to match as far left as it can
    get away with. And because it comes before the .*? does, its greed wins.

    Xho
     
    Xho Jingleheimerschmidt, Nov 13, 2010
    #7
  8. Re: if ('A:B:C' =~ /:(.*?)$/) then why the heck is $1 'B:C' and notjust 'C'

    On 2010-11-13 03:37, Xho Jingleheimerschmidt <> wrote:
    > Really it is not non-greedy. It is still greedy, it just greedy for
    > less, rather than greedy for more. It it is still greedy because it
    > satisfies itself, without looking around at the "wants" of others.


    > The colon is also greedy. It is greedy to match as far left as it can
    > get away with. And because it comes before the .*? does, its greed wins.


    Please. "Greedy" in the context of regular expressions is a technical
    term with a precisely defined meaning. You are not helping by inventing
    a different meaning for the word based on its meaning in common English.

    hp
     
    Peter J. Holzer, Nov 14, 2010
    #8
  9. Re: if ('A:B:C' =~ /:(.*?)$/) then why the heck is $1 'B:C' and notjust 'C'

    Peter J. Holzer wrote:
    > On 2010-11-13 03:37, Xho Jingleheimerschmidt <> wrote:
    >> Really it is not non-greedy. It is still greedy, it just greedy for
    >> less, rather than greedy for more. It it is still greedy because it
    >> satisfies itself, without looking around at the "wants" of others.

    >
    >> The colon is also greedy. It is greedy to match as far left as it can
    >> get away with. And because it comes before the .*? does, its greed wins.

    >
    > Please. "Greedy" in the context of regular expressions is a technical
    > term with a precisely defined meaning. You are not helping by inventing
    > a different meaning for the word based on its meaning in common English.


    Greedy is well defined in the field of computer science, and I am not
    the one inventing new meanings for it.

    Xho
     
    Xho Jingleheimerschmidt, Nov 15, 2010
    #9
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Mr. SweatyFinger

    why why why why why

    Mr. SweatyFinger, Nov 28, 2006, in forum: ASP .Net
    Replies:
    4
    Views:
    917
    Mark Rae
    Dec 21, 2006
  2. Mr. SweatyFinger
    Replies:
    2
    Views:
    2,033
    Smokey Grindel
    Dec 2, 2006
  3. Mufasa
    Replies:
    6
    Views:
    318
    Mufasa
    Dec 17, 2007
  4. Replies:
    0
    Views:
    304
  5. Replies:
    4
    Views:
    547
    cwdjrxyz
    Jan 17, 2006
Loading...

Share This Page