Regex not matching

Discussion in 'Perl Misc' started by Andrew DeFaria, May 15, 2005.

  1. I thought I understood Perl regexs pretty well but this one confuses me.
    What am I doing wrong here?

    #!/usr/bin/perl

    use
    strict;
    use
    warnings;


    $_ = "#if __LDBL_SIZE == 80
    block";


    if (/^#.*= (\d*)/)
    {
    print "Pattern matched \$1 =
    \"$1\"\n";
    } else
    {
    print "Pattern did not
    match\n";
    }



    if (/^#.*(\d*)/)
    {
    print "Pattern matched \$1 =
    \"$1\"\n";
    } else
    {
    print "Pattern did not
    match\n";
    }

    Outputs:

    Pattern matched $1 =
    "80"
    Pattern matched $1 = ""

    Why does the second pattern fail?!?
    --
    Experience is something you don't get until just after you need it.
     
    Andrew DeFaria, May 15, 2005
    #1
    1. Advertising

  2. Andrew DeFaria wrote:

    Ugh, that messed up pretty bad. Let me try again.
    I thought I understood Perl regexs pretty well but this one confuses me.
    What am I doing wrong here?

    #!/usr/bin/perl

    use strict;
    use
    warnings;



    $_ = "#if __LDBL_SIZE == 80
    block";



    if (/^#.*= (\d*)/) {
    print "Pattern matched \$1 = \"$1\"\n";
    } else {
    print "Pattern did not match\n";
    }



    if (/^#.*(\d*)/) {
    print "Pattern matched \$1 = \"$1\"\n";
    } else {
    print "Pattern did not match\n";
    }

    Outputs:

    Pattern matched $1 = "80"
    Pattern matched $1 = ""

    Why does the second pattern fail?!?
    --
    Not one shred of evidence supports the notion that life is serious.
     
    Andrew DeFaria, May 15, 2005
    #2
    1. Advertising

  3. Ignoramus4744 wrote:

    > On Sun, 15 May 2005 03:37:23 GMT, Andrew DeFaria <>
    > wrote:
    >
    >> I thought I understood Perl regexs pretty well but this one confuses me.
    >> What am I doing wrong here?
    >>
    >> #!/usr/bin/perl
    >>
    >> use
    >> strict;
    >> use
    >> warnings;
    >>
    >>
    >> $_ = "#if __LDBL_SIZE == 80
    >> block";
    >>
    >>
    >> if (/^#.*= (\d*)/)
    >> {
    >> print "Pattern matched \$1 =
    >> \"$1\"\n";
    >> } else
    >> {
    >> print "Pattern did not
    >> match\n";
    >> }
    >>
    >>
    >>
    >> if (/^#.*(\d*)/)
    >> {
    >> print "Pattern matched \$1 =
    >> \"$1\"\n";
    >> } else
    >> {
    >> print "Pattern did not
    >> match\n";
    >> }
    >>
    >> Outputs:
    >>
    >> Pattern matched $1 =
    >> "80"
    >> Pattern matched $1 = ""
    >>
    >> Why does the second pattern fail?!?

    >
    >
    > it did not fail, it successfuly mapped \d* to an empty string.


    OK why didn't it match it to 80 like the first pattern did? Why is it
    required to have the "=" and the space in the pattern? Why wouldn't ".*"
    suck that up?
    --
    42.7 percent of all statistics are made up on the spot.
     
    Andrew DeFaria, May 15, 2005
    #3
  4. Andrew DeFaria

    Lars Eighner Guest

    In our last episode,
    <Ttzhe.522$>,
    the lovely and talented Andrew DeFaria
    broadcast on comp.lang.perl.misc:

    > I thought I understood Perl regexs pretty well but this one confuses me.
    > What am I doing wrong here?


    > #!/usr/bin/perl


    > use
    > strict;
    > use
    > warnings;
    >


    > $_ = "#if __LDBL_SIZE == 80
    > block";
    >


    > if (/^#.*= (\d*)/)
    > {
    > print "Pattern matched \$1 =
    > \"$1\"\n";
    > } else
    > {
    > print "Pattern did not
    > match\n";
    > }


    >


    > if (/^#.*(\d*)/)
    > {
    > print "Pattern matched \$1 =
    > \"$1\"\n";
    > } else
    > {
    > print "Pattern did not
    > match\n";
    > }


    > Outputs:


    > Pattern matched $1 =
    > "80"
    > Pattern matched $1 = ""


    > Why does the second pattern fail?!?


    Regular expressions are greedy. ^#.* matched the whole
    line including everything after = . That left nothing to match
    \d*. (Or actually it left \d zero times to match.)

    Just a guess.

    --
    Lars Eighner http://www.larseighner.com/
    Save the whales! Collect the whole set!
     
    Lars Eighner, May 15, 2005
    #4
  5. Andrew DeFaria wrote:

    > I thought I understood Perl regexs pretty well but this one confuses me.
    > What am I doing wrong here?
    >
    > #!/usr/bin/perl
    >
    > use
    > strict;
    > use
    > warnings;
    >
    >
    > $_ = "##if __LDBL_SIZE == 80
    > block";
    >
    >
    > if (/^#.*= (\d*)/)


    Here, the ".*" matches "if __LDBL_SIZE =", because
    it must leave the "= " to match the literal. The
    "\d*" then scoops up the "80".

    > {
    > print "Pattern matched \$1 =
    > \"$1\"\n";
    > } else
    > {
    > print "Pattern did not
    > match\n";
    > }
    >
    >
    >
    > if (/^#.*(\d*)/)


    Because it is *greedy*, and there is no literal
    to limit it, the ".*" here matches "if __LDBL_SIZE == 80".
    Since "\d*" can be satisfied with no characters, no
    characters is all the ".*" leaves it.

    > {
    > print "Pattern matched \$1 =
    > \"$1\"\n";
    > } else
    > {
    > print "Pattern did not
    > match\n";
    > }
    >
    > Outputs:
    >
    > Pattern matched $1 =
    > "80"
    > Pattern matched $1 = ""
    >
    > Why does the second pattern fail?!?


    It doesn't fail. It does exactly what you told it
    to and successfully matches. This, of course, may
    not be what you *wanted*, but that's why we debug
    programs :).

    --
    Christopher Mattern

    "Which one you figure tracked us?"
    "The ugly one, sir."
    "...Could you be more specific?"
     
    Chris Mattern, May 15, 2005
    #5
  6. Andrew DeFaria

    Joe Smith Guest

    Andrew DeFaria wrote:
    > Ignoramus4744 wrote:
    >> it did not fail, it successfuly mapped \d* to an empty string.

    >
    > OK why didn't it match it to 80 like the first pattern did? Why is it
    > required to have the "=" and the space in the pattern? Why wouldn't ".*"
    > suck that up?


    /^#.*(\d*)/ = Match '#' at the beginning of the line, followed
    by any number of characters (as much as possible) followed by
    zero or more digits. The greediness of the '.*' ate up everything,
    including '80', which allowed (forced) \d* to match zero digits.

    /^#.*(\d+)/ = Match '#' at the beginning of the line, followed
    by any number of characters (as much as possible) followed by
    one or more digits. The greediness of the '.*' grab everything
    possible, as long as it leaves one digit left over for \d+. This
    means that '8' is included in .* and only '0' is picked up by \d+.

    /^#.*?(\d+)/ = Match '#' at the beginning of the line, followed
    by as few as characters as possible to allow the rest of the regex
    to match. This means that \d+ will match '80'. (Using \d* would also
    match '80' in this case.)

    Two things to remember:
    1) Use \d+ instead of \d* to avoid matching zero digits.
    2) .* is greedy; it will swallow up everything possible. Only if
    there is more to match will it accept less than everything.

    -Joe
     
    Joe Smith, May 15, 2005
    #6
  7. Joe Smith wrote:

    > Andrew DeFaria wrote:
    >
    >> Ignoramus4744 wrote:
    >>
    >>> it did not fail, it successfuly mapped \d* to an empty string.

    >>
    >> OK why didn't it match it to 80 like the first pattern did? Why is it
    >> required to have the "=" and the space in the pattern? Why wouldn't
    >> ".*" suck that up?

    >
    > /^#.*(\d*)/ = Match '#' at the beginning of the line, followed by any
    > number of characters (as much as possible) followed by
    > zero or more digits. The greediness of the '.*' ate up everything,
    > including '80', which allowed (forced) \d* to match zero digits.
    >
    > /^#.*(\d+)/ = Match '#' at the beginning of the line, followed by any
    > number of characters (as much as possible) followed by
    > one or more digits. The greediness of the '.*' grab everything
    > possible, as long as it leaves one digit left over for \d+. This
    > means that '8' is included in .* and only '0' is picked up by \d+.
    >
    > /^#.*?(\d+)/ = Match '#' at the beginning of the line, followed by as
    > few as characters as possible to allow the rest of the regex to
    > match. This means that \d+ will match '80'. (Using \d* would also
    > match '80' in this case.)
    >
    > Two things to remember:
    > 1) Use \d+ instead of \d* to avoid matching zero digits.
    > 2) .* is greedy; it will swallow up everything possible. Only if
    > there is more to match will it accept less than everything.


    Thanks. Good explanation. The matching zero digits got me here. And
    thanks for the 2nd paragraph too. Got caught by that too.
    --
    If a cow laughed, would milk come out her nose?
     
    Andrew DeFaria, May 15, 2005
    #7
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Xah Lee
    Replies:
    1
    Views:
    971
    Ilias Lazaridis
    Sep 22, 2006
  2. Xah Lee
    Replies:
    8
    Views:
    482
    Ilias Lazaridis
    Sep 26, 2006
  3. Replies:
    3
    Views:
    823
    Reedick, Andrew
    Jul 1, 2008
  4. Xah Lee
    Replies:
    2
    Views:
    240
    Xah Lee
    Sep 25, 2006
  5. Replies:
    2
    Views:
    416
Loading...

Share This Page