Regex not matching

Andrew DeFaria · May 15, 2005

I thought I understood Perl regexs pretty well but this one confuses me.
What am I doing wrong here?

#!/usr/bin/perl

use
strict;
use
warnings;

$_ = "#if __LDBL_SIZE == 80
block";

if (/^#.*= (\d*)/)
{
print "Pattern matched \$1 =
\"$1\"\n";
} else
{
print "Pattern did not
match\n";
}

if (/^#.*(\d*)/)
{
print "Pattern matched \$1 =
\"$1\"\n";
} else
{
print "Pattern did not
match\n";
}

Outputs:

Pattern matched $1 =
"80"
Pattern matched $1 = ""

Why does the second pattern fail?!?

Andrew DeFaria · May 15, 2005

Andrew DeFaria wrote:

Ugh, that messed up pretty bad. Let me try again.
I thought I understood Perl regexs pretty well but this one confuses me.
What am I doing wrong here?

#!/usr/bin/perl

use strict;
use
warnings;

$_ = "#if __LDBL_SIZE == 80
block";

if (/^#.*= (\d*)/) {
print "Pattern matched \$1 = \"$1\"\n";
} else {
print "Pattern did not match\n";
}

if (/^#.*(\d*)/) {
print "Pattern matched \$1 = \"$1\"\n";
} else {
print "Pattern did not match\n";
}

Outputs:

Pattern matched $1 = "80"
Pattern matched $1 = ""

Why does the second pattern fail?!?

Andrew DeFaria · May 15, 2005

Ignoramus4744 said:
it did not fail, it successfuly mapped \d* to an empty string.

OK why didn't it match it to 80 like the first pattern did? Why is it
required to have the "=" and the space in the pattern? Why wouldn't ".*"
suck that up?

Lars Eighner · May 15, 2005

In our last episode,
the lovely and talented Andrew DeFaria
broadcast on comp.lang.perl.misc:

I thought I understood Perl regexs pretty well but this one confuses me.
What am I doing wrong here?

$_ = "#if __LDBL_SIZE == 80
block";

if (/^#.*= (\d*)/)
{
print "Pattern matched \$1 =
\"$1\"\n";
} else
{
print "Pattern did not
match\n";
}

if (/^#.*(\d*)/)
{
print "Pattern matched \$1 =
\"$1\"\n";
} else
{
print "Pattern did not
match\n";
}

Pattern matched $1 =
"80"
Pattern matched $1 = ""

Why does the second pattern fail?!?

Regular expressions are greedy. ^#.* matched the whole
line including everything after = . That left nothing to match
\d*. (Or actually it left \d zero times to match.)

Just a guess.

Chris Mattern · May 15, 2005

Andrew said:
I thought I understood Perl regexs pretty well but this one confuses me.
What am I doing wrong here?

#!/usr/bin/perl

use
strict;
use
warnings;

$_ = "##if __LDBL_SIZE == 80
block";

if (/^#.*= (\d*)/)

Here, the ".*" matches "if __LDBL_SIZE =", because
it must leave the "= " to match the literal. The
"\d*" then scoops up the "80".

{
print "Pattern matched \$1 =
\"$1\"\n";
} else
{
print "Pattern did not
match\n";
}

if (/^#.*(\d*)/)

Because it is *greedy*, and there is no literal
to limit it, the ".*" here matches "if __LDBL_SIZE == 80".
Since "\d*" can be satisfied with no characters, no
characters is all the ".*" leaves it.

{
print "Pattern matched \$1 =
\"$1\"\n";
} else
{
print "Pattern did not
match\n";
}

Outputs:

Pattern matched $1 =
"80"
Pattern matched $1 = ""

Why does the second pattern fail?!?

It doesn't fail. It does exactly what you told it
to and successfully matches. This, of course, may
not be what you *wanted*, but that's why we debug
programs

.

--
Christopher Mattern

"Which one you figure tracked us?"
"The ugly one, sir."
"...Could you be more specific?"

Joe Smith · May 15, 2005

Andrew said:
OK why didn't it match it to 80 like the first pattern did? Why is it
required to have the "=" and the space in the pattern? Why wouldn't ".*"
suck that up?

/^#.*(\d*)/ = Match '#' at the beginning of the line, followed
by any number of characters (as much as possible) followed by
zero or more digits. The greediness of the '.*' ate up everything,
including '80', which allowed (forced) \d* to match zero digits.

/^#.*(\d+)/ = Match '#' at the beginning of the line, followed
by any number of characters (as much as possible) followed by
one or more digits. The greediness of the '.*' grab everything
possible, as long as it leaves one digit left over for \d+. This
means that '8' is included in .* and only '0' is picked up by \d+.

/^#.*?(\d+)/ = Match '#' at the beginning of the line, followed
by as few as characters as possible to allow the rest of the regex
to match. This means that \d+ will match '80'. (Using \d* would also
match '80' in this case.)

Two things to remember:
1) Use \d+ instead of \d* to avoid matching zero digits.
2) .* is greedy; it will swallow up everything possible. Only if
there is more to match will it accept less than everything.

-Joe

Andrew DeFaria · May 15, 2005

Joe said:
/^#.*(\d*)/ = Match '#' at the beginning of the line, followed by any
number of characters (as much as possible) followed by
zero or more digits. The greediness of the '.*' ate up everything,
including '80', which allowed (forced) \d* to match zero digits.

/^#.*(\d+)/ = Match '#' at the beginning of the line, followed by any
number of characters (as much as possible) followed by
one or more digits. The greediness of the '.*' grab everything
possible, as long as it leaves one digit left over for \d+. This
means that '8' is included in .* and only '0' is picked up by \d+.

/^#.*?(\d+)/ = Match '#' at the beginning of the line, followed by as
few as characters as possible to allow the rest of the regex to
match. This means that \d+ will match '80'. (Using \d* would also
match '80' in this case.)

Two things to remember:
1) Use \d+ instead of \d* to avoid matching zero digits.
2) .* is greedy; it will swallow up everything possible. Only if
there is more to match will it accept less than everything.

Thanks. Good explanation. The matching zero digits got me here. And
thanks for the 2nd paragraph too. Got caught by that too.

why the loop-break after pattern-matching	1	Jan 1, 2012
Regex testing and UTF8 awarenes or Regex and numeric pattern matching	2	Mar 10, 2009
eval within grep not working	1	Oct 1, 2010
Regex to match a numerical IP range	7	Dec 11, 2010
What does it mean: "Trailing \ in regex m/\\\/ at ...	0	Nov 6, 2010
Regex not matching a string	2	Jan 9, 2013
How to debug a regex with (?DEFINE)?	0	Aug 7, 2012
My regex kung-fu is not strong =(	0	Apr 4, 2020

Regex not matching

Andrew DeFaria

Andrew DeFaria

Andrew DeFaria

Lars Eighner

Chris Mattern

Joe Smith

Andrew DeFaria

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads