Regex not matching

A

Andrew DeFaria

I thought I understood Perl regexs pretty well but this one confuses me.
What am I doing wrong here?

#!/usr/bin/perl

use
strict;
use
warnings;


$_ = "#if __LDBL_SIZE == 80
block";


if (/^#.*= (\d*)/)
{
print "Pattern matched \$1 =
\"$1\"\n";
} else
{
print "Pattern did not
match\n";
}



if (/^#.*(\d*)/)
{
print "Pattern matched \$1 =
\"$1\"\n";
} else
{
print "Pattern did not
match\n";
}

Outputs:

Pattern matched $1 =
"80"
Pattern matched $1 = ""

Why does the second pattern fail?!?
 
A

Andrew DeFaria

Andrew DeFaria wrote:

Ugh, that messed up pretty bad. Let me try again.
I thought I understood Perl regexs pretty well but this one confuses me.
What am I doing wrong here?

#!/usr/bin/perl

use strict;
use
warnings;



$_ = "#if __LDBL_SIZE == 80
block";



if (/^#.*= (\d*)/) {
print "Pattern matched \$1 = \"$1\"\n";
} else {
print "Pattern did not match\n";
}



if (/^#.*(\d*)/) {
print "Pattern matched \$1 = \"$1\"\n";
} else {
print "Pattern did not match\n";
}

Outputs:

Pattern matched $1 = "80"
Pattern matched $1 = ""

Why does the second pattern fail?!?
 
A

Andrew DeFaria

Ignoramus4744 said:
it did not fail, it successfuly mapped \d* to an empty string.

OK why didn't it match it to 80 like the first pattern did? Why is it
required to have the "=" and the space in the pattern? Why wouldn't ".*"
suck that up?
 
L

Lars Eighner

In our last episode,
the lovely and talented Andrew DeFaria
broadcast on comp.lang.perl.misc:
I thought I understood Perl regexs pretty well but this one confuses me.
What am I doing wrong here?


$_ = "#if __LDBL_SIZE == 80
block";
if (/^#.*= (\d*)/)
{
print "Pattern matched \$1 =
\"$1\"\n";
} else
{
print "Pattern did not
match\n";
}

if (/^#.*(\d*)/)
{
print "Pattern matched \$1 =
\"$1\"\n";
} else
{
print "Pattern did not
match\n";
}

Pattern matched $1 =
"80"
Pattern matched $1 = ""
Why does the second pattern fail?!?

Regular expressions are greedy. ^#.* matched the whole
line including everything after = . That left nothing to match
\d*. (Or actually it left \d zero times to match.)

Just a guess.
 
C

Chris Mattern

Andrew said:
I thought I understood Perl regexs pretty well but this one confuses me.
What am I doing wrong here?

#!/usr/bin/perl

use
strict;
use
warnings;


$_ = "##if __LDBL_SIZE == 80
block";


if (/^#.*= (\d*)/)

Here, the ".*" matches "if __LDBL_SIZE =", because
it must leave the "= " to match the literal. The
"\d*" then scoops up the "80".
{
print "Pattern matched \$1 =
\"$1\"\n";
} else
{
print "Pattern did not
match\n";
}



if (/^#.*(\d*)/)

Because it is *greedy*, and there is no literal
to limit it, the ".*" here matches "if __LDBL_SIZE == 80".
Since "\d*" can be satisfied with no characters, no
characters is all the ".*" leaves it.
{
print "Pattern matched \$1 =
\"$1\"\n";
} else
{
print "Pattern did not
match\n";
}

Outputs:

Pattern matched $1 =
"80"
Pattern matched $1 = ""

Why does the second pattern fail?!?

It doesn't fail. It does exactly what you told it
to and successfully matches. This, of course, may
not be what you *wanted*, but that's why we debug
programs :).

--
Christopher Mattern

"Which one you figure tracked us?"
"The ugly one, sir."
"...Could you be more specific?"
 
J

Joe Smith

Andrew said:
OK why didn't it match it to 80 like the first pattern did? Why is it
required to have the "=" and the space in the pattern? Why wouldn't ".*"
suck that up?

/^#.*(\d*)/ = Match '#' at the beginning of the line, followed
by any number of characters (as much as possible) followed by
zero or more digits. The greediness of the '.*' ate up everything,
including '80', which allowed (forced) \d* to match zero digits.

/^#.*(\d+)/ = Match '#' at the beginning of the line, followed
by any number of characters (as much as possible) followed by
one or more digits. The greediness of the '.*' grab everything
possible, as long as it leaves one digit left over for \d+. This
means that '8' is included in .* and only '0' is picked up by \d+.

/^#.*?(\d+)/ = Match '#' at the beginning of the line, followed
by as few as characters as possible to allow the rest of the regex
to match. This means that \d+ will match '80'. (Using \d* would also
match '80' in this case.)

Two things to remember:
1) Use \d+ instead of \d* to avoid matching zero digits.
2) .* is greedy; it will swallow up everything possible. Only if
there is more to match will it accept less than everything.

-Joe
 
A

Andrew DeFaria

Joe said:
/^#.*(\d*)/ = Match '#' at the beginning of the line, followed by any
number of characters (as much as possible) followed by
zero or more digits. The greediness of the '.*' ate up everything,
including '80', which allowed (forced) \d* to match zero digits.

/^#.*(\d+)/ = Match '#' at the beginning of the line, followed by any
number of characters (as much as possible) followed by
one or more digits. The greediness of the '.*' grab everything
possible, as long as it leaves one digit left over for \d+. This
means that '8' is included in .* and only '0' is picked up by \d+.

/^#.*?(\d+)/ = Match '#' at the beginning of the line, followed by as
few as characters as possible to allow the rest of the regex to
match. This means that \d+ will match '80'. (Using \d* would also
match '80' in this case.)

Two things to remember:
1) Use \d+ instead of \d* to avoid matching zero digits.
2) .* is greedy; it will swallow up everything possible. Only if
there is more to match will it accept less than everything.

Thanks. Good explanation. The matching zero digits got me here. And
thanks for the 2nd paragraph too. Got caught by that too.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,764
Messages
2,569,566
Members
45,041
Latest member
RomeoFarnh

Latest Threads

Top