Difference of * and + in regular expression

P

Peng Yu

Hi,

If I used the uncommented if-statement, I would get no match. If I
used the commend if statement otherwise, I would have the following
string as the output. I'm wondering why the regular expression with *
does not match anything?

namespace a { namespace b { namespace c {

Thanks,
Peng

$string="a namespace a { namespace b { namespace c { ";

#if ($string =~ /\s*((namespace\s+\w(\w|\d)*\s*\{\s*)+)/) {
if ($string =~ /\s*((namespace\s+\w(\w|\d)*\s*\{\s*)*)/) {
print "$1\$\n";
}
 
G

Gunnar Hjalmarsson

Peng said:
If I used the uncommented if-statement, I would get no match.

Not true. $1 is defined, so the regex does match.
$string="a namespace a { namespace b { namespace c { ";

#if ($string =~ /\s*((namespace\s+\w(\w|\d)*\s*\{\s*)+)/) {
if ($string =~ /\s*((namespace\s+\w(\w|\d)*\s*\{\s*)*)/) {
print "$1\$\n";
}

With the * quantifier, the regex seems to behave non-greedy, though.
 
J

John W. Krahn

Peng said:
Hi,

If I used the uncommented if-statement, I would get no match. If I
used the commend if statement otherwise, I would have the following
string as the output. I'm wondering why the regular expression with *
does not match anything?

It does match, it just doesn't match what you expected it to match.
namespace a { namespace b { namespace c {

Thanks,
Peng

$string="a namespace a { namespace b { namespace c { ";

#if ($string =~ /\s*((namespace\s+\w(\w|\d)*\s*\{\s*)+)/) {
if ($string =~ /\s*((namespace\s+\w(\w|\d)*\s*\{\s*)*)/) {
print "$1\$\n";
}

$ perl -e'
use re qw/ debug /;

my $string = "a namespace a { namespace b { namespace c { ";

if ($string =~ /\s*((namespace\s+\w(\w|\d)*\s*\{\s*)*)/) {
print "$1\$\n";
}
'
Compiling REx `\s*((namespace\s+\w(\w|\d)*\s*\{\s*)*)'
size 40 Got 324 bytes for offset annotations.
first at 1
1: STAR(3)
2: SPACE(0)
3: OPEN1(5)
5: CURLYX[0] {0,32767}(37)
7: OPEN2(9)
9: EXACT <namespace>(13)
13: PLUS(15)
14: SPACE(0)
15: ALNUM(16)
16: CURLYM[3] {0,32767}(28)
20: BRANCH(22)
21: ALNUM(26)
22: BRANCH(24)
23: DIGIT(26)
26: SUCCEED(0)
27: NOTHING(28)
28: STAR(30)
29: SPACE(0)
30: EXACT <{>(32)
32: STAR(34)
33: SPACE(0)
34: CLOSE2(36)
36: WHILEM[1/2](0)
37: NOTHING(38)
38: CLOSE1(40)
40: END(0)
minlen 0
Offsets: [40]
3[1] 1[2] 4[1] 0[0] 37[1] 0[0] 5[1] 0[0] 6[9] 0[0] 0[0] 0[0]
17[1] 15[2] 18[2] 27[1] 0[0] 20[1] 0[0] 20[1] 21[2] 23[1] 24[2] 26[1]
0[0] 27[0] 27[0] 30[1] 28[2] 31[2] 0[0] 35[1] 33[2] 36[1] 0[0] 37[0]
37[0] 38[1] 0[0] 39[0]
Matching REx "\s*((namespace\s+\w(\w|\d)*\s*\{\s*)*)" against "a
namespace a { namespace b { namespace c { "
Setting an EVAL scope, savestack=5
0 <> <a namespace > | 1: STAR
SPACE can match 0 times out of 2147483647...
Setting an EVAL scope, savestack=5
0 <> <a namespace > | 3: OPEN1
0 <> <a namespace > | 5: CURLYX[0] {0,32767}
0 <> <a namespace > | 36: WHILEM[1/2]
0 out of 0..32767 cc=bfa0d330
Setting an EVAL scope, savestack=15
0 <> <a namespace > | 7: OPEN2
0 <> <a namespace > | 9: EXACT <namespace>
failed...
restoring \1 to -1(0)..-1(no)
restoring \1..\3 to undef
failed, try continuation...
0 <> <a namespace > | 37: NOTHING
0 <> <a namespace > | 38: CLOSE1
0 <> <a namespace > | 40: END
Match successful!
$
Freeing REx: `"\\s*((namespace\\s+\\w(\\w|\\d)*\\s*\\{\\s*)*)"'


You see where it says "Match successful!", that means that the
expression (namespace\s+\w(\w|\d)*\s*\{\s*)* matched zero times.

Also, the expression \w(\w|\d)* could be simplified to \w+.


John
 
B

Ben Morrow

Quoth Peng Yu said:
If I used the uncommented if-statement, I would get no match. If I
used the commend if statement otherwise, I would have the following
string as the output. I'm wondering why the regular expression with *
does not match anything?

namespace a { namespace b { namespace c {

$string="a namespace a { namespace b { namespace c { ";

#if ($string =~ /\s*((namespace\s+\w(\w|\d)*\s*\{\s*)+)/) {
if ($string =~ /\s*((namespace\s+\w(\w|\d)*\s*\{\s*)*)/) {

'Match earlier in the string' beats 'match longest', even with greedy
matching, and since your regex will match the empty string the first
match is right before the first 'a'.

Ben
 
P

Peng Yu

Not true. $1 is defined, so the regex does match.



With the * quantifier, the regex seems to behave non-greedy, though.

According to the manual, *? is non-greedy.
Why * is also non-greedy?

Thanks,
Peng
 
G

Gunnar Hjalmarsson

Peng said:
According to the manual, *? is non-greedy.
Why * is also non-greedy?

I don't know, sorry. Maybe the answer can be derived from John's more
extensive explanation.
 
T

Tad J McClellan

Peng Yu said:
According to the manual, *? is non-greedy.
Why * is also non-greedy?


Greediness is not involved here.

(Greedy vs. non-greedy never changes whether a match will succeed or fail.
It is simply a "tie breaker" used when the regex engine can match more
than one way at the current pos()ition.
)

There are 2 primary issues with this OP's problem: writing a pattern
where everything is optional, and that regexes match as early as possible
from left to right.

If you write a pattern where everything is optional, then it will match
the empty string, which in turn means that it would match *every* string
you can think of.

The left-to-right evaluation of the pattern seems to be buried
a bit in perlre.pod:

The above recipes describe the ordering of matches I<at a given position>.
One more rule is needed to understand how a match is determined for the
whole regular expression: a match at an earlier position is always better
than a match at a later position.
 
C

comp.llang.perl.moderated

Greediness is not involved here.

(Greedy vs. non-greedy never changes whether a match will succeed or fail.
It is simply a "tie breaker" used when the regex engine can match more
than one way at the current pos()ition.
)

There are 2 primary issues with this OP's problem: writing a pattern
where everything is optional, and that regexes match as early as possible
from left to right.

If you write a pattern where everything is optional, then it will match
the empty string, which in turn means that it would match *every* string
you can think of.

The left-to-right evaluation of the pattern seems to be buried
a bit in perlre.pod:

The above recipes describe the ordering of matches I<at a given position>.
One more rule is needed to understand how a match is determined for the
whole regular expression: a match at an earlier position is always better
than a match at a later position.

I still prefer to think of this as another
aspect of greediness: * can be greedy
but only as greedy as needed to get the
earliest match. Thus, even greed embraces the cardinal Perl virtue of
laziness....
 
T

Ted Zlatanov

clpm> I still prefer to think of this as another aspect of greediness: *
clpm> can be greedy but only as greedy as needed to get the earliest
clpm> match. Thus, even greed embraces the cardinal Perl virtue of
clpm> laziness....

I'd call that opportunism, not laziness.

"The two cardinal virtues of Perl are TMTOWTDI and laziness and
opportunism... No, no. The THREE cardinal virtues of Perl are TMTOWTDI
and laziness and opportunism and DWIM... DAMN IT... The FOUR cardinal
virtues of Perl are... etc."

Ted
 
X

xhoster

Peng Yu said:
According to the manual, *? is non-greedy.
Why * is also non-greedy?

It depends on what you mean. "Greedy" in CS generally means you make
locally optimal decisions, rather than looking for globally optimal ones.
But what is considered "optimal" in the local matching of a regex?

In this sense, it is greedy either way, in that it still optimizes locally
rather than globally. It is just that what we consider optimal changes
with the addition of ?.

At this point, perhaps they revert from a CS meaning to a moral/political
meaning--greedy no longer means local vs. global, now it means as much as
possible vs. as little as possible.

Xho

--
-------------------- http://NewsReader.Com/ --------------------
The costs of publication of this article were defrayed in part by the
payment of page charges. This article must therefore be hereby marked
advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate
this fact.
 
M

MSwanberg

Hi,

If I used the uncommented if-statement, I would get no match. If I
used the commend if statement otherwise, I would have the following
string as the output. I'm wondering why the regular expression with *
does not match anything?

 namespace a { namespace b { namespace c {

Thanks,
Peng

$string="a namespace a { namespace b { namespace c { ";

#if ($string =~ /\s*((namespace\s+\w(\w|\d)*\s*\{\s*)+)/) {
if ($string =~ /\s*((namespace\s+\w(\w|\d)*\s*\{\s*)*)/) {
  print "$1\$\n";



}- Hide quoted text -

- Show quoted text -


I changed it to

if ($string =~ /\s*(namespace\s+\w(\w|\d)*\s*\{\s*)/) {
print "$1\$\n";
}

and it seems to work okay.

What exactly are you trying to do?

-Mike
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,581
Members
45,056
Latest member
GlycogenSupporthealth

Latest Threads

Top