Difference of * and + in regular expression

Discussion in 'Perl Misc' started by Peng Yu, Jun 22, 2008.

  1. Peng Yu

    Peng Yu Guest

    Hi,

    If I used the uncommented if-statement, I would get no match. If I
    used the commend if statement otherwise, I would have the following
    string as the output. I'm wondering why the regular expression with *
    does not match anything?

    namespace a { namespace b { namespace c {

    Thanks,
    Peng

    $string="a namespace a { namespace b { namespace c { ";

    #if ($string =~ /\s*((namespace\s+\w(\w|\d)*\s*\{\s*)+)/) {
    if ($string =~ /\s*((namespace\s+\w(\w|\d)*\s*\{\s*)*)/) {
    print "$1\$\n";
    }
    Peng Yu, Jun 22, 2008
    #1
    1. Advertising

  2. Peng Yu wrote:
    > If I used the uncommented if-statement, I would get no match.


    Not true. $1 is defined, so the regex does match.

    > $string="a namespace a { namespace b { namespace c { ";
    >
    > #if ($string =~ /\s*((namespace\s+\w(\w|\d)*\s*\{\s*)+)/) {
    > if ($string =~ /\s*((namespace\s+\w(\w|\d)*\s*\{\s*)*)/) {
    > print "$1\$\n";
    > }


    With the * quantifier, the regex seems to behave non-greedy, though.

    --
    Gunnar Hjalmarsson
    Email: http://www.gunnar.cc/cgi-bin/contact.pl
    Gunnar Hjalmarsson, Jun 22, 2008
    #2
    1. Advertising

  3. Peng Yu wrote:
    > Hi,
    >
    > If I used the uncommented if-statement, I would get no match. If I
    > used the commend if statement otherwise, I would have the following
    > string as the output. I'm wondering why the regular expression with *
    > does not match anything?


    It does match, it just doesn't match what you expected it to match.

    > namespace a { namespace b { namespace c {
    >
    > Thanks,
    > Peng
    >
    > $string="a namespace a { namespace b { namespace c { ";
    >
    > #if ($string =~ /\s*((namespace\s+\w(\w|\d)*\s*\{\s*)+)/) {
    > if ($string =~ /\s*((namespace\s+\w(\w|\d)*\s*\{\s*)*)/) {
    > print "$1\$\n";
    > }


    $ perl -e'
    use re qw/ debug /;

    my $string = "a namespace a { namespace b { namespace c { ";

    if ($string =~ /\s*((namespace\s+\w(\w|\d)*\s*\{\s*)*)/) {
    print "$1\$\n";
    }
    '
    Compiling REx `\s*((namespace\s+\w(\w|\d)*\s*\{\s*)*)'
    size 40 Got 324 bytes for offset annotations.
    first at 1
    1: STAR(3)
    2: SPACE(0)
    3: OPEN1(5)
    5: CURLYX[0] {0,32767}(37)
    7: OPEN2(9)
    9: EXACT <namespace>(13)
    13: PLUS(15)
    14: SPACE(0)
    15: ALNUM(16)
    16: CURLYM[3] {0,32767}(28)
    20: BRANCH(22)
    21: ALNUM(26)
    22: BRANCH(24)
    23: DIGIT(26)
    26: SUCCEED(0)
    27: NOTHING(28)
    28: STAR(30)
    29: SPACE(0)
    30: EXACT <{>(32)
    32: STAR(34)
    33: SPACE(0)
    34: CLOSE2(36)
    36: WHILEM[1/2](0)
    37: NOTHING(38)
    38: CLOSE1(40)
    40: END(0)
    minlen 0
    Offsets: [40]
    3[1] 1[2] 4[1] 0[0] 37[1] 0[0] 5[1] 0[0] 6[9] 0[0] 0[0] 0[0]
    17[1] 15[2] 18[2] 27[1] 0[0] 20[1] 0[0] 20[1] 21[2] 23[1] 24[2] 26[1]
    0[0] 27[0] 27[0] 30[1] 28[2] 31[2] 0[0] 35[1] 33[2] 36[1] 0[0] 37[0]
    37[0] 38[1] 0[0] 39[0]
    Matching REx "\s*((namespace\s+\w(\w|\d)*\s*\{\s*)*)" against "a
    namespace a { namespace b { namespace c { "
    Setting an EVAL scope, savestack=5
    0 <> <a namespace > | 1: STAR
    SPACE can match 0 times out of 2147483647...
    Setting an EVAL scope, savestack=5
    0 <> <a namespace > | 3: OPEN1
    0 <> <a namespace > | 5: CURLYX[0] {0,32767}
    0 <> <a namespace > | 36: WHILEM[1/2]
    0 out of 0..32767 cc=bfa0d330
    Setting an EVAL scope, savestack=15
    0 <> <a namespace > | 7: OPEN2
    0 <> <a namespace > | 9: EXACT <namespace>
    failed...
    restoring \1 to -1(0)..-1(no)
    restoring \1..\3 to undef
    failed, try continuation...
    0 <> <a namespace > | 37: NOTHING
    0 <> <a namespace > | 38: CLOSE1
    0 <> <a namespace > | 40: END
    Match successful!
    $
    Freeing REx: `"\\s*((namespace\\s+\\w(\\w|\\d)*\\s*\\{\\s*)*)"'


    You see where it says "Match successful!", that means that the
    expression (namespace\s+\w(\w|\d)*\s*\{\s*)* matched zero times.

    Also, the expression \w(\w|\d)* could be simplified to \w+.


    John
    --
    Perl isn't a toolbox, but a small machine shop where you
    can special-order certain sorts of tools at low cost and
    in short order. -- Larry Wall
    John W. Krahn, Jun 22, 2008
    #3
  4. Peng Yu

    Ben Morrow Guest

    Quoth Peng Yu <>:
    >
    > If I used the uncommented if-statement, I would get no match. If I
    > used the commend if statement otherwise, I would have the following
    > string as the output. I'm wondering why the regular expression with *
    > does not match anything?
    >
    > namespace a { namespace b { namespace c {
    >
    > $string="a namespace a { namespace b { namespace c { ";
    >
    > #if ($string =~ /\s*((namespace\s+\w(\w|\d)*\s*\{\s*)+)/) {
    > if ($string =~ /\s*((namespace\s+\w(\w|\d)*\s*\{\s*)*)/) {


    'Match earlier in the string' beats 'match longest', even with greedy
    matching, and since your regex will match the empty string the first
    match is right before the first 'a'.

    Ben

    --
    You poor take courage, you rich take care:
    The Earth was made a common treasury for everyone to share
    All things in common, all people one.
    'We come in peace'---the order came to cut them down. []
    Ben Morrow, Jun 22, 2008
    #4
  5. Peng Yu

    Peng Yu Guest

    On Jun 21, 9:39 pm, Gunnar Hjalmarsson <> wrote:
    > Peng Yu wrote:
    > > If I used the uncommented if-statement, I would get no match.

    >
    > Not true. $1 is defined, so the regex does match.
    >
    > > $string="a namespace a { namespace b { namespace c { ";

    >
    > > #if ($string =~ /\s*((namespace\s+\w(\w|\d)*\s*\{\s*)+)/) {
    > > if ($string =~ /\s*((namespace\s+\w(\w|\d)*\s*\{\s*)*)/) {
    > > print "$1\$\n";
    > > }

    >
    > With the * quantifier, the regex seems to behave non-greedy, though.


    According to the manual, *? is non-greedy.
    Why * is also non-greedy?

    Thanks,
    Peng
    Peng Yu, Jun 22, 2008
    #5
  6. Peng Yu wrote:
    > On Jun 21, 9:39 pm, Gunnar Hjalmarsson <> wrote:
    >> Peng Yu wrote:
    >>> If I used the uncommented if-statement, I would get no match.

    >> Not true. $1 is defined, so the regex does match.
    >>
    >>> $string="a namespace a { namespace b { namespace c { ";
    >>> #if ($string =~ /\s*((namespace\s+\w(\w|\d)*\s*\{\s*)+)/) {
    >>> if ($string =~ /\s*((namespace\s+\w(\w|\d)*\s*\{\s*)*)/) {
    >>> print "$1\$\n";
    >>> }

    >> With the * quantifier, the regex seems to behave non-greedy, though.

    >
    > According to the manual, *? is non-greedy.
    > Why * is also non-greedy?


    I don't know, sorry. Maybe the answer can be derived from John's more
    extensive explanation.

    --
    Gunnar Hjalmarsson
    Email: http://www.gunnar.cc/cgi-bin/contact.pl
    Gunnar Hjalmarsson, Jun 22, 2008
    #6
  7. Peng Yu <> wrote:
    > On Jun 21, 9:39 pm, Gunnar Hjalmarsson <> wrote:
    >> Peng Yu wrote:
    >> > If I used the uncommented if-statement, I would get no match.

    >>
    >> Not true. $1 is defined, so the regex does match.
    >>
    >> > $string="a namespace a { namespace b { namespace c { ";

    >>
    >> > #if ($string =~ /\s*((namespace\s+\w(\w|\d)*\s*\{\s*)+)/) {
    >> > if ($string =~ /\s*((namespace\s+\w(\w|\d)*\s*\{\s*)*)/) {
    >> > print "$1\$\n";
    >> > }

    >>
    >> With the * quantifier, the regex seems to behave non-greedy, though.

    >
    > According to the manual, *? is non-greedy.
    > Why * is also non-greedy?



    Greediness is not involved here.

    (Greedy vs. non-greedy never changes whether a match will succeed or fail.
    It is simply a "tie breaker" used when the regex engine can match more
    than one way at the current pos()ition.
    )

    There are 2 primary issues with this OP's problem: writing a pattern
    where everything is optional, and that regexes match as early as possible
    from left to right.

    If you write a pattern where everything is optional, then it will match
    the empty string, which in turn means that it would match *every* string
    you can think of.

    The left-to-right evaluation of the pattern seems to be buried
    a bit in perlre.pod:

    The above recipes describe the ordering of matches I<at a given position>.
    One more rule is needed to understand how a match is determined for the
    whole regular expression: a match at an earlier position is always better
    than a match at a later position.


    --
    Tad McClellan
    email: perl -le "print scalar reverse qq/moc.noitatibaher\100cmdat/"
    Tad J McClellan, Jun 22, 2008
    #7
  8. On Jun 22, 8:00 am, Tad J McClellan <> wrote:
    > Peng Yu <> wrote:
    > > On Jun 21, 9:39 pm, Gunnar Hjalmarsson <> wrote:
    > >> Peng Yu wrote:
    > >> > If I used the uncommented if-statement, I would get no match.

    >
    > >> Not true. $1 is defined, so the regex does match.

    >
    > >> > $string="a namespace a { namespace b { namespace c { ";

    >
    > >> > #if ($string =~ /\s*((namespace\s+\w(\w|\d)*\s*\{\s*)+)/) {
    > >> > if ($string =~ /\s*((namespace\s+\w(\w|\d)*\s*\{\s*)*)/) {
    > >> > print "$1\$\n";
    > >> > }

    >
    > >> With the * quantifier, the regex seems to behave non-greedy, though.

    >
    > > According to the manual, *? is non-greedy.
    > > Why * is also non-greedy?

    >
    > Greediness is not involved here.
    >
    > (Greedy vs. non-greedy never changes whether a match will succeed or fail.
    > It is simply a "tie breaker" used when the regex engine can match more
    > than one way at the current pos()ition.
    > )
    >
    > There are 2 primary issues with this OP's problem: writing a pattern
    > where everything is optional, and that regexes match as early as possible
    > from left to right.
    >
    > If you write a pattern where everything is optional, then it will match
    > the empty string, which in turn means that it would match *every* string
    > you can think of.
    >
    > The left-to-right evaluation of the pattern seems to be buried
    > a bit in perlre.pod:
    >
    > The above recipes describe the ordering of matches I<at a given position>.
    > One more rule is needed to understand how a match is determined for the
    > whole regular expression: a match at an earlier position is always better
    > than a match at a later position.
    >


    I still prefer to think of this as another
    aspect of greediness: * can be greedy
    but only as greedy as needed to get the
    earliest match. Thus, even greed embraces the cardinal Perl virtue of
    laziness....

    --
    Charles DeRykus
    comp.llang.perl.moderated, Jun 23, 2008
    #8
  9. Peng Yu

    Ted Zlatanov Guest

    On Sun, 22 Jun 2008 20:41:02 -0700 (PDT) "comp.llang.perl.moderated" <> wrote:

    clpm> I still prefer to think of this as another aspect of greediness: *
    clpm> can be greedy but only as greedy as needed to get the earliest
    clpm> match. Thus, even greed embraces the cardinal Perl virtue of
    clpm> laziness....

    I'd call that opportunism, not laziness.

    "The two cardinal virtues of Perl are TMTOWTDI and laziness and
    opportunism... No, no. The THREE cardinal virtues of Perl are TMTOWTDI
    and laziness and opportunism and DWIM... DAMN IT... The FOUR cardinal
    virtues of Perl are... etc."

    Ted
    Ted Zlatanov, Jun 23, 2008
    #9
  10. Peng Yu

    Guest

    Peng Yu <> wrote:
    > On Jun 21, 9:39 pm, Gunnar Hjalmarsson <> wrote:
    > > Peng Yu wrote:
    > > > If I used the uncommented if-statement, I would get no match.

    > >
    > > Not true. $1 is defined, so the regex does match.
    > >
    > > > $string="a namespace a { namespace b { namespace c { ";

    > >
    > > > #if ($string =~ /\s*((namespace\s+\w(\w|\d)*\s*\{\s*)+)/) {
    > > > if ($string =~ /\s*((namespace\s+\w(\w|\d)*\s*\{\s*)*)/) {
    > > > print "$1\$\n";
    > > > }

    > >
    > > With the * quantifier, the regex seems to behave non-greedy, though.

    >
    > According to the manual, *? is non-greedy.
    > Why * is also non-greedy?


    It depends on what you mean. "Greedy" in CS generally means you make
    locally optimal decisions, rather than looking for globally optimal ones.
    But what is considered "optimal" in the local matching of a regex?

    In this sense, it is greedy either way, in that it still optimizes locally
    rather than globally. It is just that what we consider optimal changes
    with the addition of ?.

    At this point, perhaps they revert from a CS meaning to a moral/political
    meaning--greedy no longer means local vs. global, now it means as much as
    possible vs. as little as possible.

    Xho

    --
    -------------------- http://NewsReader.Com/ --------------------
    The costs of publication of this article were defrayed in part by the
    payment of page charges. This article must therefore be hereby marked
    advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate
    this fact.
    , Jun 23, 2008
    #10
  11. Peng Yu

    MSwanberg Guest

    On Jun 21, 9:04 pm, Peng Yu <> wrote:
    > Hi,
    >
    > If I used the uncommented if-statement, I would get no match. If I
    > used the commend if statement otherwise, I would have the following
    > string as the output. I'm wondering why the regular expression with *
    > does not match anything?
    >
    >  namespace a { namespace b { namespace c {
    >
    > Thanks,
    > Peng
    >
    > $string="a namespace a { namespace b { namespace c { ";
    >
    > #if ($string =~ /\s*((namespace\s+\w(\w|\d)*\s*\{\s*)+)/) {
    > if ($string =~ /\s*((namespace\s+\w(\w|\d)*\s*\{\s*)*)/) {
    >   print "$1\$\n";
    >
    >
    >
    > }- Hide quoted text -
    >
    > - Show quoted text -



    I changed it to

    if ($string =~ /\s*(namespace\s+\w(\w|\d)*\s*\{\s*)/) {
    print "$1\$\n";
    }

    and it seems to work okay.

    What exactly are you trying to do?

    -Mike
    MSwanberg, Jun 25, 2008
    #11
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. VSK
    Replies:
    2
    Views:
    2,289
  2. jakk
    Replies:
    4
    Views:
    12,143
  3. =?iso-8859-1?B?bW9vcJk=?=

    Matching abitrary expression in a regular expression

    =?iso-8859-1?B?bW9vcJk=?=, Dec 1, 2005, in forum: Java
    Replies:
    8
    Views:
    846
    Alan Moore
    Dec 2, 2005
  4. GIMME
    Replies:
    3
    Views:
    11,950
    vforvikash
    Dec 29, 2008
  5. Guru
    Replies:
    4
    Views:
    96
Loading...

Share This Page