Question about Quantifiers in java Regular expression

Discussion in 'Java' started by NeoGeoSNK, Mar 2, 2008.

  1. NeoGeoSNK

    NeoGeoSNK Guest

    Hello,
    I have learned Java Regular expression for a long time, but still
    confused about Quantifiers:

    import java.util.regex.*;
    public class NRGRegex{
    public static void main(String[] args){
    Pattern p = Pattern.compile("a??");
    String a = "aaa";
    Matcher m = p.matcher(a);
    while(m.find()){
    System.out.println("found char = " + m.group() + " at " + m.start()
    + " and " + m.end()); }
    }
    }

    the output result is:
    found char = at 0 and 0
    found char = at 1 and 1
    found char = at 2 and 2
    found char = at 3 and 3
    here "a??" is Reluctant quantifiers but why all char 'a' not match
    successful?

    when I use greedy quantifiers Pattern p = Pattern.compile("a?");
    the output result is:
    found char = a at 0 and 1
    found char = a at 1 and 2
    found char = a at 2 and 3
    found char = at 3 and 3

    I think greedy quantifiers first eat whole string "aaa" at a time,
    but why the emtry char at (0,0) (1,1) (2,2) can't match successful
    compare with Reluctant quantifiers ?

    Thanks!
     
    NeoGeoSNK, Mar 2, 2008
    #1
    1. Advertising

  2. NeoGeoSNK wrote:
    > Hello,
    > I have learned Java Regular expression for a long time, but still
    > confused about Quantifiers:
    >
    > import java.util.regex.*;
    > public class NRGRegex{
    > public static void main(String[] args){
    > Pattern p = Pattern.compile("a??");
    > String a = "aaa";
    > Matcher m = p.matcher(a);
    > while(m.find()){
    > System.out.println("found char = " + m.group() + " at " + m.start()
    > + " and " + m.end()); }
    > }
    > }
    >
    > the output result is:
    > found char = at 0 and 0
    > found char = at 1 and 1
    > found char = at 2 and 2
    > found char = at 3 and 3
    > here "a??" is Reluctant quantifiers but why all char 'a' not match
    > successful?


    The definition of "a?" means that either a is matched or it isn't.
    Without a quantifier, it attempts to match a first and only omit the a
    when it can't match. However, you specified the reluctant quantifier,
    which makes the `?' operator attempt to not match first.

    Psuedocode for "a?":
    try to match `a' and then the rest of the regex
    if match fails:
    try to match nothing and rest of regex
    return result of match
    else:
    return true

    For "a??":
    try to match nothing and then the rest of the regex
    if match fails:
    try to match `a' and rest of regex
    return result of match
    else:
    return true

    Since "a??" is the full regex, the first attempt (to match nothing) will
    succeed at every point, and the fall back of matching `a' will never occur.

    > when I use greedy quantifiers Pattern p = Pattern.compile("a?");
    > the output result is:
    > found char = a at 0 and 1
    > found char = a at 1 and 2
    > found char = a at 2 and 3
    > found char = at 3 and 3
    >
    > I think greedy quantifiers first eat whole string "aaa" at a time,
    > but why the emtry char at (0,0) (1,1) (2,2) can't match successful
    > compare with Reluctant quantifiers ?


    Greedy means, essentially, to assume that a match will work and only
    unmatch a character if it doesn't work. Reluctant quantifiers will
    attempt to match the rest of the regex and only match more if it has to.

    A typical example is this:
    Finding a closing parenthesis in an arithmetic expression (can't handle
    nested):
    "(1+4)*5-6/(1+9)": the obvious regex "\\(.*\\)" will match the entire
    string, whereas "\\(.*?\\)" will match only "(1+4)".

    If you want to match "aaa", the regex "a*" or "a+" will do so.

    Finally, there is the possessive quantifier, which refuses to backtrack
    on failed matches. I can imagine that there are times when this would be
    helpful, but none that I can think of off the top of my head...

    --
    Beware of bugs in the above code; I have only proved it correct, not
    tried it. -- Donald E. Knuth
     
    Joshua Cranmer, Mar 2, 2008
    #2
    1. Advertising

  3. NeoGeoSNK

    NeoGeoSNK Guest

    On Mar 3, 2:40 am, Joshua Cranmer <> wrote:
    > NeoGeoSNK wrote:
    > > Hello,
    > > I have learned Java Regular expression for a long time, but still
    > > confused about Quantifiers:

    >
    > > import java.util.regex.*;
    > > public class NRGRegex{
    > > public static void main(String[] args){
    > > Pattern p = Pattern.compile("a??");
    > > String a = "aaa";
    > > Matcher m = p.matcher(a);
    > > while(m.find()){
    > > System.out.println("found char = " + m.group() + " at " + m.start()
    > > + " and " + m.end()); }
    > > }
    > > }

    >
    > > the output result is:
    > > found char = at 0 and 0
    > > found char = at 1 and 1
    > > found char = at 2 and 2
    > > found char = at 3 and 3
    > > here "a??" is Reluctant quantifiers but why all char 'a' not match
    > > successful?

    >
    > The definition of "a?" means that either a is matched or it isn't.
    > Without a quantifier, it attempts to match a first and only omit the a
    > when it can't match. However, you specified the reluctant quantifier,
    > which makes the `?' operator attempt to not match first.
    >
    > Psuedocode for "a?":
    > try to match `a' and then the rest of the regex
    > if match fails:
    > try to match nothing and rest of regex
    > return result of match
    > else:
    > return true
    >
    > For "a??":
    > try to match nothing and then the rest of the regex
    > if match fails:
    > try to match `a' and rest of regex
    > return result of match
    > else:
    > return true
    >
    > Since "a??" is the full regex, the first attempt (to match nothing) will
    > succeed at every point, and the fall back of matching `a' will never occur.
    >
    > > when I use greedy quantifiers Pattern p = Pattern.compile("a?");
    > > the output result is:
    > > found char = a at 0 and 1
    > > found char = a at 1 and 2
    > > found char = a at 2 and 3
    > > found char = at 3 and 3

    >
    > > I think greedy quantifiers first eat whole string "aaa" at a time,
    > > but why the emtry char at (0,0) (1,1) (2,2) can't match successful
    > > compare with Reluctant quantifiers ?

    >
    > Greedy means, essentially, to assume that a match will work and only
    > unmatch a character if it doesn't work. Reluctant quantifiers will
    > attempt to match the rest of the regex and only match more if it has to.
    >
    > A typical example is this:
    > Finding a closing parenthesis in an arithmetic expression (can't handle
    > nested):
    > "(1+4)*5-6/(1+9)": the obvious regex "\\(.*\\)" will match the entire
    > string, whereas "\\(.*?\\)" will match only "(1+4)".
    >
    > If you want to match "aaa", the regex "a*" or "a+" will do so.
    >
    > Finally, there is the possessive quantifier, which refuses to backtrack
    > on failed matches. I can imagine that there are times when this would be
    > helpful, but none that I can think of off the top of my head...
    >
    > --
    > Beware of bugs in the above code; I have only proved it correct, not
    > tried it. -- Donald E. Knuth



    Thanks, It's very clear,
    > The definition of "a?" means that either a is matched or it isn't.
    > Without a quantifier, it attempts to match a first and only omit the a
    > when it can't match. However, you specified the reluctant quantifier,
    > which makes the `?' operator attempt to not match first.

    so do you mean:
    X? meaning X,once or not at all
    but
    X?? meaning not at all or X,once

    one question is:
    > "(1+4)*5-6/(1+9)": the obvious regex "\\(.*\\)" will match the entire
    > string, whereas "\\(.*?\\)" will match only "(1+4)".
    >

    I have test it, and "\\(.*?\\)" match both (1+4) and (1+9), why do you
    think it only match (1+4) ?

    Thanks for your repay again.
     
    NeoGeoSNK, Mar 3, 2008
    #3
  4. NeoGeoSNK

    Lars Enderin Guest

    NeoGeoSNK skrev:
    > On Mar 3, 2:40 am, Joshua Cranmer <> wrote:
    >> NeoGeoSNK wrote:
    >>> Hello,
    >>> I have learned Java Regular expression for a long time, but still
    >>> confused about Quantifiers:
    >>> import java.util.regex.*;
    >>> public class NRGRegex{
    >>> public static void main(String[] args){
    >>> Pattern p = Pattern.compile("a??");
    >>> String a = "aaa";
    >>> Matcher m = p.matcher(a);
    >>> while(m.find()){
    >>> System.out.println("found char = " + m.group() + " at " + m.start()
    >>> + " and " + m.end()); }
    >>> }
    >>> }

    >>

    > one question is:
    >> "(1+4)*5-6/(1+9)": the obvious regex "\\(.*\\)" will match the entire
    >> string, whereas "\\(.*?\\)" will match only "(1+4)".
    >>

    > I have test it, and "\\(.*?\\)" match both (1+4) and (1+9), why do you
    > think it only match (1+4) ?
    >

    That regexp matches first (1+4), then (1+9). The other regexp matches
    from the first ( up to and including the last ), once.
     
    Lars Enderin, Mar 3, 2008
    #4
  5. NeoGeoSNK wrote:
    > On Mar 3, 2:40 am, Joshua Cranmer <> wrote:
    >> The definition of "a?" means that either a is matched or it isn't.
    >> Without a quantifier, it attempts to match a first and only omit the a
    >> when it can't match. However, you specified the reluctant quantifier,
    >> which makes the `?' operator attempt to not match first.

    > so do you mean:
    > X? meaning X,once or not at all
    > but
    > X?? meaning not at all or X,once


    Right.

    >> "(1+4)*5-6/(1+9)": the obvious regex "\\(.*\\)" will match the entire
    >> string, whereas "\\(.*?\\)" will match only "(1+4)".
    >>

    > I have test it, and "\\(.*?\\)" match both (1+4) and (1+9), why do you
    > think it only match (1+4) ?


    Oops, I should have been clearer. "(1+9)" will be matched as well. What
    I had intended to say was that the first match would not match the whole
    string but merely the indicated substring.

    --
    Beware of bugs in the above code; I have only proved it correct, not
    tried it. -- Donald E. Knuth
     
    Joshua Cranmer, Mar 3, 2008
    #5
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. VSK
    Replies:
    2
    Views:
    2,307
  2. Replies:
    1
    Views:
    387
    Joshua Cranmer
    Sep 9, 2007
  3. Nathan Sokalski

    JavaScript RegExp Quantifiers

    Nathan Sokalski, Jun 11, 2008, in forum: ASP .Net
    Replies:
    2
    Views:
    358
    Hans Kesting
    Jun 13, 2008
  4. Johnathan Smith

    rexular expression quantifiers

    Johnathan Smith, Jan 7, 2008, in forum: Ruby
    Replies:
    4
    Views:
    114
    Giles Bowkett
    Jan 7, 2008
  5. Replies:
    0
    Views:
    140
Loading...

Share This Page