Question about Quantifiers in java Regular expression

N

NeoGeoSNK

Hello,
I have learned Java Regular expression for a long time, but still
confused about Quantifiers:

import java.util.regex.*;
public class NRGRegex{
public static void main(String[] args){
Pattern p = Pattern.compile("a??");
String a = "aaa";
Matcher m = p.matcher(a);
while(m.find()){
System.out.println("found char = " + m.group() + " at " + m.start()
+ " and " + m.end()); }
}
}

the output result is:
found char = at 0 and 0
found char = at 1 and 1
found char = at 2 and 2
found char = at 3 and 3
here "a??" is Reluctant quantifiers but why all char 'a' not match
successful?

when I use greedy quantifiers Pattern p = Pattern.compile("a?");
the output result is:
found char = a at 0 and 1
found char = a at 1 and 2
found char = a at 2 and 3
found char = at 3 and 3

I think greedy quantifiers first eat whole string "aaa" at a time,
but why the emtry char at (0,0) (1,1) (2,2) can't match successful
compare with Reluctant quantifiers ?

Thanks!
 
J

Joshua Cranmer

NeoGeoSNK said:
Hello,
I have learned Java Regular expression for a long time, but still
confused about Quantifiers:

import java.util.regex.*;
public class NRGRegex{
public static void main(String[] args){
Pattern p = Pattern.compile("a??");
String a = "aaa";
Matcher m = p.matcher(a);
while(m.find()){
System.out.println("found char = " + m.group() + " at " + m.start()
+ " and " + m.end()); }
}
}

the output result is:
found char = at 0 and 0
found char = at 1 and 1
found char = at 2 and 2
found char = at 3 and 3
here "a??" is Reluctant quantifiers but why all char 'a' not match
successful?

The definition of "a?" means that either a is matched or it isn't.
Without a quantifier, it attempts to match a first and only omit the a
when it can't match. However, you specified the reluctant quantifier,
which makes the `?' operator attempt to not match first.

Psuedocode for "a?":
try to match `a' and then the rest of the regex
if match fails:
try to match nothing and rest of regex
return result of match
else:
return true

For "a??":
try to match nothing and then the rest of the regex
if match fails:
try to match `a' and rest of regex
return result of match
else:
return true

Since "a??" is the full regex, the first attempt (to match nothing) will
succeed at every point, and the fall back of matching `a' will never occur.
when I use greedy quantifiers Pattern p = Pattern.compile("a?");
the output result is:
found char = a at 0 and 1
found char = a at 1 and 2
found char = a at 2 and 3
found char = at 3 and 3

I think greedy quantifiers first eat whole string "aaa" at a time,
but why the emtry char at (0,0) (1,1) (2,2) can't match successful
compare with Reluctant quantifiers ?

Greedy means, essentially, to assume that a match will work and only
unmatch a character if it doesn't work. Reluctant quantifiers will
attempt to match the rest of the regex and only match more if it has to.

A typical example is this:
Finding a closing parenthesis in an arithmetic expression (can't handle
nested):
"(1+4)*5-6/(1+9)": the obvious regex "\\(.*\\)" will match the entire
string, whereas "\\(.*?\\)" will match only "(1+4)".

If you want to match "aaa", the regex "a*" or "a+" will do so.

Finally, there is the possessive quantifier, which refuses to backtrack
on failed matches. I can imagine that there are times when this would be
helpful, but none that I can think of off the top of my head...
 
N

NeoGeoSNK

NeoGeoSNK said:
Hello,
I have learned Java Regular expression for a long time, but still
confused about Quantifiers:
import java.util.regex.*;
public class NRGRegex{
public static void main(String[] args){
Pattern p = Pattern.compile("a??");
String a = "aaa";
Matcher m = p.matcher(a);
while(m.find()){
System.out.println("found char = " + m.group() + " at " + m.start()
+ " and " + m.end()); }
}
}
the output result is:
found char = at 0 and 0
found char = at 1 and 1
found char = at 2 and 2
found char = at 3 and 3
here "a??" is Reluctant quantifiers but why all char 'a' not match
successful?

The definition of "a?" means that either a is matched or it isn't.
Without a quantifier, it attempts to match a first and only omit the a
when it can't match. However, you specified the reluctant quantifier,
which makes the `?' operator attempt to not match first.

Psuedocode for "a?":
try to match `a' and then the rest of the regex
if match fails:
try to match nothing and rest of regex
return result of match
else:
return true

For "a??":
try to match nothing and then the rest of the regex
if match fails:
try to match `a' and rest of regex
return result of match
else:
return true

Since "a??" is the full regex, the first attempt (to match nothing) will
succeed at every point, and the fall back of matching `a' will never occur.
when I use greedy quantifiers Pattern p = Pattern.compile("a?");
the output result is:
found char = a at 0 and 1
found char = a at 1 and 2
found char = a at 2 and 3
found char = at 3 and 3
I think greedy quantifiers first eat whole string "aaa" at a time,
but why the emtry char at (0,0) (1,1) (2,2) can't match successful
compare with Reluctant quantifiers ?

Greedy means, essentially, to assume that a match will work and only
unmatch a character if it doesn't work. Reluctant quantifiers will
attempt to match the rest of the regex and only match more if it has to.

A typical example is this:
Finding a closing parenthesis in an arithmetic expression (can't handle
nested):
"(1+4)*5-6/(1+9)": the obvious regex "\\(.*\\)" will match the entire
string, whereas "\\(.*?\\)" will match only "(1+4)".

If you want to match "aaa", the regex "a*" or "a+" will do so.

Finally, there is the possessive quantifier, which refuses to backtrack
on failed matches. I can imagine that there are times when this would be
helpful, but none that I can think of off the top of my head...


Thanks, It's very clear,
The definition of "a?" means that either a is matched or it isn't.
Without a quantifier, it attempts to match a first and only omit the a
when it can't match. However, you specified the reluctant quantifier,
which makes the `?' operator attempt to not match first.
so do you mean:
X? meaning X,once or not at all
but
X?? meaning not at all or X,once

one question is:
"(1+4)*5-6/(1+9)": the obvious regex "\\(.*\\)" will match the entire
string, whereas "\\(.*?\\)" will match only "(1+4)".
I have test it, and "\\(.*?\\)" match both (1+4) and (1+9), why do you
think it only match (1+4) ?

Thanks for your repay again.
 
L

Lars Enderin

NeoGeoSNK skrev:
NeoGeoSNK said:
Hello,
I have learned Java Regular expression for a long time, but still
confused about Quantifiers:
import java.util.regex.*;
public class NRGRegex{
public static void main(String[] args){
Pattern p = Pattern.compile("a??");
String a = "aaa";
Matcher m = p.matcher(a);
while(m.find()){
System.out.println("found char = " + m.group() + " at " + m.start()
+ " and " + m.end()); }
}
}
one question is:
"(1+4)*5-6/(1+9)": the obvious regex "\\(.*\\)" will match the entire
string, whereas "\\(.*?\\)" will match only "(1+4)".
I have test it, and "\\(.*?\\)" match both (1+4) and (1+9), why do you
think it only match (1+4) ?
That regexp matches first (1+4), then (1+9). The other regexp matches
from the first ( up to and including the last ), once.
 
J

Joshua Cranmer

NeoGeoSNK said:
so do you mean:
X? meaning X,once or not at all
but
X?? meaning not at all or X,once
Right.

I have test it, and "\\(.*?\\)" match both (1+4) and (1+9), why do you
think it only match (1+4) ?

Oops, I should have been clearer. "(1+9)" will be matched as well. What
I had intended to say was that the first match would not match the whole
string but merely the indicated substring.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,007
Latest member
obedient dusk

Latest Threads

Top