Extracting substring with regexp


A

Alex

How to extract substring with regexp when we have start and end for
substring?
For example I want to find what is between "abc" and "xyz" in the
String.

Pattern p = Pattern.compile("abc*xyz");
Matcher m = p.matcher("aaaaaaaaaabc123xyzzzzzzzzzzzzz");

but m.matches() returns false and can't find my patter.

How can I get these "123" in this example?

Alex Kizub.
 
Ad

Advertisements

L

Lord Zoltar

How to extract substring with regexp when we have start and end for
substring?
For example I want to find what is between "abc" and "xyz" in the
String.

  Pattern p = Pattern.compile("abc*xyz");
  Matcher m = p.matcher("aaaaaaaaaabc123xyzzzzzzzzzzzzz");

 but m.matches() returns false and can't find my patter.

 How can I get these "123" in this example?

Alex Kizub.

You could try grouping:

(a+bc)(.+)(xyz+)

group 2 is the one that would have what you want.
 
E

Eric Sosman

Alex said:
How to extract substring with regexp when we have start and end for
substring?
For example I want to find what is between "abc" and "xyz" in the
String.

Pattern p = Pattern.compile("abc*xyz");
Matcher m = p.matcher("aaaaaaaaaabc123xyzzzzzzzzzzzzz");

but m.matches() returns false and can't find my patter.

How can I get these "123" in this example?

First, correct your regexp: As written, it looks for
an a, a b, any number of c's (including zero), then x,
y, and z. You probably want "abc.*xyz" instead.

Second, realize that matches() tries to match the
entire input sequence. So it will fail, because the "aa"
at the start does not match either the original or the
corrected regexp. You probably want the find() method
instead.

Third, since what you are interested is the stuff
between the abc and the xyz, you should indicate your
interest by making that part into a "group." Change the
regexp yet again, this time to "abc(.*)xyz". When find()
returns true, you can then use m.group(1) to retrieve the
part between the parentheses.

Finally, it would be a Really Good Idea for you to read
the Javadoc on the Pattern and Matcher classes, where all
this and more is described, with reasonably comprehensible
examples, too.
 
A

Alex

Eric:
Thanks a lot. You are right.
But here is mismatch in my knowledge again:
I assume that this


Pattern p = Pattern.compile("abc(.*)xyz");
Matcher m = p.matcher("xxxxxabc123xyz789xyzxxxxx");
if (m.find())System.out.println(m.group(1));

should print "123" but, instead, it prints "123xyz789".
How can I force regexp to find first match?

I agree that I should read the Sun documentation. And, trust me, I
did...

Thanks in advance.
Alex Kizub.
 
L

Lord Zoltar

If you're having trouble with regular expressions, it's a lot easier
to test them before add them to your code and compile and run. I like
to use a program called Expresso to test regular expressions.
 
J

Joshua Cranmer

Alex said:
Pattern p = Pattern.compile("abc(.*)xyz");
Matcher m = p.matcher("xxxxxabc123xyz789xyzxxxxx");
if (m.find())System.out.println(m.group(1));

should print "123" but, instead, it prints "123xyz789".
How can I force regexp to find first match?

Short answer: By default, matching will take the longest matching group.
Use "abc(.*?)xyz" instead.

Long answer: The *, +, and ? operators (unqualified) match by first
assuming that the match continues and then backtrack until they fail.
The `?' operator, when concatenated, will override that behavior by
first trying to match without applying the operator and then applying
it. The `+' operator will also override the behavior by prohibiting
backtracking.

"(a*"+operator+")a" on the string "aaa", group 1 matches:
"": aa
"?": a
"+": <failure>
 
Ad

Advertisements

R

Roedy Green

How to extract substring with regexp when we have start and end for
substring?
For example I want to find what is between "abc" and "xyz" in the
String.

Pattern p = Pattern.compile("abc*xyz");
Matcher m = p.matcher("aaaaaaaaaabc123xyzzzzzzzzzzzzz");

but m.matches() returns false and can't find my patter.

How can I get these "123" in this example?

There are two way I can think of:

1. find each piece then use an ordinary substring to extract the
middle.

2. use groups i.e. (...) around the middle piece you want.

see http://mindprod.com/jgloss/regex.html
 

Top