java pattern matcher

M

Moiristo

I have a problem with the java Matcher.

Consider the following regex: <<<(.)+>>>
Now, consider the following String containing two occurrences of the
above regex:

SELECT TOP <<<Upper limit>>> FROM (SELECT TOP <<<Lower limit>>> FROM person)

However, when I use Matcher.find(), it does not what I expect it to do.
Instead of matching the two substrings, it finds only one, namely:

'Upper limit>>> FROM (SELECT TOP <<<Lower limit'

This is of course correct, but not what I intended. Can someone help me
to solve this?
 
O

Oliver Wong

Moiristo said:
Stefan said:
Try
<<<(.)+?>>>
or
<<<([^>])+>>>

But you actually might want <<<(.+?)>>>

or

<<<([^>]+)>>>

, respectively.

Thank you! Could you please explain why this works and not the regex I
used?

I guess Java's regular expression engine is by default greedy (most RE
engines are greedy by default). I'm guessing the first alternative, (.+?),
the '?' acts as a modifer on '+', telling it not to be greedy. That is,
instead of matching the longest possible substring, it tries to match the
shortest possible substring.

In the second alternative, instead of accepting "<<<" followed by
anything, followed by ">>>", it accepts "<<<" followed by anything except
">", followed by ">>>".

- Oliver
 
M

Moiristo

Stefan said:
Moiristo said:
<<<(.+?)>>>
<<<([^>]+)>>>
Thank you! Could you please explain why this works and not the regex I used?

The first suggestion should be used, because the second one
fails to match "<<<ab<c>def>>>".

Oliver by now has explained the expressions. See also:

http://download.java.net/jdk6/docs/api/java/util/regex/Pattern.html

Thank you both. I knew it was something like that, but I didn't know
about modifiers in regex's; I only knew that '?' stood for 'once or not
at all'.
 
L

lordy

Moiristo said:
Stefan said:
(e-mail address removed)-berlin.de (Stefan Ram) writes:
Try
<<<(.)+?>>>
or
<<<([^>])+>>>

But you actually might want <<<(.+?)>>>

or

<<<([^>]+)>>>

, respectively.

Thank you! Could you please explain why this works and not the regex I
used?

I guess Java's regular expression engine is by default greedy (most RE
engines are greedy by default). I'm guessing the first alternative, (.+?),
the '?' acts as a modifer on '+', telling it not to be greedy. That is,
instead of matching the longest possible substring, it tries to match the
shortest possible substring.

In the second alternative, instead of accepting "<<<" followed by
anything, followed by ">>>", it accepts "<<<" followed by anything except
">", followed by ">>>".

- Oliver

And the latter is more efficient. The first will do a lot of
backtracking given the expected input strings.

Lordy
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,766
Messages
2,569,569
Members
45,042
Latest member
icassiem

Latest Threads

Top