RegEx: Is there such a thing as "non-greedy backwards"?

M

mrclean_ii

Let me explain: If I have a text like
"...target1.....target2...target2..." the pattern
"target1[\s\S]*target2" will match from "target1" to the LAST
"target2". If we slightly change the pattern to
"target1[\s\S]*?target2" the expression becomes non-greedy and the
pattern will match from "target1" to the FIRST "target2".

Now suppose the text is "...target1.....target1...target2..." and I
want to match from the LAST "target1" to the "target2" (what I would
call "non-greedy backwards"). Any help would be appreciated, thank you!
 
A

Alan Moore

There's no built-in mechanism that does this, but you can do it
yourself like this:

Pattern p = Pattern.compile("target1(?:[^t]++|t(?!arget1))*+target2");

In other words, after matching the first token, you look for any
character that's not the first letter of the token, OR that letter as
long as it's not followed by the rest of the token. It's important to
use the aggressive quantifiers ("++" and "*+"), because the regex could
be prohibitively slow without them.
 
M

mrclean_ii

Thanks!

It seems to work on very small strings for me without the aggressive
quantifiers. I need to use it in VB(Script) and there is no Compile
method and the aggressive quantifiers result in runtime errors. I
posted the question here because the regex is pretty similar and I
didn't find anything else for VBScript. Is there a workaround?
 
A

Alan Moore

First, a correction. The regex that I posted wouldn't have worked
anyway (the middle part would gobble up the second token and never give
it back). That technique only works if the first token and the second
token are the same or, with a small modification, if they start with
the same letter. In the case of your example, with tokens of "target1"
and "target2", the regex would be

"target1(?:[^t]++|t(?!arget[12]))*+target2"

For completely different tokens, a more elaborate regex is needed:

"foo(?:[^fb]++|f(?!oo)|b(?!ar))*+bar"

If you don't have aggressive quantifiers, try this version:

"foo(?:[^fb]|f(?!oo)|b(?!ar))*bar"

It's much less efficient because you're only matching "[^fb]" once each
time through, but it will probably be fast enough.
 
M

mrclean_ii

Thanks Alan, it works like a charm. Even the less efficient one is fast
on long strings. You guessed it, I really was looking for the foo/bar
pattern. It looks pretty complicated and I think that the non-greedy
specifier ? should work in both directions (see the first message in
thread).

Thanks again!
 
M

mrclean_ii

Thanks Alan, it works like a charm. Even the less efficient one is fast
on long strings. You guessed it, I really was looking for the foo/bar
pattern. It looks pretty complicated and I think that the non-greedy
specifier ? should work in both directions (see the first message in
thread).

Thanks again!
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top