Examples of using "reluctant" subexpressions in regexps?

D

david.karr

I'm preparing a presentation about regular expressions to a group of my
colleagues. I'm looking for examples of expressions using the
"reluctant" quantifier (the "?"). I'm not asking about the syntax,
just real (or simulated) examples of why you would want to use this.
Note that I'm not referring to using this on "simple" expressions, like
"a?", but on already quantified expressions, like "a+?" or "a*?", or
even "a{n}?". I wouldn't even mention the latter one, as it doesn't
seem to make sense to me, but it's used as an example (with no real
explanation) in the "perlre" documentation page.
 
L

Lasse Reichstein Nielsen

I'm preparing a presentation about regular expressions to a group of my
colleagues. I'm looking for examples of expressions using the
"reluctant" quantifier (the "?"). I'm not asking about the syntax,
just real (or simulated) examples of why you would want to use this.

First of all, I don't believe there is any regular expression that can
be written with a non-greedy match, that can't also be written using
only greedy matches (although perhaps with look-ahead).

One example is matching all HTML tags (e.g., for removal):
/<.*?>/g

Generally, non-greedy matching is useful when you want to match a
delimited substring several times inside a string. With greedy match,
you match from the first start marker to the last end marker, and you
have to write your regexp to avoid matching more than one end marker.
With a non-greedy match, as above, you just match from start to the
first end.

Good luck
/L
 
H

HK

I'm preparing a presentation about regular expressions to a group of my
colleagues. I'm looking for examples of expressions using the
"reluctant" quantifier (the "?").

Under certain circumstances it is nice to match
a whole XML element, even with nested elements,
as long as it does not contain itself.

There are circumstances where this, however,
does not work. The example given on

http://www.ebi.ac.uk/Rebholz-srv/whatizit/monq-doc/monq/jfa/doc-files/resyntax.html#greed

under "Non Greedy Matching vs. Shortest Match"
may clarify it a bit. The "shortest match"
operator is, however, not available in
java.util.regex.

On said page is a nasty typo: it should read:

"just because the longer match satisfies
the regular expression, while stopping
at the first </tag> would **not** match."

You'll note where the 'not' is missing.-)

Harald.
 
A

Alan Moore

I'm preparing a presentation about regular expressions to a group of my
colleagues. I'm looking for examples of expressions using the
"reluctant" quantifier (the "?"). I'm not asking about the syntax,
just real (or simulated) examples of why you would want to use this.
Note that I'm not referring to using this on "simple" expressions, like
"a?", but on already quantified expressions, like "a+?" or "a*?", or
even "a{n}?". I wouldn't even mention the latter one, as it doesn't
seem to make sense to me, but it's used as an example (with no real
explanation) in the "perlre" documentation page.

Heh! I'm surprised they put that in there with no disclaimer. True,
it's correct *syntax*, but the question mark has no effect. In the
javadoc for java.util.regex.Pattern, they do the same thing with the
possessive '+':

X{n}+ X, exactly n times

....which is just as pointless as X{n}?.
 
?

.

I'm preparing a presentation about regular expressions to a group of my
colleagues. I'm looking for examples of expressions using the
"reluctant" quantifier (the "?"). I'm not asking about the syntax,
just real (or simulated) examples of why you would want to use this.
Note that I'm not referring to using this on "simple" expressions, like
"a?", but on already quantified expressions, like "a+?" or "a*?", or
even "a{n}?". I wouldn't even mention the latter one, as it doesn't
seem to make sense to me, but it's used as an example (with no real
explanation) in the "perlre" documentation page.

The best example I have seen is:

This is <bold>an example</bold> of HTML.

If we were looking for the tag <bold> the regular expression <.*> will
match <bold>an example</bold>. Too long because .* is greedy. By changing
the regular expression to <.*?> we get the desired results.

This is actually the only place I've seen the reluctant quantifier used.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,754
Messages
2,569,522
Members
44,995
Latest member
PinupduzSap

Latest Threads

Top