java.util.regex and multiple matches

A

argabalala

Hi,

I'm trying to catch all the values matched by subgroups
of the following regex (it's a simplified version, x and y
originally are regex as well).


(?:(?:(x)|(y)),)*(x)(?:,(?:(x)|(y)))*|(a)

This expresion means wether :
-I must get an "x" somewhere, and a set (perhaps several) of "x" and
"y"
eventually empty.
-I get a "a"

Sadly, I only catch the last elements that matched the
subgroups. Hence, I loose some "x" and some "y".
Therefore, I use a trick today to do the job :
I globally match the expression using java.util.regex
and then I split the string and match resulting individual
elements to "x", "y" and "a".

So I'm certainly doing the job done twice, because the
first global match must have parsed each "x", "y" and "a"
individually.

Is there a better way to acheive this with java.util.regex ?
Or with any other robust package ?

Thanks for your help,
 
R

Robert Klemme

Roedy Green said:
Adding more () is the first thing to try.

This doesn't help with repetition operators; you always get either one or
all of the repeated parts but not all individually. IMHO the OP's approach
is the best he can do: match the overall regexp and then use a second regexp
to match all occurrences of something in a group.

Kind regards

robert
 
J

John C. Bollinger

Robert said:
This doesn't help with repetition operators; you always get either one
or all of the repeated parts but not all individually. IMHO the OP's
approach is the best he can do: match the overall regexp and then use a
second regexp to match all occurrences of something in a group.

Yes, and with that being the case I'd write the regex itself like this:

a|(?(?:[xy],)*x(?:,y)*)

It matches exactly the same strings and is much simpler. With fewer
alternations and the fixed alternative first it may also run a bit
faster, if that happened to be an issue. You can always get the matched
input subsequence from the Matcher's group(0), if you happen to be
accepting subsequence matches (Matcher.find(), Matcher.lookingAt()), so
you don't need to group the whole pattern. There might be more pattern
optimizations available if this were really performance critical; I
applied only those that also simplify and clarify the pattern.
 
A

argabalala

Robert Klemme a écrit :
IMHO the OP's approach
is the best he can do: match the overall regexp and then use a second regexp
to match all occurrences of something in a group.
I'm disappointed to see my approach confirmed to be a valid one :)

Thanks to all posters for their comments.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,756
Messages
2,569,535
Members
45,007
Latest member
OrderFitnessKetoCapsules

Latest Threads

Top