JAVA Regular Expression

R

Remko Silvis

Hi can anyone help me understand the following
I have a regular expression that has to match for example:

<format zipped="false" encrypted="false" />

My expression:
Pattern singleTag = Pattern.compile("< *(\\w+) *(\\w+\\=\"\\w+\" *)*/>");
matches only 3 Groups

group(0) = <format zipped="false" encrypted="false" />
group(1) = format
group(2) = encrypted="false"

it skippes the 'zipped="false"' argument.
When i change the regexp to fit this string exactly,
like this:
Pattern singleTag = Pattern.compile("< *(\\w+) *(\\w+\\=\"\\w+\"
*)(\\w+\\=\"\\w+\" *)/>");
i get the four groups i expect:

group(0) = <format zipped="false" encrypted="false" />
group(1) = format
group(2) = zipped="false"
group(4) = encrypted="false"

it seams that repeating a group '()*' does not create a new group, but
overwrites the previous one.

Can anyone confirm this?

Thanx,
Remko
 
R

Robert Klemme

Remko said:
Hi can anyone help me understand the following
I have a regular expression that has to match for example:

<format zipped="false" encrypted="false" />

My expression:
Pattern singleTag = Pattern.compile("< *(\\w+) *(\\w+\\=\"\\w+\"
*)*/>"); matches only 3 Groups

group(0) = <format zipped="false" encrypted="false" />
group(1) = format
group(2) = encrypted="false"

Your regexp has only two brackets so you only get two groups (plus group 0
for the whole match).
it skippes the 'zipped="false"' argument.
When i change the regexp to fit this string exactly,
like this:
Pattern singleTag = Pattern.compile("< *(\\w+) *(\\w+\\=\"\\w+\"
*)(\\w+\\=\"\\w+\" *)/>");
i get the four groups i expect:

group(0) = <format zipped="false" encrypted="false" />
group(1) = format
group(2) = zipped="false"
group(4) = encrypted="false"

it seams that repeating a group '()*' does not create a new group, but
overwrites the previous one.

Can anyone confirm this?

This is standard regexp bahavior.

Two solutions:

1. If you know there are only two attributes you can change your regexp to
match two attributes and directly access them.

2. If there are 0 to n attributes (most likely there are) the easiest is
to create a regexp that will match the whole <...> expression and have a
group extract all attributes. Then create a second regexp which
recognizes a single attribute and use that to iterate over the group that
recognized all attributes.

Kind regards

robert
 
G

Greg R. Broderick

Hi can anyone help me understand the following
I have a regular expression that has to match for example:

<format zipped="false" encrypted="false" />

Why are you using regular expressions to parse XML, when there are an
abundance of available XML-specific parsers (DOM, SAX, STAX) available
for you to use, which would very likely be simpler and more reliable?

Cheers
GRB
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,007
Latest member
obedient dusk

Latest Threads

Top