Parens Matching

K

Ken Kast

Does anyone have a regular expression pattern that would include a test for
balanced parens of arbitrary nestedness?
 
I

IchBin

Ken said:
Does anyone have a regular expression pattern that would include a test for
balanced parens of arbitrary nestedness?

Is arbitrary 'nestedness' some new type of epistemological terminology?

Sorry, just could not stop myself.

--
Thanks in Advance... http://weconsultants.prophp.org
IchBin, Pocono Lake, Pa, USA http://ichbinquotations.awardspace.com
______________________________________________________________________
'If there is one, Knowledge is the "Fountain of Youth"'
-William E. Taylor, Regular Guy (1952-)
 
M

Michael

Does anyone have a regular expression pattern that would include a test for
balanced parens of arbitrary nestedness?

No. No one does.

There's a well known proof in Computer Science theory that it is not
possible to create a regular expression for balanced parentheses.

Look up "the pumping lemma" for details.

Michael
 
M

Mark Jeffcoat

Michael said:
No. No one does.

There's a well known proof in Computer Science theory that it is not
possible to create a regular expression for balanced parentheses.

Michael is quite right. Of course, that proof is silent
on the subject of creating a regular expression that accepts
balanced strings with at most Long.MAX_VALUE open parens.
(And really, most texts will have fewer than half that many.)

While you're waiting for the program you wrote to write that
regular expression to finish, you can pass the time with a
counter that increments on '(', and decrements on ')'.
 
T

Tim Slattery

Michael said:
No. No one does.

There's a well known proof in Computer Science theory that it is not
possible to create a regular expression for balanced parentheses.

OTOH, it would be pretty simple to write a loop that cycles through a
string and keeps track of left and right parens. Add 1 to the counter
when you find a left paren, subtract one for a right paren. If the
counter ever goes negative, throw an error. If it's not zero at the
end of the string, throw an error.

--
Tim Slattery
(e-mail address removed)
http://members.cox.net/slatteryt
 
M

Michael

There's a well known proof in Computer Science theory that it is not
OTOH, it would be pretty simple to write a loop that cycles through a
string and keeps track of left and right parens. Add 1 to the counter
when you find a left paren, subtract one for a right paren. If the
counter ever goes negative, throw an error. If it's not zero at the
end of the string, throw an error.

I guess I should have spelled it out further - you can't match
balanced parentheses *with a regular expression.* Of course, there
are other ways to do it. Either write the kind or program above, or
write a recursive decent parser, (or use a regular expression that
only matches up to N balanced parentheses) etc. Basically, balanced
parentheses aren't a regular grammar, they're a context free grammar,
so you need a more powerful tool. In particular, they're a grammar
somthing like this:
Expression -> nothing | ( Expression )

Sorry if I caused more confusion that I removed. It was late.

Michael
 
D

Daniel Pitts

Does anyone have a regular expression pattern that would include a test for
balanced parens of arbitrary nestedness?

Two ways:

public class NestedParens {
public static void main(String[] args) {
String[] strings = {
"Test case 1",
"(Test case 2)",
"((Test) case ) 3",
")test case 4 (",
};
for (String string: strings) {
System.out.println(string +
" is " + (isBalancedRe(string)
?
"Balanced" : "Mis-matched"));
}

for (String string: strings) {
System.out.println(string +
" is " + (isBalancedCount(string)
?
"Balanced" : "Mis-matched"));
}
}

public static boolean isBalancedRe(String string) {
String last;
do {
last = string;
string = string.replaceAll("\\([^()]*\\)", "");
} while (!last.equals(string));
return !string.matches(".*[()].*");
}

private static boolean isBalancedCount(String string) {
int parens = 0;
for (char c : string.toCharArray()) {
switch (c) {
case ')':
if (--parens < 0) {
return false;
}
break;
case '(':
++parens;
}
}
return parens == 0;
}
}
 
A

Alex Hunsley

Michael said:
No. No one does.

There's a well known proof in Computer Science theory that it is not
possible to create a regular expression for balanced parentheses.

Look up "the pumping lemma" for details.

That thang always made me think of either pumping lemons or pumping
lemmings.

The matched parenthesis problem can be done with a non-deterministic
state machine, but not with a deterministic one.... hence no regular
expression can do it.
 
A

Alex Hunsley

Tim said:
OTOH, it would be pretty simple to write a loop that cycles through a
string and keeps track of left and right parens. Add 1 to the counter
when you find a left paren, subtract one for a right paren. If the
counter ever goes negative, throw an error. If it's not zero at the
end of the string, throw an error.

Although even this isn't correct, in theory... due to the finite memory
of the computer, it is possible to give it a sequence that should pass
(the brackets match), but won't pass, because the computer runs out of
memory. Ok, a little silly, but...
 
D

Daniel Pitts

Does anyone have a regular expression pattern that would include a test for
balanced parens of arbitrary nestedness?

Two ways:

public class NestedParens {
public static void main(String[] args) {
String[] strings = {
"Test case 1",
"(Test case 2)",
"((Test) case ) 3",
")test case 4 (",
};
for (String string: strings) {
System.out.println(string +
" is " + (isBalancedRe(string)
?
"Balanced" : "Mis-matched"));
}

for (String string: strings) {
System.out.println(string +
" is " + (isBalancedCount(string)
?
"Balanced" : "Mis-matched"));
}
}

public static boolean isBalancedRe(String string) {
String last;
do {
last = string;
string = string.replaceAll("\\([^()]*\\)", "");
} while (!last.equals(string));
return !string.matches(".*[()].*");
}

private static boolean isBalancedCount(String string) {
int parens = 0;
for (char c : string.toCharArray()) {
switch (c) {
case ')':
if (--parens < 0) {
return false;
}
break;
case '(':
++parens;
}
}
return parens == 0;
}

}


Or, a little more concisely:
public static boolean isBalancedRe(String string) {
while (string.matches(".*\\([^()]*\\).*")) {
string = string.replaceAll("\\([^()]*\\)", "");
}
return !string.matches(".*[()].*");
}

private static boolean isBalancedCount(String string) {
int parens = 0;
for (char c : string.toCharArray()) {
if (c == '(') {
++parens;
}
if (c == ')' && --parens < 0) {
return false;
}
}
return parens == 0;
}
....

Or, to get down right esoteric:
private static boolean isBalancedCount(String string) {
int parens = 0;
for (char c : string.toCharArray()) {
if ((c == '(' && ++ parens < 0) || c==')' && --parens < 0)
{
return false;
}
}
return parens == 0;
}

CafeBabe points to those who know why that works the way it does.
 
D

Daniel Pitts

That thang always made me think of either pumping lemons or pumping
lemmings.

The matched parenthesis problem can be done with a non-deterministic
state machine, but not with a deterministic one.... hence no regular
expression can do it.

Hmm, I don't think your definition is correct.
Regex isn't always implementedwith a DSM, it might use NDA, which
still doesn't solve it.

In either case, the following is a deterministic state machine solves
the problem.

private static boolean isBalancedCount(String string) {
int parens = 0;
for (char c : string.toCharArray()) {
if (c == '(') {
++parens;
}
if (c == ')' && --parens < 0) {
return false;
}
}
return parens == 0;
}
 
K

Ken Kast

Absolutely never thought I would generate so much conversation. I need to
sit down with a glass of wine and digest this.

I probably should have been more transparent in my original post. I'm not
trying to test a string for balanced parens. I need it to be a match
termination condition. In other words, I have a fairly complex pattern. So
it tells when a match is done. I need to say the match is done if the
character pattern is played out or the match will play out before I balance
parens.

I first wrote this in C#. .Net has extended RegExp that lets you swap named
groups on condition. This effectively lets you model a stack. So you push
left parens on the stack and pop them with right ones. Now I want to
implement the same app in Java. I can certainly assume a low level nesting,
on the order of 3 or 4, so running out of memory or running out of time
aren't concerns. I haven't thought it through, but I could probably do it
brute force by iterating thru a set of patterns. That would solve 6 sigma's
worth of real world situations. I was hoping for something more elegant.

Thanks to everyone for your thoughts. I certainly learned how lemmings have
influenced computer science.

Ken
 
L

Lew

Alex said:
Although even this isn't correct, in theory... due to the finite memory
of the computer, it is possible to give it a sequence that should pass
(the brackets match), but won't pass, because the computer runs out of
memory. Ok, a little silly, but...

Not so silly, really. It points to a valuable principle of workaday
programming, nay, engineering overall. When the theory says there is no
perfect algorithm, the practice suggests to use an imperfect one. You code to
handle the realistic cases and discontinuously reject inputs that exceed the
algorithm's ability, like the memory-hog example.

This way of thinking also leads one to jump out of "What regular expression?"
into "What algorithm?"

- Lew
 
A

Alex Hunsley

Daniel said:
Hmm, I don't think your definition is correct.
Regex isn't always implementedwith a DSM, it might use NDA, which
still doesn't solve it.

What do you mean by NDA? Non-deterministic automaton?
In either case, the following is a deterministic state machine solves
the problem.

private static boolean isBalancedCount(String string) {
int parens = 0;
for (char c : string.toCharArray()) {
if (c == '(') {
++parens;
}
if (c == ')' && --parens < 0) {
return false;
}
}
return parens == 0;
}

Nope, it doesn't solve the problem. The int type in Java has finite
capacity - see Integer.MAX_VALUE - so more accurately this a finite
deterministic state machine. I'm actually trying to remember if
'deterministic state machine' usually implies the 'finite' part in
common usage. A look at wikipedia seems to suggest that, but I wasn't
looking at it for long...

So, anyway, I can think of an input that causes the above code to fail -
and hence problem not solved.

To give a simplified example of this, suppose we have a Java where
Integer.MAX_VALUE = 1, and .MIN_VALUE = -2. So an int can have 4
distinct values for int. Assuming this, would would your code save about
the following inputs?

(((( - this would be a false positive
(((((((( - another false positive
((())) - this would be a false negative


Similarly, your code above in the world of real Java would give a false
positive for the input:

'(' * [(Integer.MAX_VALUE - Integer.MIN_VALUE + 1) * n]

where n is any positive integer.

Basically, true parens matching is scuppered, because of the need for
infinite memory!

lex
 
A

Alex Hunsley

Lew said:
Not so silly, really. It points to a valuable principle of workaday
programming, nay, engineering overall. When the theory says there is no
perfect algorithm, the practice suggests to use an imperfect one. You
code to handle the realistic cases and discontinuously reject inputs
that exceed the algorithm's ability, like the memory-hog example.

This way of thinking also leads one to jump out of "What regular
expression?" into "What algorithm?"

Good points. I have also touched on this subject with a reply I just
wrote to a post from Daniel. In Comp Sci. proper, when you're talking
about bracket matching, you want a machine that solves it for *any*
input. So in that sense, it can't be done by a normal real world machine...

lex
 
A

Alex Hunsley

Mark said:
Michael is quite right. Of course, that proof is silent
on the subject of creating a regular expression that accepts
balanced strings with at most Long.MAX_VALUE open parens.

Although in theory you could actually count this many with a single long:

Long.MAX_VALUE - Long.MIN_VALUE + 1

I have a feeling this expression is equal to:

- (2 * Long.MIN_VALUE)
 
C

Chris Uppal

Alex said:
Although even this isn't correct, in theory... due to the finite memory
of the computer, it is possible to give it a sequence that should pass
(the brackets match), but won't pass, because the computer runs out of
memory. Ok, a little silly, but...

You can do it in constant space for arbitrary length inputs /if/ you are
allowed to use the input as working space too -- just replace '()' by '' until
there's nothing left.

A more elegant method would be to keep the unclosed-bracket count in unbounded
integer representation (BigInteger-style), stored in (overwriting) the portion
of the input sequence which has already been scanned.

-- chris
 
C

Chris Uppal

Alex said:
The matched parenthesis problem can be done with a non-deterministic
state machine, but not with a deterministic one.... hence no regular
expression can do it.

I would understand "non-deterministic state machine" to mean the
non-deterministic variant of finite state automaton). And the deterministic
and non-deterministic versions of FSAs are equivalent in power. Do you use the
term to mean something other form of automaton ?

-- chris
 
C

Chris Uppal

Daniel said:
private static boolean isBalancedCount(String string) {
int parens = 0;
for (char c : string.toCharArray()) {
if ((c == '(' && ++ parens < 0) || c==')' && --parens < 0)
{
return false;
}
}
return parens == 0;
}

I have rarely seen code I liked less. Well done ;-)

-- chris
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,754
Messages
2,569,521
Members
44,995
Latest member
PinupduzSap

Latest Threads

Top