Regular expression to match surrounding parenthesis

B

Bob

Hi,

I am trying to create a regular expression to verify that user entered
data is surrounded by the same number of open and closed parenthesis.

For example: if 'a' was the expression I was trying to match then a,
(a), ((a)), (((a)))... (((((((a))))))) would all be valid.

I am not new to regular expressions but I am also not an expert. I
have spent hours searching for a solution but no luck.

Is this possible and if so any help would be appreciated?

Thanks
Bob
 
G

Gunnar Hjalmarsson

Bob said:
I am trying to create a regular expression to verify that user
entered data is surrounded by the same number of open and closed
parenthesis.

For example: if 'a' was the expression I was trying to match then
a, (a), ((a)), (((a)))... (((((((a))))))) would all be valid.

What about nesting?

The CPAN module Text::Balanced might be helpful.
 
P

Peter J. Acklam

I am trying to create a regular expression to verify that user entered
data is surrounded by the same number of open and closed parenthesis.

For example: if 'a' was the expression I was trying to match then a,
(a), ((a)), (((a)))... (((((((a))))))) would all be valid.

I am not new to regular expressions but I am also not an expert. I
have spent hours searching for a solution but no luck.

In stead of verifying directly that the input is correct, it it
probably simpler to remove everything you know is correct and see
if there is anything left:

1 while $input =~ s/\([^()]*\)//g;
print $input =~ /[()]/ ? "bad" : "ok";

The first line removes all matching, possibly nested, parantheses
and the second line checks to see if there are any parentheses
left.

Peter
 
P

Peter J. Acklam

Abigail said:
Peter J. Acklam ([email protected]) wrote:
``
`` In stead of verifying directly that the input is correct, it it
`` probably simpler to remove everything you know is correct and see
`` if there is anything left:
``
`` 1 while $input =~ s/\([^()]*\)//g;
`` print $input =~ /[()]/ ? "bad" : "ok";
``
`` The first line removes all matching, possibly nested, parantheses
`` and the second line checks to see if there are any parentheses
`` left.

The algorithm may be 'simpler' by some standard, it's not
efficient. Worst case, the algorithm takes time quadratic in
the length of the input.

True, but the context here was validating user input, suggesting
interactive use, which in turn means that this is probably going
to be run relatively rarely and not thousands of time inside a
loop.

Of course, a better regex or a module could be used, or...

sub isvalid {
local $_ = shift;
my $level = 0;
while (length) {
s/^[^()]+//; # non-parantheses
if (s/^\(//) { # opening parenthesis
++$level;
next;
}
if (s/^\)//) { # closing parenthesis
return '' if --$level < 0;
next;
}
}
return $level == 0;
}

Peter
 
J

Josef Moellers

Bob said:
Hi,

I am trying to create a regular expression to verify that user entered
data is surrounded by the same number of open and closed parenthesis.

For example: if 'a' was the expression I was trying to match then a,
(a), ((a)), (((a)))... (((((((a))))))) would all be valid.

I am not new to regular expressions but I am also not an expert. I
have spent hours searching for a solution but no luck.

Is this possible and if so any help would be appreciated?

In a nutshell: no, this is impossible. (<- that is a definite period!)

Regular expressions cannot parse this kind of things since finite state
automata, which implement the matching of strings to regular
expressions, cannot count and have no storage (that's why they are
called "finite state") and you need to count the number of opening
parens before you can validate that there are as many closing parens.

This is from the theory of automata. Practical implementations may allow
this but the question was whether regular expressions can do this.
 
A

Anno Siegel

[checking paren balance]
Of course, a better regex or a module could be used, or...

sub isvalid {
local $_ = shift;
my $level = 0;
while (length) {
s/^[^()]+//; # non-parantheses
if (s/^\(//) { # opening parenthesis
++$level;
next;
}
if (s/^\)//) { # closing parenthesis
return '' if --$level < 0;
next;
}
}
return $level == 0;
}

Instead of consuming the string bit by bit you could extract just the
parens and loop over them. That simplifies things a bit:

sub isvalid {
my $level = 0;
( $level += $_ eq '(' ? 1 : -1 ) >= 0 or return !!0 for
shift =~ /([()])/g;
$level == 0;
}

Anno
 
A

Anno Siegel

Abigail said:
Josef Moellers ([email protected]) wrote on MMMCMLXIV
September MCMXCIII in <URL:-- Bob wrote:
-- > I am trying to create a regular expression to verify that user entered
-- > data is surrounded by the same number of open and closed parenthesis.
[...]

-- In a nutshell: no, this is impossible. (<- that is a definite period!)
--
-- Regular expressions cannot parse this kind of things since finite state
-- automata, which implement the matching of strings to regular
-- expressions, cannot count and have no storage (that's why they are
-- called "finite state") and you need to count the number of opening
-- parens before you can validate that there are as many closing parens.
--
-- This is from the theory of automata. Practical implementations may allow
-- this but the question was whether regular expressions can do this.


It all depends on your definition of 'regular expressions'. In the context
of Perl, a 'regular expression' is something that can be grokked by Perls
regular expression engine. And with those regular expressions, you *can*
matched strings with balanced parenthesis.

It would be very inpractical if discussion on the forum would refer to
"expressions that can be grokked by Perl's regular expression engine"
instead of what's understood by everyone but a few pedants, "regular
expressions".

The mathematicians who work with regular expressions are just a club of
pedants?

"Regular expression" has two significantly different definitions in
mathematics and in CS. Of course, the CS meaning is the primary one
on clpm, but pointing out the difference is not merely pedantry.

Anno
 
S

Sam Holden

Abigail said:
Josef Moellers ([email protected]) wrote on MMMCMLXIV
September MCMXCIII in <URL:-- Bob wrote:
-- > I am trying to create a regular expression to verify that user entered
-- > data is surrounded by the same number of open and closed parenthesis.
[...]

-- In a nutshell: no, this is impossible. (<- that is a definite period!) [snip FSA explanation]
-- This is from the theory of automata. Practical implementations may allow
-- this but the question was whether regular expressions can do this.


It all depends on your definition of 'regular expressions'. In the context
of Perl, a 'regular expression' is something that can be grokked by Perls
regular expression engine. And with those regular expressions, you *can*
matched strings with balanced parenthesis.

It would be very inpractical if discussion on the forum would refer to
"expressions that can be grokked by Perl's regular expression engine"
instead of what's understood by everyone but a few pedants, "regular
expressions".

The mathematicians who work with regular expressions are just a club of
pedants?

I'm sure those mathematicians who happen to read clpm know the
difference between the two regular expression definitions and never
think for even a moment that a post on clpm that mentions "regular
expression" means anything but "expressions that can be grokked by
Perl's regular expression engine", hence they aren't in the "few
pedants" set above.
"Regular expression" has two significantly different definitions in
mathematics and in CS. Of course, the CS meaning is the primary one
on clpm, but pointing out the difference is not merely pedantry.

But answering "No" to can a regular expression match X, where X can be
done by perl's non-regular regular expressions but not by regular
regular expressions seems pedantic to me.

Sure, it's fun to show off that you know all about FSAs and PDAs and TMs
but since this is clpm they are all pretty much irrelevant to the
question. Hence the showing off should be accompanied by an answer in
perl, after all someone else will know an answer and can do the showing
themselves.

The OP clearly didn't mean "can a mathematical regular expression do
this?", they meant "can a perl regular expression do this?", hence the
answer of "No" is deceptive. I seriously doubt anyone would ask in clpm
about the properties of a real honest to goodness regular expressions.

In fact I have seen people criticised for asking about less powerful
regular expressions here, asking for a regular expression to match X and
then saying that the answer won't work because grep doesn't understand
all that. Which seems to be the other side of the coin.
 
I

Ilya Zakharevich

[A complimentary Cc of this posting was sent to
Anno Siegel
The mathematicians who work with regular expressions are just a club of
pedants?

Can't answer this question; never saw a mathematician who works with
regular expressions. 1/5 ;-)

But in general, [with a few exceptions] mathematicians do not mind the
same word having different meanings in different contexts. Too few
words, too many things to work over....

[For an extreme example, 'less than' may have a different meaning if
written by a French.]

Hope this helps,
Ilya
 
A

Anno Siegel

Ilya Zakharevich said:
[A complimentary Cc of this posting was sent to
Anno Siegel
The mathematicians who work with regular expressions are just a club of
pedants?

Can't answer this question; never saw a mathematician who works with
regular expressions. 1/5 ;-)

Okay, but some work with grammars and like to distinguish regular grammars,
which are defined using regular expressions.

This is, BTW, the reason why the mathematical notion of regular expressions
was never amended with backreferences, the way computer implementations are.
In the theory of grammars and languages, it is the distinguishing property
of regularity that some constructs (like nested parentheses, but even
simpler ones) cannot be parsed. Extending the capabilities of regular
expressions would spoil the game.
But in general, [with a few exceptions] mathematicians do not mind the
same word having different meanings in different contexts. Too few
words, too many things to work over....

Not at all. Mathematics has practiced operator overloading long before
it was named thus.

Nor do I have a problem with "regular expression" meaning two different
things in different disciplines. But the case is not quite like "index"
meaning two essentially unrelated things to a mathematician and a librarian.
The computer notion of "regex" was derived from the earlier mathematical
model and is essentially (down to the notation) the same thing.

Being practitioners, computer people soon invented shortcuts and extended
notations in their hackish ways. Many of these (like character classes,
or {m,n} quantifiers) are inessential and don't change the power of
what a regex can do, only the ease of expressing it. The introduction
of backreferences *does* change the expressive power of regexes, in a
way that was, and still is, deliberately excluded from the mathematical
definition. That can lead to contradictory statements about "regular
expressions", which, depending on who is speaking, are both correct.

This particular relationship deserves an explanation when it comes up.

Anno
 
A

A. Sinan Unur

[A complimentary Cc of this posting was sent to

I think you mean 'complementary' rather than 'complimentary'.

Just a friendly heads-up from one nonnative speaker to another :)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top