Regular expression to match surrounding parenthesis

Discussion in 'Perl Misc' started by Bob, Jul 7, 2004.

  1. Bob

    Bob Guest

    Hi,

    I am trying to create a regular expression to verify that user entered
    data is surrounded by the same number of open and closed parenthesis.

    For example: if 'a' was the expression I was trying to match then a,
    (a), ((a)), (((a)))... (((((((a))))))) would all be valid.

    I am not new to regular expressions but I am also not an expert. I
    have spent hours searching for a solution but no luck.

    Is this possible and if so any help would be appreciated?

    Thanks
    Bob
    Bob, Jul 7, 2004
    #1
    1. Advertising

  2. Bob wrote:
    > I am trying to create a regular expression to verify that user
    > entered data is surrounded by the same number of open and closed
    > parenthesis.
    >
    > For example: if 'a' was the expression I was trying to match then
    > a, (a), ((a)), (((a)))... (((((((a))))))) would all be valid.


    What about nesting?

    The CPAN module Text::Balanced might be helpful.

    --
    Gunnar Hjalmarsson
    Email: http://www.gunnar.cc/cgi-bin/contact.pl
    Gunnar Hjalmarsson, Jul 7, 2004
    #2
    1. Advertising

  3. (Bob) wrote:

    > I am trying to create a regular expression to verify that user entered
    > data is surrounded by the same number of open and closed parenthesis.
    >
    > For example: if 'a' was the expression I was trying to match then a,
    > (a), ((a)), (((a)))... (((((((a))))))) would all be valid.
    >
    > I am not new to regular expressions but I am also not an expert. I
    > have spent hours searching for a solution but no luck.


    In stead of verifying directly that the input is correct, it it
    probably simpler to remove everything you know is correct and see
    if there is anything left:

    1 while $input =~ s/\([^()]*\)//g;
    print $input =~ /[()]/ ? "bad" : "ok";

    The first line removes all matching, possibly nested, parantheses
    and the second line checks to see if there are any parentheses
    left.

    Peter

    --
    #!/local/bin/perl5 -wp -*- mode: cperl; coding: iso-8859-1; -*-
    # matlab comment stripper (strips comments from Matlab m-files)
    s/^((?:(?:[])}\w.]'+|[^'%])+|'[^'\n]*(?:''[^'\n]*)*')*).*/$1/x;
    Peter J. Acklam, Jul 7, 2004
    #3
  4. Abigail <> wrote:

    > Peter J. Acklam () wrote:
    > ``
    > `` In stead of verifying directly that the input is correct, it it
    > `` probably simpler to remove everything you know is correct and see
    > `` if there is anything left:
    > ``
    > `` 1 while $input =~ s/\([^()]*\)//g;
    > `` print $input =~ /[()]/ ? "bad" : "ok";
    > ``
    > `` The first line removes all matching, possibly nested, parantheses
    > `` and the second line checks to see if there are any parentheses
    > `` left.
    >
    > The algorithm may be 'simpler' by some standard, it's not
    > efficient. Worst case, the algorithm takes time quadratic in
    > the length of the input.


    True, but the context here was validating user input, suggesting
    interactive use, which in turn means that this is probably going
    to be run relatively rarely and not thousands of time inside a
    loop.

    Of course, a better regex or a module could be used, or...

    sub isvalid {
    local $_ = shift;
    my $level = 0;
    while (length) {
    s/^[^()]+//; # non-parantheses
    if (s/^\(//) { # opening parenthesis
    ++$level;
    next;
    }
    if (s/^\)//) { # closing parenthesis
    return '' if --$level < 0;
    next;
    }
    }
    return $level == 0;
    }

    Peter

    --
    #!/local/bin/perl5 -wp -*- mode: cperl; coding: iso-8859-1; -*-
    # matlab comment stripper (strips comments from Matlab m-files)
    s/^((?:(?:[])}\w.]'+|[^'%])+|'[^'\n]*(?:''[^'\n]*)*')*).*/$1/x;
    Peter J. Acklam, Jul 8, 2004
    #4
  5. Bob wrote:
    > Hi,
    >
    > I am trying to create a regular expression to verify that user entered
    > data is surrounded by the same number of open and closed parenthesis.
    >
    > For example: if 'a' was the expression I was trying to match then a,
    > (a), ((a)), (((a)))... (((((((a))))))) would all be valid.
    >
    > I am not new to regular expressions but I am also not an expert. I
    > have spent hours searching for a solution but no luck.
    >
    > Is this possible and if so any help would be appreciated?


    In a nutshell: no, this is impossible. (<- that is a definite period!)

    Regular expressions cannot parse this kind of things since finite state
    automata, which implement the matching of strings to regular
    expressions, cannot count and have no storage (that's why they are
    called "finite state") and you need to count the number of opening
    parens before you can validate that there are as many closing parens.

    This is from the theory of automata. Practical implementations may allow
    this but the question was whether regular expressions can do this.
    --
    Josef Möllers (Pinguinpfleger bei FSC)
    If failure had no penalty success would not be a prize
    -- T. Pratchett
    Josef Moellers, Jul 8, 2004
    #5
  6. Bob

    Anno Siegel Guest

    Peter J. Acklam <> wrote in comp.lang.perl.misc:

    [checking paren balance]

    > Of course, a better regex or a module could be used, or...
    >
    > sub isvalid {
    > local $_ = shift;
    > my $level = 0;
    > while (length) {
    > s/^[^()]+//; # non-parantheses
    > if (s/^\(//) { # opening parenthesis
    > ++$level;
    > next;
    > }
    > if (s/^\)//) { # closing parenthesis
    > return '' if --$level < 0;
    > next;
    > }
    > }
    > return $level == 0;
    > }


    Instead of consuming the string bit by bit you could extract just the
    parens and loop over them. That simplifies things a bit:

    sub isvalid {
    my $level = 0;
    ( $level += $_ eq '(' ? 1 : -1 ) >= 0 or return !!0 for
    shift =~ /([()])/g;
    $level == 0;
    }

    Anno
    Anno Siegel, Jul 8, 2004
    #6
  7. Bob

    Anno Siegel Guest

    Abigail <> wrote in comp.lang.perl.misc:
    > Josef Moellers () wrote on MMMCMLXIV
    > September MCMXCIII in <URL:news:ccipqq$2gp$-siemens.com>:
    > -- Bob wrote:


    > -- > I am trying to create a regular expression to verify that user entered
    > -- > data is surrounded by the same number of open and closed parenthesis.


    [...]

    > -- In a nutshell: no, this is impossible. (<- that is a definite period!)
    > --
    > -- Regular expressions cannot parse this kind of things since finite state
    > -- automata, which implement the matching of strings to regular
    > -- expressions, cannot count and have no storage (that's why they are
    > -- called "finite state") and you need to count the number of opening
    > -- parens before you can validate that there are as many closing parens.
    > --
    > -- This is from the theory of automata. Practical implementations may allow
    > -- this but the question was whether regular expressions can do this.
    >
    >
    > It all depends on your definition of 'regular expressions'. In the context
    > of Perl, a 'regular expression' is something that can be grokked by Perls
    > regular expression engine. And with those regular expressions, you *can*
    > matched strings with balanced parenthesis.
    >
    > It would be very inpractical if discussion on the forum would refer to
    > "expressions that can be grokked by Perl's regular expression engine"
    > instead of what's understood by everyone but a few pedants, "regular
    > expressions".


    The mathematicians who work with regular expressions are just a club of
    pedants?

    "Regular expression" has two significantly different definitions in
    mathematics and in CS. Of course, the CS meaning is the primary one
    on clpm, but pointing out the difference is not merely pedantry.

    Anno
    Anno Siegel, Jul 8, 2004
    #7
  8. Bob

    Sam Holden Guest

    On 8 Jul 2004 10:13:02 GMT,
    Anno Siegel <-berlin.de> wrote:
    > Abigail <> wrote in comp.lang.perl.misc:
    >> Josef Moellers () wrote on MMMCMLXIV
    >> September MCMXCIII in <URL:news:ccipqq$2gp$-siemens.com>:
    >> -- Bob wrote:

    >
    >> -- > I am trying to create a regular expression to verify that user entered
    >> -- > data is surrounded by the same number of open and closed parenthesis.

    >
    > [...]
    >
    >> -- In a nutshell: no, this is impossible. (<- that is a definite period!)

    [snip FSA explanation]
    >> -- This is from the theory of automata. Practical implementations may allow
    >> -- this but the question was whether regular expressions can do this.
    >>
    >>
    >> It all depends on your definition of 'regular expressions'. In the context
    >> of Perl, a 'regular expression' is something that can be grokked by Perls
    >> regular expression engine. And with those regular expressions, you *can*
    >> matched strings with balanced parenthesis.
    >>
    >> It would be very inpractical if discussion on the forum would refer to
    >> "expressions that can be grokked by Perl's regular expression engine"
    >> instead of what's understood by everyone but a few pedants, "regular
    >> expressions".

    >
    > The mathematicians who work with regular expressions are just a club of
    > pedants?


    I'm sure those mathematicians who happen to read clpm know the
    difference between the two regular expression definitions and never
    think for even a moment that a post on clpm that mentions "regular
    expression" means anything but "expressions that can be grokked by
    Perl's regular expression engine", hence they aren't in the "few
    pedants" set above.

    >
    > "Regular expression" has two significantly different definitions in
    > mathematics and in CS. Of course, the CS meaning is the primary one
    > on clpm, but pointing out the difference is not merely pedantry.


    But answering "No" to can a regular expression match X, where X can be
    done by perl's non-regular regular expressions but not by regular
    regular expressions seems pedantic to me.

    Sure, it's fun to show off that you know all about FSAs and PDAs and TMs
    but since this is clpm they are all pretty much irrelevant to the
    question. Hence the showing off should be accompanied by an answer in
    perl, after all someone else will know an answer and can do the showing
    themselves.

    The OP clearly didn't mean "can a mathematical regular expression do
    this?", they meant "can a perl regular expression do this?", hence the
    answer of "No" is deceptive. I seriously doubt anyone would ask in clpm
    about the properties of a real honest to goodness regular expressions.

    In fact I have seen people criticised for asking about less powerful
    regular expressions here, asking for a regular expression to match X and
    then saying that the answer won't work because grep doesn't understand
    all that. Which seems to be the other side of the coin.

    --
    Sam Holden
    Sam Holden, Jul 8, 2004
    #8
  9. [A complimentary Cc of this posting was sent to
    Anno Siegel
    <-berlin.de>], who wrote in article <ccj6ne$f4b$-Berlin.DE>:
    > The mathematicians who work with regular expressions are just a club of
    > pedants?


    Can't answer this question; never saw a mathematician who works with
    regular expressions. 1/5 ;-)

    But in general, [with a few exceptions] mathematicians do not mind the
    same word having different meanings in different contexts. Too few
    words, too many things to work over....

    [For an extreme example, 'less than' may have a different meaning if
    written by a French.]

    Hope this helps,
    Ilya
    Ilya Zakharevich, Jul 9, 2004
    #9
  10. Bob

    Anno Siegel Guest

    Ilya Zakharevich <> wrote in comp.lang.perl.misc:
    > [A complimentary Cc of this posting was sent to
    > Anno Siegel
    > <-berlin.de>], who wrote in article
    > <ccj6ne$f4b$-Berlin.DE>:
    > > The mathematicians who work with regular expressions are just a club of
    > > pedants?

    >
    > Can't answer this question; never saw a mathematician who works with
    > regular expressions. 1/5 ;-)


    Okay, but some work with grammars and like to distinguish regular grammars,
    which are defined using regular expressions.

    This is, BTW, the reason why the mathematical notion of regular expressions
    was never amended with backreferences, the way computer implementations are.
    In the theory of grammars and languages, it is the distinguishing property
    of regularity that some constructs (like nested parentheses, but even
    simpler ones) cannot be parsed. Extending the capabilities of regular
    expressions would spoil the game.

    > But in general, [with a few exceptions] mathematicians do not mind the
    > same word having different meanings in different contexts. Too few
    > words, too many things to work over....


    Not at all. Mathematics has practiced operator overloading long before
    it was named thus.

    Nor do I have a problem with "regular expression" meaning two different
    things in different disciplines. But the case is not quite like "index"
    meaning two essentially unrelated things to a mathematician and a librarian.
    The computer notion of "regex" was derived from the earlier mathematical
    model and is essentially (down to the notation) the same thing.

    Being practitioners, computer people soon invented shortcuts and extended
    notations in their hackish ways. Many of these (like character classes,
    or {m,n} quantifiers) are inessential and don't change the power of
    what a regex can do, only the ease of expressing it. The introduction
    of backreferences *does* change the expressive power of regexes, in a
    way that was, and still is, deliberately excluded from the mathematical
    definition. That can lead to contradictory statements about "regular
    expressions", which, depending on who is speaking, are both correct.

    This particular relationship deserves an explanation when it comes up.

    Anno
    Anno Siegel, Jul 9, 2004
    #10
  11. Ilya Zakharevich <> wrote in
    news:ccml0r$1bqb$:

    > [A complimentary Cc of this posting was sent to


    I think you mean 'complementary' rather than 'complimentary'.

    Just a friendly heads-up from one nonnative speaker to another :)

    --
    A. Sinan Unur
    (reverse each component for email address)
    A. Sinan Unur, Jul 10, 2004
    #11
  12. Bob

    David Combs Guest

    In article <ccml0r$1bqb$>,
    Ilya Zakharevich <> wrote:
    >...


    >[For an extreme example, 'less than' may have a different meaning if
    >written by a French.]


    Please do say more about this.

    Thanks,

    David
    David Combs, Jul 13, 2004
    #12
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. VSK
    Replies:
    2
    Views:
    2,272
  2. puzzlecracker
    Replies:
    0
    Views:
    438
    puzzlecracker
    Jan 25, 2006
  3. Replies:
    3
    Views:
    475
  4. Daniel Fac
    Replies:
    3
    Views:
    134
    Daniel Fac
    Nov 2, 2008
  5. Carl Cunningham

    Regular expressions and parenthesis in match text

    Carl Cunningham, Sep 15, 2003, in forum: Perl Misc
    Replies:
    2
    Views:
    107
    Carl Cunningham
    Sep 15, 2003
Loading...

Share This Page