Parens Matching

Discussion in 'Java' started by Ken Kast, Feb 16, 2007.

  1. Ken Kast

    Ken Kast Guest

    Does anyone have a regular expression pattern that would include a test for
    balanced parens of arbitrary nestedness?
    Ken Kast, Feb 16, 2007
    #1
    1. Advertising

  2. Ken Kast

    IchBin Guest

    Ken Kast wrote:
    > Does anyone have a regular expression pattern that would include a test for
    > balanced parens of arbitrary nestedness?


    Is arbitrary 'nestedness' some new type of epistemological terminology?

    Sorry, just could not stop myself.

    --
    Thanks in Advance... http://weconsultants.prophp.org
    IchBin, Pocono Lake, Pa, USA http://ichbinquotations.awardspace.com
    ______________________________________________________________________
    'If there is one, Knowledge is the "Fountain of Youth"'
    -William E. Taylor, Regular Guy (1952-)
    IchBin, Feb 16, 2007
    #2
    1. Advertising

  3. Ken Kast

    Ken Kast Guest

    Nope. Something I made up in order to make the message not so dry.

    "IchBin" <> wrote in message
    news:...
    > Ken Kast wrote:
    >> Does anyone have a regular expression pattern that would include a test
    >> for balanced parens of arbitrary nestedness?

    >
    > Is arbitrary 'nestedness' some new type of epistemological terminology?
    >
    > Sorry, just could not stop myself.
    >
    > --
    > Thanks in Advance... http://weconsultants.prophp.org
    > IchBin, Pocono Lake, Pa, USA http://ichbinquotations.awardspace.com
    > ______________________________________________________________________
    > 'If there is one, Knowledge is the "Fountain of Youth"'
    > -William E. Taylor, Regular Guy (1952-)
    Ken Kast, Feb 16, 2007
    #3
  4. Ken Kast

    Michael Guest

    On Feb 15, 7:34 pm, "Ken Kast" <> wrote:
    > Does anyone have a regular expression pattern that would include a test for
    > balanced parens of arbitrary nestedness?


    No. No one does.

    There's a well known proof in Computer Science theory that it is not
    possible to create a regular expression for balanced parentheses.

    Look up "the pumping lemma" for details.

    Michael
    Michael, Feb 16, 2007
    #4
  5. "Michael" <> writes:

    > On Feb 15, 7:34 pm, "Ken Kast" <> wrote:
    >> Does anyone have a regular expression pattern that would include a test for
    >> balanced parens of arbitrary nestedness?

    >
    > No. No one does.
    >
    > There's a well known proof in Computer Science theory that it is not
    > possible to create a regular expression for balanced parentheses.
    >


    Michael is quite right. Of course, that proof is silent
    on the subject of creating a regular expression that accepts
    balanced strings with at most Long.MAX_VALUE open parens.
    (And really, most texts will have fewer than half that many.)

    While you're waiting for the program you wrote to write that
    regular expression to finish, you can pass the time with a
    counter that increments on '(', and decrements on ')'.

    --
    Mark Jeffcoat
    Austin, TX
    Mark Jeffcoat, Feb 16, 2007
    #5
  6. Ken Kast

    Tim Slattery Guest

    "Michael" <> wrote:

    >On Feb 15, 7:34 pm, "Ken Kast" <> wrote:
    >> Does anyone have a regular expression pattern that would include a test for
    >> balanced parens of arbitrary nestedness?

    >
    >No. No one does.
    >
    >There's a well known proof in Computer Science theory that it is not
    >possible to create a regular expression for balanced parentheses.


    OTOH, it would be pretty simple to write a loop that cycles through a
    string and keeps track of left and right parens. Add 1 to the counter
    when you find a left paren, subtract one for a right paren. If the
    counter ever goes negative, throw an error. If it's not zero at the
    end of the string, throw an error.

    --
    Tim Slattery

    http://members.cox.net/slatteryt
    Tim Slattery, Feb 16, 2007
    #6
  7. Ken Kast

    Michael Guest

    > >There's a well known proof in Computer Science theory that it is not
    > >possible to create a regular expression for balanced parentheses.

    >
    > OTOH, it would be pretty simple to write a loop that cycles through a
    > string and keeps track of left and right parens. Add 1 to the counter
    > when you find a left paren, subtract one for a right paren. If the
    > counter ever goes negative, throw an error. If it's not zero at the
    > end of the string, throw an error.


    I guess I should have spelled it out further - you can't match
    balanced parentheses *with a regular expression.* Of course, there
    are other ways to do it. Either write the kind or program above, or
    write a recursive decent parser, (or use a regular expression that
    only matches up to N balanced parentheses) etc. Basically, balanced
    parentheses aren't a regular grammar, they're a context free grammar,
    so you need a more powerful tool. In particular, they're a grammar
    somthing like this:
    Expression -> nothing | ( Expression )

    Sorry if I caused more confusion that I removed. It was late.

    Michael
    Michael, Feb 16, 2007
    #7
  8. Ken Kast

    Daniel Pitts Guest

    On Feb 15, 7:34 pm, "Ken Kast" <> wrote:
    > Does anyone have a regular expression pattern that would include a test for
    > balanced parens of arbitrary nestedness?


    Two ways:

    public class NestedParens {
    public static void main(String[] args) {
    String[] strings = {
    "Test case 1",
    "(Test case 2)",
    "((Test) case ) 3",
    ")test case 4 (",
    };
    for (String string: strings) {
    System.out.println(string +
    " is " + (isBalancedRe(string)
    ?
    "Balanced" : "Mis-matched"));
    }

    for (String string: strings) {
    System.out.println(string +
    " is " + (isBalancedCount(string)
    ?
    "Balanced" : "Mis-matched"));
    }
    }

    public static boolean isBalancedRe(String string) {
    String last;
    do {
    last = string;
    string = string.replaceAll("\\([^()]*\\)", "");
    } while (!last.equals(string));
    return !string.matches(".*[()].*");
    }

    private static boolean isBalancedCount(String string) {
    int parens = 0;
    for (char c : string.toCharArray()) {
    switch (c) {
    case ')':
    if (--parens < 0) {
    return false;
    }
    break;
    case '(':
    ++parens;
    }
    }
    return parens == 0;
    }
    }
    Daniel Pitts, Feb 17, 2007
    #8
  9. Ken Kast

    Alex Hunsley Guest

    Michael wrote:
    > On Feb 15, 7:34 pm, "Ken Kast" <> wrote:
    >> Does anyone have a regular expression pattern that would include a test for
    >> balanced parens of arbitrary nestedness?

    >
    > No. No one does.
    >
    > There's a well known proof in Computer Science theory that it is not
    > possible to create a regular expression for balanced parentheses.
    >
    > Look up "the pumping lemma" for details.


    That thang always made me think of either pumping lemons or pumping
    lemmings.

    The matched parenthesis problem can be done with a non-deterministic
    state machine, but not with a deterministic one.... hence no regular
    expression can do it.
    Alex Hunsley, Feb 17, 2007
    #9
  10. Ken Kast

    Alex Hunsley Guest

    Tim Slattery wrote:
    > "Michael" <> wrote:
    >
    >> On Feb 15, 7:34 pm, "Ken Kast" <> wrote:
    >>> Does anyone have a regular expression pattern that would include a test for
    >>> balanced parens of arbitrary nestedness?

    >> No. No one does.
    >>
    >> There's a well known proof in Computer Science theory that it is not
    >> possible to create a regular expression for balanced parentheses.

    >
    > OTOH, it would be pretty simple to write a loop that cycles through a
    > string and keeps track of left and right parens. Add 1 to the counter
    > when you find a left paren, subtract one for a right paren. If the
    > counter ever goes negative, throw an error. If it's not zero at the
    > end of the string, throw an error.


    Although even this isn't correct, in theory... due to the finite memory
    of the computer, it is possible to give it a sequence that should pass
    (the brackets match), but won't pass, because the computer runs out of
    memory. Ok, a little silly, but...


    >
    > --
    > Tim Slattery
    >
    > http://members.cox.net/slatteryt
    Alex Hunsley, Feb 17, 2007
    #10
  11. Ken Kast

    Daniel Pitts Guest

    On Feb 16, 5:14 pm, "Daniel Pitts" <>
    wrote:
    > On Feb 15, 7:34 pm, "Ken Kast" <> wrote:
    >
    > > Does anyone have a regular expression pattern that would include a test for
    > > balanced parens of arbitrary nestedness?

    >
    > Two ways:
    >
    > public class NestedParens {
    > public static void main(String[] args) {
    > String[] strings = {
    > "Test case 1",
    > "(Test case 2)",
    > "((Test) case ) 3",
    > ")test case 4 (",
    > };
    > for (String string: strings) {
    > System.out.println(string +
    > " is " + (isBalancedRe(string)
    > ?
    > "Balanced" : "Mis-matched"));
    > }
    >
    > for (String string: strings) {
    > System.out.println(string +
    > " is " + (isBalancedCount(string)
    > ?
    > "Balanced" : "Mis-matched"));
    > }
    > }
    >
    > public static boolean isBalancedRe(String string) {
    > String last;
    > do {
    > last = string;
    > string = string.replaceAll("\\([^()]*\\)", "");
    > } while (!last.equals(string));
    > return !string.matches(".*[()].*");
    > }
    >
    > private static boolean isBalancedCount(String string) {
    > int parens = 0;
    > for (char c : string.toCharArray()) {
    > switch (c) {
    > case ')':
    > if (--parens < 0) {
    > return false;
    > }
    > break;
    > case '(':
    > ++parens;
    > }
    > }
    > return parens == 0;
    > }
    >
    > }



    Or, a little more concisely:
    public static boolean isBalancedRe(String string) {
    while (string.matches(".*\\([^()]*\\).*")) {
    string = string.replaceAll("\\([^()]*\\)", "");
    }
    return !string.matches(".*[()].*");
    }

    private static boolean isBalancedCount(String string) {
    int parens = 0;
    for (char c : string.toCharArray()) {
    if (c == '(') {
    ++parens;
    }
    if (c == ')' && --parens < 0) {
    return false;
    }
    }
    return parens == 0;
    }
    ....

    Or, to get down right esoteric:
    private static boolean isBalancedCount(String string) {
    int parens = 0;
    for (char c : string.toCharArray()) {
    if ((c == '(' && ++ parens < 0) || c==')' && --parens < 0)
    {
    return false;
    }
    }
    return parens == 0;
    }

    CafeBabe points to those who know why that works the way it does.
    Daniel Pitts, Feb 17, 2007
    #11
  12. Ken Kast

    Daniel Pitts Guest

    On Feb 16, 5:26 pm, Alex Hunsley <> wrote:
    > Michael wrote:
    > > On Feb 15, 7:34 pm, "Ken Kast" <> wrote:
    > >> Does anyone have a regular expression pattern that would include a test for
    > >> balanced parens of arbitrary nestedness?

    >
    > > No. No one does.

    >
    > > There's a well known proof in Computer Science theory that it is not
    > > possible to create a regular expression for balanced parentheses.

    >
    > > Look up "the pumping lemma" for details.

    >
    > That thang always made me think of either pumping lemons or pumping
    > lemmings.
    >
    > The matched parenthesis problem can be done with a non-deterministic
    > state machine, but not with a deterministic one.... hence no regular
    > expression can do it.


    Hmm, I don't think your definition is correct.
    Regex isn't always implementedwith a DSM, it might use NDA, which
    still doesn't solve it.

    In either case, the following is a deterministic state machine solves
    the problem.

    private static boolean isBalancedCount(String string) {
    int parens = 0;
    for (char c : string.toCharArray()) {
    if (c == '(') {
    ++parens;
    }
    if (c == ')' && --parens < 0) {
    return false;
    }
    }
    return parens == 0;
    }
    Daniel Pitts, Feb 17, 2007
    #12
  13. Ken Kast

    Ken Kast Guest

    Absolutely never thought I would generate so much conversation. I need to
    sit down with a glass of wine and digest this.

    I probably should have been more transparent in my original post. I'm not
    trying to test a string for balanced parens. I need it to be a match
    termination condition. In other words, I have a fairly complex pattern. So
    it tells when a match is done. I need to say the match is done if the
    character pattern is played out or the match will play out before I balance
    parens.

    I first wrote this in C#. .Net has extended RegExp that lets you swap named
    groups on condition. This effectively lets you model a stack. So you push
    left parens on the stack and pop them with right ones. Now I want to
    implement the same app in Java. I can certainly assume a low level nesting,
    on the order of 3 or 4, so running out of memory or running out of time
    aren't concerns. I haven't thought it through, but I could probably do it
    brute force by iterating thru a set of patterns. That would solve 6 sigma's
    worth of real world situations. I was hoping for something more elegant.

    Thanks to everyone for your thoughts. I certainly learned how lemmings have
    influenced computer science.

    Ken


    "Ken Kast" <> wrote in message
    news:45d525e7$0$28082$...
    > Does anyone have a regular expression pattern that would include a test
    > for balanced parens of arbitrary nestedness?
    >
    Ken Kast, Feb 17, 2007
    #13
  14. Ken Kast

    Lew Guest

    Alex Hunsley wrote:
    > Although even this isn't correct, in theory... due to the finite memory
    > of the computer, it is possible to give it a sequence that should pass
    > (the brackets match), but won't pass, because the computer runs out of
    > memory. Ok, a little silly, but...


    Not so silly, really. It points to a valuable principle of workaday
    programming, nay, engineering overall. When the theory says there is no
    perfect algorithm, the practice suggests to use an imperfect one. You code to
    handle the realistic cases and discontinuously reject inputs that exceed the
    algorithm's ability, like the memory-hog example.

    This way of thinking also leads one to jump out of "What regular expression?"
    into "What algorithm?"

    - Lew
    Lew, Feb 17, 2007
    #14
  15. Ken Kast

    Alex Hunsley Guest

    Daniel Pitts wrote:
    > On Feb 16, 5:26 pm, Alex Hunsley <> wrote:
    >> Michael wrote:
    >>> On Feb 15, 7:34 pm, "Ken Kast" <> wrote:
    >>>> Does anyone have a regular expression pattern that would include a test for
    >>>> balanced parens of arbitrary nestedness?
    >>> No. No one does.
    >>> There's a well known proof in Computer Science theory that it is not
    >>> possible to create a regular expression for balanced parentheses.
    >>> Look up "the pumping lemma" for details.

    >> That thang always made me think of either pumping lemons or pumping
    >> lemmings.
    >>
    >> The matched parenthesis problem can be done with a non-deterministic
    >> state machine, but not with a deterministic one.... hence no regular
    >> expression can do it.

    >
    > Hmm, I don't think your definition is correct.
    > Regex isn't always implementedwith a DSM, it might use NDA, which
    > still doesn't solve it.


    What do you mean by NDA? Non-deterministic automaton?

    >
    > In either case, the following is a deterministic state machine solves
    > the problem.
    >
    > private static boolean isBalancedCount(String string) {
    > int parens = 0;
    > for (char c : string.toCharArray()) {
    > if (c == '(') {
    > ++parens;
    > }
    > if (c == ')' && --parens < 0) {
    > return false;
    > }
    > }
    > return parens == 0;
    > }


    Nope, it doesn't solve the problem. The int type in Java has finite
    capacity - see Integer.MAX_VALUE - so more accurately this a finite
    deterministic state machine. I'm actually trying to remember if
    'deterministic state machine' usually implies the 'finite' part in
    common usage. A look at wikipedia seems to suggest that, but I wasn't
    looking at it for long...

    So, anyway, I can think of an input that causes the above code to fail -
    and hence problem not solved.

    To give a simplified example of this, suppose we have a Java where
    Integer.MAX_VALUE = 1, and .MIN_VALUE = -2. So an int can have 4
    distinct values for int. Assuming this, would would your code save about
    the following inputs?

    (((( - this would be a false positive
    (((((((( - another false positive
    ((())) - this would be a false negative


    Similarly, your code above in the world of real Java would give a false
    positive for the input:

    '(' * [(Integer.MAX_VALUE - Integer.MIN_VALUE + 1) * n]

    where n is any positive integer.

    Basically, true parens matching is scuppered, because of the need for
    infinite memory!

    lex
    Alex Hunsley, Feb 17, 2007
    #15
  16. Ken Kast

    Alex Hunsley Guest

    Lew wrote:
    > Alex Hunsley wrote:
    >> Although even this isn't correct, in theory... due to the finite memory
    >> of the computer, it is possible to give it a sequence that should pass
    >> (the brackets match), but won't pass, because the computer runs out of
    >> memory. Ok, a little silly, but...

    >
    > Not so silly, really. It points to a valuable principle of workaday
    > programming, nay, engineering overall. When the theory says there is no
    > perfect algorithm, the practice suggests to use an imperfect one. You
    > code to handle the realistic cases and discontinuously reject inputs
    > that exceed the algorithm's ability, like the memory-hog example.
    >
    > This way of thinking also leads one to jump out of "What regular
    > expression?" into "What algorithm?"


    Good points. I have also touched on this subject with a reply I just
    wrote to a post from Daniel. In Comp Sci. proper, when you're talking
    about bracket matching, you want a machine that solves it for *any*
    input. So in that sense, it can't be done by a normal real world machine...

    lex


    >
    > - Lew
    Alex Hunsley, Feb 17, 2007
    #16
  17. Ken Kast

    Alex Hunsley Guest

    Mark Jeffcoat wrote:
    > "Michael" <> writes:
    >
    >> On Feb 15, 7:34 pm, "Ken Kast" <> wrote:
    >>> Does anyone have a regular expression pattern that would include a test for
    >>> balanced parens of arbitrary nestedness?

    >> No. No one does.
    >>
    >> There's a well known proof in Computer Science theory that it is not
    >> possible to create a regular expression for balanced parentheses.
    >>

    >
    > Michael is quite right. Of course, that proof is silent
    > on the subject of creating a regular expression that accepts
    > balanced strings with at most Long.MAX_VALUE open parens.


    Although in theory you could actually count this many with a single long:

    Long.MAX_VALUE - Long.MIN_VALUE + 1

    I have a feeling this expression is equal to:

    - (2 * Long.MIN_VALUE)
    Alex Hunsley, Feb 17, 2007
    #17
  18. Ken Kast

    Chris Uppal Guest

    Alex Hunsley wrote:

    > Although even this isn't correct, in theory... due to the finite memory
    > of the computer, it is possible to give it a sequence that should pass
    > (the brackets match), but won't pass, because the computer runs out of
    > memory. Ok, a little silly, but...


    You can do it in constant space for arbitrary length inputs /if/ you are
    allowed to use the input as working space too -- just replace '()' by '' until
    there's nothing left.

    A more elegant method would be to keep the unclosed-bracket count in unbounded
    integer representation (BigInteger-style), stored in (overwriting) the portion
    of the input sequence which has already been scanned.

    -- chris
    Chris Uppal, Feb 17, 2007
    #18
  19. Ken Kast

    Chris Uppal Guest

    Alex Hunsley wrote:

    > The matched parenthesis problem can be done with a non-deterministic
    > state machine, but not with a deterministic one.... hence no regular
    > expression can do it.


    I would understand "non-deterministic state machine" to mean the
    non-deterministic variant of finite state automaton). And the deterministic
    and non-deterministic versions of FSAs are equivalent in power. Do you use the
    term to mean something other form of automaton ?

    -- chris
    Chris Uppal, Feb 17, 2007
    #19
  20. Ken Kast

    Chris Uppal Guest

    Daniel Pitts wrote:

    > private static boolean isBalancedCount(String string) {
    > int parens = 0;
    > for (char c : string.toCharArray()) {
    > if ((c == '(' && ++ parens < 0) || c==')' && --parens < 0)
    > {
    > return false;
    > }
    > }
    > return parens == 0;
    > }


    I have rarely seen code I liked less. Well done ;-)

    -- chris
    Chris Uppal, Feb 17, 2007
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Stephen Ferg

    How to avoid "f.close" (no parens) bug?

    Stephen Ferg, Feb 11, 2004, in forum: Python
    Replies:
    12
    Views:
    483
    Nick Craig-Wood
    Feb 13, 2004
  2. Batista, Facundo

    RE: How to avoid "f.close" (no parens) bug?

    Batista, Facundo, Feb 11, 2004, in forum: Python
    Replies:
    2
    Views:
    240
    Neil Hodgson
    Feb 11, 2004
  3. Batista, Facundo

    RE: How to avoid "f.close" (no parens) bug?

    Batista, Facundo, Feb 11, 2004, in forum: Python
    Replies:
    2
    Views:
    272
    Peter Hansen
    Feb 11, 2004
  4. ivo welch
    Replies:
    5
    Views:
    101
    Robert Klemme
    Jan 24, 2009
  5. ivo welch

    matching balanced parens

    ivo welch, Dec 26, 2003, in forum: Perl Misc
    Replies:
    3
    Views:
    95
    Ben Morrow
    Dec 27, 2003
Loading...

Share This Page