regex match for same number of opening and closing brackets

Discussion in 'Perl Misc' started by Sascha Bendix, Sep 18, 2009.

  1. Hi,

    I got a little parsing problem right here: I got some strings and want
    to ensure, that there are as many opening as closing brackets via a
    regex without specifying the exact number.

    This regex would be part of a bigger one, so it can't be done in two steps.

    Can anybody give me a hint how to do this?

    Here are some boundary conditions of the strings I have:

    * the number of occurrences differs between 0 an 67
    * the strings always start with [a-z]
    * there is always [a-z,]+ between two opening brackets

    Thanks in advance for every hint.

    Regards,

    Sascha
     
    Sascha Bendix, Sep 18, 2009
    #1
    1. Advertising

  2. Sascha Bendix <> wrote:
    >I got a little parsing problem right here: I got some strings and want
    >to ensure, that there are as many opening as closing brackets via a
    >regex without specifying the exact number.


    It cannot be done with pure regular expressions because balanced
    brackets require (at least) a context-free language. And a context-free
    language cannot be parsed with regular expressions. For more information
    please see "Chomsky Hierarchy" in your favourite book about formal
    languages.

    However Perl's regular expressions are enhanced and there are some means
    now.

    >Can anybody give me a hint how to do this?


    See "perldoc -q balanced"

    jue
     
    Jürgen Exner, Sep 18, 2009
    #2
    1. Advertising

  3. Sascha Bendix

    Guest

    On Fri, 18 Sep 2009 14:22:05 +0200, Sascha Bendix <> wrote:

    >Hi,
    >
    >I got a little parsing problem right here: I got some strings and want
    >to ensure, that there are as many opening as closing brackets via a
    >regex without specifying the exact number.
    >
    >This regex would be part of a bigger one, so it can't be done in two steps.
    >
    >Can anybody give me a hint how to do this?
    >
    >Here are some boundary conditions of the strings I have:
    >
    > * the number of occurrences differs between 0 an 67
    > * the strings always start with [a-z]
    > * there is always [a-z,]+ between two opening brackets
    >
    >Thanks in advance for every hint.
    >
    >Regards,
    >
    >Sascha


    - Try with(out) debug option.
    - It can be set to (*FAIL) when first $2 is populated.
    - Requires Perl 5.10
    - Max 50 recursive levels (shouldn't be a problem).
    - Is only based on char delimeters, can be tweeked for
    string delimeters.


    Good Luck!
    -sln
    ================

    use strict;
    use warnings;

    my $debug = 0;
    my $string = " asdf[ ( () ) (((( ] ))) ( ) () a )";
    my ($x,$y) =
    (
    quotemeta '(',
    quotemeta ')'
    );

    my $rxbal = qr {
    (?: ($x(?:(?>[^$x$y]+)|(?1))*$y) | ([$x$y]) | . )+
    }x;

    if ($debug)
    {
    use re 'eval';
    $rxbal = qr {
    (?:
    ( # group 1
    $x
    (?{ print "(",pos()," "; })
    (?:
    (?> [^$x$y]+ ) # not $x nor $y and no backtracking
    |
    (?{ print "-",pos(),"\n"; })
    (?1) # Recurse to start of group 1
    )*
    (?{ print ")",pos()," "; })
    $y
    )
    (?{ print "\n\$1 => $^N",pos()," \n"; })
    |
    ([$x$y]) # $x or $y
    (?{ print "\n\$2 => (!!!!!!!!!!bad) $^N",pos()," \n"; })
    |
    (.) # any char
    (?{ print "\n\$3 => $^N",pos()," \n"; })
    )+
    }x;
    }

    print "$string\n";

    if ($string =~ /$rxbal/)
    {
    if (defined $2)
    { print "** This is NOT balanced\n" }
    else
    { print "** This is balanced\n" }
    }
    else
    { print "** Nothing to balance\n" }
     
    , Sep 18, 2009
    #3
  4. Sascha Bendix

    Guest

    On Fri, 18 Sep 2009 14:06:08 -0700, wrote:

    >On Fri, 18 Sep 2009 14:22:05 +0200, Sascha Bendix <> wrote:
    >
    >>Hi,
    >>
    >>I got a little parsing problem right here: I got some strings and want
    >>to ensure, that there are as many opening as closing brackets via a
    >>regex without specifying the exact number.
    >>
    >>This regex would be part of a bigger one, so it can't be done in two steps.
    >>
    >>Can anybody give me a hint how to do this?
    >>


    Try this modified version. This regex accepts strings
    as open and close balanced text. It will still do single
    character.
    '$x,$y' are the open,close text variables below.

    -sln

    ----------------------------
    use strict;
    use warnings;

    my $debug = 1;

    my $string = "ga <a>this<a>{s<a>{ds}</a>g}</a>that-this</a> g <a>dn gd</a> that{fn} asdf </a>ga";

    my ($x,$y) =
    (
    '<a>',
    '</a>'
    );
    my ($xp,$yp) = ($x,$y);
    $_ = quotemeta for ($x,$y);


    my $rxbal = qr {
    (?:
    ($x (?:(?>(?:(?!$x|$y).)+) | (?1))* $y) # Group 1
    | ($x|$y) # Group 2
    | .
    )+
    }xs;


    if ($debug)
    {
    use re 'eval';
    $rxbal = qr {
    (?:
    ( # Group 1
    $x
    (?{ print "\n x => '$xp'",pos(); })
    (?:
    (?> (?: (?!$x|$y) . )+ ) # no backtracking and not ($x|$y)
    |
    (?{ print "-",pos(); })
    (?1) # Recurse to start of group 1
    )*
    $y
    (?{ print "\n y => '$yp'",pos(); })
    )
    (?{ print "\n* \$1 => '$^N'",pos(); })
    |
    ($x|$y) # Group 2, $x or $y
    (?{ print "\n! \$2 => (____bad____) '$^N'",pos(); })
    |
    (.) # Group 3, any char
    (?{ print "\n \$3 => '$^N'",pos(); })
    )+
    (?{ print "\n"; })
    }xs;
    }

    print "\nx = '$xp'\ny = '$yp'\n",'-'x20,"\n'$string'\n";

    if ($string =~ /$rxbal/)
    {
    print "\n";
    if (defined $2)
    { print "** This is NOT balanced\n" }
    elsif (not defined $1)
    { print "** Nothing to balance\n" }
    else
    { print "** This is balanced\n" }
    }
    else
    { print "** The string is empty\n" }

    __END__

    Output:

    x = '<a>'
    y = '</a>'
    --------------------
    'ga <a>this<a>{s<a>{ds}</a>g}</a>that-this</a> g <a>dn gd</a> that{fn} asdf </a>
    ga'

    $3 => 'g'1
    $3 => 'a'2
    $3 => ' '3
    x => '<a>'6-10
    x => '<a>'13-15
    x => '<a>'18-22
    y => '</a>'26-28
    y => '</a>'32-41
    y => '</a>'45
    * $1 => '<a>this<a>{s<a>{ds}</a>g}</a>that-this</a>'45
    $3 => ' '46
    $3 => 'g'47
    $3 => ' '48
    x => '<a>'51-56
    y => '</a>'60
    * $1 => '<a>dn gd</a>'60
    $3 => ' '61
    $3 => 't'62
    $3 => 'h'63
    $3 => 'a'64
    $3 => 't'65
    $3 => '{'66
    $3 => 'f'67
    $3 => 'n'68
    $3 => '}'69
    $3 => ' '70
    $3 => 'a'71
    $3 => 's'72
    $3 => 'd'73
    $3 => 'f'74
    $3 => ' '75
    ! $2 => (____bad____) '</a>'79
    $3 => 'g'80
    $3 => 'a'81

    ** This is NOT balanced
     
    , Sep 19, 2009
    #4
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. hiwa
    Replies:
    0
    Views:
    648
  2. NevilleDNZ

    RegEx for matching brackets

    NevilleDNZ, May 2, 2008, in forum: Python
    Replies:
    5
    Views:
    1,105
    NevilleDNZ
    May 4, 2008
  3. Marcus Brody

    Brackets () in variable used for pattern match

    Marcus Brody, Sep 3, 2003, in forum: Perl Misc
    Replies:
    6
    Views:
    118
    Tad McClellan
    Sep 5, 2003
  4. Peter Stacy
    Replies:
    1
    Views:
    111
    John W. Krahn
    Nov 8, 2009
  5. jwcarlton
    Replies:
    23
    Views:
    500
    ccc31807
    Feb 22, 2011
Loading...

Share This Page