Help with nested pattern.

Discussion in 'Perl Misc' started by somedeveloper@gmail.com, Apr 14, 2007.

  1. Guest

    Hi,

    Would appreciate some hints on a 'smart' / 'nifty' solution to this
    problem.

    The problem:
    I need to extract a block of text lying between -- let's say -- a
    pair of brackets.
    There can be an arbitrary # of such [] blocks nested one inside the
    other.
    I know how to mark my first '[' to start the matching process.

    Example:
    abc [ def .*
    [ .* ]
    [ .*
    [ .* ]
    ]
    uvw ] xyz

    Desired output: [ def .* uvw ]


    1. Now, I don't know if this is something Perl regexps can handle. I
    read somewhere (possibly incorrectly) that nested patterns are in
    general constructs that are handled via grammars (flex/bison combo)
    and not regexps.

    2. But since Perl provides features like match-time-code-evaluation in
    regexps, I thought incrementing a count variable on each '[',
    decrementing it on each ']', and printing the current pattern when the
    count goes to zero would do the job... but I'm not so sure how.

    3. If there's really no solution via regexps and grammars, I would
    have to use the brute-force approach of processing each character in a
    loop looking for ['s and ]'s. (yuck!)

    Regards...
     
    , Apr 14, 2007
    #1
    1. Advertising

  2. On 14 Apr, 11:31, wrote:
    > Hi,
    >
    > Would appreciate some hints on a 'smart' / 'nifty' solution to this
    > problem.
    >
    > The problem:
    > I need to extract a block of text lying between -- let's say --
    > a pair of brackets.
    > There can be an arbitrary # of such [] blocks nested one inside
    > the other.


    This is FAQ: "How do I find matching/nesting anything?"
     
    Brian McCauley, Apr 14, 2007
    #2
    1. Advertising

  3. On 14 Apr, 11:42, "Brian McCauley" <> wrote:

    > This is FAQ: "How do I find matching/nesting anything?"


    Applying the suggestions given there

    use strict;
    use warnings;

    my $in = ' abc [ def .*
    [ .* ]
    [ .*
    [ .* ]
    ]
    uvw ] xyz';

    local our $re;

    # Taken from "perldoc perlre" section dealing with (??{ })
    $re = qr{
    \[
    (?:
    (?> [^\[\]]+ )
    |
    (??{ $re })
    )*
    \]
    }x;

    # Find first top-level bracketed section
    my ($out) = $in =~ /($re)/;

    # Remove sub-brackets
    $out =~ s/(?<!\A)$re//g;

    # Normalize whitespace
    $out =~ s/\s+/ /g;

    print "$out\n";

    __END__
     
    Brian McCauley, Apr 14, 2007
    #3
  4. Guest

    On Apr 14, 4:28 pm, "Brian McCauley" <> wrote:
    > On 14 Apr, 11:42, "Brian McCauley" <> wrote:
    >
    > > This is FAQ: "How do I find matching/nesting anything?"

    >
    > Applying the suggestions given there
    >
    > use strict;
    > use warnings;
    >
    > my $in = ' abc [ def .*
    > [ .* ]
    > [ .*
    > [ .* ]
    > ]
    > uvw ] xyz';
    >
    > local our $re;
    >
    > # Taken from "perldoc perlre" section dealing with (??{ })
    > $re = qr{
    > \[
    > (?:
    > (?> [^\[\]]+ )
    > |
    > (??{ $re })
    > )*
    > \]
    > }x;
    >
    > # Find first top-level bracketed section
    > my ($out) = $in =~ /($re)/;
    >
    > # Remove sub-brackets
    > $out =~ s/(?<!\A)$re//g;
    >
    > # Normalize whitespace
    > $out =~ s/\s+/ /g;
    >
    > print "$out\n";
    >
    > __END__


    Can't thank you enough! It was (really){2,}\.\.\. dumb on my part to
    not check the faq first!
     
    , Apr 14, 2007
    #4
  5. Mirco Wahab Guest

    wrote:
    > The problem:
    > I need to extract a block of text lying between -- let's say -- a
    > pair of brackets.
    > There can be an arbitrary # of such [] blocks nested one inside the
    > other.
    > I know how to mark my first '[' to start the matching process.
    > Example:
    > abc [ def .*
    > [ .* ]
    > [ .*
    > [ .* ]
    > ]
    > uvw ] xyz
    >
    > Desired output: [ def .* uvw ]


    If the problem stays as simple as your example,
    which means: you know in advance to capture
    only the outer part of something, you could
    simply re-model it as a regexp and forget about
    the inner structure (if you don't need it).

    Example (you know you need only the "outer pair")

    use strict;
    use warnings;

    my $text = '
    abc [ def .*
    [ .* ]
    [ .*
    [ .* ]
    ]
    uvw ] xyz ';

    my $reg;

    $reg = qr/ \A # start of string
    .+? (\[ \s+ \w+) \s+ (\S+) # re-model abc [ def ~~~
    .* # be greedy
    \b(\w+ \s+ \]) \s+ \w+ \s+ # re-model backwards
    \z
    /xs;


    if( $text =~ /$reg/ ) {
    print "$1 $2 $3"
    }


    If your real problem is more complicated,
    then you'd go with Brians solution imho.

    Regards

    Mirco
     
    Mirco Wahab, Apr 14, 2007
    #5
  6. On Apr 14, 12:28 pm, "Brian McCauley" <> wrote:

    > # Remove sub-brackets
    > $out =~ s/(?<!\A)$re//g;


    \A is zero width (so look-behind = look-ahead) and without a /m
    qualifier it's equivalent to ^ so the above is more neatly written as:

    $out =~ s/(?!^)$re//g;
     
    Brian McCauley, Apr 19, 2007
    #6
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Replies:
    17
    Views:
    1,870
    Chris Uppal
    Nov 16, 2005
  2. sunny
    Replies:
    1
    Views:
    465
    Salt_Peter
    Dec 7, 2006
  3. Pallav singh
    Replies:
    0
    Views:
    363
    Pallav singh
    Jan 22, 2012
  4. Pallav singh
    Replies:
    0
    Views:
    404
    Pallav singh
    Jan 22, 2012
  5. Pallav singh
    Replies:
    1
    Views:
    452
    Peter Remmers
    Jan 22, 2012
Loading...

Share This Page