Help with nested pattern.

S

somedeveloper

Hi,

Would appreciate some hints on a 'smart' / 'nifty' solution to this
problem.

The problem:
I need to extract a block of text lying between -- let's say -- a
pair of brackets.
There can be an arbitrary # of such [] blocks nested one inside the
other.
I know how to mark my first '[' to start the matching process.

Example:
abc [ def .*
[ .* ]
[ .*
[ .* ]
]
uvw ] xyz

Desired output: [ def .* uvw ]


1. Now, I don't know if this is something Perl regexps can handle. I
read somewhere (possibly incorrectly) that nested patterns are in
general constructs that are handled via grammars (flex/bison combo)
and not regexps.

2. But since Perl provides features like match-time-code-evaluation in
regexps, I thought incrementing a count variable on each '[',
decrementing it on each ']', and printing the current pattern when the
count goes to zero would do the job... but I'm not so sure how.

3. If there's really no solution via regexps and grammars, I would
have to use the brute-force approach of processing each character in a
loop looking for ['s and ]'s. (yuck!)

Regards...
 
B

Brian McCauley

Hi,

Would appreciate some hints on a 'smart' / 'nifty' solution to this
problem.

The problem:
I need to extract a block of text lying between -- let's say --
a pair of brackets.
There can be an arbitrary # of such [] blocks nested one inside
the other.

This is FAQ: "How do I find matching/nesting anything?"
 
B

Brian McCauley

This is FAQ: "How do I find matching/nesting anything?"

Applying the suggestions given there

use strict;
use warnings;

my $in = ' abc [ def .*
[ .* ]
[ .*
[ .* ]
]
uvw ] xyz';

local our $re;

# Taken from "perldoc perlre" section dealing with (??{ })
$re = qr{
\[
(?:
(?> [^\[\]]+ )
|
(??{ $re })
)*
\]
}x;

# Find first top-level bracketed section
my ($out) = $in =~ /($re)/;

# Remove sub-brackets
$out =~ s/(?<!\A)$re//g;

# Normalize whitespace
$out =~ s/\s+/ /g;

print "$out\n";

__END__
 
S

somedeveloper

This is FAQ: "How do I find matching/nesting anything?"

Applying the suggestions given there

use strict;
use warnings;

my $in = ' abc [ def .*
[ .* ]
[ .*
[ .* ]
]
uvw ] xyz';

local our $re;

# Taken from "perldoc perlre" section dealing with (??{ })
$re = qr{
\[
(?:
(?> [^\[\]]+ )
|
(??{ $re })
)*
\]
}x;

# Find first top-level bracketed section
my ($out) = $in =~ /($re)/;

# Remove sub-brackets
$out =~ s/(?<!\A)$re//g;

# Normalize whitespace
$out =~ s/\s+/ /g;

print "$out\n";

__END__

Can't thank you enough! It was (really){2,}\.\.\. dumb on my part to
not check the faq first!
 
M

Mirco Wahab

The problem:
I need to extract a block of text lying between -- let's say -- a
pair of brackets.
There can be an arbitrary # of such [] blocks nested one inside the
other.
I know how to mark my first '[' to start the matching process.
Example:
abc [ def .*
[ .* ]
[ .*
[ .* ]
]
uvw ] xyz

Desired output: [ def .* uvw ]

If the problem stays as simple as your example,
which means: you know in advance to capture
only the outer part of something, you could
simply re-model it as a regexp and forget about
the inner structure (if you don't need it).

Example (you know you need only the "outer pair")

use strict;
use warnings;

my $text = '
abc [ def .*
[ .* ]
[ .*
[ .* ]
]
uvw ] xyz ';

my $reg;

$reg = qr/ \A # start of string
.+? (\[ \s+ \w+) \s+ (\S+) # re-model abc [ def ~~~
.* # be greedy
\b(\w+ \s+ \]) \s+ \w+ \s+ # re-model backwards
\z
/xs;


if( $text =~ /$reg/ ) {
print "$1 $2 $3"
}


If your real problem is more complicated,
then you'd go with Brians solution imho.

Regards

Mirco
 
B

Brian McCauley

# Remove sub-brackets
$out =~ s/(?<!\A)$re//g;

\A is zero width (so look-behind = look-ahead) and without a /m
qualifier it's equivalent to ^ so the above is more neatly written as:

$out =~ s/(?!^)$re//g;
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top