Recursive regular expression (or alternative)

André Hänsel · Apr 12, 2006

Hi,

I'm looking for a pattern to parse the following subject:

blabla(cond1)?{abc}:{def}blabla(cond2)?{qwe(cond3)?{poi}:{iop}}

Probably I can't do that by executing a regexp _once_, so I need some
kind of recursive call to get those two lists (e.g. in an array):
array(
0 => array("cond1","abc","def"),
1 => array("cond2","qwe(cond3)?{poi}:{iop}","")
)
and for the second one then I need the array:
array(
0 => array("cond3","poi","iop")
)

Or in words: In a configuration file I have _nested_ conditional blocks
in the format (condition)?{content_when_true}:{content_when_false}
while the false content ist optional.

I built the following pattern, but thats just a bad guess:
/$([a-zA-Z0-9_ &|]+)$\?\{(.*)\}(\:\{(.*)\})?/
It matches both blocks when it's greedy and ignores the nested block
when it's ungreedy.

If you have any completly different solution I'll be glad to hear it,
too.

Best regards,
André

Tad McClellan · Apr 13, 2006

[ Followup set to a newsgroup that I read ]

André Hänsel said:
In a configuration file I have _nested_ conditional blocks

If you have any completly different solution I'll be glad to hear it,
too.

perldoc -q nest

How do I find matching/nesting anything?

Anno Siegel · Apr 14, 2006

Tad McClellan said:
[ Followup set to a newsgroup that I read ]

André Hänsel said:

In a configuration file I have _nested_ conditional blocks

Click to expand...

If you have any completly different solution I'll be glad to hear it,
too.

Click to expand...

perldoc -q nest

How do I find matching/nesting anything?

Ah, right... I knew there was a FAQ.

Another place to look at is the explanation of "(??{ ... })" in
'perldoc perlre'.

I believe what you really want when you think you want a recursive regex
is a parser (specifically, a recursive-descent parser). That is most
conveniently built with Parse::RecDescent (a standard module).

It is possible to build non-trivial parsers using regex recursion through
the "(??{ ... })" construct. I have appended code that shows this for a
simple set of arithmetic expressions. Debugging this kind of code is
hard. One of the most frequent parsing errors is deep recursion, which
gives you a segfault, no more, no less. It is also not immediately clear
how to extend the parser to build a parse tree or some other result as it
goes along.

Writing the same thing in Parse::RecDescent would be shorter, clearer
(to readers of the Parse::RecDescent language), give you debugging support,
and laid-out methods for building a result while parsing.

Anno

#!/usr/local/bin/perl
use strict; use warnings; $| = 1;

# Build a regex ($expr) that parses arithmetic expressions consisting of
# unsigned numbers, '+', '*', and parentheses.

use re 'eval';

# terminal tokens
my $number = qr/ \s* \d+ \s*/x;
my $plus = qr/ \s* \+ \s* /x;
my $times = qr/ \s* \* \s* /x;
my $open = qr/ \s* $ \s* /x;
my $close = qr/ \s* $ \s* /x;

# non-terminals
my ( $factor, $term, $expr);

$factor = qr/ # a factor of a product is
$number # just a number
| # or
$open (??{ $expr }) $close # any expression in parentheses
/x;

$term = qr/ # a term of a sum is
(??{ $factor }) # a factor
(?: # optionally followed by
$times (??{ $term }) # more factors separated by '*'
)?
/x;

$expr = qr/ # an expression is
$term # a term
(?: # optionally followed by
$plus (??{ $expr }) # more terms separated by '+'
)?
/x;

# see how it works:
for (
'xyz', # no match
'1',
'(((1)))',
'(((1())))', # only '1' matches
'1 + 2',
'1+2+3 + 4 + 5+6 + 7',
'1+(2+(3+(4+5)+6)+7)',

'1*2',
'1*(2+3)',
'1 + 2*(3+4)*(5+6)',
) {
if ( /($expr)/ ) {
print "matched $1 in '$_'\n";
} else {
print "no match in '$_'\n";
}
}

regular expression	6	Nov 18, 2007
anybody help me	1	Feb 10, 2006
comp.lang.c Answers to Frequently Asked Questions (FAQ List)	1	Feb 1, 2004

Recursive regular expression (or alternative)

André Hänsel

Tad McClellan

Anno Siegel

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads