Recursive regular expression (or alternative)

A

André Hänsel

Hi,

I'm looking for a pattern to parse the following subject:

blabla(cond1)?{abc}:{def}blabla(cond2)?{qwe(cond3)?{poi}:{iop}}

Probably I can't do that by executing a regexp _once_, so I need some
kind of recursive call to get those two lists (e.g. in an array):
array(
0 => array("cond1","abc","def"),
1 => array("cond2","qwe(cond3)?{poi}:{iop}","")
)
and for the second one then I need the array:
array(
0 => array("cond3","poi","iop")
)

Or in words: In a configuration file I have _nested_ conditional blocks
in the format (condition)?{content_when_true}:{content_when_false}
while the false content ist optional.

I built the following pattern, but thats just a bad guess:
/\(([a-zA-Z0-9_ &|]+)\)\?\{(.*)\}(\:\{(.*)\})?/
It matches both blocks when it's greedy and ignores the nested block
when it's ungreedy.

If you have any completly different solution I'll be glad to hear it,
too. :)

Best regards,
André
 
T

Tad McClellan

[ Followup set to a newsgroup that I read ]


André Hänsel said:
In a configuration file I have _nested_ conditional blocks
If you have any completly different solution I'll be glad to hear it,
too. :)


perldoc -q nest

How do I find matching/nesting anything?
 
A

Anno Siegel

Tad McClellan said:
[ Followup set to a newsgroup that I read ]


André Hänsel said:
In a configuration file I have _nested_ conditional blocks
If you have any completly different solution I'll be glad to hear it,
too. :)


perldoc -q nest

How do I find matching/nesting anything?

Ah, right... I knew there was a FAQ.

Another place to look at is the explanation of "(??{ ... })" in
'perldoc perlre'.

I believe what you really want when you think you want a recursive regex
is a parser (specifically, a recursive-descent parser). That is most
conveniently built with Parse::RecDescent (a standard module).

It is possible to build non-trivial parsers using regex recursion through
the "(??{ ... })" construct. I have appended code that shows this for a
simple set of arithmetic expressions. Debugging this kind of code is
hard. One of the most frequent parsing errors is deep recursion, which
gives you a segfault, no more, no less. It is also not immediately clear
how to extend the parser to build a parse tree or some other result as it
goes along.

Writing the same thing in Parse::RecDescent would be shorter, clearer
(to readers of the Parse::RecDescent language), give you debugging support,
and laid-out methods for building a result while parsing.

Anno


#!/usr/local/bin/perl
use strict; use warnings; $| = 1;

# Build a regex ($expr) that parses arithmetic expressions consisting of
# unsigned numbers, '+', '*', and parentheses.

use re 'eval';

# terminal tokens
my $number = qr/ \s* \d+ \s*/x;
my $plus = qr/ \s* \+ \s* /x;
my $times = qr/ \s* \* \s* /x;
my $open = qr/ \s* \( \s* /x;
my $close = qr/ \s* \) \s* /x;

# non-terminals
my ( $factor, $term, $expr);

$factor = qr/ # a factor of a product is
$number # just a number
| # or
$open (??{ $expr }) $close # any expression in parentheses
/x;

$term = qr/ # a term of a sum is
(??{ $factor }) # a factor
(?: # optionally followed by
$times (??{ $term }) # more factors separated by '*'
)?
/x;

$expr = qr/ # an expression is
$term # a term
(?: # optionally followed by
$plus (??{ $expr }) # more terms separated by '+'
)?
/x;


# see how it works:
for (
'xyz', # no match
'1',
'(((1)))',
'(((1())))', # only '1' matches
'1 + 2',
'1+2+3 + 4 + 5+6 + 7',
'1+(2+(3+(4+5)+6)+7)',

'1*2',
'1*(2+3)',
'1 + 2*(3+4)*(5+6)',
) {
if ( /($expr)/ ) {
print "matched $1 in '$_'\n";
} else {
print "no match in '$_'\n";
}
}
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads


Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,013
Latest member
KatriceSwa

Latest Threads

Top