Recursive regular expression (or alternative)

Discussion in 'Perl Misc' started by André Hänsel, Apr 12, 2006.

  1. Hi,

    I'm looking for a pattern to parse the following subject:

    blabla(cond1)?{abc}:{def}blabla(cond2)?{qwe(cond3)?{poi}:{iop}}

    Probably I can't do that by executing a regexp _once_, so I need some
    kind of recursive call to get those two lists (e.g. in an array):
    array(
    0 => array("cond1","abc","def"),
    1 => array("cond2","qwe(cond3)?{poi}:{iop}","")
    )
    and for the second one then I need the array:
    array(
    0 => array("cond3","poi","iop")
    )

    Or in words: In a configuration file I have _nested_ conditional blocks
    in the format (condition)?{content_when_true}:{content_when_false}
    while the false content ist optional.

    I built the following pattern, but thats just a bad guess:
    /\(([a-zA-Z0-9_ &|]+)\)\?\{(.*)\}(\:\{(.*)\})?/
    It matches both blocks when it's greedy and ignores the nested block
    when it's ungreedy.

    If you have any completly different solution I'll be glad to hear it,
    too. :)

    Best regards,
    André
     
    André Hänsel, Apr 12, 2006
    #1
    1. Advertising

  2. [ Followup set to a newsgroup that I read ]


    André Hänsel <> wrote:

    > In a configuration file I have _nested_ conditional blocks


    > If you have any completly different solution I'll be glad to hear it,
    > too. :)



    perldoc -q nest

    How do I find matching/nesting anything?


    --
    Tad McClellan SGML consulting
    Perl programming
    Fort Worth, Texas
     
    Tad McClellan, Apr 13, 2006
    #2
    1. Advertising

  3. André Hänsel

    Anno Siegel Guest

    Tad McClellan <> wrote in comp.lang.perl.misc:
    >
    > [ Followup set to a newsgroup that I read ]
    >
    >
    > André Hänsel <> wrote:
    >
    > > In a configuration file I have _nested_ conditional blocks

    >
    > > If you have any completly different solution I'll be glad to hear it,
    > > too. :)

    >
    >
    > perldoc -q nest
    >
    > How do I find matching/nesting anything?


    Ah, right... I knew there was a FAQ.

    Another place to look at is the explanation of "(??{ ... })" in
    'perldoc perlre'.

    I believe what you really want when you think you want a recursive regex
    is a parser (specifically, a recursive-descent parser). That is most
    conveniently built with Parse::RecDescent (a standard module).

    It is possible to build non-trivial parsers using regex recursion through
    the "(??{ ... })" construct. I have appended code that shows this for a
    simple set of arithmetic expressions. Debugging this kind of code is
    hard. One of the most frequent parsing errors is deep recursion, which
    gives you a segfault, no more, no less. It is also not immediately clear
    how to extend the parser to build a parse tree or some other result as it
    goes along.

    Writing the same thing in Parse::RecDescent would be shorter, clearer
    (to readers of the Parse::RecDescent language), give you debugging support,
    and laid-out methods for building a result while parsing.

    Anno


    #!/usr/local/bin/perl
    use strict; use warnings; $| = 1;

    # Build a regex ($expr) that parses arithmetic expressions consisting of
    # unsigned numbers, '+', '*', and parentheses.

    use re 'eval';

    # terminal tokens
    my $number = qr/ \s* \d+ \s*/x;
    my $plus = qr/ \s* \+ \s* /x;
    my $times = qr/ \s* \* \s* /x;
    my $open = qr/ \s* \( \s* /x;
    my $close = qr/ \s* \) \s* /x;

    # non-terminals
    my ( $factor, $term, $expr);

    $factor = qr/ # a factor of a product is
    $number # just a number
    | # or
    $open (??{ $expr }) $close # any expression in parentheses
    /x;

    $term = qr/ # a term of a sum is
    (??{ $factor }) # a factor
    (?: # optionally followed by
    $times (??{ $term }) # more factors separated by '*'
    )?
    /x;

    $expr = qr/ # an expression is
    $term # a term
    (?: # optionally followed by
    $plus (??{ $expr }) # more terms separated by '+'
    )?
    /x;


    # see how it works:
    for (
    'xyz', # no match
    '1',
    '(((1)))',
    '(((1())))', # only '1' matches
    '1 + 2',
    '1+2+3 + 4 + 5+6 + 7',
    '1+(2+(3+(4+5)+6)+7)',

    '1*2',
    '1*(2+3)',
    '1 + 2*(3+4)*(5+6)',
    ) {
    if ( /($expr)/ ) {
    print "matched $1 in '$_'\n";
    } else {
    print "no match in '$_'\n";
    }
    }
    --
    If you want to post a followup via groups.google.com, don't use
    the broken "Reply" link at the bottom of the article. Click on
    "show options" at the top of the article, then click on the
    "Reply" at the bottom of the article headers.
     
    Anno Siegel, Apr 14, 2006
    #3
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. VSK
    Replies:
    2
    Views:
    2,353
  2. =?iso-8859-1?B?bW9vcJk=?=

    Matching abitrary expression in a regular expression

    =?iso-8859-1?B?bW9vcJk=?=, Dec 1, 2005, in forum: Java
    Replies:
    8
    Views:
    872
    Alan Moore
    Dec 2, 2005
  3. lisong
    Replies:
    6
    Views:
    301
    lisong
    Nov 26, 2007
  4. n00m
    Replies:
    12
    Views:
    1,133
  5. vamsi
    Replies:
    21
    Views:
    2,135
    Keith Thompson
    Mar 9, 2009
Loading...

Share This Page