FAQ 4.23 How do I find matching/nesting anything?

Discussion in 'Perl Misc' started by PerlFAQ Server, Apr 2, 2011.

  1. This is an excerpt from the latest version perlfaq4.pod, which
    comes with the standard Perl distribution. These postings aim to
    reduce the number of repeated questions as well as allow the community
    to review and update the answers. The latest version of the complete
    perlfaq is at http://faq.perl.org .

    --------------------------------------------------------------------

    4.23: How do I find matching/nesting anything?

    This isn't something that can be done in one regular expression, no
    matter how complicated. To find something between two single characters,
    a pattern like "/x([^x]*)x/" will get the intervening bits in $1. For
    multiple ones, then something more like "/alpha(.*?)omega/" would be
    needed. But none of these deals with nested patterns. For balanced
    expressions using "(", "{", "[" or "<" as delimiters, use the CPAN
    module Regexp::Common, or see "(??{ code })" in perlre. For other cases,
    you'll have to write a parser.

    If you are serious about writing a parser, there are a number of modules
    or oddities that will make your life a lot easier. There are the CPAN
    modules "Parse::RecDescent", "Parse::Yapp", and "Text::Balanced"; and
    the "byacc" program. Starting from perl 5.8 the "Text::Balanced" is part
    of the standard distribution.

    One simple destructive, inside-out approach that you might try is to
    pull out the smallest nesting parts one at a time:

    while (s/BEGIN((?:(?!BEGIN)(?!END).)*)END//gs) {
    # do something with $1
    }

    A more complicated and sneaky approach is to make Perl's regular
    expression engine do it for you. This is courtesy Dean Inada, and rather
    has the nature of an Obfuscated Perl Contest entry, but it really does
    work:

    # $_ contains the string to parse
    # BEGIN and END are the opening and closing markers for the
    # nested text.

    @( = ('(','');
    @) = (')','');
    ($re=$_)=~s/((BEGIN)|(END)|.)/$)[!$3]\Q$1\E$([!$2]/gs;
    @$ = (eval{/$re/},$@!~/unmatched/i);
    print join("\n",@$[0..$#$]) if( $$[-1] );



    --------------------------------------------------------------------

    The perlfaq-workers, a group of volunteers, maintain the perlfaq. They
    are not necessarily experts in every domain where Perl might show up,
    so please include as much information as possible and relevant in any
    corrections. The perlfaq-workers also don't have access to every
    operating system or platform, so please include relevant details for
    corrections to examples that do not work on particular platforms.
    Working code is greatly appreciated.

    If you'd like to help maintain the perlfaq, see the details in
    perlfaq.pod.
    PerlFAQ Server, Apr 2, 2011
    #1
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Ian
    Replies:
    2
    Views:
    800
  2. Trans
    Replies:
    10
    Views:
    302
    Sean O'Halpin
    Sep 16, 2005
  3. nani
    Replies:
    2
    Views:
    164
    comp.llang.perl.moderated
    Mar 14, 2008
  4. PerlFAQ Server

    FAQ 4.23 How do I find matching/nesting anything?

    PerlFAQ Server, Jan 2, 2011, in forum: Perl Misc
    Replies:
    0
    Views:
    90
    PerlFAQ Server
    Jan 2, 2011
  5. PerlFAQ Server

    FAQ 4.52 How do I sort an array by (anything)?

    PerlFAQ Server, Feb 23, 2011, in forum: Perl Misc
    Replies:
    0
    Views:
    92
    PerlFAQ Server
    Feb 23, 2011
Loading...

Share This Page