match nested parenthesis

Discussion in 'Python' started by s99999999s2003, Jan 23, 2007.

  1. hi
    i wish to find an reg exp for matching nested parenthesis of varying
    level like
    string =
    "somewords1(words(somewords2)-(some(some)words3)somestuff)somestuff"
    and be able to evaluate the pair starting from the inner most(the
    deepest level) , ie (some)
    up till the outer most. What is a good reg exp to do this? or is simple
    string manipulations enough?
    thanks
     
    s99999999s2003, Jan 23, 2007
    #1
    1. Advertisements

  2. s99999999s2003

    Terry Reedy Guest

    | hi
    | i wish to find an reg exp for matching nested parenthesis of varying
    | level like
    | string =
    | "somewords1(words(somewords2)-(some(some)words3)somestuff)somestuff"
    | and be able to evaluate the pair starting from the inner most(the
    | deepest level) , ie (some)
    | up till the outer most. What is a good reg exp to do this? or is simple
    | string manipulations enough?
    | thanks

    The presence of indefinitely nested fence markers, like parens,
    differentiates a context-free language, generated from a context-free
    grammar, from a regular language/grammar. You cannot parse such sentences
    with a regex engine or even with an extended regex engine such as Python's.
    Use a more general parser.

    tjr
     
    Terry Reedy, Jan 23, 2007
    #2
    1. Advertisements

  3. s99999999s2003

    Neil Cerutti Guest

    It is not possible, because the set of strings containing
    balanced parenthesis is not regular. Python re's aren't *really*
    regular expressions, so you might be able to hack something up
    using extensions (I'm not sure if it's been proven that you
    can'). I believe regexes are the wrong tool for the job.
    Write a context free grammar and a recognizer for that grammar,
    instead. Python developers haven't been able to agree on any such
    module to include in the standard distribution yet,
    unfortunately. But fortunately (consequently), there are tons of
    nice parser and scanner generators available on the net. Try
    pyparsing for a start, as it's a recent, maintained package with
    excellent support on this group at the moment.

    Or just compose your own little function by hand. A stack-based
    solution (implemented using a list as a stack) should be easy enough.
     
    Neil Cerutti, Jan 23, 2007
    #3
  4. s99999999s2003

    arn.zart Guest

    Evaluation using re.sub() and recursion (3 lines of code):

    def eval_sub(expr):
    r"""Evaluate subexpressions in nested paired delimiters.

    For example,
    ‹ single left-pointing angle quotation ‹
    › single right-pointing angle quotation ›

    >>> eval_sub('‹3 * ‹3 + ‹‹1 + 1› * 2›››')
    '21'
    >>> eval_sub('‹3 * 3 + ‹‹1 + 1› * 2›››') # test
    mismatched delimiters
    '13\x9b'
    >>> '\x9b' == '›' # encoding ISO-8859-1
    True
    >>> eval_sub('3 * ‹‹1 + 1› * 2 + 3›') # test absence of
    outer delimiters
    '3 * 7'
    """
    val, n = re.subn("‹([^›‹]+)›", lambda m:
    str(eval(m.group(1))), expr)
    if n == 0:
    return val

    return eval_sub(val)
    #end

    This is just a proof of concept.
     
    arn.zart, Jan 25, 2007
    #4
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.