using regular express to analyze lisp code

Discussion in 'Python' started by Kelie, Oct 4, 2007.

  1. Kelie

    Kelie Guest

    hello,

    i've spent couple of hours trying to figure out the correct regular
    expression to catch a VisualLisp (it is for AutoCAD and has a syntax
    that's similar to common lisp) function body. VisualLisp is case-
    insensitive. Any line beginning with ";" is for comment (can have
    space(s) before ";").

    here is an example of VisualLisp function:

    (defun get_obj_app_names (obj / rv)
    (foreach app (get_registered_apps (vla-get-document obj))
    (if (get_xdata obj app)
    (setq rv (cons app rv))
    )
    )
    (if rv
    ;;"This line is comment (comment)"
    ;;) This line is also comment
    (acad_strlsort rv)
    nil
    )
    )

    for a function named foo, it is easy to find the beginning part of the
    function
    "(defun foo", but it is hard to find the ")" at the end of code block.
    if eventually i can't come up with the solution using regular
    expression only, what i was thinking is after finding the beginning
    part, which is "(defun foo" in this case, i can count the parenthesis,
    ignoring anything inside "" and any line for comment, until i find the
    closing ")".

    not sure if i've made myself understood. thanks for reading.

    kelie
    Kelie, Oct 4, 2007
    #1
    1. Advertising

  2. Kelie

    Dan Guest

    On Oct 4, 1:13 pm, Kelie <> wrote:
    > hello,
    >
    > i've spent couple of hours trying to figure out the correct regular
    > expression to catch a VisualLisp (it is for AutoCAD and has a syntax
    > that's similar to common lisp) function body. VisualLisp is case-
    > insensitive. Any line beginning with ";" is for comment (can have
    > space(s) before ";").
    >
    > here is an example of VisualLisp function:
    >
    > (defun get_obj_app_names (obj / rv)
    > (foreach app (get_registered_apps (vla-get-document obj))
    > (if (get_xdata obj app)
    > (setq rv (cons app rv))
    > )
    > )
    > (if rv
    > ;;"This line is comment (comment)"
    > ;;) This line is also comment
    > (acad_strlsort rv)
    > nil
    > )
    > )
    >
    > for a function named foo, it is easy to find the beginning part of the
    > function
    > "(defun foo", but it is hard to find the ")" at the end of code block.
    > if eventually i can't come up with the solution using regular
    > expression only, what i was thinking is after finding the beginning
    > part, which is "(defun foo" in this case, i can count the parenthesis,
    > ignoring anything inside "" and any line for comment, until i find the
    > closing ")".
    >
    > not sure if i've made myself understood. thanks for reading.
    >
    > kelie


    So, paren matching is a canonical context-sensitive algorithm. Now,
    many regex libraries have *some* not-purely-regular features, but I
    doubt your going to find anything to match parens in a single regex.
    If you want to go all out you can use a parser generator (for python
    parser generators, see http://python.fyxm.net/topics/parsing.html).
    Otherwise, you can go about it the quick-and-dirty way you describe:
    scan for matching open and close parens, and ignore things in quotes
    and comments.

    -Dan
    Dan, Oct 4, 2007
    #2
    1. Advertising

  3. Kelie

    Tim Chase Guest

    > i've spent couple of hours trying to figure out the correct regular
    > expression to catch a VisualLisp

    [snipped]
    > "(defun foo", but it is hard to find the ")" at the end of code block.
    > if eventually i can't come up with the solution using regular
    > expression only, what i was thinking is after finding the beginning
    > part, which is "(defun foo" in this case, i can count the parenthesis,
    > ignoring anything inside "" and any line for comment, until i find the
    > closing ")".



    """
    Some people, when confronted with a problem, think
    "I know, I'll use regular expressions!"
    Now they have two problems
    """


    Regular expressions are a wonderful tool when the domain is
    correct. However, when your domain involves processing
    arbitrarily nested syntax, regexps are not your friend. It is
    sometimes feasible to mung them into a fixed-depth-nesting
    parser, but it's always fairly painful, and the fixed-depth is an
    annoying limitation.

    Use a parsing lib. I've tinkered a bit with PyParsing[1] which
    is fairly easy to pick up, but powerful enough that you're not
    banging your head against limitations. There are a number of
    other parsing libraries[2] with various domain-specific features
    and audiences, but I'd go browsing through them only if PyParsing
    doesn't fill the bill.

    As you don't detail what you want to do with the content or how
    pathological the input can be, but you might be able to get away
    with just skimming through the input and counting open-parens and
    close-parens, stopping when they've been balanced, skipping lines
    with comments.

    -tkc

    [1] http://pyparsing.wikispaces.com/
    [2] http://nedbatchelder.com/text/python-parsers.html
    Tim Chase, Oct 4, 2007
    #3
  4. Kelie

    Kelie Guest

    On Oct 4, 7:50 am, Tim Chase <> wrote:
    > Use a parsing lib. I've tinkered a bit with PyParsing[1] which
    > is fairly easy to pick up, but powerful enough that you're not
    > banging your head against limitations. There are a number of
    > other parsing libraries[2] with various domain-specific features
    > and audiences, but I'd go browsing through them only if PyParsing
    > doesn't fill the bill.
    >
    > As you don't detail what you want to do with the content or how
    > pathological the input can be, but you might be able to get away
    > with just skimming through the input and counting open-parens and
    > close-parens, stopping when they've been balanced, skipping lines
    > with comments.


    thanks Tim. following you and Dan's advice i visited
    http://python.fyxm.net/topics/parsing.html and i picked up pyparsing
    after brief reading of descriptions for couple of packages. now that
    you recommended it, seems that i made a good choice.

    btw, the content found will be copied to a new text file.
    Kelie, Oct 4, 2007
    #4
  5. Kelie

    Kelie Guest

    On Oct 4, 7:28 am, Dan <> wrote:
    > So, paren matching is a canonical context-sensitive algorithm. Now,
    > many regex libraries have *some* not-purely-regular features, but I
    > doubt your going to find anything to match parens in a single regex.
    > If you want to go all out you can use a parser generator (for python
    > parser generators, seehttp://python.fyxm.net/topics/parsing.html).
    > Otherwise, you can go about it the quick-and-dirty way you describe:
    > scan for matching open and close parens, and ignore things in quotes
    > and comments.
    >
    > -Dan


    Dan, thanks for suggesting parser generators.
    Kelie, Oct 4, 2007
    #5
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. =?Utf-8?B?Q2FybCBNLg==?=
    Replies:
    4
    Views:
    2,478
    sivlookingforu
    Aug 7, 2006
  2. ekzept
    Replies:
    0
    Views:
    369
    ekzept
    Aug 10, 2007
  3. nanothermite911fbibustards
    Replies:
    0
    Views:
    375
    nanothermite911fbibustards
    Jun 16, 2010
  4. nanothermite911fbibustards
    Replies:
    0
    Views:
    317
    nanothermite911fbibustards
    Jun 16, 2010
  5. Man-wai Chang
    Replies:
    2
    Views:
    570
    Man-wai Chang
    Mar 3, 2012
Loading...

Share This Page