using regular express to analyze lisp code

Discussion in 'Python' started by Kelie, Oct 4, 2007.

  1. Kelie

    Kelie Guest


    i've spent couple of hours trying to figure out the correct regular
    expression to catch a VisualLisp (it is for AutoCAD and has a syntax
    that's similar to common lisp) function body. VisualLisp is case-
    insensitive. Any line beginning with ";" is for comment (can have
    space(s) before ";").

    here is an example of VisualLisp function:

    (defun get_obj_app_names (obj / rv)
    (foreach app (get_registered_apps (vla-get-document obj))
    (if (get_xdata obj app)
    (setq rv (cons app rv))
    (if rv
    ;;"This line is comment (comment)"
    ;;) This line is also comment
    (acad_strlsort rv)

    for a function named foo, it is easy to find the beginning part of the
    "(defun foo", but it is hard to find the ")" at the end of code block.
    if eventually i can't come up with the solution using regular
    expression only, what i was thinking is after finding the beginning
    part, which is "(defun foo" in this case, i can count the parenthesis,
    ignoring anything inside "" and any line for comment, until i find the
    closing ")".

    not sure if i've made myself understood. thanks for reading.

    Kelie, Oct 4, 2007
    1. Advertisements

  2. Kelie

    Dan Guest

    So, paren matching is a canonical context-sensitive algorithm. Now,
    many regex libraries have *some* not-purely-regular features, but I
    doubt your going to find anything to match parens in a single regex.
    If you want to go all out you can use a parser generator (for python
    parser generators, see
    Otherwise, you can go about it the quick-and-dirty way you describe:
    scan for matching open and close parens, and ignore things in quotes
    and comments.

    Dan, Oct 4, 2007
    1. Advertisements

  3. Kelie

    Tim Chase Guest

    i've spent couple of hours trying to figure out the correct regular

    Some people, when confronted with a problem, think
    "I know, I'll use regular expressions!"
    Now they have two problems

    Regular expressions are a wonderful tool when the domain is
    correct. However, when your domain involves processing
    arbitrarily nested syntax, regexps are not your friend. It is
    sometimes feasible to mung them into a fixed-depth-nesting
    parser, but it's always fairly painful, and the fixed-depth is an
    annoying limitation.

    Use a parsing lib. I've tinkered a bit with PyParsing[1] which
    is fairly easy to pick up, but powerful enough that you're not
    banging your head against limitations. There are a number of
    other parsing libraries[2] with various domain-specific features
    and audiences, but I'd go browsing through them only if PyParsing
    doesn't fill the bill.

    As you don't detail what you want to do with the content or how
    pathological the input can be, but you might be able to get away
    with just skimming through the input and counting open-parens and
    close-parens, stopping when they've been balanced, skipping lines
    with comments.


    Tim Chase, Oct 4, 2007
  4. Kelie

    Kelie Guest

    thanks Tim. following you and Dan's advice i visited and i picked up pyparsing
    after brief reading of descriptions for couple of packages. now that
    you recommended it, seems that i made a good choice.

    btw, the content found will be copied to a new text file.
    Kelie, Oct 4, 2007
  5. Kelie

    Kelie Guest

    Dan, thanks for suggesting parser generators.
    Kelie, Oct 4, 2007
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.