replace c-style comments with newlines (regexp)

Discussion in 'Python' started by lex __, Dec 21, 2007.

  1. lex __

    lex __ Guest

    I'm tryin to use regexp to replace multi-line c-style comments (like /* this /n */ ) with /n (newlines).
    I tried someting like re.sub('/\*(.*)/\*' , '/n' , file)
    but it doesn't work for multiple lines.

    besides that I want to keep all newlines as they were in the original file, so I can still use the original linenumbers (I want to use linenumbers as a reference for later use.)
    I know that that will complicate things a bit more, so this is a bit less important.

    background: I'm trying to create a 'intelligent' source-code security analysis tool for c/c++ , python and php files, but filtering the comments seems to be the biggest problem. :(

    So, if you have an answer to this , please let me know how to do this!

    thanks in advance,
    - Alex



    _________________________________________________________________
    Download de nieuwe Windows Live Messenger!
    http://get.live.com/messenger/overview
     
    lex __, Dec 21, 2007
    #1
    1. Advertisements


  2. Regexes won't cross line boundaries unless you make them multiline with
    re.MULTILINE.

    Also, I'm no expert on regexes, but it looks to me that your regex is
    greedy. I think you need the non-greedy version, which by memory (and
    completely untested) is something like this:

    rx = re.compile('/\*(.*?)/\*', re.MULTILINE)


    Have you considered what happens when your C code includes a string
    literal containing '/*'?


    "Some people, when confronted with a problem, think “I know, I’ll use
    regular expressions.†Now they have two problems."
    -- Jamie Zawinski, in comp.lang.emacs
     
    Steven D'Aprano, Dec 21, 2007
    #2
    1. Advertisements

  3. lex __

    Peter Otten Guest

    re.MULTILINE affects the behaviour of ^ and $, the relevant flag is re.DOTALL:
    ['a', 'b\nb', 'c/*c']
    .... return "\n" * match.group(1).count("\n")
    .... 'A BB \n CCC '
    Indeed.

    Peter
     
    Peter Otten, Dec 21, 2007
    #3
  4. lex __

    Neil Cerutti Guest

    There are free C lexers and parsers available (e.g., gcc). I
    recommend them to you. Gluing a real C parser into your Python
    code might be easier than writing one. Not that it's impossible
    to discover C comments with your own special-purpose, simple
    parser (see Exercise 1-23 in K&R _The C Programming Language 2nd
    Edition_), but it's not remotely doable with a regex.
     
    Neil Cerutti, Dec 21, 2007
    #4
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.