replace c-style comments with newlines (regexp)

Discussion in 'Python' started by lex __, Dec 21, 2007.

  1. lex __

    lex __ Guest

    I'm tryin to use regexp to replace multi-line c-style comments (like /* this /n */ ) with /n (newlines).
    I tried someting like re.sub('/\*(.*)/\*' , '/n' , file)
    but it doesn't work for multiple lines.

    besides that I want to keep all newlines as they were in the original file, so I can still use the original linenumbers (I want to use linenumbers as a reference for later use.)
    I know that that will complicate things a bit more, so this is a bit less important.

    background: I'm trying to create a 'intelligent' source-code security analysis tool for c/c++ , python and php files, but filtering the comments seems to be the biggest problem. :(

    So, if you have an answer to this , please let me know how to do this!

    thanks in advance,
    - Alex

    Download de nieuwe Windows Live Messenger!
    lex __, Dec 21, 2007
    1. Advertisements

  2. Regexes won't cross line boundaries unless you make them multiline with

    Also, I'm no expert on regexes, but it looks to me that your regex is
    greedy. I think you need the non-greedy version, which by memory (and
    completely untested) is something like this:

    rx = re.compile('/\*(.*?)/\*', re.MULTILINE)

    Have you considered what happens when your C code includes a string
    literal containing '/*'?

    "Some people, when confronted with a problem, think “I know, I’ll use
    regular expressions.†Now they have two problems."
    -- Jamie Zawinski, in comp.lang.emacs
    Steven D'Aprano, Dec 21, 2007
    1. Advertisements

  3. lex __

    Peter Otten Guest

    re.MULTILINE affects the behaviour of ^ and $, the relevant flag is re.DOTALL:
    ['a', 'b\nb', 'c/*c']
    .... return "\n" *"\n")
    .... 'A BB \n CCC '

    Peter Otten, Dec 21, 2007
  4. lex __

    Neil Cerutti Guest

    There are free C lexers and parsers available (e.g., gcc). I
    recommend them to you. Gluing a real C parser into your Python
    code might be easier than writing one. Not that it's impossible
    to discover C comments with your own special-purpose, simple
    parser (see Exercise 1-23 in K&R _The C Programming Language 2nd
    Edition_), but it's not remotely doable with a regex.
    Neil Cerutti, Dec 21, 2007
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.