replace c-style comments with newlines (regexp)

Discussion in 'Python' started by lex __, Dec 21, 2007.

  1. lex __

    lex __ Guest

    I'm tryin to use regexp to replace multi-line c-style comments (like /* this /n */ ) with /n (newlines).
    I tried someting like re.sub('/\*(.*)/\*' , '/n' , file)
    but it doesn't work for multiple lines.

    besides that I want to keep all newlines as they were in the original file, so I can still use the original linenumbers (I want to use linenumbers as a reference for later use.)
    I know that that will complicate things a bit more, so this is a bit less important.

    background: I'm trying to create a 'intelligent' source-code security analysis tool for c/c++ , python and php files, but filtering the comments seems to be the biggest problem. :(

    So, if you have an answer to this , please let me know how to do this!

    thanks in advance,
    - Alex



    _________________________________________________________________
    Download de nieuwe Windows Live Messenger!
    http://get.live.com/messenger/overview
     
    lex __, Dec 21, 2007
    #1
    1. Advertising

  2. On Fri, 21 Dec 2007 00:00:47 +0000, lex __ wrote:

    > I'm tryin to use regexp to replace multi-line c-style comments (like /*
    > this /n */ ) with /n (newlines). I tried someting like
    > re.sub('/\*(.*)/\*' , '/n' , file) but it doesn't work for multiple
    > lines.



    Regexes won't cross line boundaries unless you make them multiline with
    re.MULTILINE.

    Also, I'm no expert on regexes, but it looks to me that your regex is
    greedy. I think you need the non-greedy version, which by memory (and
    completely untested) is something like this:

    rx = re.compile('/\*(.*?)/\*', re.MULTILINE)


    Have you considered what happens when your C code includes a string
    literal containing '/*'?


    "Some people, when confronted with a problem, think “I know, I’ll use
    regular expressions.†Now they have two problems."
    -- Jamie Zawinski, in comp.lang.emacs



    --
    Steven.
     
    Steven D'Aprano, Dec 21, 2007
    #2
    1. Advertising

  3. lex __

    Peter Otten Guest

    Steven D'Aprano wrote:

    > On Fri, 21 Dec 2007 00:00:47 +0000, lex __ wrote:
    >
    >> I'm tryin to use regexp to replace multi-line c-style comments (like /*
    >> this /n */ ) with /n (newlines). I tried someting like
    >> re.sub('/\*(.*)/\*' , '/n' , file) but it doesn't work for multiple
    >> lines.


    > Regexes won't cross line boundaries unless you make them multiline with
    > re.MULTILINE.


    re.MULTILINE affects the behaviour of ^ and $, the relevant flag is re.DOTALL:

    > Also, I'm no expert on regexes, but it looks to me that your regex is
    > greedy. I think you need the non-greedy version, which by memory (and


    >>> re.compile("/\*(.*?)\*/", re.DOTALL).findall("/*a*/ /*b\nb*/ /*c/*c*/")

    ['a', 'b\nb', 'c/*c']

    >>> def replace(match):

    .... return "\n" * match.group(1).count("\n")
    ....
    >>> re.compile(r"(/\*.*?\*/)", re.DOTALL).sub(replace, "A /*a*/ BB /*b\nb*/ CCC /*c/*c*/")

    'A BB \n CCC '

    > Have you considered what happens when your C code includes a string
    > literal containing '/*'?


    Indeed.

    Peter
     
    Peter Otten, Dec 21, 2007
    #3
  4. lex __

    Neil Cerutti Guest

    On 2007-12-21, lex __ <> wrote:
    > I'm tryin to use regexp to replace multi-line c-style comments
    > (like /* this /n */ ) with /n (newlines). I tried someting
    > like re.sub('/\*(.*)/\*' , '/n' , file) but it doesn't
    > work for multiple lines.
    >
    > besides that I want to keep all newlines as they were in the
    > original file, so I can still use the original linenumbers (I
    > want to use linenumbers as a reference for later use.) I know
    > that that will complicate things a bit more, so this is a bit
    > less important.
    >
    > background: I'm trying to create a 'intelligent' source-code
    > security analysis tool for c/c++ , python and php files, but
    > filtering the comments seems to be the biggest problem. :(
    >
    > So, if you have an answer to this , please let me know how to
    > do this!


    There are free C lexers and parsers available (e.g., gcc). I
    recommend them to you. Gluing a real C parser into your Python
    code might be easier than writing one. Not that it's impossible
    to discover C comments with your own special-purpose, simple
    parser (see Exercise 1-23 in K&R _The C Programming Language 2nd
    Edition_), but it's not remotely doable with a regex.

    --
    Neil Cerutti
     
    Neil Cerutti, Dec 21, 2007
    #4
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Replies:
    0
    Views:
    1,141
  2. Replies:
    4
    Views:
    583
  3. Florian Schulze

    problem with newlines in regexp substitution

    Florian Schulze, Feb 23, 2006, in forum: Python
    Replies:
    1
    Views:
    273
    James Stroud
    Feb 23, 2006
  4. Joao Silva
    Replies:
    16
    Views:
    366
    7stud --
    Aug 21, 2009
  5. Replies:
    4
    Views:
    618
    Dr John Stockton
    Jun 3, 2006
Loading...

Share This Page