Regular Expression for Finding and Deleting comments

Discussion in 'Python' started by Jeremy, Jan 4, 2011.

  1. Jeremy

    Jeremy Guest

    I am trying to write a regular expression that finds and deletes (replaces with nothing) comments in a string/file. Comments are defined by the first non-whitespace character is a 'c' or a dollar sign somewhere in the line. I want to replace these comments with nothing which isn't too hard. The trouble is, the comments are replaced with a new-line; or the new-line isn't captured in the regular expression.

    Below, I have copied a minimal example. Can someone help?

    Thanks,
    Jeremy


    import re

    text = """ c
    C - Second full line comment (first comment had no text)
    c Third full line comment
    F44:N 2 $ Inline comments start with dollar sign and go to end of line"""

    commentPattern = re.compile("""
    (^\s*?c\s*?.*?| # Comment start with c or C
    \$.*?)$\n # Comment starting with $
    """, re.VERBOSE|re.MULTILINE|re.IGNORECASE)

    found = commentPattern.finditer(text)

    print("\n\nCard:\n--------------\n%s\n------------------" %text)

    if found:
    print("\nI found the following:")
    for f in found: print(f.groups())

    else:
    print("\nNot Found")

    print("\n\nComments replaced with ''")
    replaced = commentPattern.sub('', text)
    print("--------------\n%s\n------------------" %replaced)
     
    Jeremy, Jan 4, 2011
    #1
    1. Advertising

  2. Jeremy

    MRAB Guest

    On 04/01/2011 17:11, Jeremy wrote:
    > I am trying to write a regular expression that finds and deletes (replaces with nothing) comments in a string/file. Comments are defined by the first non-whitespace character is a 'c' or a dollar sign somewhere in the line. I want to replace these comments with nothing which isn't too hard. The trouble is, the comments are replaced with a new-line; or the new-line isn't captured in the regular expression.
    >
    > Below, I have copied a minimal example. Can someone help?
    >
    > Thanks,
    > Jeremy
    >
    >
    > import re
    >
    > text = """ c
    > C - Second full line comment (first comment had no text)
    > c Third full line comment
    > F44:N 2 $ Inline comments start with dollar sign and go to end of line"""
    >
    > commentPattern = re.compile("""
    > (^\s*?c\s*?.*?| # Comment start with c or C
    > \$.*?)$\n # Comment starting with $
    > """, re.VERBOSE|re.MULTILINE|re.IGNORECASE)
    >

    Part of the problem is that you're not using raw string literals or
    doubling the backslashes.

    Try soemthing like this:

    commentPattern = re.compile(r"""
    (^[ \t]*c.*\n| # Comment start with c or C
    [ \t]*\$.*) # Comment starting with $
    """, re.VERBOSE|re.MULTILINE|re.IGNORECASE)

    > found = commentPattern.finditer(text)
    >
    > print("\n\nCard:\n--------------\n%s\n------------------" %text)
    >
    > if found:
    > print("\nI found the following:")
    > for f in found: print(f.groups())
    >
    > else:
    > print("\nNot Found")
    >
    > print("\n\nComments replaced with ''")
    > replaced = commentPattern.sub('', text)
    > print("--------------\n%s\n------------------" %replaced)
    >
     
    MRAB, Jan 4, 2011
    #2
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. VSK
    Replies:
    2
    Views:
    2,303
  2. katy28
    Replies:
    0
    Views:
    3,470
    katy28
    Feb 27, 2008
  3. Replies:
    5
    Views:
    1,011
  4. Jeremy
    Replies:
    1
    Views:
    355
  5. PerlFAQ Server
    Replies:
    0
    Views:
    174
    PerlFAQ Server
    Feb 10, 2011
Loading...

Share This Page