simplest way to strip a comment from the end of a line?


Joe Strout

I have lines in a config file which can end with a comment (delimited
by # as in Python), but which may also contain string literals
(delimited by double quotes). A comment delimiter within a string
literal doesn't count. Is there any easy way to strip off such a
comment, or do I need to use a loop to find each # and then count the
quotation marks to its left?

- Joe


I have lines in a config file which can end with a comment (delimited  
by # as in Python), but which may also contain string literals  
(delimited by double quotes).  A comment delimiter within a string  
literal doesn't count.  Is there any easy way to strip off such a  
comment, or do I need to use a loop to find each # and then count the  
quotation marks to its left?

- Joe


if the string literal you wan't to escape, is not escaped (i.e
contains \" ) then a regexp like

..*?(?:".*?".*?)*#(?P<comment> .*?)$

(not tested)
..*? everything but keep it greedy
".*?" the string literal not escaped

Arnaud Delobelle

Joe Strout said:
I have lines in a config file which can end with a comment (delimited
by # as in Python), but which may also contain string literals
(delimited by double quotes). A comment delimiter within a string
literal doesn't count. Is there any easy way to strip off such a
comment, or do I need to use a loop to find each # and then count the
quotation marks to its left?

- Joe

FWIW this is what comes to mind.
.... i = -1
.... while True:
.... i = line.find('#', i+1)
.... if i == -1:
.... return line
.... if line.count('"', 0, i) % 2 == 0:
.... return line[:i]
.... 'foo="bar\\" baz" # this breaks'

As the last example shows, it won't work if there is an escaped double
quote in the string.



if the string literal you wan't to escape, is not escaped (i.e
contains \" ) then a regexp like

.*?(?:".*?".*?)*#(?P<comment> .*?)$

(not tested)
.*?  everything but keep it greedy
".*?" the string literal not escaped

well it works too

import re

test ='''this is a test 1
this is a test 2 #with a comment
this is a '#gnarlier' test #with a comment
this is a "#gnarlier" test #with a comment

splitter = re.compile(r'(?m)^(?P<data>.*?(".*?".*?)*)(?:#.*?)?$')

def com_strip(text):
return [x[0] for x in splitter.findall(test) ]

# raw implementation
for line in test.split('\n'):
print line, '->', re.match(r'(?P<data>.*?(".*?".*?)*)(?:#.*?)?$',

# with a function
for line in com_strip(test):
print line

and here is the console output

this is a test 1 -> this is a test 1
this is a test 2 #with a comment -> this is a test 2
this is a '#gnarlier' test #with a comment -> this is a '
this is a "#gnarlier" test #with a comment -> this is a "#gnarlier"
this is a test 1
this is a test 2
this is a '
this is a "#gnarlier" test


Using rsplit('#', 1) works for lines *with* comments:
['this is a test']
['this is a test ', 'with a comment']
["this is a '#gnarlier' test ", 'with a comment']

But not if # occurs in lines without comments:
["this is a '", "gnarlier' test"]

/Jean Brouwers

Paul McGuire

Yowza! My eyes glaze over when I see re's like "r'(?m)^(?P<data>.*?

Here's a simple recognizer that reads source code and suppresses
comments. A comment will be a '#' character followed by the rest of
the line. We need the recognizer to also detect quoted strings, so
that any would-be '#' comment introducers that are in a quoted string
*wont* incur the stripping wrath of the recognizer. A quoted string
must be recognized before recognizing a '#' comment introducer.

With our input tests given as:

tests ='''this is a test 1
this is a test 2 #with a comment
this is a '#gnarlier' test #with a comment
this is a "#gnarlier" test #with a comment

here is such a recognizer implemented using pyparsing.

from pyparsing import quotedString, Suppress, restOfLine

comment = Suppress('#' + restOfLine)
recognizer = quotedString | comment

for t in tests:
print t
print recognizer.transformString(t)


this is a test 1
this is a test 1

this is a test 2 #with a comment
this is a test 2

this is a '#gnarlier' test #with a comment
this is a '#gnarlier' test

this is a "#gnarlier" test #with a comment
this is a "#gnarlier" test

For some added fun, add a parse action to quoted strings, to know when
we've really done something interesting:

def detectGnarliness(tokens):
if '#' in tokens[0]:
print "Ooooh, how gnarly! ->", tokens[0]

Now our output becomes:

this is a test 1
this is a test 1

this is a test 2 #with a comment
this is a test 2

this is a '#gnarlier' test #with a comment
Ooooh, how gnarly! -> '#gnarlier'
this is a '#gnarlier' test

this is a "#gnarlier" test #with a comment
Ooooh, how gnarly! -> "#gnarlier"
this is a "#gnarlier" test

-- Paul


Yowza!  My eyes glaze over when I see re's like "r'(?m)^(?P<data>.*?

yeah, I know ... :( ( I love complicated regexp ... it's like a puzzle
game for me)

from pyparsing import quotedString, Suppress, restOfLine

comment = Suppress('#' + restOfLine)
recognizer = quotedString | comment

for t in tests:
    print t
    print recognizer.transformString(t)


this is a test 1
this is a test 1

this is a test 2 #with a comment
this is a test 2

this is a '#gnarlier' test #with a comment
this is a '#gnarlier' test

this is a "#gnarlier" test #with a comment
this is a "#gnarlier" test

For some added fun, add a parse action to quoted strings, to know when
we've really done something interesting:

def detectGnarliness(tokens):
    if '#' in tokens[0]:
        print "Ooooh, how gnarly! ->", tokens[0]

Now our output becomes:

this is a test 1
this is a test 1

this is a test 2 #with a comment
this is a test 2

this is a '#gnarlier' test #with a comment
Ooooh, how gnarly! -> '#gnarlier'
this is a '#gnarlier' test

this is a "#gnarlier" test #with a comment
Ooooh, how gnarly! -> "#gnarlier"
this is a "#gnarlier" test

-- Paul

I didn't knew pyparsing. It's amazing ! thanks


Yowza!  My eyes glaze over when I see re's like "r'(?m)^(?P<data>.*?

yeah, I know ... :( ( I love complicated regexp ... it's like a puzzle
game for me)

from pyparsing import quotedString, Suppress, restOfLine
comment = Suppress('#' + restOfLine)
recognizer = quotedString | comment
for t in tests:
    print t
    print recognizer.transformString(t)

this is a test 1
this is a test 1
this is a test 2 #with a comment
this is a test 2
this is a '#gnarlier' test #with a comment
this is a '#gnarlier' test
this is a "#gnarlier" test #with a comment
this is a "#gnarlier" test
For some added fun, add a parse action to quoted strings, to know when
we've really done something interesting:
def detectGnarliness(tokens):
    if '#' in tokens[0]:
        print "Ooooh, how gnarly! ->", tokens[0]
Now our output becomes:
this is a test 1
this is a test 1
this is a test 2 #with a comment
this is a test 2
this is a '#gnarlier' test #with a comment
Ooooh, how gnarly! -> '#gnarlier'
this is a '#gnarlier' test
this is a "#gnarlier" test #with a comment
Ooooh, how gnarly! -> "#gnarlier"
this is a "#gnarlier" test

I didn't knew pyparsing. It's amazing ! thanks

maybe you'd rather replace:
splitter = re.compile(r'(?m)^(?P<data>.*?(".*?".*?)*)(?:#.*?)?$')


from reO import *
quote = characters('"') # defining the characters used as string sep
sharp= string('#') # defining the sharp symbol
data = ALL + repeat( group( quote + ALL + quote + ALL ) ) # ALL
( "ALL" ALL)*
comment = group(sharp+ALL+END_LINE) # the comment itself

xp = flag(MULTILINE=True) + START_LINE + group( data, name="data") +

splitter = xp.compile()

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Latest member

Latest Threads
