simplest way to strip a comment from the end of a line?

Joe Strout · Dec 4, 2008

I have lines in a config file which can end with a comment (delimited
by # as in Python), but which may also contain string literals
(delimited by double quotes). A comment delimiter within a string
literal doesn't count. Is there any easy way to strip off such a
comment, or do I need to use a loop to find each # and then count the
quotation marks to its left?

Thanks,
- Joe

eric · Dec 4, 2008

I have lines in a config file which can end with a comment (delimited
by # as in Python), but which may also contain string literals
(delimited by double quotes). A comment delimiter within a string
literal doesn't count. Is there any easy way to strip off such a
comment, or do I need to use a loop to find each # and then count the
quotation marks to its left?

Thanks,
- Joe

Hi,

if the string literal you wan't to escape, is not escaped (i.e
contains \" ) then a regexp like

..*?(?:".*?".*?)*#(?P<comment> .*?)$

(not tested)
..*? everything but keep it greedy
".*?" the string literal not escaped

Arnaud Delobelle · Dec 4, 2008

Joe Strout said:
I have lines in a config file which can end with a comment (delimited
by # as in Python), but which may also contain string literals
(delimited by double quotes). A comment delimiter within a string
literal doesn't count. Is there any easy way to strip off such a
comment, or do I need to use a loop to find each # and then count the
quotation marks to its left?

Thanks,
- Joe

FWIW this is what comes to mind.
.... i = -1
.... while True:
.... i = line.find('#', i+1)
.... if i == -1:
.... return line
.... if line.count('"', 0, i) % 2 == 0:
.... return line[:i]
.... 'foo="bar\\" baz" # this breaks'

As the last example shows, it won't work if there is an escaped double
quote in the string.

eric · Dec 4, 2008

Hi,

if the string literal you wan't to escape, is not escaped (i.e
contains \" ) then a regexp like

.*?(?:".*?".*?)*#(?P<comment> .*?)$

(not tested)
.*? everything but keep it greedy
".*?" the string literal not escaped

well it works too

import re

test ='''this is a test 1
this is a test 2 #with a comment
this is a '#gnarlier' test #with a comment
this is a "#gnarlier" test #with a comment
'''

splitter = re.compile(r'(?m)^(?P<data>.*?(".*?".*?)*)(?:#.*?)?$')

def com_strip(text):
return [x[0] for x in splitter.findall(test) ]

# raw implementation
for line in test.split('\n'):
print line, '->', re.match(r'(?P<data>.*?(".*?".*?)*)(?:#.*?)?$',
line).group("data")

# with a function
for line in com_strip(test):
print line

and here is the console output

this is a test 1 -> this is a test 1
this is a test 2 #with a comment -> this is a test 2
this is a '#gnarlier' test #with a comment -> this is a '
this is a "#gnarlier" test #with a comment -> this is a "#gnarlier"
test
->
this is a test 1
this is a test 2
this is a '
this is a "#gnarlier" test

MrJean1 · Dec 4, 2008

Using rsplit('#', 1) works for lines *with* comments:
['this is a test']
['this is a test ', 'with a comment']
["this is a '#gnarlier' test ", 'with a comment']

But not if # occurs in lines without comments:
["this is a '", "gnarlier' test"]

/Jean Brouwers

Paul McGuire · Dec 4, 2008

Yowza! My eyes glaze over when I see re's like "r'(?m)^(?P<data>.*?
(".*?".*?)*)(?:#.*?)?$"!

Here's a simple recognizer that reads source code and suppresses
comments. A comment will be a '#' character followed by the rest of
the line. We need the recognizer to also detect quoted strings, so
that any would-be '#' comment introducers that are in a quoted string
*wont* incur the stripping wrath of the recognizer. A quoted string
must be recognized before recognizing a '#' comment introducer.

With our input tests given as:

tests ='''this is a test 1
this is a test 2 #with a comment
this is a '#gnarlier' test #with a comment
this is a "#gnarlier" test #with a comment
'''.splitlines()

here is such a recognizer implemented using pyparsing.

from pyparsing import quotedString, Suppress, restOfLine

comment = Suppress('#' + restOfLine)
recognizer = quotedString | comment

for t in tests:
print t
print recognizer.transformString(t)
print

Prints:

this is a test 1
this is a test 1

this is a test 2 #with a comment
this is a test 2

this is a '#gnarlier' test #with a comment
this is a '#gnarlier' test

this is a "#gnarlier" test #with a comment
this is a "#gnarlier" test

For some added fun, add a parse action to quoted strings, to know when
we've really done something interesting:

def detectGnarliness(tokens):
if '#' in tokens[0]:
print "Ooooh, how gnarly! ->", tokens[0]
quotedString.setParseAction(detectGnarliness)

Now our output becomes:

this is a test 1
this is a test 1

this is a test 2 #with a comment
this is a test 2

this is a '#gnarlier' test #with a comment
Ooooh, how gnarly! -> '#gnarlier'
this is a '#gnarlier' test

this is a "#gnarlier" test #with a comment
Ooooh, how gnarly! -> "#gnarlier"
this is a "#gnarlier" test

-- Paul

eric · Dec 5, 2008

Yowza! My eyes glaze over when I see re's like "r'(?m)^(?P<data>.*?
(".*?".*?)*)(?:#.*?)?$"!

yeah, I know ...

( I love complicated regexp ... it's like a puzzle
game for me)

from pyparsing import quotedString, Suppress, restOfLine

comment = Suppress('#' + restOfLine)
recognizer = quotedString | comment

for t in tests:
print t
print recognizer.transformString(t)
print

Prints:

this is a test 1
this is a test 1

this is a test 2 #with a comment
this is a test 2

this is a '#gnarlier' test #with a comment
this is a '#gnarlier' test

this is a "#gnarlier" test #with a comment
this is a "#gnarlier" test

For some added fun, add a parse action to quoted strings, to know when
we've really done something interesting:

def detectGnarliness(tokens):
if '#' in tokens[0]:
print "Ooooh, how gnarly! ->", tokens[0]
quotedString.setParseAction(detectGnarliness)

Now our output becomes:

this is a test 1
this is a test 1

this is a test 2 #with a comment
this is a test 2

this is a '#gnarlier' test #with a comment
Ooooh, how gnarly! -> '#gnarlier'
this is a '#gnarlier' test

this is a "#gnarlier" test #with a comment
Ooooh, how gnarly! -> "#gnarlier"
this is a "#gnarlier" test

-- Paul

I didn't knew pyparsing. It's amazing ! thanks

eric · Dec 5, 2008

Yowza! My eyes glaze over when I see re's like "r'(?m)^(?P<data>.*?
(".*?".*?)*)(?:#.*?)?$"!

Click to expand...

yeah, I know ... ( I love complicated regexp ... it's like a puzzle
game for me)

from pyparsing import quotedString, Suppress, restOfLine

Click to expand...

comment = Suppress('#' + restOfLine)
recognizer = quotedString | comment

Click to expand...

for t in tests:
print t
print recognizer.transformString(t)
print

this is a test 1
this is a test 1

Click to expand...

this is a test 2 #with a comment
this is a test 2

Click to expand...

this is a '#gnarlier' test #with a comment
this is a '#gnarlier' test

Click to expand...

this is a "#gnarlier" test #with a comment
this is a "#gnarlier" test

Click to expand...

For some added fun, add a parse action to quoted strings, to know when
we've really done something interesting:

Click to expand...

def detectGnarliness(tokens):
if '#' in tokens[0]:
print "Ooooh, how gnarly! ->", tokens[0]
quotedString.setParseAction(detectGnarliness)

Click to expand...

Now our output becomes:

Click to expand...

this is a test 1
this is a test 1

Click to expand...

this is a test 2 #with a comment
this is a test 2

Click to expand...

this is a '#gnarlier' test #with a comment
Ooooh, how gnarly! -> '#gnarlier'
this is a '#gnarlier' test

Click to expand...

this is a "#gnarlier" test #with a comment
Ooooh, how gnarly! -> "#gnarlier"
this is a "#gnarlier" test

Click to expand...

-- Paul

Click to expand...

I didn't knew pyparsing. It's amazing ! thanks

maybe you'd rather replace:
splitter = re.compile(r'(?m)^(?P<data>.*?(".*?".*?)*)(?:#.*?)?$')

by

from reO import *
quote = characters('"') # defining the characters used as string sep
sharp= string('#') # defining the sharp symbol
data = ALL + repeat( group( quote + ALL + quote + ALL ) ) # ALL
( "ALL" ALL)*
comment = group(sharp+ALL+END_LINE) # the comment itself

xp = flag(MULTILINE=True) + START_LINE + group( data, name="data") +
if_exists(comment)

splitter = xp.compile()

Is there a way to add strings to a list without the quotation marks in C++?	1	Nov 9, 2020
What's the very simplest way to run some Python from a button on a web page?	3	Jan 21, 2012
FAQ 4.32 How do I strip blank space from the beginning/end of a string?	0	Feb 25, 2011
I am having trouble finding a method of using the git enterprise api to scrape data from projects	1	Jun 1, 2023
How to try a range of hex values in C# code ?	0	Nov 19, 2022
How to bind data of mysql from existing iframe into a new iframe on the same webpage	1	Oct 26, 2022
Whats the simplest way doing a resizing/shrinking proportional to thePX of the screen	30	Oct 20, 2011
Text file with mixed end-of-line terminations	2	Aug 31, 2011

simplest way to strip a comment from the end of a line?

Joe Strout

eric

Arnaud Delobelle

eric

MrJean1

Paul McGuire

eric

eric

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads