simplest way to strip a comment from the end of a line?

J

Joe Strout

I have lines in a config file which can end with a comment (delimited
by # as in Python), but which may also contain string literals
(delimited by double quotes). A comment delimiter within a string
literal doesn't count. Is there any easy way to strip off such a
comment, or do I need to use a loop to find each # and then count the
quotation marks to its left?

Thanks,
- Joe
 
E

eric

I have lines in a config file which can end with a comment (delimited  
by # as in Python), but which may also contain string literals  
(delimited by double quotes).  A comment delimiter within a string  
literal doesn't count.  Is there any easy way to strip off such a  
comment, or do I need to use a loop to find each # and then count the  
quotation marks to its left?

Thanks,
- Joe

Hi,

if the string literal you wan't to escape, is not escaped (i.e
contains \" ) then a regexp like

..*?(?:".*?".*?)*#(?P<comment> .*?)$

(not tested)
..*? everything but keep it greedy
".*?" the string literal not escaped
 
A

Arnaud Delobelle

Joe Strout said:
I have lines in a config file which can end with a comment (delimited
by # as in Python), but which may also contain string literals
(delimited by double quotes). A comment delimiter within a string
literal doesn't count. Is there any easy way to strip off such a
comment, or do I need to use a loop to find each # and then count the
quotation marks to its left?

Thanks,
- Joe

FWIW this is what comes to mind.
.... i = -1
.... while True:
.... i = line.find('#', i+1)
.... if i == -1:
.... return line
.... if line.count('"', 0, i) % 2 == 0:
.... return line[:i]
.... 'foo="bar\\" baz" # this breaks'

As the last example shows, it won't work if there is an escaped double
quote in the string.
 
E

eric

Hi,

if the string literal you wan't to escape, is not escaped (i.e
contains \" ) then a regexp like

.*?(?:".*?".*?)*#(?P<comment> .*?)$

(not tested)
.*?  everything but keep it greedy
".*?" the string literal not escaped


well it works too

import re

test ='''this is a test 1
this is a test 2 #with a comment
this is a '#gnarlier' test #with a comment
this is a "#gnarlier" test #with a comment
'''

splitter = re.compile(r'(?m)^(?P<data>.*?(".*?".*?)*)(?:#.*?)?$')

def com_strip(text):
return [x[0] for x in splitter.findall(test) ]

# raw implementation
for line in test.split('\n'):
print line, '->', re.match(r'(?P<data>.*?(".*?".*?)*)(?:#.*?)?$',
line).group("data")

# with a function
for line in com_strip(test):
print line

and here is the console output

this is a test 1 -> this is a test 1
this is a test 2 #with a comment -> this is a test 2
this is a '#gnarlier' test #with a comment -> this is a '
this is a "#gnarlier" test #with a comment -> this is a "#gnarlier"
test
->
this is a test 1
this is a test 2
this is a '
this is a "#gnarlier" test
 
M

MrJean1

Using rsplit('#', 1) works for lines *with* comments:
['this is a test']
['this is a test ', 'with a comment']
["this is a '#gnarlier' test ", 'with a comment']


But not if # occurs in lines without comments:
["this is a '", "gnarlier' test"]


/Jean Brouwers
 
P

Paul McGuire

Yowza! My eyes glaze over when I see re's like "r'(?m)^(?P<data>.*?
(".*?".*?)*)(?:#.*?)?$"!

Here's a simple recognizer that reads source code and suppresses
comments. A comment will be a '#' character followed by the rest of
the line. We need the recognizer to also detect quoted strings, so
that any would-be '#' comment introducers that are in a quoted string
*wont* incur the stripping wrath of the recognizer. A quoted string
must be recognized before recognizing a '#' comment introducer.

With our input tests given as:

tests ='''this is a test 1
this is a test 2 #with a comment
this is a '#gnarlier' test #with a comment
this is a "#gnarlier" test #with a comment
'''.splitlines()

here is such a recognizer implemented using pyparsing.


from pyparsing import quotedString, Suppress, restOfLine

comment = Suppress('#' + restOfLine)
recognizer = quotedString | comment

for t in tests:
print t
print recognizer.transformString(t)
print


Prints:

this is a test 1
this is a test 1

this is a test 2 #with a comment
this is a test 2

this is a '#gnarlier' test #with a comment
this is a '#gnarlier' test

this is a "#gnarlier" test #with a comment
this is a "#gnarlier" test


For some added fun, add a parse action to quoted strings, to know when
we've really done something interesting:

def detectGnarliness(tokens):
if '#' in tokens[0]:
print "Ooooh, how gnarly! ->", tokens[0]
quotedString.setParseAction(detectGnarliness)

Now our output becomes:

this is a test 1
this is a test 1

this is a test 2 #with a comment
this is a test 2

this is a '#gnarlier' test #with a comment
Ooooh, how gnarly! -> '#gnarlier'
this is a '#gnarlier' test

this is a "#gnarlier" test #with a comment
Ooooh, how gnarly! -> "#gnarlier"
this is a "#gnarlier" test


-- Paul
 
E

eric

Yowza!  My eyes glaze over when I see re's like "r'(?m)^(?P<data>.*?
(".*?".*?)*)(?:#.*?)?$"!

yeah, I know ... :( ( I love complicated regexp ... it's like a puzzle
game for me)

from pyparsing import quotedString, Suppress, restOfLine

comment = Suppress('#' + restOfLine)
recognizer = quotedString | comment

for t in tests:
    print t
    print recognizer.transformString(t)
    print

Prints:

this is a test 1
this is a test 1

this is a test 2 #with a comment
this is a test 2

this is a '#gnarlier' test #with a comment
this is a '#gnarlier' test

this is a "#gnarlier" test #with a comment
this is a "#gnarlier" test

For some added fun, add a parse action to quoted strings, to know when
we've really done something interesting:

def detectGnarliness(tokens):
    if '#' in tokens[0]:
        print "Ooooh, how gnarly! ->", tokens[0]
quotedString.setParseAction(detectGnarliness)

Now our output becomes:

this is a test 1
this is a test 1

this is a test 2 #with a comment
this is a test 2

this is a '#gnarlier' test #with a comment
Ooooh, how gnarly! -> '#gnarlier'
this is a '#gnarlier' test

this is a "#gnarlier" test #with a comment
Ooooh, how gnarly! -> "#gnarlier"
this is a "#gnarlier" test

-- Paul


I didn't knew pyparsing. It's amazing ! thanks
 
E

eric

Yowza!  My eyes glaze over when I see re's like "r'(?m)^(?P<data>.*?
(".*?".*?)*)(?:#.*?)?$"!

yeah, I know ... :( ( I love complicated regexp ... it's like a puzzle
game for me)


from pyparsing import quotedString, Suppress, restOfLine
comment = Suppress('#' + restOfLine)
recognizer = quotedString | comment
for t in tests:
    print t
    print recognizer.transformString(t)
    print

this is a test 1
this is a test 1
this is a test 2 #with a comment
this is a test 2
this is a '#gnarlier' test #with a comment
this is a '#gnarlier' test
this is a "#gnarlier" test #with a comment
this is a "#gnarlier" test
For some added fun, add a parse action to quoted strings, to know when
we've really done something interesting:
def detectGnarliness(tokens):
    if '#' in tokens[0]:
        print "Ooooh, how gnarly! ->", tokens[0]
quotedString.setParseAction(detectGnarliness)
Now our output becomes:
this is a test 1
this is a test 1
this is a test 2 #with a comment
this is a test 2
this is a '#gnarlier' test #with a comment
Ooooh, how gnarly! -> '#gnarlier'
this is a '#gnarlier' test
this is a "#gnarlier" test #with a comment
Ooooh, how gnarly! -> "#gnarlier"
this is a "#gnarlier" test

I didn't knew pyparsing. It's amazing ! thanks


maybe you'd rather replace:
splitter = re.compile(r'(?m)^(?P<data>.*?(".*?".*?)*)(?:#.*?)?$')

by

from reO import *
quote = characters('"') # defining the characters used as string sep
sharp= string('#') # defining the sharp symbol
data = ALL + repeat( group( quote + ALL + quote + ALL ) ) # ALL
( "ALL" ALL)*
comment = group(sharp+ALL+END_LINE) # the comment itself

xp = flag(MULTILINE=True) + START_LINE + group( data, name="data") +
if_exists(comment)

splitter = xp.compile()
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top