simplest way to strip a comment from the end of a line?

Discussion in 'Python' started by Joe Strout, Dec 4, 2008.

  1. Joe Strout

    Joe Strout Guest

    I have lines in a config file which can end with a comment (delimited
    by # as in Python), but which may also contain string literals
    (delimited by double quotes). A comment delimiter within a string
    literal doesn't count. Is there any easy way to strip off such a
    comment, or do I need to use a loop to find each # and then count the
    quotation marks to its left?

    Thanks,
    - Joe
     
    Joe Strout, Dec 4, 2008
    #1
    1. Advertising

  2. Joe Strout

    eric Guest

    On Dec 4, 4:50 pm, Joe Strout <> wrote:
    > I have lines in a config file which can end with a comment (delimited  
    > by # as in Python), but which may also contain string literals  
    > (delimited by double quotes).  A comment delimiter within a string  
    > literal doesn't count.  Is there any easy way to strip off such a  
    > comment, or do I need to use a loop to find each # and then count the  
    > quotation marks to its left?
    >
    > Thanks,
    > - Joe


    Hi,

    if the string literal you wan't to escape, is not escaped (i.e
    contains \" ) then a regexp like

    ..*?(?:".*?".*?)*#(?P<comment> .*?)$

    (not tested)
    ..*? everything but keep it greedy
    ".*?" the string literal not escaped
     
    eric, Dec 4, 2008
    #2
    1. Advertising

  3. Joe Strout <> writes:

    > I have lines in a config file which can end with a comment (delimited
    > by # as in Python), but which may also contain string literals
    > (delimited by double quotes). A comment delimiter within a string
    > literal doesn't count. Is there any easy way to strip off such a
    > comment, or do I need to use a loop to find each # and then count the
    > quotation marks to its left?
    >
    > Thanks,
    > - Joe


    FWIW this is what comes to mind.

    >>> def strip_comment(line):

    .... i = -1
    .... while True:
    .... i = line.find('#', i+1)
    .... if i == -1:
    .... return line
    .... if line.count('"', 0, i) % 2 == 0:
    .... return line[:i]
    ....
    >>> strip_comment('foo=1 # set foo')

    'foo=1 '
    >>> strip_comment('foo="bar" # set foo')

    'foo="bar" '
    >>> strip_comment('foo="bar # set foo"')

    'foo="bar # set foo"'
    >>> strip_comment('foo="bar # set foo" # set foo')

    'foo="bar # set foo" '
    >>> strip_comment('foo="bar # set foo" + "baz ## fubar" # set foo')

    'foo="bar # set foo" + "baz ## fubar" '
    >>> strip_comment('foo="bar # set foo" + "baz ## fubar # set foo"')

    'foo="bar # set foo" + "baz ## fubar # set foo"'
    >>> strip_comment(r'foo="bar\" baz" # this breaks')

    'foo="bar\\" baz" # this breaks'

    As the last example shows, it won't work if there is an escaped double
    quote in the string.

    --
    Arnaud
     
    Arnaud Delobelle, Dec 4, 2008
    #3
  4. Joe Strout

    eric Guest

    On Dec 4, 5:15 pm, eric <> wrote:
    > On Dec 4, 4:50 pm, Joe Strout <> wrote:
    >
    > > I have lines in a config file which can end with a comment (delimited  
    > > by # as in Python), but which may also contain string literals  
    > > (delimited by double quotes).  A comment delimiter within a string  
    > > literal doesn't count.  Is there any easy way to strip off such a  
    > > comment, or do I need to use a loop to find each # and then count the  
    > > quotation marks to its left?

    >
    > > Thanks,
    > > - Joe

    >
    > Hi,
    >
    > if the string literal you wan't to escape, is not escaped (i.e
    > contains \" ) then a regexp like
    >
    > .*?(?:".*?".*?)*#(?P<comment> .*?)$
    >
    > (not tested)
    > .*?  everything but keep it greedy
    > ".*?" the string literal not escaped



    well it works too

    import re

    test ='''this is a test 1
    this is a test 2 #with a comment
    this is a '#gnarlier' test #with a comment
    this is a "#gnarlier" test #with a comment
    '''

    splitter = re.compile(r'(?m)^(?P<data>.*?(".*?".*?)*)(?:#.*?)?$')

    def com_strip(text):
    return [x[0] for x in splitter.findall(test) ]

    # raw implementation
    for line in test.split('\n'):
    print line, '->', re.match(r'(?P<data>.*?(".*?".*?)*)(?:#.*?)?$',
    line).group("data")

    # with a function
    for line in com_strip(test):
    print line

    and here is the console output

    this is a test 1 -> this is a test 1
    this is a test 2 #with a comment -> this is a test 2
    this is a '#gnarlier' test #with a comment -> this is a '
    this is a "#gnarlier" test #with a comment -> this is a "#gnarlier"
    test
    ->
    this is a test 1
    this is a test 2
    this is a '
    this is a "#gnarlier" test
     
    eric, Dec 4, 2008
    #4
  5. Joe Strout

    MrJean1 Guest

    Using rsplit('#', 1) works for lines *with* comments:

    >>> 'this is a test'.rsplit('#', 1)

    ['this is a test']

    >>> 'this is a test #with a comment'.rsplit('#', 1)

    ['this is a test ', 'with a comment']

    >>> "this is a '#gnarlier' test #with a comment".rsplit('#', 1)

    ["this is a '#gnarlier' test ", 'with a comment']


    But not if # occurs in lines without comments:

    >>> "this is a '#gnarlier' test".rsplit('#', 1)

    ["this is a '", "gnarlier' test"]


    /Jean Brouwers



    On Dec 4, 7:50 am, Joe Strout <> wrote:
    > I have lines in a config file which can end with a comment (delimited  
    > by # as in Python), but which may also contain string literals  
    > (delimited by double quotes).  A comment delimiter within a string  
    > literal doesn't count.  Is there any easy way to strip off such a  
    > comment, or do I need to use a loop to find each # and then count the  
    > quotation marks to its left?
    >
    > Thanks,
    > - Joe
     
    MrJean1, Dec 4, 2008
    #5
  6. Joe Strout

    Paul McGuire Guest

    Yowza! My eyes glaze over when I see re's like "r'(?m)^(?P<data>.*?
    (".*?".*?)*)(?:#.*?)?$"!

    Here's a simple recognizer that reads source code and suppresses
    comments. A comment will be a '#' character followed by the rest of
    the line. We need the recognizer to also detect quoted strings, so
    that any would-be '#' comment introducers that are in a quoted string
    *wont* incur the stripping wrath of the recognizer. A quoted string
    must be recognized before recognizing a '#' comment introducer.

    With our input tests given as:

    tests ='''this is a test 1
    this is a test 2 #with a comment
    this is a '#gnarlier' test #with a comment
    this is a "#gnarlier" test #with a comment
    '''.splitlines()

    here is such a recognizer implemented using pyparsing.


    from pyparsing import quotedString, Suppress, restOfLine

    comment = Suppress('#' + restOfLine)
    recognizer = quotedString | comment

    for t in tests:
    print t
    print recognizer.transformString(t)
    print


    Prints:

    this is a test 1
    this is a test 1

    this is a test 2 #with a comment
    this is a test 2

    this is a '#gnarlier' test #with a comment
    this is a '#gnarlier' test

    this is a "#gnarlier" test #with a comment
    this is a "#gnarlier" test


    For some added fun, add a parse action to quoted strings, to know when
    we've really done something interesting:

    def detectGnarliness(tokens):
    if '#' in tokens[0]:
    print "Ooooh, how gnarly! ->", tokens[0]
    quotedString.setParseAction(detectGnarliness)

    Now our output becomes:

    this is a test 1
    this is a test 1

    this is a test 2 #with a comment
    this is a test 2

    this is a '#gnarlier' test #with a comment
    Ooooh, how gnarly! -> '#gnarlier'
    this is a '#gnarlier' test

    this is a "#gnarlier" test #with a comment
    Ooooh, how gnarly! -> "#gnarlier"
    this is a "#gnarlier" test


    -- Paul
     
    Paul McGuire, Dec 4, 2008
    #6
  7. Joe Strout

    eric Guest

    On Dec 4, 11:35 pm, Paul McGuire <> wrote:
    > Yowza!  My eyes glaze over when I see re's like "r'(?m)^(?P<data>.*?
    > (".*?".*?)*)(?:#.*?)?$"!
    >


    yeah, I know ... :( ( I love complicated regexp ... it's like a puzzle
    game for me)


    > from pyparsing import quotedString, Suppress, restOfLine
    >
    > comment = Suppress('#' + restOfLine)
    > recognizer = quotedString | comment
    >
    > for t in tests:
    >     print t
    >     print recognizer.transformString(t)
    >     print
    >
    > Prints:
    >
    > this is a test 1
    > this is a test 1
    >
    > this is a test 2 #with a comment
    > this is a test 2
    >
    > this is a '#gnarlier' test #with a comment
    > this is a '#gnarlier' test
    >
    > this is a "#gnarlier" test #with a comment
    > this is a "#gnarlier" test
    >
    > For some added fun, add a parse action to quoted strings, to know when
    > we've really done something interesting:
    >
    > def detectGnarliness(tokens):
    >     if '#' in tokens[0]:
    >         print "Ooooh, how gnarly! ->", tokens[0]
    > quotedString.setParseAction(detectGnarliness)
    >
    > Now our output becomes:
    >
    > this is a test 1
    > this is a test 1
    >
    > this is a test 2 #with a comment
    > this is a test 2
    >
    > this is a '#gnarlier' test #with a comment
    > Ooooh, how gnarly! -> '#gnarlier'
    > this is a '#gnarlier' test
    >
    > this is a "#gnarlier" test #with a comment
    > Ooooh, how gnarly! -> "#gnarlier"
    > this is a "#gnarlier" test
    >
    > -- Paul



    I didn't knew pyparsing. It's amazing ! thanks
     
    eric, Dec 5, 2008
    #7
  8. Joe Strout

    eric Guest

    On Dec 5, 11:56 am, eric <> wrote:
    > On Dec 4, 11:35 pm, Paul McGuire <> wrote:
    >
    > > Yowza!  My eyes glaze over when I see re's like "r'(?m)^(?P<data>.*?
    > > (".*?".*?)*)(?:#.*?)?$"!

    >
    > yeah, I know ... :( ( I love complicated regexp ... it's like a puzzle
    > game for me)
    >
    >
    >
    > > from pyparsing import quotedString, Suppress, restOfLine

    >
    > > comment = Suppress('#' + restOfLine)
    > > recognizer = quotedString | comment

    >
    > > for t in tests:
    > >     print t
    > >     print recognizer.transformString(t)
    > >     print

    >
    > > Prints:

    >
    > > this is a test 1
    > > this is a test 1

    >
    > > this is a test 2 #with a comment
    > > this is a test 2

    >
    > > this is a '#gnarlier' test #with a comment
    > > this is a '#gnarlier' test

    >
    > > this is a "#gnarlier" test #with a comment
    > > this is a "#gnarlier" test

    >
    > > For some added fun, add a parse action to quoted strings, to know when
    > > we've really done something interesting:

    >
    > > def detectGnarliness(tokens):
    > >     if '#' in tokens[0]:
    > >         print "Ooooh, how gnarly! ->", tokens[0]
    > > quotedString.setParseAction(detectGnarliness)

    >
    > > Now our output becomes:

    >
    > > this is a test 1
    > > this is a test 1

    >
    > > this is a test 2 #with a comment
    > > this is a test 2

    >
    > > this is a '#gnarlier' test #with a comment
    > > Ooooh, how gnarly! -> '#gnarlier'
    > > this is a '#gnarlier' test

    >
    > > this is a "#gnarlier" test #with a comment
    > > Ooooh, how gnarly! -> "#gnarlier"
    > > this is a "#gnarlier" test

    >
    > > -- Paul

    >
    > I didn't knew pyparsing. It's amazing ! thanks



    maybe you'd rather replace:
    splitter = re.compile(r'(?m)^(?P<data>.*?(".*?".*?)*)(?:#.*?)?$')

    by

    from reO import *
    quote = characters('"') # defining the characters used as string sep
    sharp= string('#') # defining the sharp symbol
    data = ALL + repeat( group( quote + ALL + quote + ALL ) ) # ALL
    ( "ALL" ALL)*
    comment = group(sharp+ALL+END_LINE) # the comment itself

    xp = flag(MULTILINE=True) + START_LINE + group( data, name="data") +
    if_exists(comment)

    splitter = xp.compile()
     
    eric, Dec 5, 2008
    #8
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Brent Burkart
    Replies:
    5
    Views:
    798
    Jerry III
    Oct 16, 2003
  2. Alec S.
    Replies:
    10
    Views:
    10,229
    Alec S.
    Apr 16, 2005
  3. Aquila
    Replies:
    35
    Views:
    481
    Mathieu Bouchard
    Mar 31, 2005
  4. yelipolok
    Replies:
    4
    Views:
    275
    John W. Krahn
    Jan 27, 2010
  5. PerlFAQ Server
    Replies:
    0
    Views:
    132
    PerlFAQ Server
    Feb 25, 2011
Loading...

Share This Page