Replace string except inside quotes?

Discussion in 'Python' started by beliavsky@aol.com, Dec 3, 2004.

  1. Guest

    The code

    for text in open("file.txt","r"):
    print text.replace("foo","bar")[:-1]

    replaces 'foo' with 'bar' in a file, but how do I avoid changing text
    inside single or double quotes? For making changes to Python code, I
    would also like to avoid changing text in comments, either the '#' or
    '""" ... """' kind.
    , Dec 3, 2004
    #1
    1. Advertising

  2. In article <>,
    wrote:

    > The code
    >
    > for text in open("file.txt","r"):
    > print text.replace("foo","bar")[:-1]
    >
    > replaces 'foo' with 'bar' in a file, but how do I avoid changing text
    > inside single or double quotes? For making changes to Python code, I
    > would also like to avoid changing text in comments, either the '#' or
    > '""" ... """' kind.


    The first part of what you describe isn't too bad, here's some code that
    seems to do what you want:

    import re

    def replace_unquoted(text, src, dst, quote = '"'):
    r = re.compile(r'%s([^\\%s]|\\[\\%s])*%s' %
    (quote, quote, quote, quote))

    out = '' ; last_pos = 0
    for m in r.finditer(text):
    out += text[last_pos:m.start()].replace(src, dst)
    out += m.group()
    last_pos = m.end()

    return out + text[last_pos:].replace(src, dst)

    Example usage:
    print replace_unquoted(file('foo.txt', 'r').read(),
    "foo", "bar")

    It's not the most elegant solution in the world. This code does NOT
    deal with the problem of commented text. I think it will handle triple
    quotes, though I haven't tested it on that case.

    At any rate, I hope it may help you get started.

    Cheers,
    -M

    --
    Michael J. Fromberger | Lecturer, Dept. of Computer Science
    http://www.dartmouth.edu/~sting/ | Dartmouth College, Hanover, NH, USA
    Michael J. Fromberger, Dec 3, 2004
    #2
    1. Advertising

  3. Jeff Shannon Guest

    Michael J. Fromberger wrote:

    >It's not the most elegant solution in the world. This code does NOT
    >deal with the problem of commented text. I think it will handle triple
    >quotes, though I haven't tested it on that case.
    >
    >


    I believe that it will probably work for triple quotes that begin and
    end on the same line. Of course, the primary usage of triple-quotes is
    for multiline strings, but given that the file is being examined one
    line at a time, you'd need some method of maintaining state in order to
    handle multiline strings properly. (Note that this problem is true
    regardless of whether the strings are true triple-quoted multiline
    strings, or single-quoted single-line strings broken across two lines of
    source code using '\'.)

    If the entire file is read in and processed as a single chunk, instead
    of line-by-line, then *some* of the problems go away (at the cost of
    potentially very large memory consumption and poor performance, if the
    file is large). The fact that triple-quoted strings work out (mostly)
    correctly when viewed as three pairs of quotes will help. But if a
    triple-quoted string *contains* a normally quoted string (e.g., """My
    "foo" object"""), then things break down again.

    In order to handle this sort of nested structure with anything
    resembling true reliability, it's necessary to step up to a true
    lexing/parsing procedure, instead of mere string matching and regular
    expressions.

    Jeff Shannon
    Technician/Programmer
    Credit International
    Jeff Shannon, Dec 3, 2004
    #3
  4. <> wrote > The code
    >
    > for text in open("file.txt","r"):
    > print text.replace("foo","bar")[:-1]
    >
    > replaces 'foo' with 'bar' in a file, but how do I avoid changing text
    > inside single or double quotes? For making changes to Python code, I
    > would also like to avoid changing text in comments, either the '#' or
    > '""" ... """' kind.


    The source for the tokenize module covers all these bases.


    Raymond Hettinger
    Raymond Hettinger, Dec 4, 2004
    #4
  5. M.E.Farmer Guest

    "Raymond Hettinger" <> wrote in message
    > The source for the tokenize module covers all these bases.


    > Raymond Hettinger


    # tokenize text replace

    import keyword, os, sys, traceback
    import string, cStringIO
    import token, tokenize

    ######################################################################

    class Parser:
    """python source code tokenizing text replacer
    """
    def __init__(self, raw, out=sys.stdout):
    ''' Store the source text & set some flags.
    '''
    self.raw = string.strip(string.expandtabs(raw))
    self.out = out

    def format(self, search='' ,replace='',
    replacetokentype=token.NAME):
    ''' Parse and send text.
    '''
    # Store line offsets in self.lines
    self.lines = [0, 0]
    pos = 0
    self.temp = cStringIO.StringIO()
    self.searchtext = search
    self.replacetext = replace
    self.replacetokentype = replacetokentype

    # Gather lines
    while 1:
    pos = string.find(self.raw, '\n', pos) + 1
    if not pos: break
    self.lines.append(pos)
    self.lines.append(len(self.raw))

    # Wrap text in a filelike object
    self.pos = 0
    text = cStringIO.StringIO(self.raw)

    # Parse the source.
    ## Tokenize calls the __call__
    ## function for each token till done.
    try:
    tokenize.tokenize(text.readline, self)
    except tokenize.TokenError, ex:
    traceback.print_exc()


    def __call__(self, toktype, toktext,
    (srow,scol), (erow,ecol), line):
    ''' Token handler.
    '''
    # calculate new positions
    oldpos = self.pos
    newpos = self.lines[srow] + scol
    self.pos = newpos + len(toktext)

    # handle newlines
    if toktype in [token.NEWLINE, tokenize.NL]:
    self.out.write('\n')
    return

    # send the original whitespace, if needed
    if newpos > oldpos:
    self.out.write(self.raw[oldpos:newpos])

    # skip indenting tokens
    if toktype in [token.INDENT, token.DEDENT]:
    self.pos = newpos
    return

    # search for matches to our searchtext
    # customize this for your exact needs
    if (toktype == self.replacetokentype and
    toktext == self.searchtext):
    toktext = self.replacetext

    # write it out
    self.out.write(toktext)
    return

    ######################################################################
    # just an example
    def Main():
    import sys
    if sys.argv[0]:
    filein = open(sys.argv[0]).read()
    Parser(filein, out=sys.stdout).format('tokenize', 'MyNewName')

    ######################################################################

    if __name__ == '__main__':
    Main()

    # end of code


    This is an example of how to use tokenize to replace names
    that match a search string.
    If you wanted to only replace strings and not
    names then change the replacetokentype to
    token.STRING instead of token.NAME etc...
    HTH,
    M.E.Farmer
    M.E.Farmer, Dec 4, 2004
    #5
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Chris
    Replies:
    1
    Views:
    13,601
    Oisin
    Mar 24, 2006
  2. John Salerno
    Replies:
    20
    Views:
    819
    John Salerno
    Aug 11, 2006
  3. Fabio Z Tessitore

    who is simpler? try/except/else or try/except

    Fabio Z Tessitore, Aug 12, 2007, in forum: Python
    Replies:
    5
    Views:
    361
  4. David House

    try -> except -> else -> except?

    David House, Jul 6, 2009, in forum: Python
    Replies:
    2
    Views:
    323
    Bruno Desthuilliers
    Jul 6, 2009
  5. Mike G.
    Replies:
    1
    Views:
    112
    Tad McClellan
    Aug 19, 2003
Loading...

Share This Page