Replace string except inside quotes?

beliavsky · Dec 3, 2004

The code

for text in open("file.txt","r"):
print text.replace("foo","bar")[:-1]

replaces 'foo' with 'bar' in a file, but how do I avoid changing text
inside single or double quotes? For making changes to Python code, I
would also like to avoid changing text in comments, either the '#' or
'""" ... """' kind.

Michael J. Fromberger · Dec 3, 2004

The code

for text in open("file.txt","r"):
print text.replace("foo","bar")[:-1]

replaces 'foo' with 'bar' in a file, but how do I avoid changing text
inside single or double quotes? For making changes to Python code, I
would also like to avoid changing text in comments, either the '#' or
'""" ... """' kind.

The first part of what you describe isn't too bad, here's some code that
seems to do what you want:

import re

def replace_unquoted(text, src, dst, quote = '"'):
r = re.compile(r'%s([^\\%s]|\\[\\%s])*%s' %
(quote, quote, quote, quote))

out = '' ; last_pos = 0
for m in r.finditer(text):
out += text[last_pos:m.start()].replace(src, dst)
out += m.group()
last_pos = m.end()

return out + text[last_pos:].replace(src, dst)

Example usage:
print replace_unquoted(file('foo.txt', 'r').read(),
"foo", "bar")

It's not the most elegant solution in the world. This code does NOT
deal with the problem of commented text. I think it will handle triple
quotes, though I haven't tested it on that case.

At any rate, I hope it may help you get started.

Cheers,
-M

Jeff Shannon · Dec 3, 2004

Michael said:
It's not the most elegant solution in the world. This code does NOT
deal with the problem of commented text. I think it will handle triple
quotes, though I haven't tested it on that case.

I believe that it will probably work for triple quotes that begin and
end on the same line. Of course, the primary usage of triple-quotes is
for multiline strings, but given that the file is being examined one
line at a time, you'd need some method of maintaining state in order to
handle multiline strings properly. (Note that this problem is true
regardless of whether the strings are true triple-quoted multiline
strings, or single-quoted single-line strings broken across two lines of
source code using '\'.)

If the entire file is read in and processed as a single chunk, instead
of line-by-line, then *some* of the problems go away (at the cost of
potentially very large memory consumption and poor performance, if the
file is large). The fact that triple-quoted strings work out (mostly)
correctly when viewed as three pairs of quotes will help. But if a
triple-quoted string *contains* a normally quoted string (e.g., """My
"foo" object"""), then things break down again.

In order to handle this sort of nested structure with anything
resembling true reliability, it's necessary to step up to a true
lexing/parsing procedure, instead of mere string matching and regular
expressions.

Jeff Shannon
Technician/Programmer
Credit International

Raymond Hettinger · Dec 4, 2004

for text in open("file.txt","r"):
print text.replace("foo","bar")[:-1]

replaces 'foo' with 'bar' in a file, but how do I avoid changing text
inside single or double quotes? For making changes to Python code, I
would also like to avoid changing text in comments, either the '#' or
'""" ... """' kind.

The source for the tokenize module covers all these bases.

Raymond Hettinger

M.E.Farmer · Dec 4, 2004

Raymond Hettinger said:
The source for the tokenize module covers all these bases.

Raymond Hettinger

# tokenize text replace

import keyword, os, sys, traceback
import string, cStringIO
import token, tokenize

######################################################################

class Parser:
"""python source code tokenizing text replacer
"""
def __init__(self, raw, out=sys.stdout):
''' Store the source text & set some flags.
'''
self.raw = string.strip(string.expandtabs(raw))
self.out = out

def format(self, search='' ,replace='',
replacetokentype=token.NAME):
''' Parse and send text.
'''
# Store line offsets in self.lines
self.lines = [0, 0]
pos = 0
self.temp = cStringIO.StringIO()
self.searchtext = search
self.replacetext = replace
self.replacetokentype = replacetokentype

# Gather lines
while 1:
pos = string.find(self.raw, '\n', pos) + 1
if not pos: break
self.lines.append(pos)
self.lines.append(len(self.raw))

# Wrap text in a filelike object
self.pos = 0
text = cStringIO.StringIO(self.raw)

# Parse the source.
## Tokenize calls the __call__
## function for each token till done.
try:
tokenize.tokenize(text.readline, self)
except tokenize.TokenError, ex:
traceback.print_exc()

def __call__(self, toktype, toktext,
(srow,scol), (erow,ecol), line):
''' Token handler.
'''
# calculate new positions
oldpos = self.pos
newpos = self.lines[srow] + scol
self.pos = newpos + len(toktext)

# handle newlines
if toktype in [token.NEWLINE, tokenize.NL]:
self.out.write('\n')
return

# send the original whitespace, if needed
if newpos > oldpos:
self.out.write(self.raw[oldpos:newpos])

# skip indenting tokens
if toktype in [token.INDENT, token.DEDENT]:
self.pos = newpos
return

# search for matches to our searchtext
# customize this for your exact needs
if (toktype == self.replacetokentype and
toktext == self.searchtext):
toktext = self.replacetext

# write it out
self.out.write(toktext)
return

######################################################################
# just an example
def Main():
import sys
if sys.argv[0]:
filein = open(sys.argv[0]).read()
Parser(filein, out=sys.stdout).format('tokenize', 'MyNewName')

######################################################################

if __name__ == '__main__':
Main()

# end of code

This is an example of how to use tokenize to replace names
that match a search string.
If you wanted to only replace strings and not
names then change the replacetokentype to
token.STRING instead of token.NAME etc...
HTH,
M.E.Farmer

Checking if string inside quotes?	4	May 9, 2007
Weird exception handling behavior -- late evaluation in except clause	6	Dec 2, 2012
FAQ 4.31 How can I split a [character] delimited string except when inside [character]?	0	Apr 13, 2011
Search & Replace	6	Oct 26, 2006
String and replace {placeholder} recursion problem	0	Apr 19, 2011
Replace unknow string varible in file.	4	Feb 10, 2009
String and list error while running a Markov Chain	1	Aug 26, 2020
Python client/server that reads HTML body from server	1	Apr 12, 2023

Replace string except inside quotes?

beliavsky

Michael J. Fromberger

Jeff Shannon

Raymond Hettinger

M.E.Farmer

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads