Squezing in replacements into strings

P

Peter Bengtsson

I've got a regular expression that finds certain words from a longer string.
From "Peter Bengtsson PETER, or PeTeR" it finds: 'Peter','PETER','PeTeR'.

What I then want to do is something like this:

def _ok(matchobject):
# more complicated stuff happens here
return 1

def _massage(word):
return "_" + word + "_"

for match in regex.finditer(text):
if not _ok(match):
continue
text = text[:match.start()] +\
_massageMatch(text[match.start():match.end()]) +\
text[match.end():]

This code works and can convert something like "don't allow the **** swear word"

to "don't allow the _fuck_ swear word".

The problem is when there are more than one matches. The match.start() and

match.end() are for the original string but after the first iteration in the

loop the original string changes (it gains 2 characters in length due to the
"_"'s)

How can I do this this concatenation correctly?
 
P

Peter Otten

Peter said:
I've got a regular expression that finds certain words from a longer
string.
The problem is when there are more than one matches. The match.start() and
match.end() are for the original string but after the first iteration in
the loop the original string changes (it gains 2 characters in length due
to the "_"'s
How can I do this this concatenation correctly?

I think sub() is more appropriate than finditer() for your problem, e. g.:
.... return "_%s_" % match.group(1).title()
....PeTeR")
'_Peter_ Bengtsson _Peter_, or _Peter_'
Peter
 
A

Adriano Ferreira

As Peter Otten said, sub() is probably what you want. Try:

---------------------------------------------------
import re

def _ok(matchobject):
# more complicated stuff happens here
return 1

def _massage(word):
return "_" + word + "_"


def _massage_or_not(matchobj):
if not _ok(matchobj):
return matchobj.group(0)
else:
word = matchobj.group(0)
return _massage(word)


text = "don't allow the **** swear word"

rtext = re.sub(r'****', _massage_or_not, text)
print rtext
---------------------------------------------------

No need to hassle with the changing length of the replaced string.

Best regards,
Adriano.
 
P

Peter Bengtsson

Peter Otten said:
I think sub() is more appropriate than finditer() for your problem, e. g.:

... return "_%s_" % match.group(1).title()
...
PeTeR")
'_Peter_ Bengtsson _Peter_, or _Peter_'

Ahaa! Great. I didn't realise that I can substitute with a callable that gets
the match object. Hadn't thought of it that way.
Will try this now.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top