change only the nth occurrence of a pattern in a string

T

TP

Hi everybody,

I would like to change only the nth occurence of a pattern in a string. The
problem with "replace" method of strings, and "re.sub" is that we can only
define the number of occurrences to change from the first one.
'ciucou'

What is the best way to change only the nth occurence (occurrence number n)?

Why this default behavior? For the user, it would be easier to put re.sub or
replace in a loop to change the first n occurences.

Thanks

Julien
--
python -c "print ''.join([chr(154 - ord(c)) for c in '*9(9&(18%.\
9&1+,\'Z4(55l4('])"

"When a distinguished but elderly scientist states that something is
possible, he is almost certainly right. When he states that something is
impossible, he is very probably wrong." (first law of AC Clarke)
 
R

Roy Smith

TP said:
Hi everybody,

I would like to change only the nth occurence of a pattern in a string.

It's a little ugly, but the following looks like it works. The gist is to
split the string on your pattern, then re-join the pieces using the
original delimiter everywhere except for the n'th splice. Split() is a
wonderful tool. I'm a hard-core regex geek, but I find that most things I
might have written a big hairy regex for are easier solved by doing split()
and then attacking the pieces.

There may be some fencepost errors here. I got the basics working, and
left the details as an exercise for the reader :)

This version assumes the pattern is a literal string. If it's really a
regex, you'll need to put the pattern in parens when you call split(); this
will return the exact text matched each time as elements of the list. And
then your post-processing gets a little more complicated, but nothing
that's too bad.

This does a couple of passes over the data, but at least all the operations
are O(n), so the whole thing is O(n).


#!/usr/bin/python

import re

v = "coucoucoucou"

pattern = "o"
n = 2
parts = re.split(pattern, v)
print parts

first = parts[:n]
last = parts[n:]
print first
print last

j1 = pattern.join(first)
j2 = pattern.join(last)
print j1
print j2
print "i".join([j1, j2])
print v
 
S

Steven D'Aprano

Hi everybody,

I would like to change only the nth occurence of a pattern in a string.
The problem with "replace" method of strings, and "re.sub" is that we
can only define the number of occurrences to change from the first one.

'ciucou'

What is the best way to change only the nth occurence (occurrence number
n)?

Step 1: Find the nth occurrence.
Step 2: Change it.


def findnth(source, target, n):
num = 0
start = -1
while num < n:
start = source.find(target, start+1)
if start == -1: return -1
num += 1
return start

def replacenth(source, old, new, n):
p = findnth(source, old, n)
if n == -1: return source
return source[:p] + new + source[p+len(old):]


And in use:
'abcabcWXYZabcabc'


Why this default behavior? For the user, it would be easier to put
re.sub or replace in a loop to change the first n occurences.

Easier than just calling a function? I don't think so.

I've never needed to replace only the nth occurrence of a string, and I
guess the Python Development team never did either. Or they thought that
the above two functions were so trivial that anyone could write them.
 
T

Tim Chase

I would like to change only the nth occurence of a pattern in
a string. The problem with "replace" method of strings, and
"re.sub" is that we can only define the number of occurrences
to change from the first one.

'ciucou'

What is the best way to change only the nth occurence
(occurrence number n)?

Well, there are multiple ways of doing this, including munging
the regexp to skip over the first instances of a match.
Something like the following untested:

re.sub("((?:[^o]*o){2})o", r"\1i", s)

However, for a more generic solution, you could use something like

import re
class Nth(object):
def __init__(self, n_min, n_max, replacement):
#assert n_min <= n_max, \
# "Hey, look, I don't know what I'm doing!"
if n_max > n_min:
# don't be a dope
n_min, n_max = n_max, n_min
self.n_min = n_min
self.n_max = n_max
self.replacement = replacement
self.calls = 0
def __call__(self, matchobj):
self.calls += 1
if self.n_min <= self.calls <= self.n_max:
return self.replacement
return matchobj.group(0)

s = 'coucoucoucou'
print "Initial:"
print s
print "Just positions 3-4:"
print re.sub('o', Nth(3,4,'i'), s)
for params in [
(1, 1, 'i'), # just the 1st
(1, 2, 'i'), # 1-2
(2, 2, 'i'), # just the 2nd
(2, 3, 'i'), # 2-3
(2, 4, 'i'), # 2-4
(4, 4, 'i'), # just the 4th
]:
print "Nth(%i, %i, %s)" % params
print re.sub('o', Nth(*params), s)
Why this default behavior?

Can't answer that one, but with so many easy solutions, it's not
been a big concern of mine.

-tkc
 
A

Antoon Pardon

Hi everybody,

I would like to change only the nth occurence of a pattern in a string. The
problem with "replace" method of strings, and "re.sub" is that we can only
define the number of occurrences to change from the first one.

'ciucou'

What is the best way to change only the nth occurence (occurrence number n)?

Why this default behavior? For the user, it would be easier to put re.sub or
replace in a loop to change the first n occurences.

I would do it as follows:

1) Change the pattern n times to somethings that doesn't occur in your string
2) Change it back n-1 times
3) Change the remaining one to what you want.
'couciu'
 
M

MRAB

Antoon said:
>
> I would do it as follows:
>
> 1) Change the pattern n times to somethings that doesn't occur in your string
> 2) Change it back n-1 times
> 3) Change the remaining one to what you want.
>
> 'couciu'
>
Sorry for the last posting, but it did occur to me that str.replace()
could grow another parameter 'start', so it would become:

s.replace(old, new[[, start], end]]) -> string

(In Python 2.x the method doesn't accept keyword arguments, so that
isn't a problem.)

If the possible replacements are numbered from 0, then 'start' is the
first one actually to perform and 'end' the one after the last to perform.

The 2-argument form would be s.replace(old, new) with 'start' defaulting
to 0 and 'end' to None => replacing all occurrences, same as now.

The 3-argument form would be s.replace(old, new, end) with 'start'
defaulting to 0 => equivalent to replacing the first 'end' occurrences,
same as now.

The 4-argument form would be s.replace(old, new, start, end) =>
replacing from the 'start'th to before the 'end'th occurrence,
additional behaviour as requested.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,053
Latest member
BrodieSola

Latest Threads

Top