Replace Several Items

gjhames · Aug 13, 2008

I wish to replace several characters in my string to only one.
Example, "-", "." and "/" to nothing ""
I did like that:
my_string = my_string.replace("-", "").replace(".", "").replace("/",
"").replace(")", "").replace("(", "")

But I think it's a ugly way.

What's the better way to do it?

bearophileHUGS · Aug 13, 2008

gjhames:

What's the better way to do it?

Better is a relative term. If with better you mean "faster" (in some
circumstances), then the translate method is your friend, as you can
see its second argument are the chars to be removed. As first argument
you can use something like:
"".join(map(chr, xrange(256)))
If your strings are unicode you will need something different (a dict
with Null values for the key chars you want to remove).

Bye,
bearophile

Eric Wertman · Aug 13, 2008

I tend to use the re module like so :

import re
my_string = re.sub('[\-,./]','',my_string)

Fredrik Lundh · Aug 13, 2008

Wojtek said:
The regular expression is probably the best way to do it,
but if you really want to use replace, you can also use
the replace method in loop:

suggested exercise: benchmark re.sub with literal replacement, re.sub
with callback (lambda m: ""), repeated replace, and repeated use of the form

if ch in my_string:
my_string = my_string.replace(ch, "")

on representative data.

</F>

bearophileHUGS · Aug 13, 2008

Fredrik Lundh:

suggested exercise: benchmark re.sub with literal replacement, re.sub
with callback (lambda m: ""), repeated replace, and repeated use of the form ....
on representative data.

Please, add the translate() solution too I have suggested

Bye,
bearophile

John Krukoff · Aug 13, 2008

I wish to replace several characters in my string to only one.
Example, "-", "." and "/" to nothing ""
I did like that:
my_string = my_string.replace("-", "").replace(".", "").replace("/",
"").replace(")", "").replace("(", "")

But I think it's a ugly way.

What's the better way to do it?

The maketrans interface is a bit clunky, but this is what
string.translate is best at:

ï»¿>>> import string'other'

It'd be interesting to see where it falls in the benchmarks, though.

It's worth noting that the interface for translate is quite different
for unicode strings.

Fredrik Lundh · Aug 13, 2008

Wojtek said:
I don't have to, I can anticipate the results.

Chances are that you're wrong.

</F>

John Machin · Aug 14, 2008

Dnia Thu, 14 Aug 2008 00:31:00 +0200, Fredrik Lundh napisa³(a):

At the moment my average is about 0.75 of mistake per
post on comp.lang.python (please, bare with me ;-)).
I strongly believe that the statement I made above won't
make this number rise.

Meta-mistake: Playing poker with the effbot. If he says diffidently
that he'll raise you a dime, it means he's holding five aces

Clue: The effbot was the original author of the modern (Python 1.6?)
version of the re module.

HTH,
John

Steven D'Aprano · Aug 14, 2008

Dnia Thu, 14 Aug 2008 00:31:00 +0200, Fredrik Lundh napisaÂ³(a):

At the moment my average is about 0.75 of mistake per post on
comp.lang.python (please, bare with me ;-)). I strongly believe that the
statement I made above won't make this number rise.

Okay, is this going to be one of those things where, no matter what the
benchmarks show, you say "I was right, I *did* anticipate the results. I
just anticipated them correctly/incorrectly."?

If so, you get an A+ in pedantry and F- in usefulness *wink*

In full knowledge that Python is relatively hard to guess what is fast
compared to what is slow, I'll make my guess of fastest to slowest:

1. repeated replace
2. repeated use of the form
"if ch in my_string: my_string = my_string.replace(ch, "")
3. re.sub with literal replacement
4. re.sub with callback (lambda m: "")

Results to follow.

Steven D'Aprano · Aug 14, 2008

In full knowledge that Python is relatively hard to guess what is fast
compared to what is slow, I'll make my guess of fastest to slowest:

1. repeated replace
2. repeated use of the form
"if ch in my_string: my_string = my_string.replace(ch, "")
3. re.sub with literal replacement
4. re.sub with callback (lambda m: "")

I added an extra test, which I expected to be fastest of all: using the
string.translate() function.

Here are my results, as generated with the timeit module under Python 2.5:

$ python delchars.py
Replacing 72 chars from a string of length 216
[(5.3256440162658691, 'delchars5'), (10.688904047012329, 'delchars2'),
(10.85448694229126, 'delchars1'), (67.739475965499878, 'delchars3'),
(120.5037829875946, 'delchars4')]

Based on these results, the fastest to slowest techniques are:

1. string translate (delchars5)
2. repeated replace with a test (delchars2)
3. repeated replace without a test (delchars1)
4. re.sub with literal replacement (delchars3)
5. re.sub with callback (delchars4)

However the two versions using replace are quite close, and possibly not
significant. I imagine that it would be easy to find test cases where
they were in the opposite order.

While I'm gratified that my prediction was so close to the results I
found, I welcome any suggestions to better/faster/more efficient code.

Test code follows:

==================================================

import re, string

def delchars1(s, chars):
for c in chars:
s = s.replace(c, '')
return s

def delchars2(s, chars):
for c in chars:
if c in s:
s = s.replace(c, '')
return s

def delchars3(s, chars):
chars = re.escape(chars)
x = re.compile(r'[%s]' % chars)
return x.sub('', s)

def delchars4(s, chars):
chars = re.escape(chars)
x = re.compile(r'[%s]' % chars)
return x.sub(lambda m: '', s)

def delchars5(s, chars):
return string.translate(s, string.maketrans('', ''), chars)

funcs = [delchars1, delchars2, delchars3, delchars4, delchars5]

def test_same(s, chars, known_result):
results = [f(s, chars) for f in funcs]
for i in range(len(results)):
if results != known_result:
msg = "function %s incorrectly gives %s" \
% (funcs, results)
raise AssertionError(msg)

s = "abcd.abcd-abcd/abcd"
chars = ".-/?"
test_same(s, chars, "abcd"*4)

# try something a little bigger
s = s*2 + "abcd..--//" + "a.b.c.d.a-b-c-d-a/b/c/d/"
s *= 3
test_same(s, chars, "abcd"*36)

# now do the timing tests

from timeit import Timer
t1 = Timer("delchars1(s, chars)",
"from __main__ import delchars1, s, chars")
t2 = Timer("delchars2(s, chars)",
"from __main__ import delchars2, s, chars")
t3 = Timer("delchars3(s, chars)",
"from __main__ import delchars3, s, chars")
t4 = Timer("delchars4(s, chars)",
"from __main__ import delchars4, s, chars")
t5 = Timer("delchars5(s, chars)",
"from __main__ import delchars5, s, chars")

times = [min(t.repeat()) for t in (t1, t2, t3, t4, t5)]
results = zip(times, [f.__name__ for f in funcs])
results.sort()

n = sum(s.count(c) for c in chars)
print "Replacing %d chars from a string of length %d" % (n, len(s))
print results

==================================================

Fredrik Lundh · Aug 14, 2008

Steven said:
> While I'm gratified that my prediction was so close to the results I
> found, I welcome any suggestions to better/faster/more efficient code.
> more things to try:

code tweaks:

- Factor out the creation of the regular expression from the tests:
"escape" and "compile" are relatively expensive, and neither throw-away
code (using the RE function forms) nor production code will end up doing
them both for each string.

- Same w. the translation table for "translate"

- Use Unicode strings instead of byte strings (we're moving towards 3.0,
after all).

test data variations:

- Try dropping the number of actual replacements and see what happens --
if you're escaping user-provided data (e.g. HTML), for example, it's not
that unlikely that you end up doing only a few replacements for each
string you're processing, or no replacements at all.

- Also try shorter and longer strings ("human-sized" data is often
provided in shorter chunks than 216 characters per string; the typical
size and distribution depends on your actual application, of course).

Unicode will affect translate more than the others; the last two will
most likely affect in-replace instead (that approach gets faster the
shorter the strings are, and the fewer calls to replace that you
actually end up doing).

Finally, if you want the sub-lambda form to look better, try inserting a
character before or after each special character using a template string
or a lambda (e.g. a backslash).

</F>

Fredrik Lundh · Aug 14, 2008

John said:
Clue: The effbot was the original author of the modern (Python 1.6?)
version of the re module.

And the author of the "in" and "replace" implementations in Python 2.5.

</F>

M.-A. Lemburg · Aug 14, 2008

The maketrans interface is a bit clunky, but this is what
string.translate is best at:

ï»¿>>> import string
'other'

It'd be interesting to see where it falls in the benchmarks, though.

It's worth noting that the interface for translate is quite different
for unicode strings.

Right. Unicode .translate() uses a dictionary for defining the
mapping.

Another approach is to use the re module:

>>> import re
>>> re.sub('[-./()]', '', '-./other')

Click to expand...

Click to expand...

'other'

--
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source (#1, Aug 14 2008)________________________________________________________________________

:::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! ::::

eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
Registered at Amtsgericht Duesseldorf: HRB 46611

Regex replace problem	2	Jan 6, 2022
How to use SQLite (sqlite3) more efficiently	0	Jun 5, 2014
Several gateways in Azure Analysis Services	1	Mar 23, 2023
Maybe I am a baby in the belly	2	May 25, 2023
API for custom Unicode error handlers	5	Oct 4, 2013
Antispam measures circumventing	3	Sep 20, 2013
JavaScript code not working!!	6	Jun 13, 2023
FOSS or Freeware, Prefferably Runs on Linux Mint: Search US Goverment Records, Legally to Find Literarary Work	8	Apr 5, 2023

Replace Several Items

gjhames

bearophileHUGS

Eric Wertman

Fredrik Lundh

bearophileHUGS

John Krukoff

Fredrik Lundh

John Machin

Steven D'Aprano

Steven D'Aprano

Fredrik Lundh

Fredrik Lundh

M.-A. Lemburg

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads