Deleting specific characters from a string

  • Thread starter Behrang Dadsetan
  • Start date
B

Behrang Dadsetan

Hi all,

I would like deleting specific characters from a string.
As an example, I would like to delete all of the '@' '&' in the string
'You are ben@orange?enter&your&code' so that it becomes
'benorange?enteryourcode'.

So far I have been doing it like:
str = 'You are ben@orange?enter&your&code'
str = ''.join([ c for c in str if c not in ('@', '&')])

but that looks so ugly.. I am hoping to see nicer examples to acheive
the above..

Thanks.
Ben.
 
M

Matt Shomphe

Maybe a new method should be added to the str class, called "remove".
It would take a list of characters and remove them from the string:


class RemoveString(str):
def __init__(self, s=None):
str.__init__(self, s)
def remove(self, chars):
s = self
for c in chars:
s = s.replace(c, '')
return(s)

if __name__ == '__main__':
r = RemoveString('abc')
e = r.remove('c')
print r, e
# prints "abc ab" -- it's not "in place" removal

M@
 
J

John Hunter

Matt> Maybe a new method should be added to the str class, called
Matt> "remove". It would take a list of characters and remove
Matt> them from the string:

you can use string translate for this, which is shorter and faster
than using the loop.

class rstr(str):
_allchars = "".join([chr(x) for x in range(256)])
def remove(self, chars):
return self.translate(self._allchars, chars)

me = rstr('John Hunter')
print me.remove('ohn')

Also, you don't need to define a separate __init__, since you are nor
overloading the str default.

JDH
 
D

Donn Cave

Maybe a new method should be added to the str class, called "remove".
It would take a list of characters and remove them from the string:

Check out the translate function - that's what its optional
deletions argument is for.

Donn Cave, (e-mail address removed)
 
B

Behrang Dadsetan

Donn said:
Check out the translate function - that's what its optional
deletions argument is for.
are also ugly...

The first version is completely unreadable. I guess my initial example
''.join([ c for c in str if c not in ('@', '&')]) was easier to read
than the translate (who would guess -without having to peek in the
documentation of translate- that that line deletes @ and &?!) but I am
not sure ;)

while the second becomes acceptable. The examples you gave me use the
string module.
I think I read somewhere that the methods of the object should rather be
used than the string module. Is that right?

Thanks anyhow, I will go for the replace(something, '') method.
Ben.
 
?

=?ISO-8859-1?Q?Walter_D=F6rwald?=

Behrang said:
Hi all,

I would like deleting specific characters from a string.
As an example, I would like to delete all of the '@' '&' in the string
'You are ben@orange?enter&your&code' so that it becomes
'benorange?enteryourcode'.

So far I have been doing it like:
str = 'You are ben@orange?enter&your&code'
str = ''.join([ c for c in str if c not in ('@', '&')])

but that looks so ugly.. I am hoping to see nicer examples to acheive
the above..

What about the following:

str = 'You are ben@orange?enter&your&code'
str = filter(lambda c: c not in "@&", str)

Bye,
Walter Dörwald
 
B

Behrang Dadsetan

Walter said:
Behrang said:
Hi all,

I would like deleting specific characters from a string.
As an example, I would like to delete all of the '@' '&' in the
string 'You are ben@orange?enter&your&code' so that it becomes
'benorange?enteryourcode'.

So far I have been doing it like:
str = 'You are ben@orange?enter&your&code'
str = ''.join([ c for c in str if c not in ('@', '&')])

but that looks so ugly.. I am hoping to see nicer examples to acheive
the above..


What about the following:

str = 'You are ben@orange?enter&your&code'
str = filter(lambda c: c not in "@&", str)

Bye,
Walter Dörwald

def isAcceptableChar(character):
return charachter in "@&"

str = filter(isAcceptableChar, str)

is going to finally be what I am going to use.
I not feel lambdas are so readable, unless one has serious experience in
using them and python in general. I feel it is acceptable to add a named
method that documents with its name what it is doing there.

But your example would probably have been my choice if I was more
familiar with that type of use and the potential readers of my code were
also familiar with it. Many thanks!

Ben.
 
?

=?ISO-8859-1?Q?Walter_D=F6rwald?=

Behrang said:
Walter said:
Behrang said:
Hi all,

I would like deleting specific characters from a string.
As an example, I would like to delete all of the '@' '&' in the
string 'You are ben@orange?enter&your&code' so that it becomes
'benorange?enteryourcode'.

So far I have been doing it like:
str = 'You are ben@orange?enter&your&code'
str = ''.join([ c for c in str if c not in ('@', '&')])

but that looks so ugly.. I am hoping to see nicer examples to acheive
the above..



What about the following:

str = 'You are ben@orange?enter&your&code'
str = filter(lambda c: c not in "@&", str)

Bye,
Walter Dörwald


def isAcceptableChar(character):
return charachter in "@&"

str = filter(isAcceptableChar, str)

is going to finally be what I am going to use.
I not feel lambdas are so readable, unless one has serious experience in
using them and python in general. I feel it is acceptable to add a named
method that documents with its name what it is doing there.

You're not the only one with this feeling. Compare "the eff-bot's
favourite lambda refactoring rule":

http://groups.google.de/[email protected]

Bye,
Walter Dörwald
 
J

John Hunter

Behrang> is going to finally be what I am going to use. I not
Behrang> feel lambdas are so readable, unless one has serious
Behrang> experience in using them and python in general. I feel it
Behrang> is acceptable to add a named method that documents with
Behrang> its name what it is doing there.

If you want to go the functional programing route, you can generalize
your function somewhat using a callable class:

class remove_char:
def __init__(self,remove):
self.remove = dict([ (c,1) for c in remove])

def __call__(self,c):
return not self.remove.has_key(c)

print filter(remove_char('on'), 'John Hunter')

Cheers,
Jh Huter
 
B

Bengt Richter

Walter said:
Behrang said:
Hi all,

I would like deleting specific characters from a string.
As an example, I would like to delete all of the '@' '&' in the
string 'You are ben@orange?enter&your&code' so that it becomes
'benorange?enteryourcode'.

So far I have been doing it like:
str = 'You are ben@orange?enter&your&code'
str = ''.join([ c for c in str if c not in ('@', '&')])

but that looks so ugly.. I am hoping to see nicer examples to acheive
the above..


What about the following:

str = 'You are ben@orange?enter&your&code'
str = filter(lambda c: c not in "@&", str)
Aaack! I cringe seeing builtin str name rebound like that ;-/
def isAcceptableChar(character):
return charachter in "@&"
return character not in "@&"
str = filter(isAcceptableChar, str)

is going to finally be what I am going to use.
That's not going to be anywhere near as fast as Donn's translate version.
I not feel lambdas are so readable, unless one has serious experience in
using them and python in general. I feel it is acceptable to add a named
method that documents with its name what it is doing there.

But your example would probably have been my choice if I was more
familiar with that type of use and the potential readers of my code were
also familiar with it. Many thanks!
IMO, if you are going to define a function like isAcceptableChar, only to use it
with filter, why not write a function to do the whole job, and whose invocation
reads well, while hiding Donn's fast translate version? E.g., substituting the literal
value of string.maketrans('',''):

====< removechars.py >========================================================
def removeChars(s, remove=''):
return s.translate(
'\x00\x01\x02\x03\x04\x05\x06\x07\x08\t\n\x0b\x0c\r\x0e\x0f'
'\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f'
' !"#$%&\'()*+,-./'
'0123456789:;<=>?'
'@ABCDEFGHIJKLMNO'
'PQRSTUVWXYZ[\\]^_'
'`abcdefghijklmno'
'pqrstuvwxyz{|}~\x7f'
'\x80\x81\x82\x83\x84\x85\x86\x87\x88\x89\x8a\x8b\x8c\x8d\x8e\x8f'
'\x90\x91\x92\x93\x94\x95\x96\x97\x98\x99\x9a\x9b\x9c\x9d\x9e\x9f'
'\xa0\xa1\xa2\xa3\xa4\xa5\xa6\xa7\xa8\xa9\xaa\xab\xac\xad\xae\xaf'
'\xb0\xb1\xb2\xb3\xb4\xb5\xb6\xb7\xb8\xb9\xba\xbb\xbc\xbd\xbe\xbf'
'\xc0\xc1\xc2\xc3\xc4\xc5\xc6\xc7\xc8\xc9\xca\xcb\xcc\xcd\xce\xcf'
'\xd0\xd1\xd2\xd3\xd4\xd5\xd6\xd7\xd8\xd9\xda\xdb\xdc\xdd\xde\xdf'
'\xe0\xe1\xe2\xe3\xe4\xe5\xe6\xe7\xe8\xe9\xea\xeb\xec\xed\xee\xef'
'\xf0\xf1\xf2\xf3\xf4\xf5\xf6\xf7\xf8\xf9\xfa\xfb\xfc\xfd\xfe\xff'
, remove)

if __name__ == '__main__':
import sys
args = sys.argv[1:]
fin = sys.stdin; fout=sys.stdout; remove='' # defaults
while args:
arg = args.pop(0)
if arg == '-fi': fin = file(args.pop(0))
elif arg == '-fo': fout = file(args.pop(0))
else: remove = arg
for line in fin:
fout.write(removeChars(line, remove))
==============================================================================
Not tested beyond what you see here ;-)

[16:40] C:\pywk\ut>echo "'You are ben@orange?enter&your&code'" |python removechars.py "@&"
"'You are benorange?enteryourcode'"

[16:41] C:\pywk\ut>echo "'You are ben@orange?enter&your&code'" |python removechars.py aeiou
"'Y r bn@rng?ntr&yr&cd'"

Copying a snip above to the clipboard and filtering that with no removes and then (lower case) vowels:

[16:41] C:\pywk\ut>getclip |python removechars.py
>I not feel lambdas are so readable, unless one has serious experience in
>using them and python in general. I feel it is acceptable to add a named
>method that documents with its name what it is doing there.
>
>But your example would probably have been my choice if I was more
>familiar with that type of use and the potential readers of my code were
>also familiar with it. Many thanks!

[16:42] C:\pywk\ut>getclip |python removechars.py aeiou
>I nt fl lmbds r s rdbl, nlss n hs srs xprnc n
>sng thm nd pythn n gnrl. I fl t s ccptbl t dd nmd
>mthd tht dcmnts wth ts nm wht t s dng thr.
>
>Bt yr xmpl wld prbbly hv bn my chc f I ws mr
>fmlr wth tht typ f s nd th ptntl rdrs f my cd wr
>ls fmlr wth t. Mny thnks!


Regards,
Bengt Richter
 
J

Jeff Hinrichs

John Hunter said:
Behrang> is going to finally be what I am going to use. I not
Behrang> feel lambdas are so readable, unless one has serious
Behrang> experience in using them and python in general. I feel it
Behrang> is acceptable to add a named method that documents with
Behrang> its name what it is doing there.

If you want to go the functional programing route, you can generalize
your function somewhat using a callable class:

class remove_char:
def __init__(self,remove):
self.remove = dict([ (c,1) for c in remove])

def __call__(self,c):
return not self.remove.has_key(c)

print filter(remove_char('on'), 'John Hunter')
I've been following this thread, and on a whim I built a test harness to
time the different ideas that have been put forth in this thread. I will
post complete results tomorrow on the web but the short version is that
using the .replace method is the overall champ by quite a bit. Below is the
function I tested against the others in the harness:

def stringReplace(s,c):
"""Remove any occurrences of characters in c, from string s
s - string to be filtered, c - characters to filter"""
for a in c:
s = s.replace(a,'')
return s

It wins also by being easy to understand, no filter or lambda. Not that I
have anything against filter or lambda, but when the speediest method is the
most readable, that solution is definitely the Pythonic champ. :)

-Jeff Hinrichs
 
B

Behrang Dadsetan

Jeff said:
def stringReplace(s,c):
"""Remove any occurrences of characters in c, from string s
s - string to be filtered, c - characters to filter"""
for a in c:
s = s.replace(a,'')
return s

It wins also by being easy to understand, no filter or lambda. Not that I
have anything against filter or lambda, but when the speediest method is the
most readable, that solution is definitely the Pythonic champ. :)

Well I really had nothing against the filter, but this solution looks
also acceptable.

Thanks.
Ben.
 
P

Paul Rudin

> I've been following this thread, and on a whim I built a test
> harness to time the different ideas that have been put forth in
> this thread. I will post complete results tomorrow on the web
> but the short version is that using the .replace method is the
> overall champ by quite a bit. Below is the function I tested
> against the others in the harness:
> def stringReplace(s,c): """Remove any occurrences of characters
> in c, from string s s - string to be filtered, c - characters to
> filter""" for a in c: s = s.replace(a,'') return s
> It wins also by being easy to understand, no filter or lambda.
> Not that I have anything against filter or lambda, but when the
> speediest method is the most readable, that solution is
> definitely the Pythonic champ. :)

I haven't been following this thread closely but isn't a regexp the
obvious way to do this? I'd expect it to be faster than your solution
- particularly on large input (although I haven't actually
tried). Arguably it's more pythonic too :)

re.compile(r).sub('',s)

where r is the obvious disjunctive regexp mentioning each of the
charaters you want to remove. If you want to construct such a regexp
from a list of characters:

r= reduce(lambda x,y: x+'|'+y, c,'')[1:]

So putting it all together as an alternative version of your fuction:


!!warning - untested code!!

import re

def stringReplace(s,c):
r= reduce(lambda x,y: x+'|'+y, c,'')[1:]
return re.compile(r).sub('',s)
 
P

Paul Rudin

> r= reduce(lambda x,y: x+'|'+y, c,'')[1:]

It occurs to me that this isn't what you want if c contains special
regexp chararacters so really it should be:

r= reduce(lambda x,y: re.escape(x)+'|'+re.escape(y), c,'')[1:]
> So putting it all together as an alternative version of your
> fuction:

> !!warning - untested code!!
> import re
> def stringReplace(s,c):
> r= reduce(lambda x,y: x+'|'+y, c,'')[1:]

r= reduce(lambda x,y: re.escape(x)+'|'+re.escape(y), c,'')[1:]
 
P

Paul Rudin

> r= reduce(lambda x,y: x+'|'+y, c,'')[1:]

It occurs to me that this isn't what you want if c contains special
regexp chararacters so really it should be:

r= reduce(lambda x,y: x+'|'+y, map(re.escape,c),'')[1:]
> So putting it all together as an alternative version of your
> fuction:

> !!warning - untested code!!
> import re
> def stringReplace(s,c):
> r= reduce(lambda x,y: x+'|'+y, c,'')[1:]

r= reduce(lambda x,y: x+'|'+y, map(re.escape,c),'')[1:]
 
B

Bengt Richter

Bengt said:
====< removechars.py >========================================================
def removeChars(s, remove=''):
return s.translate(
'\x00\x01\x02\x03\x04\x05\x06\x07\x08\t\n\x0b\x0c\r\x0e\x0f'
'\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f'
' !"#$%&\'()*+,-./'
'0123456789:;<=>?'
'@ABCDEFGHIJKLMNO'
'PQRSTUVWXYZ[\\]^_'
'`abcdefghijklmno'
'pqrstuvwxyz{|}~\x7f'
'\x80\x81\x82\x83\x84\x85\x86\x87\x88\x89\x8a\x8b\x8c\x8d\x8e\x8f'
'\x90\x91\x92\x93\x94\x95\x96\x97\x98\x99\x9a\x9b\x9c\x9d\x9e\x9f'
'\xa0\xa1\xa2\xa3\xa4\xa5\xa6\xa7\xa8\xa9\xaa\xab\xac\xad\xae\xaf'
'\xb0\xb1\xb2\xb3\xb4\xb5\xb6\xb7\xb8\xb9\xba\xbb\xbc\xbd\xbe\xbf'
'\xc0\xc1\xc2\xc3\xc4\xc5\xc6\xc7\xc8\xc9\xca\xcb\xcc\xcd\xce\xcf'
'\xd0\xd1\xd2\xd3\xd4\xd5\xd6\xd7\xd8\xd9\xda\xdb\xdc\xdd\xde\xdf'
'\xe0\xe1\xe2\xe3\xe4\xe5\xe6\xe7\xe8\xe9\xea\xeb\xec\xed\xee\xef'
'\xf0\xf1\xf2\xf3\xf4\xf5\xf6\xf7\xf8\xf9\xfa\xfb\xfc\xfd\xfe\xff'
, remove)
<snip>
It looks to me like serious overkill. If I would put in a method like
that somewhere in my code, my colleagues would never talk to me again ;)
What looks like overkill? The 256-character string literal? It is easy to explain,
and if you want utility functions like this, you can put them in a module and
your colleagues may never see beyond the import statement and the calls, unless
they are interested. It should have a doc-string though, which could advise using
help(str.translate) for further info ;-)

Note that the code for removeChars is not much to execute (29 bytecode bytes?), since the constant
is pre-defined (and the 9 bytes for SET_LINENOs could be optimized out):
0 SET_LINENO 1

3 SET_LINENO 2
6 LOAD_FAST 0 (s)
9 LOAD_ATTR 1 (translate)
12 LOAD_CONST 1 ('\x00\x01\x02\x03\x04\x05\x06\x07\x08\t\n\x0b\x0c\r\x0e\
x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f !"#$%&\'()*+,-./0123456789:;
<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~\x7f\x80\x81\x82\x83\x84\x85
\x86\x87\x88\x89\x8a\x8b\x8c\x8d\x8e\x8f\x90\x91\x92\x93\x94\x95\x96\x97\x98\x99\x9a\x9b\x9c\x9d
\x9e\x9f\xa0\xa1\xa2\xa3\xa4\xa5\xa6\xa7\xa8\xa9\xaa\xab\xac\xad\xae\xaf\xb0\xb1\xb2\xb3\xb4\xb5
\xb6\xb7\xb8\xb9\xba\xbb\xbc\xbd\xbe\xbf\xc0\xc1\xc2\xc3\xc4\xc5\xc6\xc7\xc8\xc9\xca\xcb\xcc\xcd
\xce\xcf\xd0\xd1\xd2\xd3\xd4\xd5\xd6\xd7\xd8\xd9\xda\xdb\xdc\xdd\xde\xdf\xe0\xe1\xe2\xe3\xe4\xe5
\xe6\xe7\xe8\xe9\xea\xeb\xec\xed\xee\xef\xf0\xf1\xf2\xf3\xf4\xf5\xf6\xf7\xf8\xf9\xfa\xfb\xfc\xfd
\xfe\xff')

15 SET_LINENO 19
18 LOAD_FAST 1 (remove)
21 CALL_FUNCTION 2
24 RETURN_VALUE
25 LOAD_CONST 0 (None)
28 RETURN_VALUE

(I believe the last two codes are never executed, just vestigial code-generation by-product
for the case of non-explicit returns at the end).

If you want to get rid of the extras, you can do python -OO and use lambda to get rid of
the vestigial return code:
... '\x00\x01\x02\x03\x04\x05\x06\x07\x08\t\n\x0b\x0c\r\x0e\x0f'
<snip>
... '\xf0\xf1\xf2\xf3\xf4\xf5\xf6\xf7\xf8\xf9\xfa\xfb\xfc\xfd\xfe\xff'
... , remove)
Gets you:
0 LOAD_FAST 0 (s)
3 LOAD_ATTR 1 (translate)
6 LOAD_CONST 1 ('\x00\x01\x02\x03\x04\x05\x06\x07\x08\t\n\x0b\x0c\r\x0e\
x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f !"#$%&\'()*+,-./0123456789:;
<snip>
\xfe\xff')
9 LOAD_FAST 1 (remove)
12 CALL_FUNCTION 2
15 RETURN_VALUE

What do you mean by overkill? ;-)

Regards,
Bengt Richter
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,535
Members
45,007
Latest member
obedient dusk

Latest Threads

Top