strip() using strings instead of chars

Christoph Zwerschke · Jul 11, 2008

In Python programs, you will quite frequently find code like the
following for removing a certain prefix from a string:

if url.startswith('http://'):
url = url[7:]

Similarly for stripping suffixes:

if filename.endswith('.html'):
filename = filename[:-5]

My problem with this is that it's cumbersome and error prone to count
the number of chars of the prefix or suffix. If you want to change it
from 'http://' to 'https://', you must not forget to change the 7 to 8.
If you write len('http://') instead of the 7, you see this is actually
a DRY problem.

Things get even worse if you have several prefixes to consider:

if url.startswith('http://'):
url = url[7:]
elif url.startswith('https://'):
url = url[8:]

You can't take use of url.startswith(('http://', 'https://')) here.

Here is another concrete example taken from the standard lib:

if chars.startswith(BOM_UTF8):
chars = chars[3:].decode("utf-8")

This avoids hardcoding the BOM_UTF8, but its length is still hardcoded,
and the programmer had to know it or look it up when writing this line.

So my suggestion is to add another string method, say "stripstr" that
behaves like "strip", but instead of stripping *characters* strips
*strings* (similarly for lstrip and rstrip). Then in the case above,
you could simply write url = url.lstripstr('http://') or
url = url.lstripstr(('http://', 'https://')).

The new function would actually comprise the old strip function, you
would have strip('aeiou') == stripstr(set('aeio')).

Instead of a new function, we could also add another parameter to strip
(lstrip, rstrip) for passing strings or changing the behavior, or we
could create functions with the signature of startswith and endswith
which instead of only checking whether the string starts or ends with
the substring, remove the substring (startswith and endswith have
additional "start" and "end" index parameters that may be useful).

Or did I overlook anything and there is already a good idiom for this?

Btw, in most other languages, "strip" is called "trim" and behaves
like Python's strip, i.e. considers the parameter as a set of chars.
There is one notable exception: In MySQL, trim behaves like stripstr
proposed above (differently to SQLite, PostgreSQL and Oracle).

-- Christoph

Bruno Desthuilliers · Jul 11, 2008

Christoph Zwerschke a écrit :

In Python programs, you will quite frequently find code like the
following for removing a certain prefix from a string:

if url.startswith('http://'):
url = url[7:]

DRY/SPOT violation. Should be written as :

prefix = 'http://'
if url.startswith(prefix):
url = url[len(prefix):]

(snip)

My problem with this is that it's cumbersome and error prone to count
the number of chars of the prefix or suffix.

cf above

If you want to change it
from 'http://' to 'https://', you must not forget to change the 7 to 8.
If you write len('http://') instead of the 7, you see this is actually
a DRY problem.

cf above

Things get even worse if you have several prefixes to consider:

if url.startswith('http://'):
url = url[7:]
elif url.startswith('https://'):
url = url[8:]

You can't take use of url.startswith(('http://', 'https://')) here.

for prefix in ('http://', 'https://'):
if url.startswith(prefix):
url = url[len(prefix):]
break

For most complex use case, you may want to consider regexps,
specifically re.sub:

>>> import re
>>> pat = re.compile(r"(^https?://|\.txt$)")
>>> urls = ['http://toto.com', 'https://titi.com', 'tutu.com', 'file://tata.txt']
>>> [pat.sub('', u) for u in urls]

Click to expand...

Click to expand...

['toto.com', 'titi.com', 'tutu.com', 'file://tata']

Not to dismiss your suggestion, but I thought you might like to know how
to solve your problem with what's currently available !-)

Christoph Zwerschke · Jul 11, 2008

Bruno said:
DRY/SPOT violation. Should be written as :

prefix = 'http://'
if url.startswith(prefix):
url = url[len(prefix):]

That was exactly my point. This formulation is a bit better, but it
still violates DRY, because you need to type "prefix" two times. It is
exactly this idiom that I see so often and that I wanted to simplify.
Your suggestions work, but I somehow feel such a simple task should have
a simpler formulation in Python, i.e. something like

url = url.lstripstr(('http://', 'https://'))

instead of

for prefix in ('http://', 'https://'):
if url.startswith(prefix):
url = url[len(prefix):]
break

-- Christoph

Marc 'BlackJack' Rintsch · Jul 11, 2008

Bruno said:
Bruno said:

DRY/SPOT violation. Should be written as :

prefix = 'http://'
if url.startswith(prefix):
url = url[len(prefix):]

Click to expand...

That was exactly my point. This formulation is a bit better, but it
still violates DRY, because you need to type "prefix" two times. It is
exactly this idiom that I see so often and that I wanted to simplify.
Your suggestions work, but I somehow feel such a simple task should have
a simpler formulation in Python, i.e. something like

url = url.lstripstr(('http://', 'https://'))

I would prefer a name like `remove_prefix()` instead of a variant with
`strip` and abbreviations in it.

Ciao,
Marc 'BlackJack' Rintsch

Christoph Zwerschke · Jul 12, 2008

Duncan said:
if url.startswith('http://'):
url = url[7:]

Click to expand...

If I came across this code I'd want to know why they weren't using
urlparse.urlsplit()...

Right, such code can have a smell since in the case of urls, file names,
config options etc. there are specialized functions available. But I'm
not sure whether the need for removing string prefix/suffixes in general
is really so rare that we shouldn't worry to offer a simpler solution.

-- Christoph

Play a game locally instead of using the game's server.	0	Dec 4, 2020
'string'.strip(chars)-like function that removes from the middle?	3	Jun 16, 2008
argparse parser stores lists instead of strings	1	Apr 28, 2011
Survey details won't go through using php, ajax, Mysql	0	Oct 26, 2023
I would like to use awk to calculate the total number of records processed	1	Aug 25, 2022
Problem with the strip string method	6	Mar 2, 2008
Strip lines from files	1	Jan 7, 2008
Context Manager getting str instead of AttributeError instance	0	Mar 15, 2012

strip() using strings instead of chars

Christoph Zwerschke

Bruno Desthuilliers

Christoph Zwerschke

Marc 'BlackJack' Rintsch

Christoph Zwerschke

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads