A curious bit of code...

forman.simon · Feb 13, 2014

I ran across this and I thought there must be a better way of doing it, but then after further consideration I wasn't so sure.

if key[:1] + key[-1:] == '<>': ...

Some possibilities that occurred to me:

if key.startswith('<') and key.endswith('>'): ...

and:

if (key[:1], key[-1:]) == ('<', '>'): ...

I haven't run these through a profiler yet, but it seems like the original might be the fastest after all?

Ethan Furman · Feb 13, 2014

I ran across this and I thought there must be a better way of doing it, but then after further consideration I wasn't so sure.

if key[:1] + key[-1:] == '<>': ...

Some possibilities that occurred to me:

if key.startswith('<') and key.endswith('>'): ...

and:

if (key[:1], key[-1:]) == ('<', '>'): ...

I haven't run these through a profiler yet, but it seems like the original might be the fastest after all?

Unless that line of code is a bottleneck, don't worry about speed, go for readability. In which case I'd go with the
second option, then the first, and definitely avoid the third.

Roy Smith · Feb 13, 2014

I ran across this and I thought there must be a better way of doing it, but
then after further consideration I wasn't so sure.

if key[:1] + key[-1:] == '<>': ...

Some possibilities that occurred to me:

if key.startswith('<') and key.endswith('>'): ...

and:

if (key[:1], key[-1:]) == ('<', '>'): ...

if re.match(r'^<.*>$', key):

sheesh.

(if you care how fast it is, pre-compile the pattern)

Alain Ketterlin · Feb 13, 2014

I ran across this and I thought there must be a better way of doing
it, but then after further consideration I wasn't so sure.

if key[:1] + key[-1:] == '<>': ...

Some possibilities that occurred to me:

if key.startswith('<') and key.endswith('>'): ...

and:

if (key[:1], key[-1:]) == ('<', '>'): ...

I would do: if key[0] == '<' and key[-1] == '>' ...

-- Alain.

Mark Lawrence · Feb 13, 2014

I ran across this and I thought there must be a better way of doing it, but then after further consideration I wasn't so sure.

if key[:1] + key[-1:] == '<>': ...

Some possibilities that occurred to me:

if key.startswith('<') and key.endswith('>'): ...

and:

if (key[:1], key[-1:]) == ('<', '>'): ...

I haven't run these through a profiler yet, but it seems like the original might be the fastest after all?

All I can say is that if you're worried about the speed of a single line
of code like the above then you've got problems. Having said that, I
suspect that using an index to extract a single character has to be
faster than using a slice, but I haven't run these through a profiler yet

Neil Cerutti · Feb 13, 2014

I ran across this and I thought there must be a better way of
doing it, but then after further consideration I wasn't so
sure.

if key[:1] + key[-1:] == '<>': ...

Some possibilities that occurred to me:

if key.startswith('<') and key.endswith('>'): ...

and:

if (key[:1], key[-1:]) == ('<', '>'): ...

I haven't run these through a profiler yet, but it seems like
the original might be the fastest after all?

I think the following would occur to someone first:

if key[0] == '<' and key[-1] == '>':
...

It is wrong to avoid the obvious. Needlessly ornate or clever
code will only irritate the person who has to read it later; most
likely yourself.

Ethan Furman · Feb 13, 2014

All I can say is that if you're worried about the speed of a single line of code like the above then you've got
problems. Having said that, I suspect that using an index to extract a single character has to be faster than using a
slice, but I haven't run these through a profiler yet

The problem with using indices in the code sample is that if the string is 0 or 1 characters long you'll get an
exception instead of a False.

Ethan Furman · Feb 13, 2014

I ran across this and I thought there must be a better way of
doing it, but then after further consideration I wasn't so
sure.

if key[:1] + key[-1:] == '<>': ...

Some possibilities that occurred to me:

if key.startswith('<') and key.endswith('>'): ...

and:

if (key[:1], key[-1:]) == ('<', '>'): ...

I haven't run these through a profiler yet, but it seems like
the original might be the fastest after all?

Click to expand...

I think the following would occur to someone first:

if key[0] == '<' and key[-1] == '>':
...

It is wrong to avoid the obvious. Needlessly ornate or clever
code will only irritate the person who has to read it later; most
likely yourself.

Not whet the obvious is wrong:

-> key = ''
--> if key[0] == '<' and key[-1] == '>':
.... print "good key!"
.... else:
.... print "bad key"
....
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
IndexError: string index out of range

Ethan Furman · Feb 13, 2014

The problem with using indices in the code sample is that if the
string is 0 or 1 characters long you'll get an exception instead
of a False.

Oops, make that zero characters.

Neil Cerutti · Feb 13, 2014

The problem with using indices in the code sample is that if
the string is 0 or 1 characters long you'll get an exception
instead of a False.

There will be an exception only if it is zero-length. But good
point! That's a pretty sneaky way to avoid checking for a
zero-length string. Is it a popular idiom?

Roy Smith · Feb 13, 2014

Ethan Furman said:
The problem with using indices in the code sample is that if the string is 0
or 1 characters long you'll get an
exception instead of a False.

My re.match() solution handles those edge cases just fine.

Mark Lawrence · Feb 13, 2014

There will be an exception only if it is zero-length. But good
point! That's a pretty sneaky way to avoid checking for a
zero-length string. Is it a popular idiom?

I hope not.

Peter Otten · Feb 13, 2014

I ran across this and I thought there must be a better way of doing it,
but then after further consideration I wasn't so sure.

if key[:1] + key[-1:] == '<>': ...

Some possibilities that occurred to me:

if key.startswith('<') and key.endswith('>'): ...

and:

if (key[:1], key[-1:]) == ('<', '>'): ...

I haven't run these through a profiler yet, but it seems like the original
might be the fastest after all?

$ python -m timeit -s 's = "<alpha>"' 's[:1]+s[-1:] == "<>"'
1000000 loops, best of 3: 0.37 usec per loop

$ python -m timeit -s 's = "<alpha>"' 's[:1] == "<" and s[-1:] == ">"'
1000000 loops, best of 3: 0.329 usec per loop

$ python -m timeit -s 's = "<alpha>"' 's.startswith("<") and
s.endswith(">")'
1000000 loops, best of 3: 0.713 usec per loop

The first is too clever for my taste.

The second is fast and easy to understand. It might attract "improvements"
replacing the slice with an index, but I trust you will catch that with your
unit tests

Personally, I'm willing to spend the few extra milliseconds and use the
foolproof third.

Neil Cerutti · Feb 13, 2014

(e-mail address removed) wrote:
The first is too clever for my taste.

The second is fast and easy to understand. It might attract
"improvements" replacing the slice with an index, but I trust
you will catch that with your unit tests

It's easy to forget exactly why startswith and endswith even exist.

Marko Rauhamaa · Feb 13, 2014

Peter Otten said:
Personally, I'm willing to spend the few extra milliseconds and use
the foolproof third.

Speaking of foolproof, what is this "key?" Is it an XML start tag,
maybe? Then, how does your test fare with, say,

<start comparison=">
">

which is equivalent to

<start comparison=">">

Marko

Ethan Furman · Feb 13, 2014

I ran across this and I thought there must be a better way of doing it,
but then after further consideration I wasn't so sure.

if key[:1] + key[-1:] == '<>': ...

Some possibilities that occurred to me:

if key.startswith('<') and key.endswith('>'): ...

and:

if (key[:1], key[-1:]) == ('<', '>'): ...

I haven't run these through a profiler yet, but it seems like the original
might be the fastest after all?

Click to expand...

$ python -m timeit -s 's = "<alpha>"' 's[:1]+s[-1:] == "<>"'
1000000 loops, best of 3: 0.37 usec per loop

$ python -m timeit -s 's = "<alpha>"' 's[:1] == "<" and s[-1:] == ">"'
1000000 loops, best of 3: 0.329 usec per loop

$ python -m timeit -s 's = "<alpha>"' 's.startswith("<") and
s.endswith(">")'
1000000 loops, best of 3: 0.713 usec per loop

The first is too clever for my taste.

The second is fast and easy to understand. It might attract "improvements"
replacing the slice with an index, but I trust you will catch that with your
unit tests

Personally, I'm willing to spend the few extra milliseconds and use the
foolproof third.

For completeness:

# the slowest method from Peter
$ python -m timeit -s 's = "<alpha>"' 's.startswith("<") and s.endswith(">")'
1000000 loops, best of 3: 0.309 usec per loop

# the re method from Roy
$ python -m timeit -s "import re;pattern=re.compile(r'^<.*>$');s = '<alpha>'" "pattern.match(s)"
1000000 loops, best of 3: 0.466 usec per loop

Zachary Ware · Feb 13, 2014

I ran across this and I thought there must be a better way of doing it, but then after further consideration I wasn't so sure.

if key[:1] + key[-1:] == '<>': ...

Some possibilities that occurred to me:

if key.startswith('<') and key.endswith('>'): ...

and:

if (key[:1], key[-1:]) == ('<', '>'): ...

I haven't run these through a profiler yet, but it seems like the original might be the fastest after all?

In a fit of curiosity, I did some timings:

'and'ed indexing:

C:\tmp>py -m timeit -s "key = '<test>'" "key[0] == '<' and key[-1] == '>'"
1000000 loops, best of 3: 0.35 usec per loop

C:\tmp>py -m timeit -s "key = '<test'" "key[0] == '<' and key[-1] == '>'"
1000000 loops, best of 3: 0.398 usec per loop

C:\tmp>py -m timeit -s "key = 'test>'" "key[0] == '<' and key[-1] == '>'"
1000000 loops, best of 3: 0.188 usec per loop

C:\tmp>py -m timeit -s "key = 'test'" "key[0] == '<' and key[-1] == '>'"
10000000 loops, best of 3: 0.211 usec per loop

C:\tmp>py -m timeit -s "key = ''" "key[0] == '<' and key[-1] == '>'"
Traceback (most recent call last):
File "P:\Python34\lib\timeit.py", line 292, in main
x = t.timeit(number)
File "P:\Python34\lib\timeit.py", line 178, in timeit
timing = self.inner(it, self.timer)
File "<timeit-src>", line 6, in inner
key[0] == '<' and key[-1] == '>'
IndexError: string index out of range

Slice concatenation:

C:\tmp>py -m timeit -s "key = '<test>'" "key[:1] + key[-1:] == '<>'"
1000000 loops, best of 3: 0.649 usec per loop

C:\tmp>py -m timeit -s "key = '<test'" "key[:1] + key[-1:] == '<>'"
1000000 loops, best of 3: 0.7 usec per loop

C:\tmp>py -m timeit -s "key = 'test>'" "key[:1] + key[-1:] == '<>'"
1000000 loops, best of 3: 0.663 usec per loop

C:\tmp>py -m timeit -s "key = 'test'" "key[:1] + key[-1:] == '<>'"
1000000 loops, best of 3: 0.665 usec per loop

C:\tmp>py -m timeit -s "key = ''" "key[:1] + key[-1:] == '<>'"
1000000 loops, best of 3: 0.456 usec per loop

String methods:

C:\tmp>py -m timeit -s "key = '<test>'" "key.startswith('<') and
key.endswith('>')"
1000000 loops, best of 3: 1.03 usec per loop

C:\tmp>py -m timeit -s "key = '<test'" "key.startswith('<') and
key.endswith('>')"
1000000 loops, best of 3: 1.02 usec per loop

C:\tmp>py -m timeit -s "key = 'test>'" "key.startswith('<') and
key.endswith('>')"
1000000 loops, best of 3: 0.504 usec per loop

C:\tmp>py -m timeit -s "key = 'test'" "key.startswith('<') and
key.endswith('>')"
1000000 loops, best of 3: 0.502 usec per loop

C:\tmp>py -m timeit -s "key = ''" "key.startswith('<') and key.endswith('>')"
1000000 loops, best of 3: 0.49 usec per loop

Tuple comparison:

C:\tmp>py -m timeit -s "key = '<test>'" "(key[:1], key[-1:]) == ('<', '>')"
1000000 loops, best of 3: 0.629 usec per loop

C:\tmp>py -m timeit -s "key = '<test'" "(key[:1], key[-1:]) == ('<', '>')"
1000000 loops, best of 3: 0.689 usec per loop

C:\tmp>py -m timeit -s "key = 'test>'" "(key[:1], key[-1:]) == ('<', '>')"
1000000 loops, best of 3: 0.676 usec per loop

C:\tmp>py -m timeit -s "key = 'test'" "(key[:1], key[-1:]) == ('<', '>')"
1000000 loops, best of 3: 0.675 usec per loop

C:\tmp>py -m timeit -s "key = ''" "(key[:1], key[-1:]) == ('<', '>')"
1000000 loops, best of 3: 0.608 usec per loop

re.match():

C:\tmp>py -m timeit -s "import re;key = '<test>'" "re.match(r'^<.*>$', key)"
100000 loops, best of 3: 3.39 usec per loop

C:\tmp>py -m timeit -s "import re;key = '<test'" "re.match(r'^<.*>$', key)"
100000 loops, best of 3: 3.27 usec per loop

C:\tmp>py -m timeit -s "import re;key = 'test>'" "re.match(r'^<.*>$', key)"
100000 loops, best of 3: 2.94 usec per loop

C:\tmp>py -m timeit -s "import re;key = 'test'" "re.match(r'^<.*>$', key)"
100000 loops, best of 3: 2.97 usec per loop

C:\tmp>py -m timeit -s "import re;key = ''" "re.match(r'^<.*>$', key)"
100000 loops, best of 3: 2.97 usec per loop

Pre-compiled re:

C:\tmp>py -m timeit -s "import re;r = re.compile(r'^<.*>$');key =
'<test>'" "r.match(key)"
1000000 loops, best of 3: 0.932 usec per loop

C:\tmp>py -m timeit -s "import re;r = re.compile(r'^<.*>$');key =
'<test'" "r.match(key)"
1000000 loops, best of 3: 0.79 usec per loop

C:\tmp>py -m timeit -s "import re;r = re.compile(r'^<.*>$');key =
'test>'" "r.match(key)"
1000000 loops, best of 3: 0.718 usec per loop

C:\tmp>py -m timeit -s "import re;r = re.compile(r'^<.*>$');key =
'test'" "r.match(key)"
1000000 loops, best of 3: 0.755 usec per loop

C:\tmp>py -m timeit -s "import re;r = re.compile(r'^<.*>$');key = ''"
"r.match(key)"
1000000 loops, best of 3: 0.731 usec per loop

Pre-compiled re with pre-fetched method:

C:\tmp>py -m timeit -s "import re;m = re.compile(r'^<.*>$').match;key
= '<test>'" "m(key)"
1000000 loops, best of 3: 0.777 usec per loop

C:\tmp>py -m timeit -s "import re;m = re.compile(r'^<.*>$').match;key
= '<test'" "m(key)"
1000000 loops, best of 3: 0.65 usec per loop

C:\tmp>py -m timeit -s "import re;m = re.compile(r'^<.*>$').match;key
= 'test>'" "m(key)"
1000000 loops, best of 3: 0.652 usec per loop

C:\tmp>py -m timeit -s "import re;m = re.compile(r'^<.*>$').match;key
= 'test'" "m(key)"
1000000 loops, best of 3: 0.576 usec per loop

C:\tmp>py -m timeit -s "import re;m = re.compile(r'^<.*>$').match;key
= ''" "m(key)"
1000000 loops, best of 3: 0.58 usec per loop

And the winner is:

C:\tmp>py -m timeit -s "key = '<test>'" "key and key[0] == '<' and
key[-1] == '>'"
1000000 loops, best of 3: 0.388 usec per loop

C:\tmp>py -m timeit -s "key = '<test'" "key and key[0] == '<' and
key[-1] == '>'"
1000000 loops, best of 3: 0.413 usec per loop

C:\tmp>py -m timeit -s "key = 'test>'" "key and key[0] == '<' and
key[-1] == '>'"
1000000 loops, best of 3: 0.219 usec per loop

C:\tmp>py -m timeit -s "key = 'test'" "key and key[0] == '<' and key[-1] == '>'"
1000000 loops, best of 3: 0.215 usec per loop

C:\tmp>py -m timeit -s "key = ''" "key and key[0] == '<' and key[-1] == '>'"
10000000 loops, best of 3: 0.0481 usec per loop

So, the moral of the story? Use short-circuit logic wherever you can,
don't use re for simple stuff (because while it may be very fast, it's
dominated by attribute lookup and function call overhead), and unless
you expect to be doing this test many many millions of times in a very
short space of time, go for readability over performance.

Chris Angelico · Feb 13, 2014

I hope not.

The use of slicing rather than indexing to avoid problems when the
string's too short? I don't know about popular, but I've certainly
used it a good bit. For the specific case of string comparisons you
can use startswith/endswith, but slicing works with other types as
well.

Also worth noting:

Python 2.7.4 (default, Apr 6 2013, 19:54:46) [MSC v.1500 32 bit
(Intel)] on win32
Type "copyright", "credits" or "license()" for more information.

s1,s2=b"asdf",u"asdf"
s1[:1],s2[:1] ('a', u'a')
s1[0],s2[0]

Click to expand...

Click to expand...

('a', u'a')

Python 3.4.0b2 (v3.4.0b2:ba32913eb13e, Jan 5 2014, 16:23:43) [MSC
v.1600 32 bit (Intel)] on win32
Type "copyright", "credits" or "license()" for more information.

s1,s2=b"asdf",u"asdf"
s1[:1],s2[:1] (b'a', 'a')
s1[0],s2[0]

Click to expand...

Click to expand...

(97, 'a')

When you slice, you get back the same type as you started with. (Also
true of lists, tuples, and probably everything else that can be
sliced.) When you index, you might not; strings are a special case
(since Python lacks a "character" type), and if your code has to run
on Py2 and Py3, byte strings stop being that special case in Py3. So
if you're working with a byte string, it might be worth slicing rather
than indexing. (Though you can still use startswith/endswith, if they
suit your purpose.)

ChrisA

Tim Chase · Feb 13, 2014

I ran across this and I thought there must be a better way of doing
it, but then after further consideration I wasn't so sure.

Some possibilities that occurred to me:

if key.startswith('<') and key.endswith('>'): ...

This is my favorite because it doesn't break on the empty string like
some of your alternatives. Your k[0] and k[-1] assume there's at
least one character in the string, otherwise an IndexError is raised.

-tkc

Emile van Sebille · Feb 13, 2014

In a fit of curiosity, I did some timings:

Snip of lots of TMTOWTDT/TIMTOWTDI/whatever... timed examples

But I didn't see this one:

s[::len(s)-1]

Emile

A curious bit of code...	0	Feb 14, 2014
How to try a range of hex values in C# code ?	0	Nov 19, 2022
Code to fill a form	1	Dec 2, 2021
Can anyone help me code a simple python code?	1	Mar 13, 2022
Curious to see alternate approach on a search/replace via regex	6	Feb 6, 2013
Trouble with prediction code, for the life of me I can't figure out why it isnt running properly. Help would be appreciated.	0	Jul 8, 2023
Hi everyone, can i have your help on this bit of code	0	Nov 19, 2016
Trying to understand 'import' a bit better	0	Mar 4, 2012

A curious bit of code...

forman.simon

Ethan Furman

Roy Smith

Alain Ketterlin

Mark Lawrence

Neil Cerutti

Ethan Furman

Ethan Furman

Ethan Furman

Neil Cerutti

Roy Smith

Mark Lawrence

Peter Otten

Neil Cerutti

Marko Rauhamaa

Ethan Furman

Zachary Ware

Chris Angelico

Tim Chase

Emile van Sebille

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads