A curious bit of code...

F

forman.simon

I ran across this and I thought there must be a better way of doing it, but then after further consideration I wasn't so sure.

if key[:1] + key[-1:] == '<>': ...


Some possibilities that occurred to me:

if key.startswith('<') and key.endswith('>'): ...

and:

if (key[:1], key[-1:]) == ('<', '>'): ...


I haven't run these through a profiler yet, but it seems like the original might be the fastest after all?
 
E

Ethan Furman

I ran across this and I thought there must be a better way of doing it, but then after further consideration I wasn't so sure.

if key[:1] + key[-1:] == '<>': ...


Some possibilities that occurred to me:

if key.startswith('<') and key.endswith('>'): ...

and:

if (key[:1], key[-1:]) == ('<', '>'): ...


I haven't run these through a profiler yet, but it seems like the original might be the fastest after all?

Unless that line of code is a bottleneck, don't worry about speed, go for readability. In which case I'd go with the
second option, then the first, and definitely avoid the third.
 
R

Roy Smith

I ran across this and I thought there must be a better way of doing it, but
then after further consideration I wasn't so sure.

if key[:1] + key[-1:] == '<>': ...


Some possibilities that occurred to me:

if key.startswith('<') and key.endswith('>'): ...

and:

if (key[:1], key[-1:]) == ('<', '>'): ...

if re.match(r'^<.*>$', key):

sheesh.

(if you care how fast it is, pre-compile the pattern)
 
A

Alain Ketterlin

I ran across this and I thought there must be a better way of doing
it, but then after further consideration I wasn't so sure.

if key[:1] + key[-1:] == '<>': ...

Some possibilities that occurred to me:

if key.startswith('<') and key.endswith('>'): ...

and:

if (key[:1], key[-1:]) == ('<', '>'): ...

I would do: if key[0] == '<' and key[-1] == '>' ...

-- Alain.
 
M

Mark Lawrence

I ran across this and I thought there must be a better way of doing it, but then after further consideration I wasn't so sure.

if key[:1] + key[-1:] == '<>': ...


Some possibilities that occurred to me:

if key.startswith('<') and key.endswith('>'): ...

and:

if (key[:1], key[-1:]) == ('<', '>'): ...


I haven't run these through a profiler yet, but it seems like the original might be the fastest after all?

All I can say is that if you're worried about the speed of a single line
of code like the above then you've got problems. Having said that, I
suspect that using an index to extract a single character has to be
faster than using a slice, but I haven't run these through a profiler yet :)
 
N

Neil Cerutti

I ran across this and I thought there must be a better way of
doing it, but then after further consideration I wasn't so
sure.

if key[:1] + key[-1:] == '<>': ...

Some possibilities that occurred to me:

if key.startswith('<') and key.endswith('>'): ...

and:

if (key[:1], key[-1:]) == ('<', '>'): ...


I haven't run these through a profiler yet, but it seems like
the original might be the fastest after all?

I think the following would occur to someone first:

if key[0] == '<' and key[-1] == '>':
...

It is wrong to avoid the obvious. Needlessly ornate or clever
code will only irritate the person who has to read it later; most
likely yourself.
 
E

Ethan Furman

All I can say is that if you're worried about the speed of a single line of code like the above then you've got
problems. Having said that, I suspect that using an index to extract a single character has to be faster than using a
slice, but I haven't run these through a profiler yet :)

The problem with using indices in the code sample is that if the string is 0 or 1 characters long you'll get an
exception instead of a False.
 
E

Ethan Furman

I ran across this and I thought there must be a better way of
doing it, but then after further consideration I wasn't so
sure.

if key[:1] + key[-1:] == '<>': ...

Some possibilities that occurred to me:

if key.startswith('<') and key.endswith('>'): ...

and:

if (key[:1], key[-1:]) == ('<', '>'): ...


I haven't run these through a profiler yet, but it seems like
the original might be the fastest after all?

I think the following would occur to someone first:

if key[0] == '<' and key[-1] == '>':
...

It is wrong to avoid the obvious. Needlessly ornate or clever
code will only irritate the person who has to read it later; most
likely yourself.

Not whet the obvious is wrong:

-> key = ''
--> if key[0] == '<' and key[-1] == '>':
.... print "good key!"
.... else:
.... print "bad key"
....
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
IndexError: string index out of range
 
E

Ethan Furman

The problem with using indices in the code sample is that if the
string is 0 or 1 characters long you'll get an exception instead
of a False.

Oops, make that zero characters. ;)
 
N

Neil Cerutti

The problem with using indices in the code sample is that if
the string is 0 or 1 characters long you'll get an exception
instead of a False.

There will be an exception only if it is zero-length. But good
point! That's a pretty sneaky way to avoid checking for a
zero-length string. Is it a popular idiom?
 
R

Roy Smith

Ethan Furman said:
The problem with using indices in the code sample is that if the string is 0
or 1 characters long you'll get an
exception instead of a False.

My re.match() solution handles those edge cases just fine.
 
M

Mark Lawrence

There will be an exception only if it is zero-length. But good
point! That's a pretty sneaky way to avoid checking for a
zero-length string. Is it a popular idiom?

I hope not.
 
P

Peter Otten

I ran across this and I thought there must be a better way of doing it,
but then after further consideration I wasn't so sure.

if key[:1] + key[-1:] == '<>': ...


Some possibilities that occurred to me:

if key.startswith('<') and key.endswith('>'): ...

and:

if (key[:1], key[-1:]) == ('<', '>'): ...


I haven't run these through a profiler yet, but it seems like the original
might be the fastest after all?

$ python -m timeit -s 's = "<alpha>"' 's[:1]+s[-1:] == "<>"'
1000000 loops, best of 3: 0.37 usec per loop

$ python -m timeit -s 's = "<alpha>"' 's[:1] == "<" and s[-1:] == ">"'
1000000 loops, best of 3: 0.329 usec per loop

$ python -m timeit -s 's = "<alpha>"' 's.startswith("<") and
s.endswith(">")'
1000000 loops, best of 3: 0.713 usec per loop

The first is too clever for my taste.

The second is fast and easy to understand. It might attract "improvements"
replacing the slice with an index, but I trust you will catch that with your
unit tests ;)

Personally, I'm willing to spend the few extra milliseconds and use the
foolproof third.
 
N

Neil Cerutti

(e-mail address removed) wrote:
The first is too clever for my taste.

The second is fast and easy to understand. It might attract
"improvements" replacing the slice with an index, but I trust
you will catch that with your unit tests ;)

It's easy to forget exactly why startswith and endswith even exist.
 
M

Marko Rauhamaa

Peter Otten said:
Personally, I'm willing to spend the few extra milliseconds and use
the foolproof third.

Speaking of foolproof, what is this "key?" Is it an XML start tag,
maybe? Then, how does your test fare with, say,

<start comparison=">
">

which is equivalent to

<start comparison="&gt;">


Marko
 
E

Ethan Furman

I ran across this and I thought there must be a better way of doing it,
but then after further consideration I wasn't so sure.

if key[:1] + key[-1:] == '<>': ...


Some possibilities that occurred to me:

if key.startswith('<') and key.endswith('>'): ...

and:

if (key[:1], key[-1:]) == ('<', '>'): ...


I haven't run these through a profiler yet, but it seems like the original
might be the fastest after all?

$ python -m timeit -s 's = "<alpha>"' 's[:1]+s[-1:] == "<>"'
1000000 loops, best of 3: 0.37 usec per loop

$ python -m timeit -s 's = "<alpha>"' 's[:1] == "<" and s[-1:] == ">"'
1000000 loops, best of 3: 0.329 usec per loop

$ python -m timeit -s 's = "<alpha>"' 's.startswith("<") and
s.endswith(">")'
1000000 loops, best of 3: 0.713 usec per loop

The first is too clever for my taste.

The second is fast and easy to understand. It might attract "improvements"
replacing the slice with an index, but I trust you will catch that with your
unit tests ;)

Personally, I'm willing to spend the few extra milliseconds and use the
foolproof third.

For completeness:

# the slowest method from Peter
$ python -m timeit -s 's = "<alpha>"' 's.startswith("<") and s.endswith(">")'
1000000 loops, best of 3: 0.309 usec per loop

# the re method from Roy
$ python -m timeit -s "import re;pattern=re.compile(r'^<.*>$');s = '<alpha>'" "pattern.match(s)"
1000000 loops, best of 3: 0.466 usec per loop
 
Z

Zachary Ware

I ran across this and I thought there must be a better way of doing it, but then after further consideration I wasn't so sure.

if key[:1] + key[-1:] == '<>': ...


Some possibilities that occurred to me:

if key.startswith('<') and key.endswith('>'): ...

and:

if (key[:1], key[-1:]) == ('<', '>'): ...


I haven't run these through a profiler yet, but it seems like the original might be the fastest after all?

In a fit of curiosity, I did some timings:

'and'ed indexing:

C:\tmp>py -m timeit -s "key = '<test>'" "key[0] == '<' and key[-1] == '>'"
1000000 loops, best of 3: 0.35 usec per loop

C:\tmp>py -m timeit -s "key = '<test'" "key[0] == '<' and key[-1] == '>'"
1000000 loops, best of 3: 0.398 usec per loop

C:\tmp>py -m timeit -s "key = 'test>'" "key[0] == '<' and key[-1] == '>'"
1000000 loops, best of 3: 0.188 usec per loop

C:\tmp>py -m timeit -s "key = 'test'" "key[0] == '<' and key[-1] == '>'"
10000000 loops, best of 3: 0.211 usec per loop

C:\tmp>py -m timeit -s "key = ''" "key[0] == '<' and key[-1] == '>'"
Traceback (most recent call last):
File "P:\Python34\lib\timeit.py", line 292, in main
x = t.timeit(number)
File "P:\Python34\lib\timeit.py", line 178, in timeit
timing = self.inner(it, self.timer)
File "<timeit-src>", line 6, in inner
key[0] == '<' and key[-1] == '>'
IndexError: string index out of range


Slice concatenation:

C:\tmp>py -m timeit -s "key = '<test>'" "key[:1] + key[-1:] == '<>'"
1000000 loops, best of 3: 0.649 usec per loop

C:\tmp>py -m timeit -s "key = '<test'" "key[:1] + key[-1:] == '<>'"
1000000 loops, best of 3: 0.7 usec per loop

C:\tmp>py -m timeit -s "key = 'test>'" "key[:1] + key[-1:] == '<>'"
1000000 loops, best of 3: 0.663 usec per loop

C:\tmp>py -m timeit -s "key = 'test'" "key[:1] + key[-1:] == '<>'"
1000000 loops, best of 3: 0.665 usec per loop

C:\tmp>py -m timeit -s "key = ''" "key[:1] + key[-1:] == '<>'"
1000000 loops, best of 3: 0.456 usec per loop


String methods:

C:\tmp>py -m timeit -s "key = '<test>'" "key.startswith('<') and
key.endswith('>')"
1000000 loops, best of 3: 1.03 usec per loop

C:\tmp>py -m timeit -s "key = '<test'" "key.startswith('<') and
key.endswith('>')"
1000000 loops, best of 3: 1.02 usec per loop

C:\tmp>py -m timeit -s "key = 'test>'" "key.startswith('<') and
key.endswith('>')"
1000000 loops, best of 3: 0.504 usec per loop

C:\tmp>py -m timeit -s "key = 'test'" "key.startswith('<') and
key.endswith('>')"
1000000 loops, best of 3: 0.502 usec per loop

C:\tmp>py -m timeit -s "key = ''" "key.startswith('<') and key.endswith('>')"
1000000 loops, best of 3: 0.49 usec per loop


Tuple comparison:

C:\tmp>py -m timeit -s "key = '<test>'" "(key[:1], key[-1:]) == ('<', '>')"
1000000 loops, best of 3: 0.629 usec per loop

C:\tmp>py -m timeit -s "key = '<test'" "(key[:1], key[-1:]) == ('<', '>')"
1000000 loops, best of 3: 0.689 usec per loop

C:\tmp>py -m timeit -s "key = 'test>'" "(key[:1], key[-1:]) == ('<', '>')"
1000000 loops, best of 3: 0.676 usec per loop

C:\tmp>py -m timeit -s "key = 'test'" "(key[:1], key[-1:]) == ('<', '>')"
1000000 loops, best of 3: 0.675 usec per loop

C:\tmp>py -m timeit -s "key = ''" "(key[:1], key[-1:]) == ('<', '>')"
1000000 loops, best of 3: 0.608 usec per loop


re.match():

C:\tmp>py -m timeit -s "import re;key = '<test>'" "re.match(r'^<.*>$', key)"
100000 loops, best of 3: 3.39 usec per loop

C:\tmp>py -m timeit -s "import re;key = '<test'" "re.match(r'^<.*>$', key)"
100000 loops, best of 3: 3.27 usec per loop

C:\tmp>py -m timeit -s "import re;key = 'test>'" "re.match(r'^<.*>$', key)"
100000 loops, best of 3: 2.94 usec per loop

C:\tmp>py -m timeit -s "import re;key = 'test'" "re.match(r'^<.*>$', key)"
100000 loops, best of 3: 2.97 usec per loop

C:\tmp>py -m timeit -s "import re;key = ''" "re.match(r'^<.*>$', key)"
100000 loops, best of 3: 2.97 usec per loop


Pre-compiled re:

C:\tmp>py -m timeit -s "import re;r = re.compile(r'^<.*>$');key =
'<test>'" "r.match(key)"
1000000 loops, best of 3: 0.932 usec per loop

C:\tmp>py -m timeit -s "import re;r = re.compile(r'^<.*>$');key =
'<test'" "r.match(key)"
1000000 loops, best of 3: 0.79 usec per loop

C:\tmp>py -m timeit -s "import re;r = re.compile(r'^<.*>$');key =
'test>'" "r.match(key)"
1000000 loops, best of 3: 0.718 usec per loop

C:\tmp>py -m timeit -s "import re;r = re.compile(r'^<.*>$');key =
'test'" "r.match(key)"
1000000 loops, best of 3: 0.755 usec per loop

C:\tmp>py -m timeit -s "import re;r = re.compile(r'^<.*>$');key = ''"
"r.match(key)"
1000000 loops, best of 3: 0.731 usec per loop


Pre-compiled re with pre-fetched method:

C:\tmp>py -m timeit -s "import re;m = re.compile(r'^<.*>$').match;key
= '<test>'" "m(key)"
1000000 loops, best of 3: 0.777 usec per loop

C:\tmp>py -m timeit -s "import re;m = re.compile(r'^<.*>$').match;key
= '<test'" "m(key)"
1000000 loops, best of 3: 0.65 usec per loop

C:\tmp>py -m timeit -s "import re;m = re.compile(r'^<.*>$').match;key
= 'test>'" "m(key)"
1000000 loops, best of 3: 0.652 usec per loop

C:\tmp>py -m timeit -s "import re;m = re.compile(r'^<.*>$').match;key
= 'test'" "m(key)"
1000000 loops, best of 3: 0.576 usec per loop

C:\tmp>py -m timeit -s "import re;m = re.compile(r'^<.*>$').match;key
= ''" "m(key)"
1000000 loops, best of 3: 0.58 usec per loop


And the winner is:

C:\tmp>py -m timeit -s "key = '<test>'" "key and key[0] == '<' and
key[-1] == '>'"
1000000 loops, best of 3: 0.388 usec per loop

C:\tmp>py -m timeit -s "key = '<test'" "key and key[0] == '<' and
key[-1] == '>'"
1000000 loops, best of 3: 0.413 usec per loop

C:\tmp>py -m timeit -s "key = 'test>'" "key and key[0] == '<' and
key[-1] == '>'"
1000000 loops, best of 3: 0.219 usec per loop

C:\tmp>py -m timeit -s "key = 'test'" "key and key[0] == '<' and key[-1] == '>'"
1000000 loops, best of 3: 0.215 usec per loop

C:\tmp>py -m timeit -s "key = ''" "key and key[0] == '<' and key[-1] == '>'"
10000000 loops, best of 3: 0.0481 usec per loop


So, the moral of the story? Use short-circuit logic wherever you can,
don't use re for simple stuff (because while it may be very fast, it's
dominated by attribute lookup and function call overhead), and unless
you expect to be doing this test many many millions of times in a very
short space of time, go for readability over performance.
 
C

Chris Angelico

I hope not.

The use of slicing rather than indexing to avoid problems when the
string's too short? I don't know about popular, but I've certainly
used it a good bit. For the specific case of string comparisons you
can use startswith/endswith, but slicing works with other types as
well.

Also worth noting:

Python 2.7.4 (default, Apr 6 2013, 19:54:46) [MSC v.1500 32 bit
(Intel)] on win32
Type "copyright", "credits" or "license()" for more information.
s1,s2=b"asdf",u"asdf"
s1[:1],s2[:1] ('a', u'a')
s1[0],s2[0]
('a', u'a')

Python 3.4.0b2 (v3.4.0b2:ba32913eb13e, Jan 5 2014, 16:23:43) [MSC
v.1600 32 bit (Intel)] on win32
Type "copyright", "credits" or "license()" for more information.
s1,s2=b"asdf",u"asdf"
s1[:1],s2[:1] (b'a', 'a')
s1[0],s2[0]
(97, 'a')

When you slice, you get back the same type as you started with. (Also
true of lists, tuples, and probably everything else that can be
sliced.) When you index, you might not; strings are a special case
(since Python lacks a "character" type), and if your code has to run
on Py2 and Py3, byte strings stop being that special case in Py3. So
if you're working with a byte string, it might be worth slicing rather
than indexing. (Though you can still use startswith/endswith, if they
suit your purpose.)

ChrisA
 
T

Tim Chase

I ran across this and I thought there must be a better way of doing
it, but then after further consideration I wasn't so sure.

Some possibilities that occurred to me:

if key.startswith('<') and key.endswith('>'): ...

This is my favorite because it doesn't break on the empty string like
some of your alternatives. Your k[0] and k[-1] assume there's at
least one character in the string, otherwise an IndexError is raised.

-tkc
 
E

Emile van Sebille

In a fit of curiosity, I did some timings:

Snip of lots of TMTOWTDT/TIMTOWTDI/whatever... timed examples :)

But I didn't see this one:

s[::len(s)-1]

Emile
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,774
Messages
2,569,598
Members
45,144
Latest member
KetoBaseReviews
Top