Two questions about efficiency

Steve M · May 26, 2004

1. Near the beginning of the document "Unifying types and classes in
Python 2.2" by GvR, which can be found at
http://www.python.org/2.2.2/descrintro.html, the author writes the
following about subclassing a built in type:
---
Here's an example of a simple dict subclass, which provides a "default
value" that is returned when a missing key is requested:

class defaultdict(dict):

def __init__(self, default=None):
dict.__init__(self)
self.default = default

def __getitem__(self, key):
try:
return dict.__getitem__(self, key)
except KeyError:
return self.default

<snip>

The __getitem__() method could also be written as follows, using the
new "key in dict" test introduced in Python 2.2:

def __getitem__(self, key):
if key in self:
return dict.__getitem__(self, key)
else:
return self.default

I believe that this version is less efficient, because it does the key
lookup twice. The exception would be when we expect that the requested
key is almost never in the dictionary: then setting up the try/except
statement is more expensive than the failing "key in self" test.
---

I am interested in his claim that the second version is less efficient
unless the requested key is "almost never" in the dictionary. I would
have thought that the overhead to do a key lookup is quite a bit less
than the overhead required to create an exception object, so that the
first version would be on average more efficient only if the requested
key is in the dictionary, say, more than half the time.

Does the "key in dict" test internally raise an exception on failure,
thus incurring at least the same overhead as the first version? Or is
the overhead to raise an exception much less than I have naively
supposed?

I have a similar idiom buried in a nested loop, and I'd like to select
the approach that is likely to be most efficient. My best guess is
that the key lookup will fail half the time.

2. Is there any reason to prefer one of the following over the other?
Im interested in both a consensus on style and readability, and
considerations of efficiency at runtime.

for x in L:
if f(x):
pass

for x in L:
if f(x):
continue

Fredrik Lundh · May 26, 2004

Steve M said:
2. Is there any reason to prefer one of the following over the other?
Im interested in both a consensus on style and readability, and
considerations of efficiency at runtime.

for x in L:
if f(x):
pass

for x in L:
if f(x):
continue

did you really mean to write that?

as written, both constructs are pointless, and the "if f(x)"-statement can
be replaced with just "f(x)" in both cases. and if you add more code inside
the if statement, or after it, the constructs do different things.

did you perhaps mean:

for x in L:
if f(x):
do stuff

for x in L:
if not f(x):
continue
do stuff

if so, my rule of thumb is to use alternative 1 if "stuff" is short, and alter-
native 2 if "stuff" is long, or if you have multiple tests before you get to
the real stuff.

ymmv, as usual.

and yes, if "stuff" is trivial, and builds a new list, consider using a list
comprehension:

L2 = [stuff for x in L if f(x)]

</F>

Slawomir Nowaczyk · May 27, 2004

On 26 May 2004 13:39:30 -0700
(e-mail address removed) (Steve M) wrote:

#> I am interested in his claim that the second version is less
#> efficient unless the requested key is "almost never" in the
#> dictionary. I would have thought that the overhead to do a key
#> lookup is quite a bit less than the overhead required to create an
#> exception object, so that the first version would be on average
#> more efficient only if the requested key is in the dictionary, say,
#> more than half the time.

#> Does the "key in dict" test internally raise an exception on
#> failure, thus incurring at least the same overhead as the first
#> version? Or is the overhead to raise an exception much less than I
#> have naively supposed?

#> I have a similar idiom buried in a nested loop, and I'd like to
#> select the approach that is likely to be most efficient. My best
#> guess is that the key lookup will fail half the time.

I suggest:
... # put version 1 here
...... # put version 2 here
...
Do not forget to share the results

--
Best wishes,
Slawomir Nowaczyk
( (e-mail address removed) )

The Main Library at Indiana University sinks over an inch every
year because when it was built, engineers failed to take into
account the weight of all the books that would occupy the building.

Paul Miller · May 27, 2004

1. Near the beginning of the document "Unifying types and classes in
Python 2.2" by GvR, which can be found at
http://www.python.org/2.2.2/descrintro.html, the author writes the
following about subclassing a built in type:
---
Here's an example of a simple dict subclass, which provides a "default
value" that is returned when a missing key is requested:

class defaultdict(dict):

def __init__(self, default=None):
dict.__init__(self)
self.default = default

def __getitem__(self, key):
try:
return dict.__getitem__(self, key)
except KeyError:
return self.default

<snip>

The __getitem__() method could also be written as follows, using the
new "key in dict" test introduced in Python 2.2:

def __getitem__(self, key):
if key in self:
return dict.__getitem__(self, key)
else:
return self.default

I believe that this version is less efficient, because it does the key
lookup twice. The exception would be when we expect that the requested
key is almost never in the dictionary: then setting up the try/except
statement is more expensive than the failing "key in self" test.
---
[...]
I have a similar idiom buried in a nested loop, and I'd like to select
the approach that is likely to be most efficient. My best guess is
that the key lookup will fail half the time.

Since Guido wrote Python, I would tend to believe him when he says the
first method is more efficient than the second.

However, even if
you nested a similar construct several levels deep inside a loop, I'd
be willing to bet that if that particular loop were a bottleneck in
your program, that optimizing that particular section of code would
probably not make the difference between "too slow" and "fast enough."

Since both options are equally readable, if I thought about it, I
would select option 2 while coding, by default. I wouldn't make a big
deal out of changing code that uses option 1, though.

Peter Otten · May 27, 2004

Steve M wrote:

[two implementations of dictionary with default value]

I am interested in his claim that the second version is less efficient
unless the requested key is "almost never" in the dictionary. I would
have thought that the overhead to do a key lookup is quite a bit less
than the overhead required to create an exception object, so that the
first version would be on average more efficient only if the requested
key is in the dictionary, say, more than half the time.

Does the "key in dict" test internally raise an exception on failure,
thus incurring at least the same overhead as the first version? Or is
the overhead to raise an exception much less than I have naively
supposed?

Thinking/believing is definitely the wrong approach here. Python ships with
the nice little script timeit.py that let's you verify your assumptions.

<dicttime.py>
class DictBase(dict):
def __init__(self, default, *args, **kw):
dict.__init__(self, *args, **kw)
self.default = default

class DictTry(DictBase):
def __getitem__(self, key):
try:
return dict.__getitem__(self, key)
except KeyError:
return self.default

class DictIn(DictBase):
def __getitem__(self, key):
if key in self:
return dict.__getitem__(self, key)
else:
return self.default

class DictGet(DictBase):
def __getitem__(self, key):
return self.get(key, self.default)

items = "hit alpha beta gamma delta epsilon zeta eta theta iota
kappa".split()
items = zip(items, map(str.upper, items))
dictTry = DictTry(None, items)
dictIn = DictIn(None, items)
dictGet = DictGet(None, items)
</dicttime.py>

$ timeit.py -s"from dicttime import dictTry as d" "d['hit']"
100000 loops, best of 3: 2.22 usec per loop
$ timeit.py -s"from dicttime import dictTry as d" "d['miss']"
100000 loops, best of 3: 11.2 usec per loop
$ timeit.py -s"from dicttime import dictIn as d" "d['hit']"
100000 loops, best of 3: 2.39 usec per loop
$ timeit.py -s"from dicttime import dictIn as d" "d['miss']"
1000000 loops, best of 3: 1.49 usec per loop
$ timeit.py -s"from dicttime import dictGet as d" "d['hit']"
100000 loops, best of 3: 2.11 usec per loop
$ timeit.py -s"from dicttime import dictGet as d" "d['miss']"
100000 loops, best of 3: 2.03 usec per loop

Two results stand out: misses are cheap for DictIn and expensive for
DictTry.

Peter

Steve M · May 28, 2004

Peter Otten said:
Two results stand out: misses are cheap for DictIn and expensive for
DictTry.

That's very interesting, thanks so much Peter and the others.

So, contrary to your (and, incidentally, Wittgenstein's) admonition to
look not think, why, would you speculate, is this the case? The
overhead from creating an exception object?

Peter Otten · May 28, 2004

Steve said:
That's very interesting, thanks so much Peter and the others.

So, contrary to your (and, incidentally, Wittgenstein's) admonition to
look not think, why, would you speculate, is this the case? The
overhead from creating an exception object?

Again, why would I speculate if I can measure?

try ... except:

$ timeit.py -s"key='a'" "try: raise KeyError('KeyError: %r' % key)" "except
KeyError: pass"
100000 loops, best of 3: 5.22 usec per loop

share of exception object creation:

$ timeit.py -s"key='a'" "KeyError('KeyError: %r' % key)"
100000 loops, best of 3: 2.96 usec per loop

So the exception mechanism accounts for more than 5 of 9 extra usec. You can
perhaps find out where the rest stems from if you dig deeper, but I'm too
lazy right now.
Which reminds me, timing with this simple tool is not only fun - I've found
my intuition proved wrong often enough, so it saves me from optimizing
portions of the code that are already fast.

Peter

1st Sketch at at ReadOnly dict	0	Jan 20, 2014
Asymmetry in globals __getitem__/__setitem__	6	Jun 12, 2014
Confused about __prepare__	1	Apr 7, 2011
Using non-dict namespaces in functions	3	Mar 17, 2012
which "dictionary with attribute-style access"?	8	Oct 12, 2009
How Python works: What do you know about support for negative indices?	13	Sep 10, 2010
Bunch 2.0 - a dict with a default	1	Nov 19, 2010
Issues with writing pytest	0	Sep 9, 2022

Two questions about efficiency

Steve M

Fredrik Lundh

Slawomir Nowaczyk

Paul Miller

Peter Otten

Steve M

Peter Otten

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads