Two questions about efficiency

S

Steve M

1. Near the beginning of the document "Unifying types and classes in
Python 2.2" by GvR, which can be found at
http://www.python.org/2.2.2/descrintro.html, the author writes the
following about subclassing a built in type:
---
Here's an example of a simple dict subclass, which provides a "default
value" that is returned when a missing key is requested:

class defaultdict(dict):

def __init__(self, default=None):
dict.__init__(self)
self.default = default

def __getitem__(self, key):
try:
return dict.__getitem__(self, key)
except KeyError:
return self.default

<snip>

The __getitem__() method could also be written as follows, using the
new "key in dict" test introduced in Python 2.2:

def __getitem__(self, key):
if key in self:
return dict.__getitem__(self, key)
else:
return self.default

I believe that this version is less efficient, because it does the key
lookup twice. The exception would be when we expect that the requested
key is almost never in the dictionary: then setting up the try/except
statement is more expensive than the failing "key in self" test.
---

I am interested in his claim that the second version is less efficient
unless the requested key is "almost never" in the dictionary. I would
have thought that the overhead to do a key lookup is quite a bit less
than the overhead required to create an exception object, so that the
first version would be on average more efficient only if the requested
key is in the dictionary, say, more than half the time.

Does the "key in dict" test internally raise an exception on failure,
thus incurring at least the same overhead as the first version? Or is
the overhead to raise an exception much less than I have naively
supposed?

I have a similar idiom buried in a nested loop, and I'd like to select
the approach that is likely to be most efficient. My best guess is
that the key lookup will fail half the time.

2. Is there any reason to prefer one of the following over the other?
Im interested in both a consensus on style and readability, and
considerations of efficiency at runtime.

for x in L:
if f(x):
pass

for x in L:
if f(x):
continue
 
F

Fredrik Lundh

Steve M said:
2. Is there any reason to prefer one of the following over the other?
Im interested in both a consensus on style and readability, and
considerations of efficiency at runtime.

for x in L:
if f(x):
pass

for x in L:
if f(x):
continue

did you really mean to write that?

as written, both constructs are pointless, and the "if f(x)"-statement can
be replaced with just "f(x)" in both cases. and if you add more code inside
the if statement, or after it, the constructs do different things.

did you perhaps mean:

for x in L:
if f(x):
do stuff

for x in L:
if not f(x):
continue
do stuff

if so, my rule of thumb is to use alternative 1 if "stuff" is short, and alter-
native 2 if "stuff" is long, or if you have multiple tests before you get to
the real stuff.

ymmv, as usual.

and yes, if "stuff" is trivial, and builds a new list, consider using a list
comprehension:

L2 = [stuff for x in L if f(x)]

</F>
 
S

Slawomir Nowaczyk

On 26 May 2004 13:39:30 -0700
(e-mail address removed) (Steve M) wrote:

#> I am interested in his claim that the second version is less
#> efficient unless the requested key is "almost never" in the
#> dictionary. I would have thought that the overhead to do a key
#> lookup is quite a bit less than the overhead required to create an
#> exception object, so that the first version would be on average
#> more efficient only if the requested key is in the dictionary, say,
#> more than half the time.

#> Does the "key in dict" test internally raise an exception on
#> failure, thus incurring at least the same overhead as the first
#> version? Or is the overhead to raise an exception much less than I
#> have naively supposed?

#> I have a similar idiom buried in a nested loop, and I'd like to
#> select the approach that is likely to be most efficient. My best
#> guess is that the key lookup will fail half the time.

I suggest:
... # put version 1 here
...... # put version 2 here
...
Do not forget to share the results :D

--
Best wishes,
Slawomir Nowaczyk
( (e-mail address removed) )

The Main Library at Indiana University sinks over an inch every
year because when it was built, engineers failed to take into
account the weight of all the books that would occupy the building.
 
P

Paul Miller

1. Near the beginning of the document "Unifying types and classes in
Python 2.2" by GvR, which can be found at
http://www.python.org/2.2.2/descrintro.html, the author writes the
following about subclassing a built in type:
---
Here's an example of a simple dict subclass, which provides a "default
value" that is returned when a missing key is requested:

class defaultdict(dict):

def __init__(self, default=None):
dict.__init__(self)
self.default = default

def __getitem__(self, key):
try:
return dict.__getitem__(self, key)
except KeyError:
return self.default

<snip>

The __getitem__() method could also be written as follows, using the
new "key in dict" test introduced in Python 2.2:

def __getitem__(self, key):
if key in self:
return dict.__getitem__(self, key)
else:
return self.default

I believe that this version is less efficient, because it does the key
lookup twice. The exception would be when we expect that the requested
key is almost never in the dictionary: then setting up the try/except
statement is more expensive than the failing "key in self" test.
---
[...]
I have a similar idiom buried in a nested loop, and I'd like to select
the approach that is likely to be most efficient. My best guess is
that the key lookup will fail half the time.

Since Guido wrote Python, I would tend to believe him when he says the
first method is more efficient than the second. ;) However, even if
you nested a similar construct several levels deep inside a loop, I'd
be willing to bet that if that particular loop were a bottleneck in
your program, that optimizing that particular section of code would
probably not make the difference between "too slow" and "fast enough."

Since both options are equally readable, if I thought about it, I
would select option 2 while coding, by default. I wouldn't make a big
deal out of changing code that uses option 1, though.
 
P

Peter Otten

Steve M wrote:

[two implementations of dictionary with default value]
I am interested in his claim that the second version is less efficient
unless the requested key is "almost never" in the dictionary. I would
have thought that the overhead to do a key lookup is quite a bit less
than the overhead required to create an exception object, so that the
first version would be on average more efficient only if the requested
key is in the dictionary, say, more than half the time.

Does the "key in dict" test internally raise an exception on failure,
thus incurring at least the same overhead as the first version? Or is
the overhead to raise an exception much less than I have naively
supposed?

Thinking/believing is definitely the wrong approach here. Python ships with
the nice little script timeit.py that let's you verify your assumptions.

<dicttime.py>
class DictBase(dict):
def __init__(self, default, *args, **kw):
dict.__init__(self, *args, **kw)
self.default = default

class DictTry(DictBase):
def __getitem__(self, key):
try:
return dict.__getitem__(self, key)
except KeyError:
return self.default

class DictIn(DictBase):
def __getitem__(self, key):
if key in self:
return dict.__getitem__(self, key)
else:
return self.default

class DictGet(DictBase):
def __getitem__(self, key):
return self.get(key, self.default)

items = "hit alpha beta gamma delta epsilon zeta eta theta iota
kappa".split()
items = zip(items, map(str.upper, items))
dictTry = DictTry(None, items)
dictIn = DictIn(None, items)
dictGet = DictGet(None, items)
</dicttime.py>


$ timeit.py -s"from dicttime import dictTry as d" "d['hit']"
100000 loops, best of 3: 2.22 usec per loop
$ timeit.py -s"from dicttime import dictTry as d" "d['miss']"
100000 loops, best of 3: 11.2 usec per loop
$ timeit.py -s"from dicttime import dictIn as d" "d['hit']"
100000 loops, best of 3: 2.39 usec per loop
$ timeit.py -s"from dicttime import dictIn as d" "d['miss']"
1000000 loops, best of 3: 1.49 usec per loop
$ timeit.py -s"from dicttime import dictGet as d" "d['hit']"
100000 loops, best of 3: 2.11 usec per loop
$ timeit.py -s"from dicttime import dictGet as d" "d['miss']"
100000 loops, best of 3: 2.03 usec per loop

Two results stand out: misses are cheap for DictIn and expensive for
DictTry.

Peter
 
S

Steve M

Peter Otten said:
Two results stand out: misses are cheap for DictIn and expensive for
DictTry.

That's very interesting, thanks so much Peter and the others.

So, contrary to your (and, incidentally, Wittgenstein's) admonition to
look not think, why, would you speculate, is this the case? The
overhead from creating an exception object?
 
P

Peter Otten

Steve said:
That's very interesting, thanks so much Peter and the others.

So, contrary to your (and, incidentally, Wittgenstein's) admonition to
look not think, why, would you speculate, is this the case? The
overhead from creating an exception object?

Again, why would I speculate if I can measure?

try ... except:

$ timeit.py -s"key='a'" "try: raise KeyError('KeyError: %r' % key)" "except
KeyError: pass"
100000 loops, best of 3: 5.22 usec per loop

share of exception object creation:

$ timeit.py -s"key='a'" "KeyError('KeyError: %r' % key)"
100000 loops, best of 3: 2.96 usec per loop

So the exception mechanism accounts for more than 5 of 9 extra usec. You can
perhaps find out where the rest stems from if you dig deeper, but I'm too
lazy right now.
Which reminds me, timing with this simple tool is not only fun - I've found
my intuition proved wrong often enough, so it saves me from optimizing
portions of the code that are already fast.

Peter
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,055
Latest member
SlimSparkKetoACVReview

Latest Threads

Top