Two questions about efficiency

Discussion in 'Python' started by Steve M, May 26, 2004.

  1. Steve M

    Steve M Guest

    1. Near the beginning of the document "Unifying types and classes in
    Python 2.2" by GvR, which can be found at
    http://www.python.org/2.2.2/descrintro.html, the author writes the
    following about subclassing a built in type:
    ---
    Here's an example of a simple dict subclass, which provides a "default
    value" that is returned when a missing key is requested:

    class defaultdict(dict):

    def __init__(self, default=None):
    dict.__init__(self)
    self.default = default

    def __getitem__(self, key):
    try:
    return dict.__getitem__(self, key)
    except KeyError:
    return self.default

    <snip>

    The __getitem__() method could also be written as follows, using the
    new "key in dict" test introduced in Python 2.2:

    def __getitem__(self, key):
    if key in self:
    return dict.__getitem__(self, key)
    else:
    return self.default

    I believe that this version is less efficient, because it does the key
    lookup twice. The exception would be when we expect that the requested
    key is almost never in the dictionary: then setting up the try/except
    statement is more expensive than the failing "key in self" test.
    ---

    I am interested in his claim that the second version is less efficient
    unless the requested key is "almost never" in the dictionary. I would
    have thought that the overhead to do a key lookup is quite a bit less
    than the overhead required to create an exception object, so that the
    first version would be on average more efficient only if the requested
    key is in the dictionary, say, more than half the time.

    Does the "key in dict" test internally raise an exception on failure,
    thus incurring at least the same overhead as the first version? Or is
    the overhead to raise an exception much less than I have naively
    supposed?

    I have a similar idiom buried in a nested loop, and I'd like to select
    the approach that is likely to be most efficient. My best guess is
    that the key lookup will fail half the time.

    2. Is there any reason to prefer one of the following over the other?
    Im interested in both a consensus on style and readability, and
    considerations of efficiency at runtime.

    for x in L:
    if f(x):
    pass

    for x in L:
    if f(x):
    continue
    Steve M, May 26, 2004
    #1
    1. Advertising

  2. "Steve M" wrote:

    > 2. Is there any reason to prefer one of the following over the other?
    > Im interested in both a consensus on style and readability, and
    > considerations of efficiency at runtime.
    >
    > for x in L:
    > if f(x):
    > pass
    >
    > for x in L:
    > if f(x):
    > continue


    did you really mean to write that?

    as written, both constructs are pointless, and the "if f(x)"-statement can
    be replaced with just "f(x)" in both cases. and if you add more code inside
    the if statement, or after it, the constructs do different things.

    did you perhaps mean:

    for x in L:
    if f(x):
    do stuff

    for x in L:
    if not f(x):
    continue
    do stuff

    if so, my rule of thumb is to use alternative 1 if "stuff" is short, and alter-
    native 2 if "stuff" is long, or if you have multiple tests before you get to
    the real stuff.

    ymmv, as usual.

    and yes, if "stuff" is trivial, and builds a new list, consider using a list
    comprehension:

    L2 = [stuff for x in L if f(x)]

    </F>
    Fredrik Lundh, May 26, 2004
    #2
    1. Advertising

  3. On 26 May 2004 13:39:30 -0700
    (Steve M) wrote:

    #> I am interested in his claim that the second version is less
    #> efficient unless the requested key is "almost never" in the
    #> dictionary. I would have thought that the overhead to do a key
    #> lookup is quite a bit less than the overhead required to create an
    #> exception object, so that the first version would be on average
    #> more efficient only if the requested key is in the dictionary, say,
    #> more than half the time.

    #> Does the "key in dict" test internally raise an exception on
    #> failure, thus incurring at least the same overhead as the first
    #> version? Or is the overhead to raise an exception much less than I
    #> have naively supposed?

    #> I have a similar idiom buried in a nested loop, and I'd like to
    #> select the approach that is likely to be most efficient. My best
    #> guess is that the key lookup will fail half the time.

    I suggest:

    >>> import profile
    >>> p = profile.profiler()
    >>> def ver1():

    ... # put version 1 here
    ...
    >>> def ver2():

    ... # put version 2 here
    ...
    >>> p.run("ver1()").print_stats()
    >>> p.run("ver2()").print_stats()


    Do not forget to share the results :D

    --
    Best wishes,
    Slawomir Nowaczyk
    ( )

    The Main Library at Indiana University sinks over an inch every
    year because when it was built, engineers failed to take into
    account the weight of all the books that would occupy the building.
    Slawomir Nowaczyk, May 27, 2004
    #3
  4. Steve M

    Paul Miller Guest

    (Steve M) wrote in message news:<>...
    > 1. Near the beginning of the document "Unifying types and classes in
    > Python 2.2" by GvR, which can be found at
    > http://www.python.org/2.2.2/descrintro.html, the author writes the
    > following about subclassing a built in type:
    > ---
    > Here's an example of a simple dict subclass, which provides a "default
    > value" that is returned when a missing key is requested:
    >
    > class defaultdict(dict):
    >
    > def __init__(self, default=None):
    > dict.__init__(self)
    > self.default = default
    >
    > def __getitem__(self, key):
    > try:
    > return dict.__getitem__(self, key)
    > except KeyError:
    > return self.default
    >
    > <snip>
    >
    > The __getitem__() method could also be written as follows, using the
    > new "key in dict" test introduced in Python 2.2:
    >
    > def __getitem__(self, key):
    > if key in self:
    > return dict.__getitem__(self, key)
    > else:
    > return self.default
    >
    > I believe that this version is less efficient, because it does the key
    > lookup twice. The exception would be when we expect that the requested
    > key is almost never in the dictionary: then setting up the try/except
    > statement is more expensive than the failing "key in self" test.
    > ---
    >[...]
    > I have a similar idiom buried in a nested loop, and I'd like to select
    > the approach that is likely to be most efficient. My best guess is
    > that the key lookup will fail half the time.


    Since Guido wrote Python, I would tend to believe him when he says the
    first method is more efficient than the second. ;) However, even if
    you nested a similar construct several levels deep inside a loop, I'd
    be willing to bet that if that particular loop were a bottleneck in
    your program, that optimizing that particular section of code would
    probably not make the difference between "too slow" and "fast enough."

    Since both options are equally readable, if I thought about it, I
    would select option 2 while coding, by default. I wouldn't make a big
    deal out of changing code that uses option 1, though.
    Paul Miller, May 27, 2004
    #4
  5. Steve M

    Peter Otten Guest

    Steve M wrote:

    [two implementations of dictionary with default value]

    > I am interested in his claim that the second version is less efficient
    > unless the requested key is "almost never" in the dictionary. I would
    > have thought that the overhead to do a key lookup is quite a bit less
    > than the overhead required to create an exception object, so that the
    > first version would be on average more efficient only if the requested
    > key is in the dictionary, say, more than half the time.
    >
    > Does the "key in dict" test internally raise an exception on failure,
    > thus incurring at least the same overhead as the first version? Or is
    > the overhead to raise an exception much less than I have naively
    > supposed?


    Thinking/believing is definitely the wrong approach here. Python ships with
    the nice little script timeit.py that let's you verify your assumptions.

    <dicttime.py>
    class DictBase(dict):
    def __init__(self, default, *args, **kw):
    dict.__init__(self, *args, **kw)
    self.default = default

    class DictTry(DictBase):
    def __getitem__(self, key):
    try:
    return dict.__getitem__(self, key)
    except KeyError:
    return self.default

    class DictIn(DictBase):
    def __getitem__(self, key):
    if key in self:
    return dict.__getitem__(self, key)
    else:
    return self.default

    class DictGet(DictBase):
    def __getitem__(self, key):
    return self.get(key, self.default)

    items = "hit alpha beta gamma delta epsilon zeta eta theta iota
    kappa".split()
    items = zip(items, map(str.upper, items))
    dictTry = DictTry(None, items)
    dictIn = DictIn(None, items)
    dictGet = DictGet(None, items)
    </dicttime.py>


    $ timeit.py -s"from dicttime import dictTry as d" "d['hit']"
    100000 loops, best of 3: 2.22 usec per loop
    $ timeit.py -s"from dicttime import dictTry as d" "d['miss']"
    100000 loops, best of 3: 11.2 usec per loop
    $ timeit.py -s"from dicttime import dictIn as d" "d['hit']"
    100000 loops, best of 3: 2.39 usec per loop
    $ timeit.py -s"from dicttime import dictIn as d" "d['miss']"
    1000000 loops, best of 3: 1.49 usec per loop
    $ timeit.py -s"from dicttime import dictGet as d" "d['hit']"
    100000 loops, best of 3: 2.11 usec per loop
    $ timeit.py -s"from dicttime import dictGet as d" "d['miss']"
    100000 loops, best of 3: 2.03 usec per loop

    Two results stand out: misses are cheap for DictIn and expensive for
    DictTry.

    Peter
    Peter Otten, May 27, 2004
    #5
  6. Steve M

    Steve M Guest

    Peter Otten <> wrote
    <snip>
    > Two results stand out: misses are cheap for DictIn and expensive for
    > DictTry.


    That's very interesting, thanks so much Peter and the others.

    So, contrary to your (and, incidentally, Wittgenstein's) admonition to
    look not think, why, would you speculate, is this the case? The
    overhead from creating an exception object?
    Steve M, May 28, 2004
    #6
  7. Steve M

    Peter Otten Guest

    Steve M wrote:

    > Peter Otten <> wrote
    > <snip>
    >> Two results stand out: misses are cheap for DictIn and expensive for
    >> DictTry.

    >
    > That's very interesting, thanks so much Peter and the others.
    >
    > So, contrary to your (and, incidentally, Wittgenstein's) admonition to
    > look not think, why, would you speculate, is this the case? The
    > overhead from creating an exception object?


    Again, why would I speculate if I can measure?

    try ... except:

    $ timeit.py -s"key='a'" "try: raise KeyError('KeyError: %r' % key)" "except
    KeyError: pass"
    100000 loops, best of 3: 5.22 usec per loop

    share of exception object creation:

    $ timeit.py -s"key='a'" "KeyError('KeyError: %r' % key)"
    100000 loops, best of 3: 2.96 usec per loop

    So the exception mechanism accounts for more than 5 of 9 extra usec. You can
    perhaps find out where the rest stems from if you dig deeper, but I'm too
    lazy right now.
    Which reminds me, timing with this simple tool is not only fun - I've found
    my intuition proved wrong often enough, so it saves me from optimizing
    portions of the code that are already fast.

    Peter
    Peter Otten, May 28, 2004
    #7
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Bryan Krone

    perl efficiency -- fastest grepping?

    Bryan Krone, Nov 5, 2004, in forum: Perl
    Replies:
    1
    Views:
    1,470
    Jim Gibson
    Nov 8, 2004
  2. Trevor Hartman

    dataset efficiency question

    Trevor Hartman, Jul 3, 2003, in forum: ASP .Net
    Replies:
    0
    Views:
    399
    Trevor Hartman
    Jul 3, 2003
  3. Joseph D. DeJohn

    Custom Paging Efficiency

    Joseph D. DeJohn, Aug 6, 2003, in forum: ASP .Net
    Replies:
    1
    Views:
    355
    S. Justin Gengo
    Aug 6, 2003
  4. MC D
    Replies:
    4
    Views:
    459
    Big D
    Nov 18, 2003
  5. GenxLogic
    Replies:
    3
    Views:
    1,266
    andrewmcdonagh
    Dec 6, 2006
Loading...

Share This Page