dict.has_key(x) versus 'x in dict'

Discussion in 'Python' started by Paul Melis, Dec 6, 2006.

  1. Paul Melis

    Paul Melis Guest

    Hello,

    I've always been using the has_key() method to test if a dictionary
    contains a certain key. Recently I tried the same using 'in', e.g.

    d = { ... }
    if k in d:
    ...

    and found that it seems to perform a lot better when lots of key-tests
    are to be performed. I also noticed that has_key() is scheduled to be
    removed from future (C)Python versions.

    Does the 'in' way of testing have an optimized implementation compared
    to has_key()? Is that the reason has_key() is being phased out?

    Thanks,
    Paul
    Paul Melis, Dec 6, 2006
    #1
    1. Advertising

  2. Paul Melis wrote:

    > I've always been using the has_key() method to test if a dictionary
    > contains a certain key. Recently I tried the same using 'in', e.g.
    >
    > d = { ... }
    > if k in d:
    > ...
    >
    > and found that it seems to perform a lot better when lots of key-tests
    > are to be performed. I also noticed that has_key() is scheduled to be
    > removed from future (C)Python versions.
    >
    > Does the 'in' way of testing have an optimized implementation compared
    > to has_key()?


    no, but full method calls are a lot slower than the under-the-hood C-level
    call used by the "in" operator.

    this is why e.g.

    string[:len(prefix)] == prefix

    is often a lot faster than

    string.startswith(prefix)

    and why

    if character in string:
    string = string.replace(character, replacement)

    is faster than

    string = string.replace(character, replacement)

    if the character isn't there most of the time, despite the fact that the "replace"
    method doesn't actually do something if the character isn't found.

    </F>
    Fredrik Lundh, Dec 6, 2006
    #2
    1. Advertising

  3. Paul Melis wrote:

    > I've always been using the has_key() method to test if a
    > dictionary contains a certain key. Recently I tried the same using
    > 'in', e.g.
    >
    > d = { ... }
    > if k in d:
    > ...


    Wouldn't be "if k in d.keys()" be the exact replacement?

    Regards,


    Björn

    --
    BOFH excuse #17:

    fat electrons in the lines
    Bjoern Schliessmann, Dec 6, 2006
    #3
  4. Bjoern Schliessmann wrote:

    > Wouldn't be "if k in d.keys()" be the exact replacement?


    no. that would convert an O(1) operation to an O(n) operation, which
    would be rather silly.

    </F>
    Fredrik Lundh, Dec 6, 2006
    #4
  5. Paul Melis

    Peter Otten Guest

    Bjoern Schliessmann wrote:

    > Paul Melis wrote:
    >
    >> I've always been using the has_key() method to test if a
    >> dictionary contains a certain key. Recently I tried the same using
    >> 'in', e.g.
    >>
    >> d = { ... }
    >> if k in d:
    >> ...

    >
    > Wouldn't be "if k in d.keys()" be the exact replacement?


    No, 'k in d' is equivalent to 'd.has_key(k)', only with less (constant)
    overhead for the function call. 'k in d.keys()' on the other hand creates a
    list of keys which is then searched linearly -- about the worst thing you
    can do about both speed and memory footprint. Some numbers:

    $ python2.5 -m timeit -s"N = 1; d = dict.fromkeys(range(N))" "N in d.keys()"
    1000000 loops, best of 3: 0.693 usec per loop
    $ python2.5 -m timeit -s"N = 1000; d = dict.fromkeys(range(N))" "N in
    d.keys()"
    10000 loops, best of 3: 77.2 usec per loop # ouch!

    $ python2.5 -m timeit -s"N = 1; d = dict.fromkeys(range(N))" "N in d"
    10000000 loops, best of 3: 0.192 usec per loop
    ~ $ python2.5 -m timeit -s"N = 1000; d = dict.fromkeys(range(N))" "N in d"
    10000000 loops, best of 3: 0.192 usec per loop

    $ python2.5 -m timeit -s"N = 1; d = dict.fromkeys(range(N))" "d.has_key(N)"
    1000000 loops, best of 3: 0.376 usec per loop
    $ python2.5 -m timeit -s"N = 1000; d = dict.fromkeys(range(N))"
    "d.has_key(N)"
    1000000 loops, best of 3: 0.385 usec per loop

    Peter
    Peter Otten, Dec 6, 2006
    #5
  6. Fredrik Lundh wrote:
    [...]
    > this is why e.g.
    >
    > string[:len(prefix)] == prefix
    >
    > is often a lot faster than
    >
    > string.startswith(prefix)


    This is interesting. In which cases does the former form perform better?
    [I won't stop using str.startswith anyway :) ]

    Regards.
    --
    Roberto Bonvallet
    Roberto Bonvallet, Dec 6, 2006
    #6
  7. Paul Melis

    Andy Dingley Guest

    Paul Melis wrote:
    > I've always been using the has_key() method to test if a dictionary
    > contains a certain key. Recently I tried the same using 'in', e.g.


    I've found using the set type to be the quickest way to do many of
    these tasks. That leads me to another problem: how to cast / convert
    sets as something else afterwards


    What's the best (i.e. Pythonic) way to do this?
    I need to generate a set (lots of intersections involved), but then I
    need to display it sorted

    lstBugsChanged = [ bugId for bugId in setBugsChanged ]
    lstBugsChanged.sort()
    Andy Dingley, Dec 6, 2006
    #7
  8. Andy Dingley wrote:
    > I need to generate a set (lots of intersections involved), but then I
    > need to display it sorted
    >
    > lstBugsChanged = [ bugId for bugId in setBugsChanged ]
    > lstBugsChanged.sort()


    In Python > 2.4:

    sorted(setBugsChanged)

    --
    Roberto Bonvallet
    Roberto Bonvallet, Dec 6, 2006
    #8
  9. Andy Dingley wrote:

    > I need to generate a set (lots of intersections involved), but then I
    > need to display it sorted
    >
    > lstBugsChanged = [ bugId for bugId in setBugsChanged ]
    > lstBugsChanged.sort()


    lstBugsChanged = sorted(setBugsChanged)

    </F>
    Fredrik Lundh, Dec 6, 2006
    #9
  10. I wrote:
    > In Python > 2.4:


    lastline.replace(">", ">=")

    --
    Roberto Bonvallet
    Roberto Bonvallet, Dec 6, 2006
    #10
  11. Peter Otten wrote:

    > No, 'k in d' is equivalent to 'd.has_key(k)', only with less
    > (constant) overhead for the function call.


    Ah, thx. Thought the "x in d" syntax might search in d.values() too.

    Regards,


    Björn

    --
    BOFH excuse #12:

    dry joints on cable plug
    Bjoern Schliessmann, Dec 6, 2006
    #11
  12. Paul Melis

    Paul Melis Guest

    Bjoern Schliessmann wrote:
    > Peter Otten wrote:
    >
    >> No, 'k in d' is equivalent to 'd.has_key(k)', only with less
    >> (constant) overhead for the function call.

    >
    > Ah, thx. Thought the "x in d" syntax might search in d.values() too.


    I don't think it does

    Python 2.4.3 (#1, Nov 19 2006, 13:16:36)
    [GCC 3.4.5 (Gentoo 3.4.5-r1, ssp-3.4.5-1.0, pie-8.7.9)] on linux2
    Type "help", "copyright", "credits" or "license" for more information.
    >>> d={1:'a',2:'b'}
    >>> 1 in d

    True
    >>> 'a' in d

    False


    Paul
    Paul Melis, Dec 6, 2006
    #12
  13. Roberto Bonvallet wrote:

    >> this is why e.g.
    >>
    >> string[:len(prefix)] == prefix
    >>
    >> is often a lot faster than
    >>
    >> string.startswith(prefix)

    >
    > This is interesting. In which cases does the former form perform better?


    no time to doublecheck right now, but iirc, last time we benchmarked
    this, slicing was faster when len(prefix) < 300 characters or so.

    </F>
    Fredrik Lundh, Dec 6, 2006
    #13
  14. Paul Melis

    Andy Dingley Guest

    Roberto Bonvallet wrote:

    > > lstBugsChanged = [ bugId for bugId in setBugsChanged ]


    > In Python > 2.4:


    Hmmm. Thanks. Another reason to twist the admin's arm and get them to
    upgrade the last 2.3.4 boxen


    > sorted(setBugsChanged)


    Out of interest, whats the Pythonic way to simply cast (sic) the set to
    a list, assuming I don't need it sorted? The list comprehension?
    Andy Dingley, Dec 6, 2006
    #14
  15. Paul Melis

    Peter Otten Guest

    Andy Dingley wrote:

    >> sorted(setBugsChanged)


    > Out of interest, whats the Pythonic way to simply cast (sic) the set to
    > a list, assuming I don't need it sorted? The list comprehension?


    list(setBugsChanged)

    Peter
    Peter Otten, Dec 6, 2006
    #15
  16. Andy Dingley wrote:
    > Out of interest, whats the Pythonic way to simply cast (sic) the set to
    > a list, assuming I don't need it sorted? The list comprehension?


    mySet = set(myList)
    myList = list(mySet)

    --
    Roberto Bonvallet
    Roberto Bonvallet, Dec 6, 2006
    #16
  17. Paul Melis

    Paul McGuire Guest

    "Peter Otten" <> wrote in message
    news:el6sfu$ord$02$-online.com...
    > Andy Dingley wrote:
    >
    >>> sorted(setBugsChanged)

    >
    >> Out of interest, whats the Pythonic way to simply cast (sic) the set to
    >> a list, assuming I don't need it sorted? The list comprehension?

    >
    > list(setBugsChanged)
    >
    > Peter


    Note that this is not really a "cast" in the C sense of the word.
    list(setBugsChanged) constructs and returns a new list containing the
    elements of setBugsChanges, and leaves setBugsChanged unchanged.

    -- Paul
    Paul McGuire, Dec 6, 2006
    #17
  18. Paul Melis wrote:

    > I don't think it does


    Thanks for trying, I was too lazy ;)

    Regards,


    Björn

    --
    BOFH excuse #52:

    Smell from unhygienic janitorial staff wrecked the tape heads
    Bjoern Schliessmann, Dec 6, 2006
    #18
  19. Paul Melis

    Duncan Booth Guest

    Paul Melis <> wrote:

    >> Ah, thx. Thought the "x in d" syntax might search in d.values() too.

    >
    > I don't think it does
    >
    > Python 2.4.3 (#1, Nov 19 2006, 13:16:36)
    > [GCC 3.4.5 (Gentoo 3.4.5-r1, ssp-3.4.5-1.0, pie-8.7.9)] on linux2
    > Type "help", "copyright", "credits" or "license" for more information.
    >>>> d={1:'a',2:'b'}
    >>>> 1 in d

    > True
    >>>> 'a' in d

    > False
    >

    It is easy enough to to check if you remember that 'in' maps to the
    __contains__ method:

    >>> help({}.__contains__)

    Help on built-in function __contains__:

    __contains__(...)
    D.__contains__(k) -> True if D has a key k, else False
    Duncan Booth, Dec 6, 2006
    #19
  20. Paul Melis

    Guest

    Peter> Bjoern Schliessmann wrote:
    >> Wouldn't be "if k in d.keys()" be the exact replacement?


    Peter> No, 'k in d' is equivalent to 'd.has_key(k)', only with less
    Peter> (constant) overhead for the function call. 'k in d.keys()' on the
    Peter> other hand creates a list of keys which is then searched linearly
    Peter> -- about the worst thing you can do about both speed and memory
    Peter> footprint.

    I will admit that way back when (maybe 8 yrs ago) I actually did this in a
    piece of frequently executed code that's been stable for a looong time. I
    have no idea why I might have written it that way. Brain fart I suppose. I
    only noticed my mistake a couple months ago during a trip into the
    containing function for some other stuff. Man, was I embarrassed...

    Skip
    , Dec 7, 2006
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Skip Montanaro
    Replies:
    1
    Views:
    319
    Alex Martelli
    Oct 20, 2003
  2. Replies:
    11
    Views:
    690
  3. Åukasz Ligowski

    "xxx.has_key(a)" vs "a in xxx"

    Åukasz Ligowski, Oct 30, 2008, in forum: Python
    Replies:
    0
    Views:
    277
    Åukasz Ligowski
    Oct 30, 2008
  4. Replies:
    4
    Views:
    295
    Chris Rebert
    Apr 4, 2010
  5. Paul Butcher
    Replies:
    12
    Views:
    683
    Gary Wright
    Nov 28, 2007
Loading...

Share This Page