dict.has_key(x) versus 'x in dict'

P

Paul Melis

Hello,

I've always been using the has_key() method to test if a dictionary
contains a certain key. Recently I tried the same using 'in', e.g.

d = { ... }
if k in d:
...

and found that it seems to perform a lot better when lots of key-tests
are to be performed. I also noticed that has_key() is scheduled to be
removed from future (C)Python versions.

Does the 'in' way of testing have an optimized implementation compared
to has_key()? Is that the reason has_key() is being phased out?

Thanks,
Paul
 
F

Fredrik Lundh

Paul said:
I've always been using the has_key() method to test if a dictionary
contains a certain key. Recently I tried the same using 'in', e.g.

d = { ... }
if k in d:
...

and found that it seems to perform a lot better when lots of key-tests
are to be performed. I also noticed that has_key() is scheduled to be
removed from future (C)Python versions.

Does the 'in' way of testing have an optimized implementation compared
to has_key()?

no, but full method calls are a lot slower than the under-the-hood C-level
call used by the "in" operator.

this is why e.g.

string[:len(prefix)] == prefix

is often a lot faster than

string.startswith(prefix)

and why

if character in string:
string = string.replace(character, replacement)

is faster than

string = string.replace(character, replacement)

if the character isn't there most of the time, despite the fact that the "replace"
method doesn't actually do something if the character isn't found.

</F>
 
B

Bjoern Schliessmann

Paul said:
I've always been using the has_key() method to test if a
dictionary contains a certain key. Recently I tried the same using
'in', e.g.

d = { ... }
if k in d:
...

Wouldn't be "if k in d.keys()" be the exact replacement?

Regards,


Björn
 
F

Fredrik Lundh

Bjoern said:
Wouldn't be "if k in d.keys()" be the exact replacement?

no. that would convert an O(1) operation to an O(n) operation, which
would be rather silly.

</F>
 
P

Peter Otten

Bjoern said:
Wouldn't be "if k in d.keys()" be the exact replacement?

No, 'k in d' is equivalent to 'd.has_key(k)', only with less (constant)
overhead for the function call. 'k in d.keys()' on the other hand creates a
list of keys which is then searched linearly -- about the worst thing you
can do about both speed and memory footprint. Some numbers:

$ python2.5 -m timeit -s"N = 1; d = dict.fromkeys(range(N))" "N in d.keys()"
1000000 loops, best of 3: 0.693 usec per loop
$ python2.5 -m timeit -s"N = 1000; d = dict.fromkeys(range(N))" "N in
d.keys()"
10000 loops, best of 3: 77.2 usec per loop # ouch!

$ python2.5 -m timeit -s"N = 1; d = dict.fromkeys(range(N))" "N in d"
10000000 loops, best of 3: 0.192 usec per loop
~ $ python2.5 -m timeit -s"N = 1000; d = dict.fromkeys(range(N))" "N in d"
10000000 loops, best of 3: 0.192 usec per loop

$ python2.5 -m timeit -s"N = 1; d = dict.fromkeys(range(N))" "d.has_key(N)"
1000000 loops, best of 3: 0.376 usec per loop
$ python2.5 -m timeit -s"N = 1000; d = dict.fromkeys(range(N))"
"d.has_key(N)"
1000000 loops, best of 3: 0.385 usec per loop

Peter
 
R

Roberto Bonvallet

Fredrik Lundh wrote:
[...]
this is why e.g.

string[:len(prefix)] == prefix

is often a lot faster than

string.startswith(prefix)

This is interesting. In which cases does the former form perform better?
[I won't stop using str.startswith anyway :) ]

Regards.
 
A

Andy Dingley

Paul said:
I've always been using the has_key() method to test if a dictionary
contains a certain key. Recently I tried the same using 'in', e.g.

I've found using the set type to be the quickest way to do many of
these tasks. That leads me to another problem: how to cast / convert
sets as something else afterwards


What's the best (i.e. Pythonic) way to do this?
I need to generate a set (lots of intersections involved), but then I
need to display it sorted

lstBugsChanged = [ bugId for bugId in setBugsChanged ]
lstBugsChanged.sort()
 
R

Roberto Bonvallet

Andy said:
I need to generate a set (lots of intersections involved), but then I
need to display it sorted

lstBugsChanged = [ bugId for bugId in setBugsChanged ]
lstBugsChanged.sort()

In Python > 2.4:

sorted(setBugsChanged)
 
F

Fredrik Lundh

Andy said:
I need to generate a set (lots of intersections involved), but then I
need to display it sorted

lstBugsChanged = [ bugId for bugId in setBugsChanged ]
lstBugsChanged.sort()

lstBugsChanged = sorted(setBugsChanged)

</F>
 
B

Bjoern Schliessmann

Peter said:
No, 'k in d' is equivalent to 'd.has_key(k)', only with less
(constant) overhead for the function call.

Ah, thx. Thought the "x in d" syntax might search in d.values() too.

Regards,


Björn
 
P

Paul Melis

Bjoern said:
Ah, thx. Thought the "x in d" syntax might search in d.values() too.

I don't think it does

Python 2.4.3 (#1, Nov 19 2006, 13:16:36)
[GCC 3.4.5 (Gentoo 3.4.5-r1, ssp-3.4.5-1.0, pie-8.7.9)] on linux2
Type "help", "copyright", "credits" or "license" for more information.False


Paul
 
F

Fredrik Lundh

Roberto said:
this is why e.g.

string[:len(prefix)] == prefix

is often a lot faster than

string.startswith(prefix)

This is interesting. In which cases does the former form perform better?

no time to doublecheck right now, but iirc, last time we benchmarked
this, slicing was faster when len(prefix) < 300 characters or so.

</F>
 
P

Peter Otten

Out of interest, whats the Pythonic way to simply cast (sic) the set to
a list, assuming I don't need it sorted? The list comprehension?

list(setBugsChanged)

Peter
 
R

Roberto Bonvallet

Andy said:
Out of interest, whats the Pythonic way to simply cast (sic) the set to
a list, assuming I don't need it sorted? The list comprehension?

mySet = set(myList)
myList = list(mySet)
 
P

Paul McGuire

Peter Otten said:
list(setBugsChanged)

Peter

Note that this is not really a "cast" in the C sense of the word.
list(setBugsChanged) constructs and returns a new list containing the
elements of setBugsChanges, and leaves setBugsChanged unchanged.

-- Paul
 
D

Duncan Booth

Paul Melis said:
Ah, thx. Thought the "x in d" syntax might search in d.values() too.

I don't think it does

Python 2.4.3 (#1, Nov 19 2006, 13:16:36)
[GCC 3.4.5 (Gentoo 3.4.5-r1, ssp-3.4.5-1.0, pie-8.7.9)] on linux2
Type "help", "copyright", "credits" or "license" for more information.False
It is easy enough to to check if you remember that 'in' maps to the
__contains__ method:
Help on built-in function __contains__:

__contains__(...)
D.__contains__(k) -> True if D has a key k, else False
 
S

skip

Peter> No, 'k in d' is equivalent to 'd.has_key(k)', only with less
Peter> (constant) overhead for the function call. 'k in d.keys()' on the
Peter> other hand creates a list of keys which is then searched linearly
Peter> -- about the worst thing you can do about both speed and memory
Peter> footprint.

I will admit that way back when (maybe 8 yrs ago) I actually did this in a
piece of frequently executed code that's been stable for a looong time. I
have no idea why I might have written it that way. Brain fart I suppose. I
only noticed my mistake a couple months ago during a trip into the
containing function for some other stuff. Man, was I embarrassed...

Skip
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,734
Messages
2,569,441
Members
44,832
Latest member
GlennSmall

Latest Threads

Top