Fast Dictionary Access

T

Thomas Lehmann

Hi!

In C++, programming STL you will use the insert method which always
provides a position and a flag which indicates whether the position
results from a new insertion or an exisiting element. Idea is to have
one search only.

<code>
if data.has_key(key):
value = data[key]
</code>

But this does mean (does it?) that the dictionary is searched two
times! If so, can somebody show me how to do this in one step?
 
C

Chris Rebert

Hi!

In C++, programming STL you will use the insert method which always
provides a position and a flag which indicates whether the position
results from a new insertion or an exisiting element. Idea is to have
one search only.

<code>
if  data.has_key(key):
  value = data[key]
</code>

But this does mean (does it?) that the dictionary is searched two
times! If so, can somebody show me how to do this in one step?

Use exception handling:

try:
value = data[key]
except KeyError:
print "No such key"
else:
print "The value for that key is", value

Cheers,
Chris
 
P

Paul Rubin

Thomas Lehmann said:
<code>
if data.has_key(key):
value = data[key]
</code>

But this does mean (does it?) that the dictionary is searched two
times! If so, can somebody show me how to do this in one step?

value = data.get(key, None)

sets value to None if the key is not in the dictionary. You can use
some other sentinel if None might actually be a value in the
dictionary. One way to get a unique sentinel is:

sentinel = object()

You can also use exception handling (this is quite efficient in Python)
as Chris Rebert mentioned.
 
P

Paul Rubin

Duncan Booth said:
The suggested alternative:

value = data.get(key, None)

also has two dictionary lookups:...

dg = data.get
....
(inside loop):
value = dg(key,None)
 
R

Rachel P

[Thomas Lehmann]
In C++, programming STL you will use the insert method which always
provides a position and a flag which indicates whether the position
results from a new insertion or an exisiting element. Idea is to have
one search only.

<code>
if  data.has_key(key):
   value = data[key]
</code>

But this does mean (does it?) that the dictionary is searched two
times! If so, can somebody show me how to do this in one step?

Several thoughts for you:

* Python isn't C++

* Dict lookups in Python are ubiquitous

* Trying to avoid them is often an exercise in futility

* Because of cache effects, double lookups are *very* cheap

* If this particular fragment is critical, consider using get():

data_get = data.get
...
# inner-loop code
value = data_get(key) # assigns None for missing values

* Another alternative is a try-block:

try: # setup is cheap
value = data[key]
except KeyError: # matching is expensive
...

* Or you can use collections.defaultdict() or a dict subclass that
defines __missing__().

* In general though, think of dicts as one of Python's most optimized
structures, one that should be embraced rather than avoided.

* Tim Peter's take on the subject from year's ago (paraphrased):
"Anything written using Python dictionaries is a gazillion times
faster than C".



Raymond
 
C

Carl Banks

Thomas Lehmann said:
In C++, programming STL you will use the insert method which always
provides a position and a flag which indicates whether the position
results from a new insertion or an exisiting element. Idea is to have
one search only.
<code>
if  data.has_key(key):
   value = data[key]
</code>
But this does mean (does it?) that the dictionary is searched two
times! If so, can somebody show me how to do this in one step?

That code actually does 3 dictionary lookups and creates a temporary
object: the first lookup is to find the 'has_key' method, then it has to
bind the method to data which involves creating a 'built-in method' object
and then it calls it. Only then do you get the two lookups you expected.

Replacing your code with:

   if key in data: value = data[key]

reduces the overhead to two dictionary lookups, no temporary objects or
function calls.

The suggested alternative:

   value = data.get(key, None)

also has two dictionary lookups: one to find the 'get' method and one to
find the key, but as in the first case it also has the overhead of binding
the method and then calling it.

In other words the get method is often the clearest (and therefore best)
way to write the code, but if performance matters do a check using the 'in'
operator (or if you know the key lookup will fail only very rarely consider
using try..except).

It's not that simple. Attribute lookups are fast because the keys are
interned strings. However, the key lookup in the data object might be
an object where hash and compare operations are much slower.

So it's not valid in general to equate the two lookups. Unless you
know that your dict keys are going to be really fast like interned
strings it makes sense to minimize dict lookups.


Carl Banks
 
S

Steven D'Aprano

So it's not valid in general to equate the two lookups. Unless you know
that your dict keys are going to be really fast like interned strings it
makes sense to minimize dict lookups.

Or to stop making assumptions about what's fast and what's not fast, and
actually profile the code and see where the hold-ups are.


"Dear Pythonistas, I have a program that runs for six hours performing
some wickedly complicated calculations on data extracted from a dict. The
dictionary lookups take five and a half seconds, how can I speed them up?"


*wink*
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,482
Members
44,901
Latest member
Noble71S45

Latest Threads

Top