Preferred Python idiom for handling non-existing dictionary keys and why?

Q

Quentin Crain

Hello again All!

(First, I would like to mention I did try to google
for the answer here!)

Say I am populating a dictionary with a list and
appending. I have written it thusly:

d={}
for (k,v) in somedata():
try:
d[k].append(v)
except KeyError:
d[k]=[v]

I could have written:

d={}
for (k,v) in somedata():
if (k in d):
d[k].append(v)
else:
d[k]=[v]


Which is perferred and why? Which is "faster"?

Thanks!!

Quentin


=====
-- Quentin Crain

------------------------------------------------
I care desperately about what I do.
Do I know what product I'm selling? No.
Do I know what I'm doing today? No.
But I'm here and I'm gonna give it my best shot.
-- Hansel

__________________________________
Do you Yahoo!?
The New Yahoo! Shopping - with improved product search
http://shopping.yahoo.com
 
J

John J. Lee

Quentin Crain said:
(First, I would like to mention I did try to google
for the answer here!)

Say I am populating a dictionary with a list and
appending. I have written it thusly:
[...]

You want the dict.setdefault method.


John
 
S

Skip Montanaro

John> You want the dict.setdefault method.

d.setdefault() never made any sense to me (IOW, to use it I always had to
look it up). The semantics of what it does just never stick in my brain.
Consequently, even though it's less efficient I generally write such loops
like this:

d = {}
for (key, val) in some_items:
lst = d.get(key) or []
lst.append(val)
d[key] = lst

Note that the first statement of the loop is correct (though perhaps not
obvious at first glance), since once initialized, d[key] never tests as
False. FYI, timeit tells the performace tale:

% timeit.py -s 'd={}' 'x = d.setdefault("x", [])'
1000000 loops, best of 3: 1.82 usec per loop
% timeit.py -s 'd={}' 'x = d.get("x") or [] ; d["x"] = x'
100000 loops, best of 3: 2.34 usec per loop

But my way isn't bad enough for me to change. ;-)

Skip
 
T

Terry Reedy

Quentin Crain said:
Hello again All!

(First, I would like to mention I did try to google
for the answer here!)

The question you asked is frequent, but hard to isolate, and has
special-case answer (to question you did not ask) even harder to find.
Say I am populating a dictionary with a list and
appending. I have written it thusly:

d={}
for (k,v) in somedata():
try:
d[k].append(v)
except KeyError:
d[k]=[v]

I could have written:

d={}
for (k,v) in somedata():
if (k in d):
d[k].append(v)
else:
d[k]=[v]


Which is perferred and why?

Neither. For me, both are superceded by

d={}
for (k,v) in somedata():
d[k] = d.get(k, []).append(v)

Read Library Reference 2.2.7 Mapping Types to learn current dict
methods.
Which is "faster"?

Tradeoff is small extra overhead every loop (the condition) versus
'occasional' big overhead (exception catching). Choice depends on
frequency of exceptions. As I remember, one data-based rule of thumb
from years ago is to use conditional if frequency more that 10%. You
could try new timeit() on all three versions.

Terry J. Reedy
 
G

Gerrit Holl

Skip said:
John> You want the dict.setdefault method.

d.setdefault() never made any sense to me (IOW, to use it I always had to
look it up).

I have had a wrong idea about setdefault for a very long time. To me, it
sounds like: "set this value the default value of dict, so after this
call, let each non-existing key result in this value".

Gerrit.
 
D

Dave Benjamin

Quentin Crain said:
Which is perferred and why?

Neither. For me, both are superceded by

d={}
for (k,v) in somedata():
d[k] = d.get(k, []).append(v)

Not quite... append returns None, so you'll need to write that as two
separate statements, ie.:

d[k] = d.get(k, [])
d[k].append(v)

Or, just:

d.setdefault(k, []).append(v)
 
P

Peter Otten

Terry said:
Neither. For me, both are superceded by

d={}
for (k,v) in somedata():
d[k] = d.get(k, []).append(v)

This would have to look up the key twice, so it has no advantage over

if k in d:
d[k].append(v)
else:
d[k] = [v]

Anyway, list.append() returns always None, so it does not work.
I think you mean

d = {}
for (k, v) in somedata:
d.setdefault(k, []).append(v)

There is a small overhead for throwing away a new list object if the key is
already in the dictionary, but I don't really care.
(Or is the compiler smart enough to reuse the empty list?)

Peter
 
A

Alex Martelli

Skip Montanaro wrote:
...
% timeit.py -s 'd={}' 'x = d.setdefault("x", [])'
1000000 loops, best of 3: 1.82 usec per loop
% timeit.py -s 'd={}' 'x = d.get("x") or [] ; d["x"] = x'
100000 loops, best of 3: 2.34 usec per loop

But my way isn't bad enough for me to change. ;-)

Actually, you can still do a bit better w/o using setdefault:

[alex@lancelot pop]$ timeit.py -s'd={}' 'x=d.setdefault("x",[])'
1000000 loops, best of 3: 0.925 usec per loop
[alex@lancelot pop]$ timeit.py -s'd={}' 'x=d.get("x") or []; d["x"]=x'
1000000 loops, best of 3: 1.21 usec per loop
[alex@lancelot pop]$ timeit.py -s'd={}' 'x=d.get("x",[]); d["x"]=x'
1000000 loops, best of 3: 1.13 usec per loop

as d.get takes a second optional argument, you can still save the 'or'.


Alex
 
T

Terry Reedy

Terry Reedy said:
Read Library Reference 2.2.7 Mapping Types to learn current dict
methods.

Seeing the other responses, I see I need to do the same and read about
setdefault() ;-)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,579
Members
45,053
Latest member
BrodieSola

Latest Threads

Top