unintuitive dict timings

B

Bob van der Poel

There was a thread a few days ago about populating a dict/list. I've got
a bit of bottleneck in my program so grabbed the ideas and did a timing
test....

----------

import time
import random

# Test of various methods to add an item to a list in a dict.

def ifelse(key, val):
if d.has_key(key):
d[key].append(val)
else:
d[key]=[val]

def get1(key,val):
l=d.get(key) or []
l.append(val)
d[key]=l

def get2(key,val):
l=d.get(key, [])
l.append(val)
d[key]=l

for p in (ifelse, get1, get2):
tm=time.clock()
d={}
kc=1000
print "Function: %s Keycount %s" % (p, kc)
for t in range(100000):
key=random.randrange(kc)
val=random.randrange(999)
p(key,val)

print "Time", time.clock()-tm

------------

Before running, my thoughts were that get2() would have been the fastest
(due to the fact there were less statements) and ifelse() would have
been the slowest. Surprisingly (to me) ifelse() is the fastest() and
get2() is the slowest.

Which begs the question...is there a faster way? BTW, interesting (again
to me) that the most readable is also the fastest.
 
L

Ludovico Magnocavallo

Bob said:
# Test of various methods to add an item to a list in a dict.

def ifelse(key, val):
if d.has_key(key):
d[key].append(val)
else:
d[key]=[val]

You could also add:

def ifelse2(key, val):
if key in d:
d[key].append(val)
else:
d[key]=[val]

which on my machine is faster than either get1 or get2, and slightly
slower than ifelse.

Ludo
 
A

Alex Martelli

Bob van der Poel wrote:
...
Before running, my thoughts were that get2() would have been the fastest
(due to the fact there were less statements) and ifelse() would have
been the slowest. Surprisingly (to me) ifelse() is the fastest() and
get2() is the slowest.

Not necessarily -- pulling out the generation of keys and values to ensure
every function is tested on the same ones:

kc=1000
keys_and_values = [
(random.randrange(kc), random.randrange(999))
for t in xrange(100000)
]
for p in (get2, get1, ifelse):
tm=time.clock()
d={}
print "Function: %s Keycount %s" % (p, kc)
for key, val in keys_and_values:
p(key,val)

print "Time", time.clock()-tm

the numbers are a bit too close to call given the precision of measurement:

[alex@lancelot pop]$ python ti.py
Function: <function get2 at 0x402dced4> Keycount 1000
Time 0.43
Function: <function get1 at 0x402dce9c> Keycount 1000
Time 0.44
Function: <function ifelse at 0x402dce64> Keycount 1000
Time 0.41

Which begs the question...is there a faster way? BTW, interesting (again
to me) that the most readable is also the fastest.

def sede(key, val):
d.setdefault(key, []).append(val)

appears to be very marginally fastest, but again it's a close call:

[alex@lancelot pop]$ python ti.py
Function: <function sede at 0x402dcf44> Keycount 1000
Time 0.37
Function: <function get2 at 0x402dcf0c> Keycount 1000
Time 0.45
Function: <function get1 at 0x402dced4> Keycount 1000
Time 0.44
Function: <function ifelse at 0x402dce9c> Keycount 1000
Time 0.4

I think you're probably best-advised to stay with what you deem most
readable: a perhaps-10%-win can hardly justify losing readability. As
for me, I find setdefault quite readable despite the ugly name. I sure
wish we DID have a way to tell a method to be lazy wrt one of its
arguments, tho;-).


Alex
 
C

Charl P. Botha

been the slowest. Surprisingly (to me) ifelse() is the fastest() and
get2() is the slowest.

Which begs the question...is there a faster way? BTW, interesting (again

<pedantic hat on>
Please don't use "begging the question" when you actually mean "raising the
question". See here for the *correct* meaning and use of "begging the
question": http://skepdic.com/begging.html
<pedantic hat off>
 
B

Bob van der Poel

Alex said:
def sede(key, val):
d.setdefault(key, []).append(val)

appears to be very marginally fastest, but again it's a close call:

Ahhh, an example of setdefault(). I looked at this in the docs but it
really didn't make much sense.... I'll give this a try just to see what
difference it makes.

Thanks.
 
D

Dave Benjamin

Alex said:
def sede(key, val):
d.setdefault(key, []).append(val)

appears to be very marginally fastest, but again it's a close call:

Ahhh, an example of setdefault(). I looked at this in the docs but it
really didn't make much sense.... I'll give this a try just to see what
difference it makes.

Another solution would be to subclass dict to make a dictionary that
automatically creates lists as needed. For example:

class dict_of_lists(dict):
def __getitem__(self, key):
try:
return dict.__getitem__(self, key)
except KeyError:
self[key] = []
return self[key]

dol = dict_of_lists()
dol['a'].append(1)
dol['b'].append(2)
dol['c'].append(3)
dol['c'].append(4)
dol['c'].append(5)
print dol

Output: {'a': [1], 'c': [3, 4, 5], 'b': [2]}
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top