Patch : doct.merge

N

Nicolas Lehuen

Hi,

I've posted this patch on Source forge :

http://sourceforge.net/tracker/index.php?func=detail&aid=1391204&group_id=5470&atid=305470

If you want to update a dictionary with another one, you can simply use
update :

a = dict(a=1,c=3)
b = dict(a=0,b=2)
a.update(b)
assert a == dict(a=0,b=2,c=3)

However, sometimes you want to merge the second dict into the first,
all while keeping the values that are already defined in the first.
This is useful if you want to insert default values in the dictionary
without overriding what is already defined.

Currently this can be done in a few different ways, but all are awkward
and/or inefficient :

a = dict(a=1,c=3)
b = dict(a=0,b=2)

Method 1:
for k in b:
if k not in a:
a[k] = b[k]

Method 2:
temp = dict(b)
temp.update(a)
a = temp

This patch adds a merge() method to the dict object, with the same
signature and usage as the update() method. Under the hood, it simply
uses PyDict_Merge() with the override parameter set to 0 instead of 1.
There's nothing new, therefore : the C API already provides this
functionality (though it is not used in the dictobject.c scope), so why
not expose it ? The result is :

a = dict(a=1,c=3)
b = dict(a=0,b=2)
a.merge(b)
assert a == dict(a=1,b=2,c=3)

Does this seem a good idea to you guys ?

Regards,
Nicolas
 
N

Nicolas Lehuen

Here's method 3 :

# Python 2.3 (no generator expression)
a.update([(k,v) for k,v in b.iteritems() if k not in a])

# Python 2.4 (with generator expression)
a.update((k,v) for k,v in b.iteritems() if k not in a)

It's a bit cleaner but still less efficient than using what's already
in the PyDict_Merge C API. It's even less efficient than method 1 and 2
! Here is the benchmark I used :

import timeit

init = '''a = dict((i,i) for i in xrange(1000) if i%2==0); b =
dict((i,i+1) for i in xrange(1000))'''

t = timeit.Timer('''for k in b:\n\tif k not in a:\n\t\ta[k] =
b[k]''',init)
print 'Method 1 : %.3f'%t.timeit(10000)

t = timeit.Timer('''temp = dict(b); temp.update(a); a = temp''',init)
print 'Method 2 : %.3f'%t.timeit(10000)

t = timeit.Timer('''a.update((k,v) for k,v in b.iteritems() if k not in
a)''',init)
print 'Method 3 : %.3f'%t.timeit(10000)

t = timeit.Timer('''a.merge(b)''',init)
print 'Using dict.merge() : %.3f'%t.timeit(10000)

Here are the results :

Method 1 : 5.315
Method 2 : 3.855
Method 3 : 7.815
Using dict.merge() : 1.425

So using generator expressions is a bad idea, and using the new
dict.merge() method gives an appreciable performance boost (~ x 3.73
here).

Regards,
Nicolas
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,578
Members
45,052
Latest member
LucyCarper

Latest Threads

Top