What's the cleanest way to compare 2 dictionary?

John Henry · Aug 9, 2006

Hi list,

I am sure there are many ways of doing comparision but I like to see
what you would do if you have 2 dictionary sets (containing lots of
data - like 20000 keys and each key contains a dozen or so of records)
and you want to build a list of differences about these two sets.

I like to end up with 3 lists: what's in A and not in B, what's in B
and not in A, and of course, what's in both A and B.

What do you think is the cleanest way to do it? (I am sure you will
come up with ways that astonishes me :=) )

Thanks,

Paddy · Aug 9, 2006

John said:
Hi list,

I am sure there are many ways of doing comparision but I like to see
what you would do if you have 2 dictionary sets (containing lots of
data - like 20000 keys and each key contains a dozen or so of records)
and you want to build a list of differences about these two sets.

I like to end up with 3 lists: what's in A and not in B, what's in B
and not in A, and of course, what's in both A and B.

What do you think is the cleanest way to do it? (I am sure you will
come up with ways that astonishes me :=) )

Thanks,

I make it 4 bins:
a_exclusive_keys
b_exclusive_keys
common_keys_equal_values
common_keys_diff_values

Something like:

a={1:1, 2:2,3:3,4:4}
b = {2:2, 3:-3, 5:5}
keya=set(a.keys())
keyb=set(b.keys())
a_xclusive = keya - keyb
b_xclusive = keyb - keya
_common = keya & keyb
common_eq = set(k for k in _common if a[k] == b[k])
common_neq = _common - common_eq

If you now simple set arithmatic, it should read OK.

- Paddy.

John Henry · Aug 9, 2006

Paddy said:
John said:

Hi list,

I am sure there are many ways of doing comparision but I like to see
what you would do if you have 2 dictionary sets (containing lots of
data - like 20000 keys and each key contains a dozen or so of records)
and you want to build a list of differences about these two sets.

I like to end up with 3 lists: what's in A and not in B, what's in B
and not in A, and of course, what's in both A and B.

What do you think is the cleanest way to do it? (I am sure you will
come up with ways that astonishes me :=) )

Thanks,

Click to expand...

I make it 4 bins:
a_exclusive_keys
b_exclusive_keys
common_keys_equal_values
common_keys_diff_values

Something like:

a={1:1, 2:2,3:3,4:4}
b = {2:2, 3:-3, 5:5}
keya=set(a.keys())
keyb=set(b.keys())
a_xclusive = keya - keyb
b_xclusive = keyb - keya
_common = keya & keyb
common_eq = set(k for k in _common if a[k] == b[k])
common_neq = _common - common_eq

If you now simple set arithmatic, it should read OK.

- Paddy.

Thanks, that's very clean. Give me good reason to move up to Python
2.4.

John Henry · Aug 9, 2006

John said:
Paddy said:

John said:

Hi list,

I am sure there are many ways of doing comparision but I like to see
what you would do if you have 2 dictionary sets (containing lots of
data - like 20000 keys and each key contains a dozen or so of records)
and you want to build a list of differences about these two sets.

I like to end up with 3 lists: what's in A and not in B, what's in B
and not in A, and of course, what's in both A and B.

What do you think is the cleanest way to do it? (I am sure you will
come up with ways that astonishes me :=) )

Thanks,

Click to expand...

I make it 4 bins:
a_exclusive_keys
b_exclusive_keys
common_keys_equal_values
common_keys_diff_values

Something like:

a={1:1, 2:2,3:3,4:4}
b = {2:2, 3:-3, 5:5}
keya=set(a.keys())
keyb=set(b.keys())
a_xclusive = keya - keyb
b_xclusive = keyb - keya
_common = keya & keyb
common_eq = set(k for k in _common if a[k] == b[k])
common_neq = _common - common_eq

If you now simple set arithmatic, it should read OK.

- Paddy.

Click to expand...

Thanks, that's very clean. Give me good reason to move up to Python
2.4.

Oh, wait, works in 2.3 too.

Just have to:

from sets import Set as set

John Machin · Aug 9, 2006

John said:
Hi list,

I am sure there are many ways of doing comparision but I like to see
what you would do if you have 2 dictionary sets (containing lots of
data - like 20000 keys and each key contains a dozen or so of records)
and you want to build a list of differences about these two sets.

I like to end up with 3 lists: what's in A and not in B, what's in B
and not in A, and of course, what's in both A and B.

What do you think is the cleanest way to do it? (I am sure you will
come up with ways that astonishes me :=) )

Paddy has already pointed out a necessary addition to your requirement
definition: common keys with different values.

Here's another possible addition: you say that "each key contains a
dozen or so of records". I presume that you mean like this:

a = {1: ['rec1a', 'rec1b'], 42: ['rec42a', 'rec42b']} # "dozen" -> 2 to
save typing

Now that happens if the other dictionary contains:

b = {1: ['rec1a', 'rec1b'], 42: ['rec42b', 'rec42a']}

Key 42 would be marked as different by Paddy's classification, but the
values are the same, just not in the same order. How do you want to
treat that? avalue == bvalue? sorted(avalue) == sorted(bvalue)? Oh, and
are you sure the buckets don't contain duplicates? Maybe you need
set(avalue) == set(bvalue). What about 'rec1a' vs 'Rec1a' vs 'REC1A'?

All comparisons are equal, but some comparisons are more equal than
others

Cheers,
John

John Henry · Aug 10, 2006

John,

Yes, there are several scenerios.

a) Comparing keys only.

That's been answered (although I haven't gotten it to work under 2.3
yet)

b) Comparing records.

Now it gets more fun - as you pointed out. I was assuming that there
is no short cut here. If the key exists on both set, and if I wish to
know if the records are the same, I would have to do record by record
comparsion. However, since there are only a handful of records per
key, this wouldn't be so bad. Maybe I just overload the compare
operator or something.

John said:
John said:

Hi list,

I am sure there are many ways of doing comparision but I like to see
what you would do if you have 2 dictionary sets (containing lots of
data - like 20000 keys and each key contains a dozen or so of records)
and you want to build a list of differences about these two sets.

I like to end up with 3 lists: what's in A and not in B, what's in B
and not in A, and of course, what's in both A and B.

What do you think is the cleanest way to do it? (I am sure you will
come up with ways that astonishes me :=) )

Click to expand...

Paddy has already pointed out a necessary addition to your requirement
definition: common keys with different values.

Here's another possible addition: you say that "each key contains a
dozen or so of records". I presume that you mean like this:

a = {1: ['rec1a', 'rec1b'], 42: ['rec42a', 'rec42b']} # "dozen" -> 2 to
save typing

Now that happens if the other dictionary contains:

b = {1: ['rec1a', 'rec1b'], 42: ['rec42b', 'rec42a']}

Key 42 would be marked as different by Paddy's classification, but the
values are the same, just not in the same order. How do you want to
treat that? avalue == bvalue? sorted(avalue) == sorted(bvalue)? Oh, and
are you sure the buckets don't contain duplicates? Maybe you need
set(avalue) == set(bvalue). What about 'rec1a' vs 'Rec1a' vs 'REC1A'?

All comparisons are equal, but some comparisons are more equal than
others

Cheers,
John

Paddy · Aug 10, 2006

John said:
John said:

Hi list,

I am sure there are many ways of doing comparision but I like to see
what you would do if you have 2 dictionary sets (containing lots of
data - like 20000 keys and each key contains a dozen or so of records)
and you want to build a list of differences about these two sets.

I like to end up with 3 lists: what's in A and not in B, what's in B
and not in A, and of course, what's in both A and B.

What do you think is the cleanest way to do it? (I am sure you will
come up with ways that astonishes me :=) )

Click to expand...

Paddy has already pointed out a necessary addition to your requirement
definition: common keys with different values.

Here's another possible addition: you say that "each key contains a
dozen or so of records". I presume that you mean like this:

a = {1: ['rec1a', 'rec1b'], 42: ['rec42a', 'rec42b']} # "dozen" -> 2 to
save typing

Now that happens if the other dictionary contains:

b = {1: ['rec1a', 'rec1b'], 42: ['rec42b', 'rec42a']}

Key 42 would be marked as different by Paddy's classification, but the
values are the same, just not in the same order. How do you want to
treat that? avalue == bvalue? sorted(avalue) == sorted(bvalue)? Oh, and
are you sure the buckets don't contain duplicates? Maybe you need
set(avalue) == set(bvalue). What about 'rec1a' vs 'Rec1a' vs 'REC1A'?

All comparisons are equal, but some comparisons are more equal than
others

Cheers,
John

Hi Johns,
The following is my attempt to give more/deeper comparison info.
Assume you have your data parsed and presented as two dicts a and b
each having as values a dict representing a record.
Further assume you have a function that can compute if two record level
dicts are the same and another function that can compute if two values
in a record level dict are the same.

With a slight modification of my earlier prog we get:

def komparator(a,b, check_equal):
keya=set(a.keys())
keyb=set(b.keys())
a_xclusive = keya - keyb
b_xclusive = keyb - keya
_common = keya & keyb
common_eq = set(k for k in _common if check_equal(a[k],b[k]))
common_neq = _common - common_eq
return (a_xclusive, b_xclusive, common_eq, common_neq)

a_xclusive, b_xclusive, common_eq, common_neq = komparator(a,b,
record_dict__equality_checker)

common_neq = [ (key,
komparator(a[key],b[key], value__equality_checker) )
for key in common_neq ]

Now we get extra info on intra record differences with little extra
code.

Look out though, you could get swamped with data

- Paddy.

John Machin · Aug 10, 2006

John said:
John,

Yes, there are several scenerios.

a) Comparing keys only.

That's been answered (although I haven't gotten it to work under 2.3
yet)

(1) What's the problem with getting it to work under 2.3?
(2) Why not upgrade?

b) Comparing records.

You haven't got that far yet. The next problem is actually comparing
two *collections* of records, and you need to decide whether for
equality purposes the collections should be treated as an unordered
list, an ordered list, a set, or something else. Then you need to
consider how equality of records is to be defined e.g. case sensitive
or not.

Now it gets more fun - as you pointed out. I was assuming that there
is no short cut here. If the key exists on both set, and if I wish to
know if the records are the same, I would have to do record by record
comparsion. However, since there are only a handful of records per
key, this wouldn't be so bad. Maybe I just overload the compare
operator or something.

IMHO, "something" would be better than "overload the compare operator".
In any case, you need to DEFINE what you mean by equality of a
collection of records, *then* implement it.

"only a handful":. Naturally 0 and 1 are special, but otherwise the
number of records in the bag shoudn't really be a factor in your
implementation.

HTH,
John

John Henry · Aug 11, 2006

John said:
(1) What's the problem with getting it to work under 2.3?
(2) Why not upgrade?

Let me comment on this part first, I am still chewing other parts of
your message.

When I do it under 2.3, I get:

common_eq = set(k for k in _common if a[k] == b[k])
^
SyntaxError: invalid syntax

Don't know why that is.

I can't upgrade yet. Some part of my code doesn't compile under 2.4
and I haven't got a chance to investigate further.

Marc 'BlackJack' Rintsch · Aug 11, 2006

John Henry said:
When I do it under 2.3, I get:

common_eq = set(k for k in _common if a[k] == b[k])
^
SyntaxError: invalid syntax

Don't know why that is.

There are no generator expressions in 2.3. Turn it into a list
comprehension::

common_eq = set([k for k in _common if a[k] == b[k]])

Ciao,
Marc 'BlackJack' Rintsch

John Henry · Aug 11, 2006

Thank you. That works.

John Henry said:
John Henry said:

When I do it under 2.3, I get:

common_eq = set(k for k in _common if a[k] == b[k])
^
SyntaxError: invalid syntax

Don't know why that is.

Click to expand...

There are no generator expressions in 2.3. Turn it into a list
comprehension::

common_eq = set([k for k in _common if a[k] == b[k]])

Ciao,
Marc 'BlackJack' Rintsch

Paddy · Aug 11, 2006

I have gone the whole hog and got something thats run-able:

========dict_diff.py=============================

from pprint import pprint as pp

a = {1:{'1':'1'}, 2:{'2':'2'}, 3:dict("AA BB CC".split()), 4:{'4':'4'}}
b = { 2:{'2':'2'}, 3:dict("BB CD EE".split()), 5:{'5':'5'}}
def record_comparator(a,b, check_equal):
keya=set(a.keys())
keyb=set(b.keys())
a_xclusive = keya - keyb
b_xclusive = keyb - keya
_common = keya & keyb
common_eq = set(k for k in _common if check_equal(a[k],b[k]))
common_neq = _common - common_eq
return {"A excl keys":a_xclusive, "B excl keys":b_xclusive,
"Common & eq":common_eq, "Common keys neq
values":common_neq}

comp_result = record_comparator(a,b, dict.__eq__)

# Further dataon common keys, neq values
common_neq = comp_result["Common keys neq values"]
common_neq = [ (key, record_comparator(a[key],b[key], str.__eq__))
for key in common_neq ]
comp_result["Common keys neq values"] = common_neq

print "\na =",; pp(a)
print "\nb =",; pp(b)
print "\ncomp_result = " ; pp(comp_result)

==========================================

When run it gives:

a ={1: {'1': '1'},
2: {'2': '2'},
3: {'A': 'A', 'C': 'C', 'B': 'B'},
4: {'4': '4'}}

b ={2: {'2': '2'}, 3: {'C': 'D', 'B': 'B', 'E': 'E'}, 5: {'5': '5'}}

comp_result =
{'A excl keys': set([1, 4]),
'B excl keys': set([5]),
'Common & eq': set([2]),
'Common keys neq values': [(3,
{'A excl keys': set(['A']),
'B excl keys': set(['E']),
'Common & eq': set(['B']),
'Common keys neq values': set(['C'])})]}

- Paddy.

Whats the Simplest Way to Import MBOX into Exchange Server?	1	Mar 27, 2026
The Best Way to Combine Multiple PST Files in Outlook	4	Jan 25, 2025
Whats the simplest way to convert PST to EML files quickly?	3	Mar 3, 2026
What's the best way to extract 2 values from a CSV file from each row systematically?	6	Sep 23, 2013
Dictionary and List	1	Apr 26, 2021
What is the safest way to combine multiple PST files?	0	Mar 31, 2026
What is the easiest way to move MBOX emails to Yahoo Mail?	1	Apr 22, 2026
How do I create a countdown, and compare todays date/time with a date/time in a database?	0	Feb 5, 2026

What's the cleanest way to compare 2 dictionary?

John Henry

Paddy

John Henry

John Henry

John Machin

John Henry

Paddy

John Machin

John Henry

Marc 'BlackJack' Rintsch

John Henry

Paddy

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads