Antoon said:
class Tree:
def __lt__(self, term):
return set(self.iteritems()) < set(term.iteritems())
def __eq__(self, term):
return set(self.iteritems()) == set(term.iteritems())
Would this be a correct definition of the desired behaviour?
No.
In [1]: {1:2} < {3:4}
Out[1]: True
In [2]: set({1:2}.iteritems()) < set({3:4}.iteritems())
Out[2]: False
The function dict_compare in dictobject.c .
Well there's a really helpful answer. I'm intrigued, Robert - since you
know the real answer to this question, why did you choose to tell the
Antoon that he was wrong, not tell him in what way he was wrong, certainly
not tell him how to be right, but just tell him to read the source, rather
than simply telling him what you knew? Still, at least you told him which
file to look in. And if he knows python but not C, or gets lost in the
byzantine workings of the interpreter, well, that's his own fault, i
guess.
So, Antoon, firstly, your implementation of __eq__ is, i believe, correct.
Your implementation of __lt__ is, sadly, not. While sets take "<" to mean
"is a proper subset of", for dicts, it's a more conventional comparison
operation, which constitutes a total ordering over all dicts (so you can
sort with it, for example). However, since dicts don't really have a
natural total ordering, it is ever so slightly arbitrary.
The rules for ordering on dicts are, AFAICT:
- If one dict has fewer elements than the other, it's the lesser
- If not, find the smallest key for which the two dicts have different
values (counting 'not present' as a value)
-- If there is no such key, the dicts are equal
-- If the key is present in one dict but not the other, the dict in which
it is present is the lesser
-- Otherwise, the dict in which the value is lesser is itself the lesser
In code:
def dict_cmp(a, b):
diff = cmp(len(a), len(b))
if (diff != 0):
return diff
for key in sorted(set(a.keys() + b.keys())):
if (key not in a):
return 1
if (key not in b):
return -1
diff = cmp(a[key], b[key])
if (diff != 0):
return diff
return 0
I assume your tree has its items sorted by key value; that means there's
an efficient implementation of this using lockstep iteration over the two
trees being compared.
Another way of looking at it is in terms of list comparisons: comparing
two dicts is the same as comparing the sorted list of keys in each dict,
breaking ties by looking at the list of values, in order of their keys.
There's a quirk, in that a shorter dict is always less than a longer dict,
regardless of the elements.
In code:
def dict_cmp_alternative(a, b):
diff = cmp(len(a), len(b))
if (diff != 0):
return diff
ka = sorted(a.keys())
kb = sorted(b.keys())
diff = cmp(ka, kb)
if (diff != 0):
return diff
va = [a[k] for k in ka]
vb = [b[k] for k in kb]
return cmp(va, vb)
Hope this helps.
tom
PS In case it's of any use to you, here's the code i used to test these:
import random
def rnd(n):
return random.randint(0, n)
def randomdict(maxlen=20, range=100):
return dict((rnd(range), rnd(range)) for x in xrange(rnd(maxlen)))
def test(cmp2, n=1000):
for i in xrange(n):
a = randomdict()
b = randomdict()
if ((cmp(a, b)) != (cmp2(a, b))):
raise AssertionError, (a, b)