dictionary interface

A

Antoon Pardon

I'm writing a Tree class, which should behave a lot like a dictionary.

In order to test this, I took the unittest from the source distribution
for dictionaries and used it to test against my Tree class.

Things are working out rather well, but I stumbled on a problem.

this unittest tries to test for '==' and '<' operators. However I
couldn't find anything in the documentation that defined how
dictionaries should behave with respect to these operators.

For the moment the best I can come up with is something like
the following:

class Tree:

def __lt__(self, term):
return set(self.iteritems()) < set(term.iteritems())

def __eq__(self, term):
return set(self.iteritems()) == set(term.iteritems())

Would this be a correct definition of the desired behaviour?

Anyone a reference?
 
R

Robert Kern

Antoon said:
I'm writing a Tree class, which should behave a lot like a dictionary.

In order to test this, I took the unittest from the source distribution
for dictionaries and used it to test against my Tree class.

Things are working out rather well, but I stumbled on a problem.

this unittest tries to test for '==' and '<' operators. However I
couldn't find anything in the documentation that defined how
dictionaries should behave with respect to these operators.

For the moment the best I can come up with is something like
the following:

class Tree:

def __lt__(self, term):
return set(self.iteritems()) < set(term.iteritems())

def __eq__(self, term):
return set(self.iteritems()) == set(term.iteritems())

Would this be a correct definition of the desired behaviour?

No.

In [1]: {1:2} < {3:4}
Out[1]: True

In [2]: set({1:2}.iteritems()) < set({3:4}.iteritems())
Out[2]: False
Anyone a reference?

The function dict_compare in dictobject.c .

--
Robert Kern
(e-mail address removed)

"In the fields of hell where the grass grows high
Are the graves of dreams allowed to die."
-- Richard Harter
 
T

Tom Anderson

Antoon said:
class Tree:

def __lt__(self, term):
return set(self.iteritems()) < set(term.iteritems())

def __eq__(self, term):
return set(self.iteritems()) == set(term.iteritems())

Would this be a correct definition of the desired behaviour?

No.

In [1]: {1:2} < {3:4}
Out[1]: True

In [2]: set({1:2}.iteritems()) < set({3:4}.iteritems())
Out[2]: False
Anyone a reference?

The function dict_compare in dictobject.c .

Well there's a really helpful answer. I'm intrigued, Robert - since you
know the real answer to this question, why did you choose to tell the
Antoon that he was wrong, not tell him in what way he was wrong, certainly
not tell him how to be right, but just tell him to read the source, rather
than simply telling him what you knew? Still, at least you told him which
file to look in. And if he knows python but not C, or gets lost in the
byzantine workings of the interpreter, well, that's his own fault, i
guess.

So, Antoon, firstly, your implementation of __eq__ is, i believe, correct.

Your implementation of __lt__ is, sadly, not. While sets take "<" to mean
"is a proper subset of", for dicts, it's a more conventional comparison
operation, which constitutes a total ordering over all dicts (so you can
sort with it, for example). However, since dicts don't really have a
natural total ordering, it is ever so slightly arbitrary.

The rules for ordering on dicts are, AFAICT:

- If one dict has fewer elements than the other, it's the lesser
- If not, find the smallest key for which the two dicts have different
values (counting 'not present' as a value)
-- If there is no such key, the dicts are equal
-- If the key is present in one dict but not the other, the dict in which
it is present is the lesser
-- Otherwise, the dict in which the value is lesser is itself the lesser

In code:

def dict_cmp(a, b):
diff = cmp(len(a), len(b))
if (diff != 0):
return diff
for key in sorted(set(a.keys() + b.keys())):
if (key not in a):
return 1
if (key not in b):
return -1
diff = cmp(a[key], b[key])
if (diff != 0):
return diff
return 0

I assume your tree has its items sorted by key value; that means there's
an efficient implementation of this using lockstep iteration over the two
trees being compared.

Another way of looking at it is in terms of list comparisons: comparing
two dicts is the same as comparing the sorted list of keys in each dict,
breaking ties by looking at the list of values, in order of their keys.
There's a quirk, in that a shorter dict is always less than a longer dict,
regardless of the elements.

In code:

def dict_cmp_alternative(a, b):
diff = cmp(len(a), len(b))
if (diff != 0):
return diff
ka = sorted(a.keys())
kb = sorted(b.keys())
diff = cmp(ka, kb)
if (diff != 0):
return diff
va = [a[k] for k in ka]
vb = [b[k] for k in kb]
return cmp(va, vb)

Hope this helps.

tom

PS In case it's of any use to you, here's the code i used to test these:

import random

def rnd(n):
return random.randint(0, n)

def randomdict(maxlen=20, range=100):
return dict((rnd(range), rnd(range)) for x in xrange(rnd(maxlen)))

def test(cmp2, n=1000):
for i in xrange(n):
a = randomdict()
b = randomdict()
if ((cmp(a, b)) != (cmp2(a, b))):
raise AssertionError, (a, b)
 
R

Robert Kern

Tom said:
Antoon said:
class Tree:

def __lt__(self, term):
return set(self.iteritems()) < set(term.iteritems())

def __eq__(self, term):
return set(self.iteritems()) == set(term.iteritems())

Would this be a correct definition of the desired behaviour?

No.

In [1]: {1:2} < {3:4}
Out[1]: True

In [2]: set({1:2}.iteritems()) < set({3:4}.iteritems())
Out[2]: False
Anyone a reference?

The function dict_compare in dictobject.c .

Well there's a really helpful answer.

Well, *I* thought it was. Maybe not "really" helpful, but certainly a
healthy start.
I'm intrigued, Robert - since you
know the real answer to this question, why did you choose to tell the
Antoon that he was wrong, not tell him in what way he was wrong, certainly
not tell him how to be right, but just tell him to read the source, rather
than simply telling him what you knew?

Because I *didn't* know. I *still* don't know. I just know that the
implementation of __lt__ was wrong as I demonstrated by applying the
given algorithm to real dictionaries. I *do* know where to find that
information: the source. So I told him absolutely everything that I knew
on the subject. I couldn't tell him anything more except by trudging
through the details of the source myself, but I'm not particularly
interested in learning those details myself, so I didn't bother.

What do you want? Personalized Python tutorials delivered by candygram?
A detailed comparison of the various partial ordering schemes that could
have been used? My first born son?
Still, at least you told him which
file to look in.

Yes, I figured it was the polite, helpful thing to do. Apparently, I
shouldn't have bothered.
And if he knows python but not C, or gets lost in the
byzantine workings of the interpreter, well, that's his own fault, i
guess.

It's certainly not mine.

--
Robert Kern
(e-mail address removed)

"In the fields of hell where the grass grows high
Are the graves of dreams allowed to die."
-- Richard Harter
 
A

Antoon Pardon

Op 2005-10-05 said:
Antoon said:
class Tree:

def __lt__(self, term):
return set(self.iteritems()) < set(term.iteritems())

def __eq__(self, term):
return set(self.iteritems()) == set(term.iteritems())

Would this be a correct definition of the desired behaviour?

No.

In [1]: {1:2} < {3:4}
Out[1]: True

In [2]: set({1:2}.iteritems()) < set({3:4}.iteritems())
Out[2]: False
Anyone a reference?

The function dict_compare in dictobject.c .

Well there's a really helpful answer. I'm intrigued, Robert - since you
know the real answer to this question, why did you choose to tell the
Antoon that he was wrong, not tell him in what way he was wrong, certainly
not tell him how to be right, but just tell him to read the source, rather
than simply telling him what you knew? Still, at least you told him which
file to look in. And if he knows python but not C, or gets lost in the
byzantine workings of the interpreter, well, that's his own fault, i
guess.

So, Antoon, firstly, your implementation of __eq__ is, i believe, correct.

Your implementation of __lt__ is, sadly, not. While sets take "<" to mean
"is a proper subset of", for dicts, it's a more conventional comparison
operation, which constitutes a total ordering over all dicts (so you can
sort with it, for example). However, since dicts don't really have a
natural total ordering, it is ever so slightly arbitrary.

The rules for ordering on dicts are, AFAICT:

- If one dict has fewer elements than the other, it's the lesser
- If not, find the smallest key for which the two dicts have different
values (counting 'not present' as a value)
-- If there is no such key, the dicts are equal
-- If the key is present in one dict but not the other, the dict in which
it is present is the lesser
-- Otherwise, the dict in which the value is lesser is itself the lesser

In code:

def dict_cmp(a, b):
diff = cmp(len(a), len(b))
if (diff != 0):
return diff
for key in sorted(set(a.keys() + b.keys())):
if (key not in a):
return 1
if (key not in b):
return -1
diff = cmp(a[key], b[key])
if (diff != 0):
return diff
return 0

Thanks for the explanation, but you somehow give me too much.

I have been searching some more and finally stumbled on this:

http://docs.python.org/ref/comparisons.html

Mappings (dictionaries) compare equal if and only if their sorted
(key, value) lists compare equal. Outcomes other than equality are
resolved consistently, but are not otherwise defined.

This seems to imply that the specific method to sort the dictionaries
is unimported (as long as it is a total ordering). So I can use whatever
method I want as long as it is achieves this.

But that is contradicted by the unittest. If you have a unittest for
comparing dictionaries, that means comparing dictionaries has a
testable characteristic and thus is further defined.

So I don't need a full implementation of dictionary comparison,
I need to know in how far such a comparison is defined and
what I can choose.
 
P

Paul Rubin

Antoon Pardon said:
But that is contradicted by the unittest. If you have a unittest for
comparing dictionaries, that means comparing dictionaries has a
testable characteristic and thus is further defined.

No, I don't think so. The unittest makes sure that a particular
implementation works as intended. That doesn't mean that every part
of the of how that particular implementation works is required by the
language definition. It can have some non-required (but
non-forbidden) characteristics and those could still get tested.
 
S

Steve Holden

Antoon said:
Op 2005-10-05 said:
Antoon Pardon wrote:


class Tree:

def __lt__(self, term):
return set(self.iteritems()) < set(term.iteritems())

def __eq__(self, term):
return set(self.iteritems()) == set(term.iteritems())

Would this be a correct definition of the desired behaviour?

No.

In [1]: {1:2} < {3:4}
Out[1]: True

In [2]: set({1:2}.iteritems()) < set({3:4}.iteritems())
Out[2]: False


Anyone a reference?

The function dict_compare in dictobject.c .

Well there's a really helpful answer. I'm intrigued, Robert - since you
know the real answer to this question, why did you choose to tell the
Antoon that he was wrong, not tell him in what way he was wrong, certainly
not tell him how to be right, but just tell him to read the source, rather
than simply telling him what you knew? Still, at least you told him which
file to look in. And if he knows python but not C, or gets lost in the
byzantine workings of the interpreter, well, that's his own fault, i
guess.

So, Antoon, firstly, your implementation of __eq__ is, i believe, correct.

Your implementation of __lt__ is, sadly, not. While sets take "<" to mean
"is a proper subset of", for dicts, it's a more conventional comparison
operation, which constitutes a total ordering over all dicts (so you can
sort with it, for example). However, since dicts don't really have a
natural total ordering, it is ever so slightly arbitrary.

The rules for ordering on dicts are, AFAICT:

- If one dict has fewer elements than the other, it's the lesser
- If not, find the smallest key for which the two dicts have different
values (counting 'not present' as a value)
-- If there is no such key, the dicts are equal
-- If the key is present in one dict but not the other, the dict in which
it is present is the lesser
-- Otherwise, the dict in which the value is lesser is itself the lesser

In code:

def dict_cmp(a, b):
diff = cmp(len(a), len(b))
if (diff != 0):
return diff
for key in sorted(set(a.keys() + b.keys())):
if (key not in a):
return 1
if (key not in b):
return -1
diff = cmp(a[key], b[key])
if (diff != 0):
return diff
return 0


Thanks for the explanation, but you somehow give me too much.

I have been searching some more and finally stumbled on this:

http://docs.python.org/ref/comparisons.html

Mappings (dictionaries) compare equal if and only if their sorted
(key, value) lists compare equal. Outcomes other than equality are
resolved consistently, but are not otherwise defined.

This seems to imply that the specific method to sort the dictionaries
is unimported (as long as it is a total ordering). So I can use whatever
method I want as long as it is achieves this.

But that is contradicted by the unittest. If you have a unittest for
comparing dictionaries, that means comparing dictionaries has a
testable characteristic and thus is further defined.

So I don't need a full implementation of dictionary comparison,
I need to know in how far such a comparison is defined and
what I can choose.
The dict unit tests are probably trying to ensure that the dictionary
ordering doesn't change from version to version, which is probably a
good idea in case someone (foolishly?) deciess to rely on it.

I can't help wondering, though, under what conditions it actually makes
sense to compare two dictionaries for anything other than equality.

regards
Steve
 
A

Antoon Pardon

Op 2005-10-05 said:
No, I don't think so. The unittest makes sure that a particular
implementation works as intended. That doesn't mean that every part
of the of how that particular implementation works is required by the
language definition.

As far as I understand, unittest test for functionality clients should
be able to rely on. They shouldn't be used to test a specific
implementation feature.

The idea is that if you change the implementation, you can quickly
test the functionality is unharmed. But you can't do that if
also specific implementation details are tested for.
It can have some non-required (but
non-forbidden) characteristics and those could still get tested.

That doesn't seem to make sense. If it is not required it shouldn't
be tested for, at least not in a unittest, because otherwise a new
implementation that doesn't have the non-required characteristics
will be rejected.

My tree class is almost finished, but one unittest still fails,
is this a failing of my class (as a replacement for a dictionary)
or is this a non-required characteristic of dictionaries?
 
P

Paul Rubin

Steve Holden said:
I can't help wondering, though, under what conditions it actually
makes sense to compare two dictionaries for anything other than
equality.

You might want to sort a bunch of dictionaries to bring the equal ones
together.
 
P

Paul Rubin

Antoon Pardon said:
My tree class is almost finished, but one unittest still fails,
is this a failing of my class (as a replacement for a dictionary)
or is this a non-required characteristic of dictionaries?

If it were me, I'd treat the language reference manual as
authoritative. YMMV.
 
A

Antoon Pardon

Op 2005-10-05 said:
The dict unit tests are probably trying to ensure that the dictionary
ordering doesn't change from version to version, which is probably a
good idea in case someone (foolishly?) deciess to rely on it.

I doubt that. Just to check I tried the following:

class Tree:

def __lt__(self, term):
return len(self) < len(term)

And the test passed.
I can't help wondering, though, under what conditions it actually makes
sense to compare two dictionaries for anything other than equality.

Yes that is part of the problem, because I can't think of such a
condition it is hard to think of what extra constraints could be
usefull here.

Anyway, I have searched the source of the test for all testing
with regards to < and after some browsing back and fore it seems
it all boils down to the following two tests.

self.assert_(not {} < {})
self.assert_(not {1: 2} < {1L: 2L})
 
S

Steve Holden

Antoon said:
Op 2005-10-05, Steve Holden schreef <[email protected]>: [...]

Anyway, I have searched the source of the test for all testing
with regards to < and after some browsing back and fore it seems
it all boils down to the following two tests.

self.assert_(not {} < {})
self.assert_(not {1: 2} < {1L: 2L})

So there isn't much to do, then! That's good. Seems you can pretty much
choose your own ordering.

It would seem sensible to test a third case, namely

self.assert_(not {1L: 2L} < {1: 2})

regards
Steve
 
T

Tom Anderson

Well, *I* thought it was.

And indeed it was. I'm sorry i was so rude - i must have been in a bad
mood. My apologies.
What do you want? Personalized Python tutorials delivered by candygram?

YES DAMMIT! WITH BIG KISS FROM GUIDO!

tom
 
B

Bengt Richter

Op 2005-10-05 said:
Antoon Pardon wrote:

class Tree:

def __lt__(self, term):
return set(self.iteritems()) < set(term.iteritems())

def __eq__(self, term):
return set(self.iteritems()) == set(term.iteritems())

Would this be a correct definition of the desired behaviour?

No.

In [1]: {1:2} < {3:4}
Out[1]: True

In [2]: set({1:2}.iteritems()) < set({3:4}.iteritems())
Out[2]: False

Anyone a reference?

The function dict_compare in dictobject.c .

Well there's a really helpful answer. I'm intrigued, Robert - since you
know the real answer to this question, why did you choose to tell the
Antoon that he was wrong, not tell him in what way he was wrong, certainly
not tell him how to be right, but just tell him to read the source, rather
than simply telling him what you knew? Still, at least you told him which
file to look in. And if he knows python but not C, or gets lost in the
byzantine workings of the interpreter, well, that's his own fault, i
guess.

So, Antoon, firstly, your implementation of __eq__ is, i believe, correct.

Your implementation of __lt__ is, sadly, not. While sets take "<" to mean
"is a proper subset of", for dicts, it's a more conventional comparison
operation, which constitutes a total ordering over all dicts (so you can
sort with it, for example). However, since dicts don't really have a
natural total ordering, it is ever so slightly arbitrary.

The rules for ordering on dicts are, AFAICT:

- If one dict has fewer elements than the other, it's the lesser
- If not, find the smallest key for which the two dicts have different
values (counting 'not present' as a value)
-- If there is no such key, the dicts are equal
-- If the key is present in one dict but not the other, the dict in which
it is present is the lesser
-- Otherwise, the dict in which the value is lesser is itself the lesser

In code:

def dict_cmp(a, b):
diff = cmp(len(a), len(b))
if (diff != 0):
return diff
for key in sorted(set(a.keys() + b.keys())):
if (key not in a):
return 1
if (key not in b):
return -1
diff = cmp(a[key], b[key])
if (diff != 0):
return diff
return 0

Thanks for the explanation, but you somehow give me too much.

I have been searching some more and finally stumbled on this:

http://docs.python.org/ref/comparisons.html

Mappings (dictionaries) compare equal if and only if their sorted
(key, value) lists compare equal. Outcomes other than equality are
resolved consistently, but are not otherwise defined.
"other outcomes" may not in general mean orderings are defined,
even when == and != are well defined. E.g., below
This seems to imply that the specific method to sort the dictionaries
is unimported (as long as it is a total ordering). So I can use whatever
method I want as long as it is achieves this.

But that is contradicted by the unittest. If you have a unittest for
comparing dictionaries, that means comparing dictionaries has a
testable characteristic and thus is further defined.

So I don't need a full implementation of dictionary comparison,
I need to know in how far such a comparison is defined and
what I can choose.
A couple of data points that may be of interest:
Traceback (most recent call last):
File "<stdin>", line 1, in ?
Traceback (most recent call last):
File "<stdin>", line 1, in ?
True

Regards,
Bengt Richter
 
A

Antoon Pardon

Op 2005-10-05 said:
Antoon said:
Op 2005-10-05, Steve Holden schreef <[email protected]>: [...]

Anyway, I have searched the source of the test for all testing
with regards to < and after some browsing back and fore it seems
it all boils down to the following two tests.

self.assert_(not {} < {})
self.assert_(not {1: 2} < {1L: 2L})

So there isn't much to do, then! That's good. Seems you can pretty much
choose your own ordering.

Yes, I must have misunderstood something because I thought there also
was the following test:

self.assert_({} < {1: 2})

Which is what prompted this thread from me.
It would seem sensible to test a third case, namely

self.assert_(not {1L: 2L} < {1: 2})

I also though about adding the following test.

dlst = range(20)
for i in xrange(20):
dlst = some_ramdom_dict()
sort(dlst)
for i in xrange(19):
for j in xrange(i+1,20):
self.assert_(dlst < dlst[j])


This would test for the consistency of the order, so that if a < b
and b < c that we also have a < c.

What do you think?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,777
Messages
2,569,604
Members
45,218
Latest member
JolieDenha

Latest Threads

Top