dictionary keys, __hash__, __cmp__

  • Thread starter =?ISO-8859-1?Q?Jan-Erik_Meyer-L=FCtgens?=
  • Start date
?

=?ISO-8859-1?Q?Jan-Erik_Meyer-L=FCtgens?=

In the Python Language Reference, I found the following
statements about using objects as dictionary keys:

1. "__hash__() should return a 32-bit integer."

2. "The only required property is that objects which
compare equal have the same hash value."

3. "If a class does not define a __cmp__() method it
should not define a __hash__() operation either."


Can I asume that:

-- it is guaranteed that id(obj) returns a unique 32-bit integer

-- keys are interchangeable (equivalent),
if the following is valid:

hash(key1) == hash(key2) and key1 == key2

-- I can ignore the 2nd statement, if I am aware of
the fact that: if objects are equal it dosn't mean that
they are the same key.

-- I can savely ignore the 3rd statement, because python
falls back to cmp(id(obj1), id(obj2)), if __cmp__()
is not defined.
 
M

Miika Keskinen

In the Python Language Reference, I found the following statements about
using objects as dictionary keys:

1. "__hash__() should return a 32-bit integer."

2. "The only required property is that objects which
compare equal have the same hash value."

3. "If a class does not define a __cmp__() method it
should not define a __hash__() operation either."


Can I asume that:

-- it is guaranteed that id(obj) returns a unique 32-bit integer

Yes it is - id returns current address of object in memory.

**SNIP**
Help on built-in function id:

id(...)
id(object) -> integer

Return the identity of an object. This is guaranteed to be unique
among simultaneously existing objects. (Hint: it's the object's
memory address.)

**SNIP**
-- keys are interchangeable (equivalent),
if the following is valid:

hash(key1) == hash(key2) and key1 == key2

Yes. note that key1 == key2 implies hash(key1) == hash(key2), but if
hash(key1) == hash(key2) there is no guarantee that key1 == key2. If you
do define __hash__() build-in function hash would use it and thus you need
to be sure about quality of your hash-method. However if you do not define
__hash__ hash() will just fall back into id() that is guaranteed to be
unique.

-- I can ignore the 2nd statement, if I am aware of
the fact that: if objects are equal it dosn't mean that they are
the same key.

So you're introducing scenario where different objects are considered
equal in means of __cmp__ while having different hash. I think that's not
normal.

Since if you do not define __cmp__ / __hash__ python will use id and thus
second rule is valid.

If you define cmp but not hash you will most likely get TypeError. If you
define both you should follow second rule as if I'm right most of the
internal data structures will depend on second rule.

-- I can savely ignore the 3rd statement, because python
falls back to cmp(id(obj1), id(obj2)), if __cmp__() is not defined.

Yes. id(obj1) != id(obj2), so obj1 != obj2. Only requirement left is that
__hash__() returns 32 bit integer. Personally i would emphasis word SHOULD
NOT in third rule. I'm sure there is situations where it's perfectly
normal to use id-value's and custom hashes. Anyways you can redefine
__cmp__() simply to (and thus avoiding to break against third rule):

def __cmp__(self, other):
return id(self).__cmp__(id(other))
 
?

=?ISO-8859-1?Q?Jan-Erik_Meyer-L=FCtgens?=

Miika said:
Yes. note that key1 == key2 implies hash(key1) == hash(key2)

.... only if I define __cmp__() and __hash__() appropriate
(and not breaking the 2nd rule)

So you're introducing scenario where different objects are considered
equal in means of __cmp__ while having different hash. I think that's not
normal.
That's the point. I know that this in not normal. But is it possible?
Yes. id(obj1) != id(obj2), so obj1 != obj2. Only requirement left is that
__hash__() returns 32 bit integer. Personally i would emphasis word SHOULD
NOT in third rule. I'm sure there is situations where it's perfectly
normal to use id-value's and custom hashes. Anyways you can redefine
__cmp__() simply to (and thus avoiding to break against third rule):

def __cmp__(self, other):
return id(self).__cmp__(id(other))

But I want __cmp__() for another comparison. As an real world example
I have a file object which should be stored in a dictionary. __cmp__()
compare the contents of two files. Thus I must define the __hash__()
method. I use id(obj) as the hash function.

So I've breaking the rule: Objects which compare equal have the same hash.

I use the file objects as dictionary keys and it seems to work.
My assumption is that keys are interchangeable (equivalent), if
hash(key1) == hash(key2) and key1 == key2. In my example keys are
equivalent, when they are identical.
id(file_object1) == id(file_object2) and file1 have the same contents
as file2.

Is this assumption valid? Or is there some sophistication in the
python implemention, that break things sometimes? Resulting in
deletion of important files :)
 
M

Michael Hudson

Jan-Erik Meyer-Lütgens said:
In the Python Language Reference, I found the following
statements about using objects as dictionary keys:

1. "__hash__() should return a 32-bit integer."

2. "The only required property is that objects which
compare equal have the same hash value."

3. "If a class does not define a __cmp__() method it
should not define a __hash__() operation either."


Can I asume that:

-- it is guaranteed that id(obj) returns a unique 32-bit integer

No. For instance it doesn't on a 64-bit platform (and on a ridiculous
64 bit platform *cough*Win64*cough* it might even return a long).
-- keys are interchangeable (equivalent),
if the following is valid:

hash(key1) == hash(key2) and key1 == key2

Not quite sure what you mean by equivalent, but if that is true
a key inserted under key1 can be retrieved with key2.
-- I can ignore the 2nd statement, if I am aware of
the fact that: if objects are equal it dosn't mean that
they are the same key.

Um, I don't understand you, but I think the answer is "no".

If you ever have a situation where hash(a) != hash(b) but a == b then
you are very much breaking the rules.
-- I can savely ignore the 3rd statement, because python
falls back to cmp(id(obj1), id(obj2)), if __cmp__()
is not defined.

Don't understand.

Cheers,
mwh
 
M

Michael Hudson

Michael Hudson said:
Um, I don't understand you, but I think the answer is "no".

If you ever have a situation where hash(a) != hash(b) but a == b then
you are very much breaking the rules.

Having thought about it a bit more, I think the way you're doing
things *is* actually safe given a platform where sizeof(int) ==
sizeof(PyObject*) and the current implementation, but sheesh, I
wouldn't want to rely on it.

Cheers,
mwh
 
M

Michael Hudson

Jan-Erik Meyer-Lütgens said:
Ok, let me ask the following question: What is the reason
for that rule?

Well, the idea is that dictionary lookup is done by equality. If you
don't define __cmp__ then equality is in fact identity (well, that's
very nearly true...) so the default implementation of __hash__ (based
on the pointer address) is as good as it can get (you have hash
equality iff you have object equality).

I think.

Cheers,
mwh
 
?

=?ISO-8859-1?Q?Jan-Erik_Meyer-L=FCtgens?=

Michael said:
Well, the idea is that dictionary lookup is done by equality. If you
don't define __cmp__ then equality is in fact identity (well, that's
very nearly true...) so the default implementation of __hash__ (based

....and the full truth is? :)
on the pointer address) is as good as it can get (you have hash
equality iff you have object equality).

I think.

So, this rule is a hint, only. It could break performance,
not functionality, if I define my own hash function, right?


To make things totally clear, I repeat my question:

The only thing to be aware of when working with keys (besides
of object immutability) seems the following:

Keys are equivalent (in the sense of: a key inserted under
key1 can be retrieved with key2), if this is valid:

hash(key1) == hash(key2) and key1 == key2

Is this guaranteed? In future implementations?
 
M

Michael Hudson

Jan-Erik Meyer-Lütgens said:
...and the full truth is? :)

Oh, something to do with __coerce__ and instances of old-style classes
(well, I didn't mention __eq__ either, but I assume you know about
that).
So, this rule is a hint, only. It could break performance,
not functionality, if I define my own hash function, right?


To make things totally clear, I repeat my question:

The only thing to be aware of when working with keys (besides
of object immutability) seems the following:

Keys are equivalent (in the sense of: a key inserted under
key1 can be retrieved with key2), if this is valid:

hash(key1) == hash(key2) and key1 == key2

Is this guaranteed?
Yes.

In future implementations?

One can't predict the entire future of course -- hey, maybe dicts will
be splay trees or something one day -- but *I* would be happy
depending on it.

Cheers,
mwh
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,537
Members
45,020
Latest member
GenesisGai

Latest Threads

Top