hashability

J

James Stroud

Hello All,

I wrote the function to test hashability of arbitrary objects. My reason
is that the built-in python (2.5) hashing is too permissive for some
uses. A symptom of this permissiveness comes from the ability to
successfully hash() arbitrary objects:

py> class C(object): pass
...
py> {C():4}[C()]
------------------------------------------------------------
Traceback (most recent call last):
File "<ipython console>", line 1, in <module>
<type 'exceptions.KeyError'>: <__main__.C object at 0xe21610>

The basis for the exception is that the two instances do not have the
same hash() although conceptually they might seem equal to the
unitiated. Were I to re-design python, I'd throw an exception in this
case because of the ill-defined behavior one might expect if a C()
serves as a key for a dict.

To prevent users of one of my libraries from falling into this and
similar traps (which have potentially problematic consequences), I came
up with this test for hashability:

def hashable(k):
try:
hash(k)
except TypeError:
good = False
else:
good = (hasattr(k, '__hash__') and
(hasattr(k, '__eq__') or hasattr(k, '__cmp__')))
return good

It works as I would like for most of the cases I can invent:

py> all(map(hashable, [1,1.0,"",(1,2,3)]))
True
py> any(map(hashable, [None, [1,2], {}, C(), __import__('sys')]))
False

Can anyone think of boundary cases I might be missing with this approach?


James
 
C

Carl Banks

Hello All,

I wrote the function to test hashability of arbitrary objects. My reason
is that the built-in python (2.5) hashing is too permissive for some
uses. A symptom of this permissiveness comes from the ability to
successfully hash() arbitrary objects:

   py> class C(object): pass
   ...
   py> {C():4}[C()]
   ------------------------------------------------------------
   Traceback (most recent call last):
     File "<ipython console>", line 1, in <module>
   <type 'exceptions.KeyError'>: <__main__.C object at 0xe21610>

The basis for the exception is that the two instances do not have the
same hash() although conceptually they might seem equal to the
unitiated. Were I to re-design python, I'd throw an exception in this
case because of the ill-defined behavior one might expect if a C()
serves as a key for a dict.

That's arguably the right thing to do.

Personally I've found that being able to use class instances as
hashable objects to be terribly useful (these objects are hashed and
compared by identity, of course), so I don't mind it. But I can
definitely see how this straddles the line between "practicality" and
"face of ambiguity". And if Python didn't do it by default, it would
be little trouble to add the appropriate __eq__ and __hash__ methods.

To prevent users of one of my libraries from falling into this and
similar traps (which have potentially problematic consequences),

Even so, I would consider whether some of your users might, like me,
also find this terribly useful, and if so (probably a few will unless
they are all novices), allow them to disable or disregard this check.
I came
up with this test for hashability:

def hashable(k):
   try:
     hash(k)
   except TypeError:
     good = False
   else:
     good = (hasattr(k, '__hash__') and
             (hasattr(k, '__eq__') or hasattr(k, '__cmp__')))
   return good

I wouldn't call the function "hashable". Class instances like C() are
hashable whether you approve or not. Something like
"deliberately_hashable" would be a better name.

It works as I would like for most of the cases I can invent:

   py> all(map(hashable, [1,1.0,"",(1,2,3)]))
   True
   py> any(map(hashable, [None, [1,2], {}, C(), __import__('sys')]))
   False

Can anyone think of boundary cases I might be missing with this approach?

It is possible to redefine == operator by defining __ne__ instead of
__eq__, at least on Python 2.5, so you should keep that in mind.


Carl Banks
 
A

Asun Friere

I wrote the function to test hashability of arbitrary objects. My reason
is that the built-in python (2.5) hashing is too permissive for some
uses. A symptom of this permissiveness comes from the ability to
successfully hash() arbitrary objects:

Arbitrary, or anonymous objects and some uses or some users? I'm
can't see why anyone would expect different instance of a class to be
equivalent keys.
The basis for the exception is that the two instances do not have the
same hash() although conceptually they might seem equal to the
unitiated.

Perhaps the best solution would be for the unitiated to correct their
misaprehensions? If you don't understand that you are instantiating a
number of anonymous instances of a class you are missing something
very fundamental.
Were I to re-design python, I'd throw an exception in this
case because of the ill-defined behavior one might expect if a C()
serves as a key for a dict.

Then you couldn't to this:

d = {C():1, C():2, C():3}
a,b,c = d.keys()
d[c]

Anonymous instances are a GoodThing(tm) and they can usually be de-
anonymised if need be.
 
J

James Stroud

Carl said:
Even so, I would consider whether some of your users might, like me,
also find this terribly useful, and if so (probably a few will unless
they are all novices), allow them to disable or disregard this check.

I realize I left out my use. The intent of the function is to flag
objects that will make useful keys for a persistent dictionary. The
{C():4}[C()] example demonstrates why I want to avoid certain types of
keys--I don't want users to do things like {C():4, C():4}, which python
happily allows but falls apart at the level of persistence.

However, I also want to avoid isinstance() and/or type checking in order
to allow maximal flexibility so that users don't have to work too hard
to make their keys.
I wouldn't call the function "hashable". Class instances like C() are
hashable whether you approve or not. Something like
"deliberately_hashable" would be a better name.

I chose "keyable".
It is possible to redefine == operator by defining __ne__ instead of
__eq__, at least on Python 2.5, so you should keep that in mind.

Thanks for this hint. I've already put it in.

James
 
A

Asun Friere

I realize I left out my use. The intent of the function is to flag
objects that will make useful keys for a persistent dictionary. The
{C():4}[C()] example demonstrates why I want to avoid certain types of
keys--I don't want users to do things like {C():4, C():4}, which python
happily allows but falls apart at the level of persistence.

What am I missing here? How, in terms of persistence, is {C():4, C():
4} conceptually different from {'spam':4, 'ham':4}?
 
C

Chris Rebert

2009/8/11 Asun Friere said:
I realize I left out my use. The intent of the function is to flag
objects that will make useful keys for a persistent dictionary. The
{C():4}[C()] example demonstrates why I want to avoid certain types of
keys--I don't want users to do things like {C():4, C():4}, which python
happily allows but falls apart at the level of persistence.

What am I missing here?  How, in terms of persistence, is {C():4, C():
4} conceptually different from {'spam':4, 'ham':4}?

Thought Experiment:
Consider, for each dict, the case of pickling it twice to 2 separate files.
When loaded from both files into the same program, the spam-ham dicts
will work as expected between each other.
The dicts with C()s will not. For example, assuming the unpickled
dicts are `cs1` and `cs2`, cs1[cs2.keys()[0]] will raise KeyError.

Cheers,
Chris
 
A

Asun Friere

You should be more imaginative.

I'm by no means discounting that there might be some actual problem
you're trying to solve here, but I honestly can't see it.

There really is no need to get personal about this, so rather than
asking for a level of imagination from me, (which I apparently lack),
please just explain to me how {one_instance_of_a_hashable_class : val,
another_instance_of_a_hashable_class :val} is conceptually different
{one_instance_of_class_str: val, another_instance_of_class_str: val},
in terms of persistence.

If I am missing something here, I would actually like to know. If on
the other hand, I'm not, then rather at taking umbrage, you might want
instead to save yourself the effort of solving a non-existent problem?
Can you come give a class to my users?

No.

However, I think it's fairly central to the notion of a class that it
is a template for creating different instances which themselves have a
unique identity. And that subsequent calls to a class' constructor
ought to create unique instances of that class (subject perhaps to
implementation tricks such as interning). If it is not obvious that {C
():4}[C()] invovles subsequent calls to C's constructor, then that
very example is itself didactic.
 
C

Chris Rebert

2009/8/11 Asun Friere said:
I realize I left out my use. The intent of the function is to flag
objects that will make useful keys for a persistent dictionary. The
{C():4}[C()] example demonstrates why I want to avoid certain types of
keys--I don't want users to do things like {C():4, C():4}, which python
happily allows but falls apart at the level of persistence.

What am I missing here?  How, in terms of persistence, is {C():4, C():
4} conceptually different from {'spam':4, 'ham':4}?

Consider the case of pickling the data twice to 2 separate files.
When loaded from both files into the same program, the spam-ham dicts
will work as expected.
The C()s will not. For cs1[cs2.keys()[0]] will raise KeyError.

Cheers,
Chris
 
C

Chris Rebert

Apologies for the possible repeated post. Gmail failed to mark the
draft as sent for some reason.

- Chris
 
A

Asun Friere

Thought Experiment:
Consider, for each dict, the case of pickling it twice to 2 separate files.
When loaded from both files into the same program, the spam-ham dicts
will work as expected between each other.
The dicts with C()s will not. For example, assuming the unpickled
dicts are `cs1` and `cs2`, cs1[cs2.keys()[0]] will raise KeyError.

Aha!

cheers.
 
J

James Stroud

Asun said:
I'm by no means discounting that there might be some actual problem
you're trying to solve here, but I honestly can't see it.

There really is no need to get personal about this, so rather than
asking for a level of imagination from me, (which I apparently lack),
please just explain to me how {one_instance_of_a_hashable_class : val,
another_instance_of_a_hashable_class :val} is conceptually different
{one_instance_of_class_str: val, another_instance_of_class_str: val},
in terms of persistence.

Sorry for being a twit. This list used to be quite nice but some people
on this list got pretty abrasive. I couldn't tell you weren't one of
these abrasive people from your post. I stopped giving the benefit of
the doubt many moons ago and promptly enter attack mode at the slightest
hint of abrasiveness. My attitude probably exacerbates the problem--but
it sure does make me feel better.


Anyway, I think the problem has been described better than I'm able, but
once an object goes to the file system, one can not take its hash value
for granted. It is not possible to rely on the ability to recreate any
arbitrary object de-novo and use the recreation as a key in proxy for
the original object. I'm after this "je ne sais pas que l'appeler"
quality of objects to use as keys (or to generate keys) for persistent
dictionaries. Carl Banks suggested that this quality should not be
called "hashable", so I'm calling it "keyable".
 
A

Asun Friere

Sorry for being a twit.

Don't be ridiculous. You haven't been a twit, I have!

I've just had a complete "blonde" moment here (with apologies to any
real blondes out there. My only excuse is that I've been up to 02:30
for the last three nights running (or maybe it's the ageing process, a
cerebrovascular accident or something).

I just checked a class I recently wrote specifically for the purposes
of hashing a dict (in case I had made this error IRL). Wouldn't you
know it, it's subclassed to tuple, and defines both __eq__ and
__cmp__. Luckily when I write production code the guy who knows what
he's doing takes over. And this in an app which compares keys from
different pickled files (representing DB holdings)?! Of all things.

I can't believe how unfathomably stupid I've been. I'm extremely
embarassed. I think I'll just go out and shoot myself now. Or get
some sleep.
 
D

Dennis Lee Bieber

...
py> {C():4}[C()]
------------------------------------------------------------
Traceback (most recent call last):
File "<ipython console>", line 1, in <module>
<type 'exceptions.KeyError'>: <__main__.C object at 0xe21610>

The basis for the exception is that the two instances do not have the
same hash() although conceptually they might seem equal to the
unitiated. Were I to re-design python, I'd throw an exception in this
case because of the ill-defined behavior one might expect if a C()
serves as a key for a dict.
"A" C()... How would you expect

c1 = C()
c2 = C()

{c1:4}[c2]

to behave? That IS the equivalent of your statement -- two instances are
two distinctly different entities...
To prevent users of one of my libraries from falling into this and
similar traps (which have potentially problematic consequences), I came
up with this test for hashability:

def hashable(k):
try:
hash(k)
except TypeError:
good = False
else:
good = (hasattr(k, '__hash__') and
(hasattr(k, '__eq__') or hasattr(k, '__cmp__')))
return good

It works as I would like for most of the cases I can invent:

py> all(map(hashable, [1,1.0,"",(1,2,3)]))
True
py> any(map(hashable, [None, [1,2], {}, C(), __import__('sys')]))
False

Can anyone think of boundary cases I might be missing with this approach?


James
 
A

Asun Friere

c1 = C()
c2 = C()

{c1:4}[c2]

to behave? That IS the equivalent of your statement -- two instances are
two distinctly different entities...

Thankyou, that is EXACTLY the mistake I made that sent me off into
lunacy.

At the risk of further embarassment, the answer is that one would
expect it to behave analogously to:

c1 = int(x)
c2 = int(x)

{c1:4}[c2]

Or is my brain still on vacation?
 
J

James Stroud

Dennis said:
...
py> {C():4}[C()]
------------------------------------------------------------
Traceback (most recent call last):
File "<ipython console>", line 1, in <module>
<type 'exceptions.KeyError'>: <__main__.C object at 0xe21610>

The basis for the exception is that the two instances do not have the
same hash() although conceptually they might seem equal to the
unitiated. Were I to re-design python, I'd throw an exception in this
case because of the ill-defined behavior one might expect if a C()
serves as a key for a dict.
"A" C()... How would you expect

c1 = C()
c2 = C()

{c1:4}[c2]

to behave?


This seems like a subjective question. I think I demonstrated how it
*does* behave, which is purely objective--and I know I can't do anything
about that. But the subjective answer to the subjective question is that
I would like an exception to be raised when the dictionary is
constructed. I know an exception doesn't get thrown in the current
incarnation of the python language. That is the objective reality of the
situation which affronts my subjective notions of how reality should be.

That IS the equivalent of your statement -- two instances are
two distinctly different entities...

Tell that to two different machines on two different days. Then bet the
life of yourself and your nearest and dearest family on that fact and
see whether you really want to take a hash value for granted. If a
property of the python language fails the "bet the lives of your nearest
and dearest on a consistent result" test, I call it "ill defined" and,
subjectively speaking, I prefer exceptions to be thrown--And, by damned,
I'll throw them myself if I have to.

"If it saves one life, it's worth it all."

James
 
S

Steven D'Aprano

Hello All,

I wrote the function to test hashability of arbitrary objects. My reason
is that the built-in python (2.5) hashing is too permissive for some
uses. A symptom of this permissiveness comes from the ability to
successfully hash() arbitrary objects:

No it doesn't.

Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: list objects are unhashable

Python successfully hashes *non*-arbitrary objects, including those that
inherit from object.


py> class C(object): pass
...
py> {C():4}[C()]

Why would you expect that to succeed? The object you are using as the key
is a different instance than the object you are looking up. Since your
instances don't even compare equal, why would you expect them to hash
equal?
False



The basis for the exception is that the two instances do not have the
same hash() although conceptually they might seem equal to the
unitiated.

Yes, well, the ignorant and uninitiated frequently make mistakes. I feel
your pain.


Were I to re-design python, I'd throw an exception in this
case because of the ill-defined behavior one might expect if a C()
serves as a key for a dict.

It's not ill-defined, it's a perfectly reasonable design you just don't
happen to like. It's quite useful for some.

To prevent users of one of my libraries from falling into this and
similar traps (which have potentially problematic consequences), I came
up with this test for hashability:

That's not a test for hashability. That's a test for hashability and
equality testing together.

def hashable(k):
try:
hash(k)
except TypeError:
good = False
else:
good = (hasattr(k, '__hash__') and
(hasattr(k, '__eq__') or hasattr(k, '__cmp__')))
return good

The test for __hash__ is redundant, given that hash() has already
succeeded.

It will wrongly reject classes that deliberately don't define equality,
for whatever reason, and that *expect* the behaviour you are treating as
an error.


It works as I would like for most of the cases I can invent:

py> all(map(hashable, [1,1.0,"",(1,2,3)])) True
py> any(map(hashable, [None, [1,2], {}, C(), __import__('sys')]))
False

Well there you go -- why on earth would you prohibit None as a dictionary
key??? That's a serious failure.
 
S

Steven D'Aprano

Tell that to two different machines on two different days. Then bet the
life of yourself and your nearest and dearest family on that fact and
see whether you really want to take a hash value for granted.

As far as I know, Python doesn't make any guarantees about hashes being
repeatable on different machines, different versions, or even different
runs of the interpreter.

If a
property of the python language fails the "bet the lives of your nearest
and dearest on a consistent result" test, I call it "ill defined" and,
subjectively speaking, I prefer exceptions to be thrown--And, by damned,
I'll throw them myself if I have to.

"If it saves one life, it's worth it all."

Depends on the life, and the cost. Would you pay a million dollars from
your own pocket to save the life of a 119 year old with advanced lung
cancer, a bad heart, and a raging infection of Ebola, from choking on a
fish bone?

What if the effort of saving that one life kills two lives? Opportunity
costs are costs too.
 
J

James Stroud

Steven said:
Well there you go -- why on earth would you prohibit None as a dictionary
key??? That's a serious failure.


roentgen 1% python
Python 2.5 (r25:51908, Sep 20 2006, 17:36:21)
[GCC 3.4.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
py> hash(None)
135543872


mbi136-176 98% /usr/bin/python
Python 2.5.1 (r251:54863, Feb 6 2009, 19:02:12)
[GCC 4.0.1 (Apple Inc. build 5465)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
py> hash(None)
2030240



That's why. Read the whole thread. You are one of the abrasive ones.
 
C

Chris Rebert

Steven said:
Well there you go -- why on earth would you prohibit None as a dictionary
key??? That's a serious failure.


roentgen 1% python
Python 2.5 (r25:51908, Sep 20 2006, 17:36:21) [GCC 3.4.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
py> hash(None)
135543872


mbi136-176 98% /usr/bin/python
Python 2.5.1 (r251:54863, Feb  6 2009, 19:02:12) [GCC 4.0.1 (Apple Inc.
build 5465)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
py> hash(None)
2030240

Actually, None is a special-case as a built-in singleton value --
there's only ever *exactly one* instance of it in a given interpreter
session. And I'm reasonably sure dict pickles don't store the hash
code of items (the dict gets recreated from scratch), so there's no
problem.

Cheers,
Chris
 
S

Steven D'Aprano

Steven said:
Well there you go -- why on earth would you prohibit None as a
dictionary key??? That's a serious failure.


roentgen 1% python
Python 2.5 (r25:51908, Sep 20 2006, 17:36:21) [GCC 3.4.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
py> hash(None)
135543872


mbi136-176 98% /usr/bin/python
Python 2.5.1 (r251:54863, Feb 6 2009, 19:02:12) [GCC 4.0.1 (Apple Inc.
build 5465)] on darwin Type "help", "copyright", "credits" or "license"
for more information. py> hash(None)
2030240



That's why. Read the whole thread. You are one of the abrasive ones.

I've read the whole thread. Pay close attention:

[steve@ando ~]$ python
Python 2.4.3 (#1, Mar 14 2007, 18:51:08)
[GCC 4.1.1 20070105 (Red Hat 4.1.1-52)] on linux2
Type "help", "copyright", "credits" or "license" for more information.[steve@ando ~]$ ssh sylar
steve@sylar's password:
Last login: Wed Aug 12 21:44:47 2009
[steve@sylar ~]$ python2.6
Python 2.6.1 (r261:67515, Dec 24 2008, 00:33:13)
[GCC 4.1.2 20070502 (Red Hat 4.1.2-12)] on linux2
Type "help", "copyright", "credits" or "license" for more information.{None: 42}


I have successfully pickled a dict using None as a key on one machine
using Python 2.4, and unpickled it on a completely different machine
running Python 2.6.

Still think that pickling None is a problem?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,578
Members
45,052
Latest member
LucyCarper

Latest Threads

Top