Hashes are only defined to operate on bytestrings. Since Python is a
high-level language and doesn't permit you to view the internal binary
representation of objects, you're going to have to properly convert
the object to a bytestring first, a process called "serialization".
The `pickle` and `json` serialization modules are included in the
standard library. These modules can convert objects to bytestrings and
back again.
Once you've done the bytestring conversion, just run the hash method
on the bytestring.
Be careful when serializing dictionaries and sets though, because they
are arbitrarily ordered, so two dictionaries containing the same items
and which compare equal may have a different internal ordering, thus
different serializations, and thus different hashes.
Cheers,
Chris
--http://blog.rebertia.com
I'd think that using the hash of the pickled representation of an
object might be problematic, no? The pickle protocol handles object
graphs in a way that allows it to preserve references back to
identical objects. Consider the following (contrived) example:
import pickle
from hashlib import md5
class Value(object):
def __init__(self, v):
self._v = v
class P1(object):
def __init__(self, name):
self.name = Value(name)
self.other_name = self.name
class P2(object):
def __init__(self, name):
self.name = Value(name)
self.other_name = Value(name)
h1 = md5(pickle.dumps(P1('sabres'))).hexdigest()
h2 = md5(pickle.dumps(P2('sabres'))).hexdigest()
print h1 == h2
Just something to be aware of. Depending on what you're trying to
accomplish, it may make sense to simply define a method which
generates a byte string representation of your object's state and just
return the hash of that value.
Thanks,
-Jeff
mcjeff.blogspot.com