Doing set operation on non-hashable objects

5lvqbwl02 · Dec 24, 2008

Hi,

I'm writing an application which is structured roughly as follows:

"db" is a dict, where the values are also dicts.
A function searches through db and returns a list of values, each of
which is a dict as described above.
I need to perform set operations on these lists (intersection and
union)
However the objects themselves are not hashable, and therefore can't
be in a set, because they are dicts.
I'm not sure how large each set will be, but the point is I'm trying
to avoid anything looking like an O(n^2) algorithm, so I can't just
use naive double-looping to check for intersection/union on a pair of
lists.

The only way I can think of to do this right is to hash the dicts by
freezing them, turning them all into tuples, which can then be
hashed. But this is a problem because more dicts might be nested
inside. At any rate, I'd need to thaw them out when I'm done and turn
them back into dicts, which seems complicated and annoying, and all
this leads me to believe there is a better way.

What I really need is a set of pointers, so at the end of the
operation I have the actual objects pointed to. Can I somehow use the
object IDs as set elements, then recreate a list with the actual
objects they point to?

How do you get the object back from an ID in python?

thanks
Michael

Gabriel Genellina · Dec 24, 2008

En Wed said:
I'm writing an application which is structured roughly as follows:

"db" is a dict, where the values are also dicts.
A function searches through db and returns a list of values, each of
which is a dict as described above.
I need to perform set operations on these lists (intersection and
union)
However the objects themselves are not hashable, and therefore can't
be in a set, because they are dicts.

See this thread from last week:
http://groups.google.com/group/comp.lang.python/t/d6818ff713bd4421

What I really need is a set of pointers, so at the end of the
operation I have the actual objects pointed to. Can I somehow use the
object IDs as set elements, then recreate a list with the actual
objects they point to?

If you *only* care about object identity, you might use a dictionary that
only compares by identity to anyone else:

class nc_dict(dict):
"A hashable dictionary that isn't equal to anyone else"

def __eq__(self, other):
return cmp(id(self),id(other))==0

def __hash__(self):
return hash(id(self))

d1 = nc_dict(a=1, b=2, c=3)
d2 = nc_dict(b=2, c=0, d=4)
d3 = nc_dict(a=1, c=3, e=5)
dd1 = nc_dict(x=d1, y=d2)
dd2 = nc_dict(x=d1, y=d3)
dd3 = nc_dict(y=d2, z=d3, w=d1)
l1 = [dd1, dd2]
l2 = [dd2, dd3]
s1 = set(l1)
s2 = set(l2)
print s1-s2
print s2-s1
print s1&s2

# instances of nc_dict with the same contents are NOT equal
d4 = nc_dict(a=1, b=2, c=3)
print d1, d4, d1==d4 # output: False

# but we can use this function to compare contents
def cmp_contents(d1, d2):
try: return cmp(sorted(d1.items()), sorted(d2.items()))
except Exception: return 1 # assume they're unequal

print cmp_contents(d1,d4)==0 # output: True

How do you get the object back from an ID in python?

You don't

Carl Banks · Dec 24, 2008

Hi,

I'm writing an application which is structured roughly as follows:

"db" is a dict, where the values are also dicts.
A function searches through db and returns a list of values, each of
which is a dict as described above.
I need to perform set operations on these lists (intersection and
union)
However the objects themselves are not hashable, and therefore can't
be in a set, because they are dicts.
I'm not sure how large each set will be, but the point is I'm trying
to avoid anything looking like an O(n^2) algorithm, so I can't just
use naive double-looping to check for intersection/union on a pair of
lists.

The only way I can think of to do this right is to hash the dicts by
freezing them, turning them all into tuples, which can then be
hashed. But this is a problem because more dicts might be nested
inside. At any rate, I'd need to thaw them out when I'm done and turn
them back into dicts, which seems complicated and annoying, and all
this leads me to believe there is a better way.

What I really need is a set of pointers, so at the end of the
operation I have the actual objects pointed to. Can I somehow use the
object IDs as set elements, then recreate a list with the actual
objects they point to?

How do you get the object back from an ID in python?

Quick and dirty way is to reference the dicts with a lightweight
hashable object that uses the dict's identity. For instance:

class Identity(object):
def __init__(self,obj):
self.obj = obj
def __hash__(self):
return hash(id(self.obj))
def __eq__(self,other):
return self.obj is other.obj

Then, instead of "s.add(d)" use "s.add(Identity(d))".

Instead of "d = s.pop()" use "d = s.pop().obj".

Instead of "d in s" use "Identity(d) in s".

And so on.

To do it a bit better, you could create an IdentitySet class that
subclasses set and wraps the methods so that they automatically apply
wrap and unwrap the arguments on their way in and out (I'd bet there's
already a cookbook recipe to do that).

Carl Banks

5lvqbwl02 · Dec 26, 2008

Quick and dirty way is to reference the dicts with a lightweight
hashable object that uses the dict's identity. For instance:

class Identity(object):
def __init__(self,obj):
self.obj = obj
def __hash__(self):
return hash(id(self.obj))
def __eq__(self,other):
return self.obj is other.obj

Then, instead of "s.add(d)" use "s.add(Identity(d))".

Instead of "d = s.pop()" use "d = s.pop().obj".

Instead of "d in s" use "Identity(d) in s".

And so on.

To do it a bit better, you could create an IdentitySet class that
subclasses set and wraps the methods so that they automatically apply
wrap and unwrap the arguments on their way in and out (I'd bet there's
already a cookbook recipe to do that).

Carl Banks

Thank you, I like this idea a lot. It's close to what I've been
thinking but a bit cleaner.

Michael

5lvqbwl02 · Dec 28, 2008

... "db" is a dict, where the values are also dicts.

Well, if the lists are ordered, you can do intersection and union
in O(n) time. If they are orderable, you can sort each first, giving
O(n log n). Note you can do lst.sort(key=id) which will sort on ids.

The lists are not ordered, since the elements of the list could be
arbitrarily complex things consisting of resistors, capacitors, sub-
circuits, etc. I'm trying do a little EDA program, taking the best
parts from the different EDA/cad tools I've used. I finally came up
with an idea using dictionaries in a certain way, so I'd like to be
able to do set operators on arbitrary things that may themselves not
be hashable.

You don't; you remember the mapping in a dictionary and look it up.

Exactly. A mapping between the ID and the thing itself.

If there were a way to go from id to object, the whole idea of garbage
collection and reference counts would fly out the window, leading to
nasty crashes (or you might get to an object that is the re-used id of
an older object).

Yup, this makes perfect sense. It was rattling around in my head for
a bit afer I wrote the original post, then this makes sense.

5lvqbwl02 · Dec 28, 2008

En Wed said:
En Wed said:

I'm writing an application which is structured roughly as follows:

Click to expand...

"db" is a dict, where the values are also dicts.
A function searches through db and returns a list of values, each of
which is a dict as described above.
I need to perform set operations on these lists (intersection and
union)
However the objects themselves are not hashable, and therefore can't
be in a set, because they are dicts.

Click to expand...

If you *only* care about object identity, you might use a dictionary that
only compares by identity to anyone else:

class nc_dict(dict):
"A hashable dictionary that isn't equal to anyone else"

def __eq__(self, other):
return cmp(id(self),id(other))==0

def __hash__(self):
return hash(id(self))

d1 = nc_dict(a=1, b=2, c=3)
d2 = nc_dict(b=2, c=0, d=4)
d3 = nc_dict(a=1, c=3, e=5)
dd1 = nc_dict(x=d1, y=d2)
dd2 = nc_dict(x=d1, y=d3)
dd3 = nc_dict(y=d2, z=d3, w=d1)
l1 = [dd1, dd2]
l2 = [dd2, dd3]
s1 = set(l1)
s2 = set(l2)
print s1-s2
print s2-s1
print s1&s2

# instances of nc_dict with the same contents are NOT equal
d4 = nc_dict(a=1, b=2, c=3)
print d1, d4, d1==d4 # output: False

# but we can use this function to compare contents
def cmp_contents(d1, d2):
try: return cmp(sorted(d1.items()), sorted(d2.items()))
except Exception: return 1 # assume they're unequal

print cmp_contents(d1,d4)==0 # output: True

Good idea. I'll try this out. I don't think it's likely that I'll
have a case where the dicts have identical values, but different
identities. And if they do I probably don't care too much.

Thanks

confusing doc: mutable and hashable	1	Apr 28, 2012
Non hashable object (without __dict__)	0	Apr 20, 2011
Non-Programmer Needs Help With Simple Program	3	Dec 13, 2024
Doing a HTTP DELETE operation with urllib2?	5	Dec 30, 2011
Productivity Power Tools 2017/2019 extension stops copy operation in Visual Studio 2017.	1	Feb 13, 2022
generating unique set of dicts from a list of dicts	0	Jan 10, 2012
Javascript set language function issue	2	Nov 24, 2024
Hashable object with self references OR how to create a tuple thatrefers to itself	0	Jun 15, 2012

Doing set operation on non-hashable objects

5lvqbwl02

Gabriel Genellina

Carl Banks

5lvqbwl02

5lvqbwl02

5lvqbwl02

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads