Doing set operation on non-hashable objects

Discussion in 'Python' started by 5lvqbwl02@sneakemail.com, Dec 24, 2008.

  1. Guest

    Hi,

    I'm writing an application which is structured roughly as follows:

    "db" is a dict, where the values are also dicts.
    A function searches through db and returns a list of values, each of
    which is a dict as described above.
    I need to perform set operations on these lists (intersection and
    union)
    However the objects themselves are not hashable, and therefore can't
    be in a set, because they are dicts.
    I'm not sure how large each set will be, but the point is I'm trying
    to avoid anything looking like an O(n^2) algorithm, so I can't just
    use naive double-looping to check for intersection/union on a pair of
    lists.

    The only way I can think of to do this right is to hash the dicts by
    freezing them, turning them all into tuples, which can then be
    hashed. But this is a problem because more dicts might be nested
    inside. At any rate, I'd need to thaw them out when I'm done and turn
    them back into dicts, which seems complicated and annoying, and all
    this leads me to believe there is a better way.

    What I really need is a set of pointers, so at the end of the
    operation I have the actual objects pointed to. Can I somehow use the
    object IDs as set elements, then recreate a list with the actual
    objects they point to?

    How do you get the object back from an ID in python?

    thanks
    Michael
     
    , Dec 24, 2008
    #1
    1. Advertising

  2. En Wed, 24 Dec 2008 17:16:59 -0200, <> escribió:

    > I'm writing an application which is structured roughly as follows:
    >
    > "db" is a dict, where the values are also dicts.
    > A function searches through db and returns a list of values, each of
    > which is a dict as described above.
    > I need to perform set operations on these lists (intersection and
    > union)
    > However the objects themselves are not hashable, and therefore can't
    > be in a set, because they are dicts.


    See this thread from last week:
    http://groups.google.com/group/comp.lang.python/t/d6818ff713bd4421

    > What I really need is a set of pointers, so at the end of the
    > operation I have the actual objects pointed to. Can I somehow use the
    > object IDs as set elements, then recreate a list with the actual
    > objects they point to?


    If you *only* care about object identity, you might use a dictionary that
    only compares by identity to anyone else:

    class nc_dict(dict):
    "A hashable dictionary that isn't equal to anyone else"

    def __eq__(self, other):
    return cmp(id(self),id(other))==0

    def __hash__(self):
    return hash(id(self))

    d1 = nc_dict(a=1, b=2, c=3)
    d2 = nc_dict(b=2, c=0, d=4)
    d3 = nc_dict(a=1, c=3, e=5)
    dd1 = nc_dict(x=d1, y=d2)
    dd2 = nc_dict(x=d1, y=d3)
    dd3 = nc_dict(y=d2, z=d3, w=d1)
    l1 = [dd1, dd2]
    l2 = [dd2, dd3]
    s1 = set(l1)
    s2 = set(l2)
    print s1-s2
    print s2-s1
    print s1&s2

    # instances of nc_dict with the same contents are NOT equal
    d4 = nc_dict(a=1, b=2, c=3)
    print d1, d4, d1==d4 # output: False

    # but we can use this function to compare contents
    def cmp_contents(d1, d2):
    try: return cmp(sorted(d1.items()), sorted(d2.items()))
    except Exception: return 1 # assume they're unequal

    print cmp_contents(d1,d4)==0 # output: True

    > How do you get the object back from an ID in python?


    You don't :)

    --
    Gabriel Genellina
     
    Gabriel Genellina, Dec 24, 2008
    #2
    1. Advertising

  3. Carl Banks Guest

    On Dec 24, 1:16 pm, wrote:
    > Hi,
    >
    > I'm writing an application which is structured roughly as follows:
    >
    > "db" is a dict, where the values are also dicts.
    > A function searches through db and returns a list of values, each of
    > which is a dict as described above.
    > I need to perform set operations on these lists (intersection and
    > union)
    > However the objects themselves are not hashable, and therefore can't
    > be in a set, because they are dicts.
    > I'm not sure how large each set will be, but the point is I'm trying
    > to avoid anything looking like an O(n^2) algorithm, so I can't just
    > use naive double-looping to check for intersection/union on a pair of
    > lists.
    >
    > The only way I can think of to do this right is to hash the dicts by
    > freezing them, turning them all into tuples, which can then be
    > hashed.  But this is a problem because more dicts might be nested
    > inside.  At any rate, I'd need to thaw them out when I'm done and turn
    > them back into dicts, which seems complicated and annoying, and all
    > this leads me to believe there is a better way.
    >
    > What I really need is a set of pointers, so at the end of the
    > operation I have the actual objects pointed to.  Can I somehow use the
    > object IDs as set elements, then recreate a list with the actual
    > objects they point to?
    >
    > How do you get the object back from an ID in python?


    Quick and dirty way is to reference the dicts with a lightweight
    hashable object that uses the dict's identity. For instance:


    class Identity(object):
    def __init__(self,obj):
    self.obj = obj
    def __hash__(self):
    return hash(id(self.obj))
    def __eq__(self,other):
    return self.obj is other.obj


    Then, instead of "s.add(d)" use "s.add(Identity(d))".

    Instead of "d = s.pop()" use "d = s.pop().obj".

    Instead of "d in s" use "Identity(d) in s".

    And so on.


    To do it a bit better, you could create an IdentitySet class that
    subclasses set and wraps the methods so that they automatically apply
    wrap and unwrap the arguments on their way in and out (I'd bet there's
    already a cookbook recipe to do that).


    Carl Banks
     
    Carl Banks, Dec 24, 2008
    #3
  4. Guest

    On Dec 24, 12:52 pm, Carl Banks <> wrote:
    > On Dec 24, 1:16 pm, wrote:
    >
    >
    >
    >
    >
    > > Hi,

    >
    > > I'm writing an application which is structured roughly as follows:

    >
    > > "db" is a dict, where the values are also dicts.
    > > A function searches through db and returns a list of values, each of
    > > which is a dict as described above.
    > > I need to perform set operations on these lists (intersection and
    > > union)
    > > However the objects themselves are not hashable, and therefore can't
    > > be in a set, because they are dicts.
    > > I'm not sure how large each set will be, but the point is I'm trying
    > > to avoid anything looking like an O(n^2) algorithm, so I can't just
    > > use naive double-looping to check for intersection/union on a pair of
    > > lists.

    >
    > > The only way I can think of to do this right is to hash the dicts by
    > > freezing them, turning them all into tuples, which can then be
    > > hashed.  But this is a problem because more dicts might be nested
    > > inside.  At any rate, I'd need to thaw them out when I'm done and turn
    > > them back into dicts, which seems complicated and annoying, and all
    > > this leads me to believe there is a better way.

    >
    > > What I really need is a set of pointers, so at the end of the
    > > operation I have the actual objects pointed to.  Can I somehow use the
    > > object IDs as set elements, then recreate a list with the actual
    > > objects they point to?

    >
    > > How do you get the object back from an ID in python?

    >
    > Quick and dirty way is to reference the dicts with a lightweight
    > hashable object that uses the dict's identity.  For instance:
    >
    > class Identity(object):
    >     def __init__(self,obj):
    >         self.obj = obj
    >     def __hash__(self):
    >         return hash(id(self.obj))
    >     def __eq__(self,other):
    >         return self.obj is other.obj
    >
    > Then, instead of "s.add(d)" use "s.add(Identity(d))".
    >
    > Instead of "d = s.pop()" use "d = s.pop().obj".
    >
    > Instead of "d in s" use "Identity(d) in s".
    >
    > And so on.
    >
    > To do it a bit better, you could create an IdentitySet class that
    > subclasses set and wraps the methods so that they automatically apply
    > wrap and unwrap the arguments on their way in and out (I'd bet there's
    > already a cookbook recipe to do that).
    >
    > Carl Banks


    Thank you, I like this idea a lot. It's close to what I've been
    thinking but a bit cleaner.

    Michael
     
    , Dec 27, 2008
    #4
  5. Guest

    > > ... "db" is a dict, where the values are also dicts.
    > > A function searches through db and returns a list of values, each of
    > > which is a dict as described above.
    > > I need to perform set operations on these lists (intersection and
    > > union)
    > > However the objects themselves are not hashable, and therefore can't
    > > be in a set, because they are dicts.
    > > I'm not sure how large each set will be, but the point is I'm trying
    > > to avoid anything looking like an O(n^2) algorithm, so I can't just
    > > use naive double-looping to check for intersection/union on a pair of
    > > lists.

    >
    > Well, if the lists are ordered, you can do intersection and union
    > in O(n) time.  If they are orderable, you can sort each first, giving
    > O(n log n).  Note you can do lst.sort(key=id) which will sort on ids.


    The lists are not ordered, since the elements of the list could be
    arbitrarily complex things consisting of resistors, capacitors, sub-
    circuits, etc. I'm trying do a little EDA program, taking the best
    parts from the different EDA/cad tools I've used. I finally came up
    with an idea using dictionaries in a certain way, so I'd like to be
    able to do set operators on arbitrary things that may themselves not
    be hashable.

    > > How do you get the object back from an ID in python?

    >
    > You don't; you remember the mapping in a dictionary and look it up.


    Exactly. A mapping between the ID and the thing itself.

    > If there were a way to go from id to object, the whole idea of garbage
    > collection and reference counts would fly out the window, leading to
    > nasty crashes (or you might get to an object that is the re-used id of
    > an older object).


    Yup, this makes perfect sense. It was rattling around in my head for
    a bit afer I wrote the original post, then this makes sense.

    >
     
    , Dec 28, 2008
    #5
  6. Guest

    On Dec 24, 12:21 pm, "Gabriel Genellina" <>
    wrote:
    > En Wed, 24 Dec 2008 17:16:59 -0200, <> escribió:
    >
    > > I'm writing an application which is structured roughly as follows:

    >
    > > "db" is a dict, where the values are also dicts.
    > > A function searches through db and returns a list of values, each of
    > > which is a dict as described above.
    > > I need to perform set operations on these lists (intersection and
    > > union)
    > > However the objects themselves are not hashable, and therefore can't
    > > be in a set, because they are dicts.

    >
    >
    > If you *only* care about object identity, you might use a dictionary that  
    > only compares by identity to anyone else:
    >
    > class nc_dict(dict):
    >    "A hashable dictionary that isn't equal to anyone else"
    >
    >    def __eq__(self, other):
    >      return cmp(id(self),id(other))==0
    >
    >    def __hash__(self):
    >      return hash(id(self))
    >
    > d1 = nc_dict(a=1, b=2, c=3)
    > d2 = nc_dict(b=2, c=0, d=4)
    > d3 = nc_dict(a=1, c=3, e=5)
    > dd1 = nc_dict(x=d1, y=d2)
    > dd2 = nc_dict(x=d1, y=d3)
    > dd3 = nc_dict(y=d2, z=d3, w=d1)
    > l1 = [dd1, dd2]
    > l2 = [dd2, dd3]
    > s1 = set(l1)
    > s2 = set(l2)
    > print s1-s2
    > print s2-s1
    > print s1&s2
    >
    > # instances of nc_dict with the same contents are NOT equal
    > d4 = nc_dict(a=1, b=2, c=3)
    > print d1, d4, d1==d4  # output: False
    >
    > # but we can use this function to compare contents
    > def cmp_contents(d1, d2):
    >      try: return cmp(sorted(d1.items()), sorted(d2.items()))
    >      except Exception: return 1 # assume they're unequal
    >
    > print cmp_contents(d1,d4)==0 # output: True


    Good idea. I'll try this out. I don't think it's likely that I'll
    have a case where the dicts have identical values, but different
    identities. And if they do I probably don't care too much.

    Thanks
     
    , Dec 28, 2008
    #6
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Erik Max Francis
    Replies:
    10
    Views:
    541
  2. Bengt Richter
    Replies:
    2
    Views:
    300
    Jp Calderone
    Jul 21, 2003
  3. Gerrit Holl

    xrange not hashable - why not?

    Gerrit Holl, Jan 25, 2004, in forum: Python
    Replies:
    6
    Views:
    332
    Peter Otten
    Jan 25, 2004
  4. Andy

    List objects are un-hashable

    Andy, Apr 27, 2007, in forum: Python
    Replies:
    7
    Views:
    277
  5. Arthur Mc Coy

    Non hashable object (without __dict__)

    Arthur Mc Coy, Apr 20, 2011, in forum: Python
    Replies:
    0
    Views:
    245
    Arthur Mc Coy
    Apr 20, 2011
Loading...

Share This Page