set using alternative hash function?


A

Austin Bingham

If I understand things correctly, the set class uses hash()
universally to calculate hash values for its elements. Is there a
standard way to have set use a different function? Say I've got a
collection of objects with names. I'd like to create a set of these
objects where the hashing is done on these names. Using the __hash__
function seems inelegant because it means I have to settle on one type
of hashing for these objects all of the time, i.e. I can't create a
set of them based on a different uniqueness criteria later. I'd like
to create a set instance that uses, say, 'hash(x.name)' rather than
'hash(x)'.

Is this possible? Am I just thinking about this problem the wrong way?
Admittedly, I'm coming at this from a C++/STL perspective, so perhaps
I'm just missing the obvious. Thanks for any help on this.

Austin Bingham
 
Ad

Advertisements

D

Diez B. Roggisch

Austin said:
If I understand things correctly, the set class uses hash()
universally to calculate hash values for its elements. Is there a
standard way to have set use a different function? Say I've got a
collection of objects with names. I'd like to create a set of these
objects where the hashing is done on these names. Using the __hash__
function seems inelegant because it means I have to settle on one type
of hashing for these objects all of the time, i.e. I can't create a
set of them based on a different uniqueness criteria later. I'd like
to create a set instance that uses, say, 'hash(x.name)' rather than
'hash(x)'.

Is this possible? Am I just thinking about this problem the wrong way?
Admittedly, I'm coming at this from a C++/STL perspective, so perhaps
I'm just missing the obvious. Thanks for any help on this.


This is a principal problem in OO - behavior shall not only change based on
the object, but also the context it's used in.

I think the decorator-pattern is the best thing here (in lieu of python
having a hash-argument to sets). Please don't forget to re-define equality
also!

class OtherCriteriaDecorator(object):


def __init__(self, delegate):
self._delegate = delegate

def __hash__(self):
return hash_based_on_delegate


def __eq__(self, other):
return equality_based_on_delegate



HTH,

Diez
 
A

Austin Bingham

I guess we see things differently. I think it's quite natural to want
a set of unique objects where "unique" is defined as an operation on
some subset/conflation/etc. of the attributes of the elements. That's
all that the regular set class is, except that it always uses the
hash() function to calculate uniqueness. In any event, the notions of
set and uniqueness I'm using are well established in other languages,
so I don't see why it couldn't be made to work in python.

As far as using a dict, that doesn't really solve my problem. It could
be part of a solution, I guess, but I would still need functionality
extrinsic to the dict. What I want is to make sure that no values in
my set have the same name, and dict won't guarantee that for me. A set
that calculated uniqueness based on its elements' names, on the other
hand, would.

Austin
 
M

Mick Krippendorf

Austin said:
I guess we see things differently. I think it's quite natural to want
a set of unique objects where "unique" is defined as an operation on
some subset/conflation/etc. of the attributes of the elements.

What you seem to imply is that the hash function imposes some kind of
uniqueness constraint on the set which uses it. That's just not the
case, the uniqueness constraint is always the (in-)equality of objects,
and for this you can override __eq__. But then you must also ensure that
x.__eq__(y) --> y.__eq__(x) & x.__hash() == y.__hash__().

Diez' solution is the pythonic way, and BTW, mathematically and
pythonically sound, wheras, if the hash function would really impose
uniqueness:

s1 = set(lambda x: hash(x))
s2 = set(lambda x: hash(x.name))

t1 = Buddy(name="jim")
t2 = Buddy(name="joe")
t3 = Buddy(name="joe")

s1.add(t1)
s1.add(t2)
s1.add(t3)

s2.add(t1)
s2.add(t2)
s2.add(t3)

would probaly yield quite strange results:

s3 = s2 | s1 # union. Is s3 == (t1,t2,t3) or is it (t1,t3)?
s4 = s2 - s1 # difference. is s4 == () or do we get an Exception?

In C++/STL the compiler would nag because s1 and s2 are of different
types since the hash function is part of the set's type (AFAIR). With
dynamic typing this isn't possible so you'd have to wait until runtime.

I'd go with Diez' advice and use a dict instead. Or, you could do
something Borg-ish (lookup Borg Pattern) restrained on names, where the
constructor of a class ensures that x.name == y.name --> x == y.
As far as using a dict, that doesn't really solve my problem. It could
be part of a solution, I guess, but I would still need functionality
extrinsic to the dict.

Python isn't strong on encapsulation, so "extrinsic" functionality is
quite common. Nothing to loose sleep over though, because, as Guido
says, "we're all consenting adults here" :)

Mick.
 
A

Anthony Tolle

[snip] I'd like to create a set of these
objects where the hashing is done on these names. [snip]

Why not use a dict? The key would be the object name. Pretty much
the same behavior as a set (via the key), and you can still easily
iterate over the objects.
 
A

Austin Bingham

Austin Bingham schrieb:
What you seem to imply is that the hash function imposes some kind of
uniqueness constraint on the set which uses it. That's just not the
case, the uniqueness constraint is always the (in-)equality of objects,
and for this you can override __eq__. But then you must also ensure that
x.__eq__(y) --> y.__eq__(x) & x.__hash() == y.__hash__().

Yes, as was pointed out earlier, I was reading too much into the hash
function's role. However, given well behaved hash and equality (which
would be a great name for a band), I don't see why those functions
need to be defined on the object itself, per se. Right now that's the
case because hash() only knows how to call obj.__hash__ (etc. for
__eq__).

I guess a good analog for what I'm thinking about is list.sort(). It's
more than happy to take a comparison operator, and in spirit that's
exactly what I'm looking for with sets.
Diez' solution is the pythonic way, and BTW, mathematically and
pythonically sound, wheras, if the hash function would really impose
uniqueness: . . .

Yes, my gray c++ roots are showing here; you're right that my brain
was secretly expecting the "compiler" to take care of things. Your
points about set operations are the strongest in this discussion, and
I haven't got a great answer.
Python isn't strong on encapsulation, so "extrinsic" functionality is
quite common.

I guess that's part of what's frustrating for me on this issue. Python
is generally extremely flexible and open, but in this case I feel like
I'm being shut out. Oh well...in the end it looks like there are ways
to get what I need, if not what I want. Thanks for the input.

Austin
 
Ad

Advertisements

A

Austin Bingham

Why not use a dict?  The key would be the object name.  Pretty much
the same behavior as a set (via the key), and you can still easily
iterate over the objects.

To reiterate, dict only gets me part of what I want. Whereas a set
with uniqueness defined over 'obj.name' would guarantee no name
collisions, dict only sorta helps me keep things straight; it doesn't
actually enforce that my values have unique names.

Austin
 
A

Anthony Tolle

On Thu, Oct 15, 2009 at 4:06 PM, Anthony Tolle > To reiterate, dict only gets me part of what I want. Whereas a set
with uniqueness defined over 'obj.name' would guarantee no name
collisions, dict only sorta helps me keep things straight; it doesn't
actually enforce that my values have unique names.

I don't understand how a set would help you enforce that your values
ave unique names. If you have two objects, both named "foo", and
added them to the set, no errors would occur.

Please provide an example of how a set provides functionality that a
dict (which enforces unique keys) doesn't.
 
G

Gabriel Genellina

En Thu, 15 Oct 2009 11:42:20 -0300, Austin Bingham
To reiterate, dict only gets me part of what I want. Whereas a set
with uniqueness defined over 'obj.name' would guarantee no name
collisions, dict only sorta helps me keep things straight; it doesn't
actually enforce that my values have unique names.

I think you didn't understand correctly Anthony Tolle's suggestion:

py> class Foo:
.... def __init__(self, name): self.name = name
....
py> objs = [Foo('Joe'), Foo('Jim'), Foo('Tom'), Foo('Jim')]
py> objs
[<__main__.Foo instance at 0x00BB8238>,
<__main__.Foo instance at 0x00BB83C8>,
<__main__.Foo instance at 0x00BB7B20>,
<__main__.Foo instance at 0x00BB7DC8>]
py> d = dict((o.name, o) for o in objs)
py> d.keys()
['Jim', 'Joe', 'Tom']
py> d.values()
[<__main__.Foo instance at 0x00BB7DC8>,
<__main__.Foo instance at 0x00BB8238>,
<__main__.Foo instance at 0x00BB7B20>]

keys in a dict are unique; values form the set you're looking for. If it's
more complex that just a name, define a function:

def my_funny_key_extractor(item):
return some(operations[on](item))

d = dict((my_funny_key_extractor(o), o) for o in objs)

If you feel so inclined, you could implement the MutableSet ABC based on
this and have a full featured set implementation.
 
A

Austin Bingham

En Thu, 15 Oct 2009 11:42:20 -0300, Austin Bingham
<[email protected]> escribió:
I think you didn't understand correctly Anthony Tolle's suggestion:

py> class Foo:
...   def __init__(self, name): self.name = name
...
py> objs = [Foo('Joe'), Foo('Jim'), Foo('Tom'), Foo('Jim')]
py> objs

I understand Anthony perfectly. Yes, I can construct a dict as you
specify, where all of the keys map to values with name attributes
equal to the key. My point is that dict doesn't really help me enforce
that beyond simply letting me set it up; it doesn't care about the
values at all, just the keys. All that I'm asking for, and I think
it's been made pretty clear, is a set that let's me define a
uniqueness criteria function other than hash(). As has been thrashed
out already, this is not as straightforward as I might have liked.

To put it in code, I want this:

s = set(hash_func = lambda obj: hash(obj.name), eq_func = ...)
...
x.name = 'foo'
y.name = 'foo'
s.add(x)
s.add(y) # no-op because of uniqueness criteria
assert len(s) == 1

Austin
 
R

Rami Chowdhury

En Thu, 15 Oct 2009 11:42:20 -0300, Austin Bingham
<[email protected]> escribió:
I think you didn't understand correctly Anthony Tolle's suggestion:

py> class Foo:
...   def __init__(self, name): self.name = name
...
py> objs = [Foo('Joe'), Foo('Jim'), Foo('Tom'), Foo('Jim')]
py> objs

I understand Anthony perfectly. Yes, I can construct a dict as you
specify, where all of the keys map to values with name attributes
equal to the key. My point is that dict doesn't really help me enforce
that beyond simply letting me set it up; it doesn't care about the
values at all, just the keys.

Perhaps this is an overly naive solution, but could you not define a class
that implemented the set interface but used a dict for internal storage,
and use that? You'd still have uniqueness (by dict key, which your class
would define as object name) and as a bonus, retrievability by name, which
set wouldn't give you.
 
Ad

Advertisements

A

Anthony Tolle

To put it in code, I want this:

  s = set(hash_func = lambda obj: hash(obj.name), eq_func = ...)
  ...
  x.name = 'foo'
  y.name = 'foo'
  s.add(x)
  s.add(y) # no-op because of uniqueness criteria
  assert len(s) == 1

I wrote a quick subclass of set that does something similar, but uses
just one function for the object uniqueness:

class MySet(set):
def __init__(self, iterable = (), idfunc = lambda x: x):
self.idfunc = idfunc
self.ids = set()
for item in iterable:
self.add(item)

def add(self, item):
id = self.idfunc(item)
if id not in self.ids:
self.ids.add(id)
set.add(self, item)
.... def __init__(self, name):
.... self.name = name
....
x = Foo('foo')
y = Foo('foo')
s = MySet(idfunc = lambda x: x.name)
s.add(x)
s
MySet([ said:
s.add(y)
s
MySet([<__main__.Foo object at 0x00A89F90>])

Is this what you are looking for?
 
E

Ethan Furman

Austin said:
Yes, I can construct a dict as you
specify, where all of the keys map to values with name attributes
equal to the key. My point is that dict doesn't really help me enforce
that beyond simply letting me set it up; it doesn't care about the
values at all, just the keys. All that I'm asking for, and I think
it's been made pretty clear, is a set that let's me define a
uniqueness criteria function other than hash(). As has been thrashed
out already, this is not as straightforward as I might have liked.

To put it in code, I want this:

s = set(hash_func = lambda obj: hash(obj.name), eq_func = ...)
...
x.name = 'foo'
y.name = 'foo'
s.add(x)
s.add(y) # no-op because of uniqueness criteria
assert len(s) == 1

Austin


I'm feeling really dense about now... What am I missing?

class obj(object):
def __init__(self, id, value):
self.id = id
self.value = value
def __eq__(self, other):
if isinstance(other, obj):
return self.id == other.id
raise NotImplemented
def __hash__(self):
return hash(self.id)
def __repr__(self):
return "obj(%s, %s)" % (self.id, self.value)

Python 2.5.4 (r254:67916, Dec 23 2008, 15:10:54) [MSC v.1310 32 bit
(Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
--> from obj import obj
--> s1 = set()
--> d1 = dict()
--> o1 = obj(17, 'first')
--> o2 = obj(18, 'second')
--> o3 = obj(17, 'third')
--> o1, o2, o3
(obj(17, first), obj(18, second), obj(17, third))
--> s1.add(o1)
--> s1.add(o2)
--> s1.add(o3)
--> d1[o1.id] = o1
--> d1[o2.id] = o2
--> d1[o3.id] = o3
--> s1
set([obj(17, first), obj(18, second)])
--> d1
{17: obj(17, third), 18: obj(18, second)}
-->

Ah ha! No-op indeed! If a set already has an item, the new item is
completely ignored, whereas if a dict already has a key, the new key's
value replaces the current value already in the dict.

Learn something new everyday!

I'm still not sure I understand your concern about the values in a set,
though. Sets keep the first object of a given key, dicts keep the last
object of a given key; in both cases, all other objects with the same
key are lost.

So is that the behavior you're wanting, keeping the first object and
discarding all others? Or is there something else I'm still missing?

~Ethan~
 
A

Anthony Tolle

I'm still not sure I understand your concern about the values in a set,
though.  Sets keep the first object of a given key, dicts keep the last
object of a given key; in both cases, all other objects with the same
key are lost.

So is that the behavior you're wanting, keeping the first object and
discarding all others?  Or is there something else I'm still missing?

I think that without a practical example of what this would be used
for, we're all going to be a little lost on this one.

So far, we've not seen the original problem, only the author's
preferred method for solving it. My guess is there are other, more
pythonic ways to solve the original problem.
 
A

Austin Bingham

I wrote a quick subclass of set that does something similar, but uses
just one function for the object uniqueness:

class MySet(set):
   def __init__(self, iterable = (), idfunc = lambda x: x):
       self.idfunc = idfunc
       self.ids = set()
       for item in iterable:
           self.add(item)

   def add(self, item):
       id = self.idfunc(item)
       if id not in self.ids:
           self.ids.add(id)
           set.add(self, item)

Yes, what you've got there provides the interface of what I want. And
no doubt we could concoct countless ways to implement some version of
this. However, if set itself accepted an alternate hash function, then
I could do it with no extra overhead. Consider your implementation: it
requires an extra set (extra space) and an extra lookup on many
operations (extra time.) My original hope was to not have to do that.

Austin
 
A

Austin Bingham

Austin Bingham wrote:
I'm feeling really dense about now... What am I missing?

What you're missing is the entire discussion up to this point. I was
looking for a way to use an alternative uniqueness criteria in a set
instance without needing to modify my class.
So is that the behavior you're wanting, keeping the first object and
discarding all others?  Or is there something else I'm still missing?

Yes and yes. I want "normal" set behavior, but I want the set to use
user-provided hash and equality tests, i.e. ones that don't
necessarily call __hash__ and __eq__ on the candidate elements.

Austin
 
Ad

Advertisements

A

Austin Bingham

I think that without a practical example of what this would be used
for, we're all going to be a little lost on this one.

So far, we've not seen the original problem, only the author's
preferred method for solving it.  My guess is there are other, more
pythonic ways to solve the original problem.

The original problem was just a question statement: can I use
alternative uniqueness functions on a set? Indeed, I proposed an idea,
which that was sets could be constructed with user-defined hash and
equality functions, the strengths and weaknesses of which have been
gone over in some detail. The short answer is that what I was looking
for (admittedly, a desire motivated by experiences in other languages)
is not really feasible, at least not without a great deal more work.

The other proposed solutions all require linearly extra storage,
linearly extra time, both, or simply don't solve the problem. And in
any event, they miss the point of the original post which was not "How
can I get a particular behavior?" (which is fairly trivial, both in my
own estimation and as evidenced by the abundance of proposals) but
"Can I get a particular behavior in a particular way?" (the answer to
which, again, seems to be no.)

Austin
 
D

Dave Angel

Austin said:
What you're missing is the entire discussion up to this point. I was
looking for a way to use an alternative uniqueness criteria in a set
instance without needing to modify my class.



Yes and yes. I want "normal" set behavior, but I want the set to use
user-provided hash and equality tests, i.e. ones that don't
necessarily call __hash__ and __eq__ on the candidate elements.

Austin
Your original post (10/15) never mentioned the __eq__() method, and
people went off on a tangent discussing hash functions. But from what
you've been saying more recently, you want a set-like collection whose
behavior isn't related to the standard comparison function.

The standard set is intimately tied to ==, in the sense that if two
objects meet the a==b relationship, then only one will be in the set.
And it doesn't matter which one, since they're equal.

You want a new collection that works differently. Great. It shouldn't
be that hard to write, but if you try to base it on existing set you'll
need a second wrapper object. If you try to base it on dict, you'll
need a second key object. You've said you don't want any extra objects
to be needed, so you'll need custom code.

A couple of dozen lines should do for the basic collection (add, clear,
remove). But if you also want difference, symmetric_difference,
intersection_update, issubset ... it could grow to several dozen. And
you'd have to make some interesting decisions, once you have two
pseudo-sets with possibly different comparison operators.

The only downside I can see is that apparently the built-in set is
written in C, and if you're implementing this in pure python, it could
be a lot slower. On the other hand, in many cases, the comparison
function-object might be the time-critical piece, in which case you coud
be close.

DaveA
 
E

Ethan Furman

Austin said:
What you're missing is the entire discussion up to this point. I was
looking for a way to use an alternative uniqueness criteria in a set
instance without needing to modify my class.

Really? Is that what you wanted? You know, I would never have guessed
that from

But hey, perhaps my reading comprehension was not at its best yesterday.

Hmmm, actually, I think I did get that. What I missed is why you care
which object stays in your precious little set, as long as you have one?
If it doesn't matter, use a dict and stop wasting our time. If it
does, tell us why and assuage our curiousity.
Yes and yes. I want "normal" set behavior, but I want the set to use
user-provided hash and equality tests, i.e. ones that don't
necessarily call __hash__ and __eq__ on the candidate elements.

Austin

As for what you want: No, it's not currently possible. If it's so big
a deal that the various methods presented don't meet with your approval,
break out the C and write your own. Then you could give that back to
the community instead of your snide remarks.

~Ethan~
 
Ad

Advertisements

A

Anthony Tolle

[snip]
As for what you want:  No, it's not currently possible.  If it's so big
a deal that the various methods presented don't meet with your approval,
break out the C and write your own.  Then you could give that back to
the community instead of your snide remarks.

~Ethan~

I didn't get the impression that Austin was being snide. Instead, I
get the impression that they are someone from a different programming
background (C++/STL) who has not had sufficient exposure to the Python
way of doing things. I believe Austin is genuinely curious as to why
Python may not implement features found in another programming
environment.

Coming from a C/C++ background myself, it took me a while to un-learn
certain idioms. I still find myself "thinking in C" on occasion, only
to find a more elegant way to accomplish the task.
 
Ad

Advertisements


Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top