Re: set using alternative hash function?

Discussion in 'Python' started by Austin Bingham, Oct 15, 2009.

  1. That's definitely a workable solution, but it still rubs me the wrong
    way. The uniqueness criteria of a set seems, to me, like a property of
    the set, whereas the python model forces it onto each set element.

    Another issue I have with the HashWrapper approach is its space
    requirements. Logically, what I'm asking to do is switch out a single
    function reference (i.e. to point at get_name() rather than hash()),
    but in practice I'm forced to instantiate a new object for each of my
    set members. On a large set, this could be disastrous.

    Don't get me wrong...your solution is a good one, but it's just not
    what I am looking for.

    Austin

    On Thu, Oct 15, 2009 at 1:36 PM, Chris Rebert <> wrote:
    > On Thu, Oct 15, 2009 at 4:24 AM, Austin Bingham
    > <> wrote:
    >> If I understand things correctly, the set class uses hash()
    >> universally to calculate hash values for its elements. Is there a
    >> standard way to have set use a different function? Say I've got a
    >> collection of objects with names. I'd like to create a set of these
    >> objects where the hashing is done on these names. Using the __hash__
    >> function seems inelegant because it means I have to settle on one type
    >> of hashing for these objects all of the time, i.e. I can't create a
    >> set of them based on a different uniqueness criteria later. I'd like
    >> to create a set instance that uses, say, 'hash(x.name)' rather than
    >> 'hash(x)'.
    >>
    >> Is this possible? Am I just thinking about this problem the wrong way?
    >> Admittedly, I'm coming at this from a C++/STL perspective, so perhaps
    >> I'm just missing the obvious. Thanks for any help on this.

    >
    > You could use wrapper objects that define an appropriate __hash__():
    >
    > #*completely untested*
    > class HashWrapper(object):
    >    def __init__(self, obj, criteria):
    >        self._wrapee = obj
    >        self._criteria = criteria
    >
    >    #override __hash__() / hash()
    >    def __hash__(self):
    >        return hash(self._criteria(self._wrapee))
    >
    >    #proxying code
    >    def __getattr__(self, name):
    >        return getattr(self._wrapee, name)
    >
    >    def __setattr__(self, name, val):
    >        setattr(self._wrapee, name, val)
    >
    > #example
    > def name_of(obj):
    >    return obj.name
    >
    > def name_and_serial_num(obj):
    >    return obj.name, obj.serial_number
    >
    > no_same_names = set(HashWrapper(obj, name_of) for obj in some_collection)
    > no_same_name_and_serial = set(HashWrapper(obj, name_and_serial_num)
    > for obj in some_collection)
    >
    > Cheers,
    > Chris
    > --
    > http://blog.rebertia.com
    >
    Austin Bingham, Oct 15, 2009
    #1
    1. Advertising

  2. Austin Bingham wrote:

    > That's definitely a workable solution, but it still rubs me the wrong
    > way. The uniqueness criteria of a set seems, to me, like a property of
    > the set, whereas the python model forces it onto each set element.


    This is a POV, but to to me, the set just deals with a very minimal
    protocol - hash-value & equality. Whatever you feed it, it has to cope with
    that. It strikes *me* as odd to ask for something else.

    The ideal solution would be to have several hash/eq-methods on your object,
    and somehow make the surroundings decide which to use. But there is no
    OO-pragma for that.

    > Another issue I have with the HashWrapper approach is its space
    > requirements. Logically, what I'm asking to do is switch out a single
    > function reference (i.e. to point at get_name() rather than hash()),
    > but in practice I'm forced to instantiate a new object for each of my
    > set members. On a large set, this could be disastrous.
    >
    > Don't get me wrong...your solution is a good one, but it's just not
    > what I am looking for.


    It is the only one you have, short of implementing set your own. And then
    the terms of implementation would be that you pass a hash- and eq-function
    & create wrapper-objects internally.

    Diez
    Diez B. Roggisch, Oct 15, 2009
    #2
    1. Advertising

  3. set using alternative hash function?

    On Thu, Oct 15, 2009 at 2:23 PM, Diez B. Roggisch <> wrote:
    > Austin Bingham wrote:
    > This is a POV, but to to me, the set just deals with a very minimal
    > protocol - hash-value & equality. Whatever you feed it, it has to cope with
    > that. It strikes *me* as odd to ask for something else.


    But I'm not really asking for anything that changes the paradigm of
    how things currently work. All I need is the ability to say something
    like this:

     s = set(hash_func = lambda x: hash(x.name))

    If set could be de-hardcoded from using hash(), nothing else about how
    it works would have to change. Or am I wrong on that? I see that you
    mention equality in the protocol, but I don't know why a set would
    need that if a) hash(x) == hash(y) --> x == y and b) hash function
    must return a 32 bit value (which I think is the case)...so maybe I
    misunderstand something.

    I wonder...does the requirement of using hash() have to do with the
    low-level implementation of set? That might explain the state of
    things.

    > The ideal solution would be to have several hash/eq-methods on your object,
    > and somehow make the surroundings decide which to use. But there is no
    > OO-pragma for that.


    But why force objects to know about all of the wacky contexts in which
    they might be used? Let objects be objects and hash-functions be
    hash-functions. If someone wants a set where uniqueness is defined by
    the number of occurrences of 'a' in an object's name, the object
    should never have to know that...it should just have a name.

    In any event, it sounds like set doesn't have any notion of switching
    out its hash function, and that more or less answers my question.

    Austin
    Austin Bingham, Oct 15, 2009
    #3
  4. Austin Bingham wrote:

    > On Thu, Oct 15, 2009 at 2:23 PM, Diez B. Roggisch <>
    > wrote:
    >> Austin Bingham wrote:
    >> This is a POV, but to to me, the set just deals with a very minimal
    >> protocol - hash-value & equality. Whatever you feed it, it has to cope
    >> with that. It strikes *me* as odd to ask for something else.

    >
    > But I'm not really asking for anything that changes the paradigm of
    > how things currently work. All I need is the ability to say something
    > like this:
    >
    > s = set(hash_func = lambda x: hash(x.name))
    >
    > If set could be de-hardcoded from using hash(), nothing else about how
    > it works would have to change. Or am I wrong on that? I see that you
    > mention equality in the protocol, but I don't know why a set would
    > need that if a) hash(x) == hash(y) --> x == y and b) hash function
    > must return a 32 bit value (which I think is the case)...so maybe I
    > misunderstand something.


    You do. Hashes can collide, and then you need equality. Sets are *based* on
    equality actually, the hash is just one optimization. Other optimizations
    build on order (usually, the STL requires you to define < on objects for
    that. Which is enough because a == b <=> a < b && b < a). But eventually,
    it all boils down to equality. You could implement a set without hash, by
    imposing linear behavior on insert and lookup - but none based purely on a
    hash.

    >
    > I wonder...does the requirement of using hash() have to do with the
    > low-level implementation of set? That might explain the state of
    > things.
    >
    >> The ideal solution would be to have several hash/eq-methods on your
    >> object, and somehow make the surroundings decide which to use. But there
    >> is no OO-pragma for that.

    >
    > But why force objects to know about all of the wacky contexts in which
    > they might be used? Let objects be objects and hash-functions be
    > hash-functions. If someone wants a set where uniqueness is defined by
    > the number of occurrences of 'a' in an object's name, the object
    > should never have to know that...it should just have a name.


    Because these functions need intimate knowledge of the objects to actually
    work. Sure, in python it's easy to define them outside, and just reach into
    the object as you wish. But aside this implementation detail, a hash (and
    equality of course) always are based upon the object in question. So I
    think it's natural to have it defined right there (which __hash__ and
    __eq__ show that this seems to be accepted in general).

    And if there were something that would decide on context which of several
    implementations to use, you'd have less to worry. As things are, there
    isn't such thing (I don't even have the slightest idea what could work),
    you are as well off with defining two functions.


    Diez
    Diez B. Roggisch, Oct 15, 2009
    #4
  5. On Thu, Oct 15, 2009 at 3:02 PM, Diez B. Roggisch <> wrote:
    > Austin Bingham wrote:
    > You do. Hashes can collide, and then you need equality. Sets are *based* on
    > equality actually, the hash is just one optimization. ...


    Right, thanks for clearing that up. Not reading closely enough ==
    public shaming ;)

    > Because these functions need intimate knowledge of the objects to actually
    > work. Sure, in python it's easy to define them outside, and just reach into
    > the object as you wish. But aside this implementation detail, a hash (and
    > equality of course) always are based upon the object in question. So I
    > think it's natural to have it defined right there (which __hash__ and
    > __eq__ show that this seems to be accepted in general).
    >
    > And if there were something that would decide on context which of several
    > implementations to use, you'd have less to worry. As things are, there
    > isn't such thing (I don't even have the slightest idea what could work),
    > you are as well off with defining two functions.


    But this "context decider" you're talking about sounds exactly like
    what I'm talking about. If I had some class with __hash1__ and
    __hash2__ (and associated equality operators), you're saying it would
    be nice to be able to select which to use based on the context (e.g.
    what type of set I'm using.) It might look like this:

    s = set(hash_func = lambda x: x.__hash2__, eq_func = lambda x, y:
    x.__eq2__(y))

    And if *that* works for you, do you still have a problem with:

    s = set(hash_func = lambda x: hash(x.name), eq_func = lambda x,y:
    x.name == y.name)

    ?

    If you don't like those, what would work for you? As far as I can
    tell, your "context decider" and my "alternative hash functions" are
    the same thing, and we just need to agree on a syntax.
    Austin Bingham, Oct 15, 2009
    #5
  6. >> And if there were something that would decide on context which of several
    >> implementations to use, you'd have less to worry. As things are, there
    >> isn't such thing (I don't even have the slightest idea what could work),
    >> you are as well off with defining two functions.

    >
    > But this "context decider" you're talking about sounds exactly like
    > what I'm talking about. If I had some class with __hash1__ and
    > __hash2__ (and associated equality operators), you're saying it would
    > be nice to be able to select which to use based on the context (e.g.
    > what type of set I'm using.) It might look like this:
    >
    > s = set(hash_func = lambda x: x.__hash2__, eq_func = lambda x, y:
    > x.__eq2__(y))
    >
    > And if *that* works for you, do you still have a problem with:
    >
    > s = set(hash_func = lambda x: hash(x.name), eq_func = lambda x,y:
    > x.name == y.name)
    >
    > ?
    >
    > If you don't like those, what would work for you? As far as I can
    > tell, your "context decider" and my "alternative hash functions" are
    > the same thing, and we just need to agree on a syntax.


    The context-decider isn't the same thing because it isn't designed yet :)
    And most probably won't ever be. It's just the abstract idea that
    hashing/equality change for one object depending on the circumstances they
    are used in, and that one wishes that the decision what to use is as simple
    & transparent as possible.

    Your approach is certainly viable, but I guess the current
    set-implementation is optimized on working with __hash__ and __eq__ on
    objects because for these exist slots in the python object structure in C.
    So while you can implement your idea, you will end up passing wrappers as
    Chris & me proposed into the current set implementation.

    However, it might be worth thinking of proposing this change to the set-type
    in general. But then for orthogonality, dicts should have it as well I
    guess. Which opens a whole new can of worms.

    I'm undecided on this. I think the cure is straight forward & simple to
    implement, and once you have the level to understand & need different
    hashing/equality, you should be comfortable writing this simple wrapper
    yourself. So for me personally, it's not a needed addition to the language.
    YMMV, and thus - go for it if you like.

    Diez
    Diez B. Roggisch, Oct 15, 2009
    #6
  7. On Thu, Oct 15, 2009 at 3:43 PM, Diez B. Roggisch <> wrote:
    > The context-decider isn't the same thing because it isn't designed yet :)
    > And most probably won't ever be. It's just the abstract idea that
    > hashing/equality change for one object depending on the circumstances they
    > are used in, and that one wishes that the decision what to use is as simple
    > & transparent as possible.


    Fair enough :)

    > Your approach is certainly viable, but I guess the current
    > set-implementation is optimized on working with __hash__ and __eq__ on
    > objects because for these exist slots in the python object structure in C.
    > So while you can implement your idea, you will end up passing wrappers as
    > Chris & me proposed into the current set implementation.


    Yes, I figured that the guts of set relied on particulars to which we
    are not privy at the python level. If the syntax let me describe sets
    the way I've been laying out here, I could probably tolerate the
    underlying implementation relying on wrappers.

    > However, it might be worth thinking of proposing this change to the set-type
    > in general. But then for orthogonality, dicts should have it as well I
    > guess. Which opens a whole new can of worms.


    dicts would certainly have to be looked at as well, but I don't think
    the can would have that many worms in it if we solve the set issue to
    everyone's satisfaction.

    In any event, thanks for helping me work through this issue.

    Austin
    Austin Bingham, Oct 15, 2009
    #7
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Austin Bingham

    set using alternative hash function?

    Austin Bingham, Oct 15, 2009, in forum: Python
    Replies:
    20
    Views:
    697
    Carl Banks
    Oct 17, 2009
  2. Chris Rebert

    Re: set using alternative hash function?

    Chris Rebert, Oct 15, 2009, in forum: Python
    Replies:
    2
    Views:
    302
    Chris Rebert
    Oct 15, 2009
  3. rp
    Replies:
    1
    Views:
    512
    red floyd
    Nov 10, 2011
  4. Srijayanth Sridhar
    Replies:
    19
    Views:
    608
    David A. Black
    Jul 2, 2008
  5. Patrick Put
    Replies:
    9
    Views:
    136
    Robert Klemme
    Feb 3, 2009
Loading...

Share This Page