hashCode

L

Lew

Patricia said:
Although that is the way round it is documented, they are both native
methods, and either could be a wrapper for the other, or they could both
be wrappers for a common native function.

Point taken. I should have said, "Which simply behaves indistinguishably from
the 'Object#hashCode()' method for its argument."

It being a literal wrapper is not important, only that it produces the same result
as 'Object#hashCode()' per its documentation.

Which latter in its Javadocs tells us that it's "typically implemented by converting
the internal address of the object into an integer".

All of which supports markspace's notion that the hash code is intended as a
sort-of address for programming purposes.

N.b., the promise of 'hashCode()' explicitly disallows using the result of
'System.identityHashCode()' as a guaranteed-unique object identifier.
 
D

Daniel Pitts

And then you get into situations like this:

if( object instanceof Hasher )
hash = ((Hasher) o).hashCode();
else
hash = System.identityHashCode( object );

And I think we all know to use polymorphism instead here. This is kind
of what I'm saying. The usefulness of a hash code was consider
intrinsic to any object, and they wanted to avoid the code above.
Therefore, Object::hashCode() becomes the design spec.
The polymorphism shouldn't have been on the object itself, Think about
how TreeSet needs either Comparable objects, or a Comparator.

interface Hasher<Type> {
int hash(Type t);
boolean isEqual(Type l, Type r);
}

So, that would change HashMap to take a Hasher<? super K> instance in
its constructor. A default Hasher<Object> could be implemented to use
System.identityHashCode and == for the common use-case.

This allows you to define different hashes and equality for the same set
of object classes. It also forces you to implement hash and isEqual
together.

It is, however, too late to "fix" the language in this way.
 
M

markspace

interface Hasher<Type> {
int hash(Type t);

Not really seeing how this is a good idea. How would you implement this?

So, that would change HashMap to take a Hasher<? super K> instance in
its constructor.

This is the problem; Map (and HashMap) were desired to be spec'd as
taking Object, not a subclass.
A default Hasher<Object> could be implemented to use
System.identityHashCode and == for the common use-case.


Again not seeing how you'd actually use that to put an object in a Map.
 
D

Daniel Pitts

Not really seeing how this is a good idea. How would you implement this?



This is the problem; Map (and HashMap) were desired to be spec'd as
taking Object, not a subclass.
Actually, they are Generic, so they are not spec'd to take Object, but
to take a specific subtype defined at compile time. At least, now that
they have the addition of Generics. Pre-generics, they still had
Comparators which had the same behavior that I'm describing, but instead
of defining buckets, they define an ordering. See below.
Again not seeing how you'd actually use that to put an object in a Map.

Example usage:
++++
// MyKeyHasher implements Hasher<MyKey>
Map<MyKey, MyValue> map=new HashMap<MyKey,MyValue>(new MyKeyHasher());

map.put(myFirstKey, myFirstValue);
map.put(mySecondKey, mySecondValue);
++++

This is directly analogous to how TreeMap and Comparator currently work
together:

++++
Map<MyKey, MyValue> map =
new TreeMap<MyKey, MyValue(new MyKeyComparator());

map.put(myFirstKey, myFirstValue);
map.put(mySecondKey, mySecondValue);
++++



Now, it would be my opinion that the standard library should provide a
IdentityHasher implementation as such:
++++
public final class IdentityHasher<T> implements Hasher<T> {
public int hash(T object) {
return System.identityHashCode(object);
}
public boolean isEqual(T l, T r) {
return l == r;
}
}
++++

There could of course be a singleton instance of this, similar to how
they have a singleton for Collections.emptySet();

Does that make more sense?
 
E

Eric Sosman

Actually, they are Generic, so they are not spec'd to take Object, but
to take a specific subtype defined at compile time. At least, now that
they have the addition of Generics. Pre-generics, they still had
Comparators which had the same behavior that I'm describing, but instead
of defining buckets, they define an ordering. See below.

Actually, "take" is insufficiently specific. A Map<K,V>
has a put() method with K,V parameters, and a putAll() method
with a Map<? extends K, ? extends V> parameter. To that extent,
Map<K,V> "takes" K.

But Map<K,V> *also* has get() and remove() and containsKey()
methods with Object parameters, not K parameters. (It also has a
containsValue() method taking Object, not V.) So insofar as
these methods are concerned said:
Example usage:
++++
// MyKeyHasher implements Hasher<MyKey>
Map<MyKey, MyValue> map=new HashMap<MyKey,MyValue>(new MyKeyHasher());

map.put(myFirstKey, myFirstValue);
map.put(mySecondKey, mySecondValue);
++++

This could sort of work, but not very well. As I wrote some
<hunt, hunt, hunt, ah!> seventeen days ago in this thread:

I don't think a HashCalculator interface along
the lines of Comparable would save the day.

The difficulty is that an external Hasher would have no access to
private fields of MyKey. That may seem a small drawback, since it
is rare to have a contributor to an object's "value" that is not
at the very least accessible through a getter. The Hasher might
need to make method calls where today's built-in hashCode() just
makes field references, but -- hey, how bad could that be?

IMHO, it could be pretty bad. Take java.lang.String, for
example: As things stand today hashCode() inspects the "value"
of the String, and everything it uses would be accessible to a
Hasher<String>. But hashCode() then caches the computed value
in a private field within String to avoid recomputing it on every
subsequent call! Could Hasher<String> do the same? How?

Okay, so the implementor of String perceives the problem and
decides that String itself will provide a default Hasher<String>
implementation. (This might be a static nested class, but it'd
probably be more efficient to have String implement Hasher<String>
directly, so every String is its own Hasher.) And the implementor
of BigInteger does the same, and so does the implementor of URL,
and of File, and of -- Hey, wait a minute! We're right back where
we began, except with more overhead and more verbiage!
 
M

markspace

Map<MyKey, MyValue> map=new HashMap<MyKey,MyValue>(new MyKeyHasher());
Does that make more sense?


I see. You're changing the specification arbitrarily to fit your needs.
That wasn't what I had in mind. Sure, if you just assume you can do
what you like you can re-write library code, but the idea was to hash
Object, without having to supply a "MyKeyHasher".
 
D

Daniel Pitts

I see. You're changing the specification arbitrarily to fit your needs.
That wasn't what I had in mind. Sure, if you just assume you can do
what you like you can re-write library code, but the idea was to hash
Object, without having to supply a "MyKeyHasher".

The point is that hashing "Object" isn't entirely sensible in most
situations. Granted, there are some situation where Identity is what
you really want to key off of, but they are far fewer than a "value"
comparison. On top of that, I've often needed two different ways to
"compare" object equality, depending on the context of that comparison.
The Hasher concept would decouple the comparison from the user of the
hash, and potentially from the hashable object.

Now, just like the Comparable interface exists, a Hashable interface
could also exist, for classes which provide a "default" comparison.
 
D

Daniel Pitts

Actually, "take" is insufficiently specific. A Map<K,V>
has a put() method with K,V parameters, and a putAll() method
with a Map<? extends K, ? extends V> parameter. To that extent,
Map<K,V> "takes" K.

But Map<K,V> *also* has get() and remove() and containsKey()
methods with Object parameters, not K parameters. (It also has a
containsValue() method taking Object, not V.) So insofar as
these methods are concerned, Map<K,V> "takes" Object.
Those are very good points. I had forgotten about the fact that get()
takes Object. The solution I come up with off the top of my head is to
update get/remove/etc... to all take the appropriate type (K or V). Yes,
this reduces some "functionality" of the class, but I doubt 99% of
programmers would care more than 1% of the time. I have used this
"feature", but only because proper typing wasn't done on an existing
legacy project.

Keep in mind, I'm talking about what Map *could* have been, not what it
could become.
This could sort of work, but not very well. As I wrote some
<hunt, hunt, hunt, ah!> seventeen days ago in this thread:

I don't think a HashCalculator interface along
the lines of Comparable would save the day.

The difficulty is that an external Hasher would have no access to
private fields of MyKey. That may seem a small drawback, since it
is rare to have a contributor to an object's "value" that is not
at the very least accessible through a getter. The Hasher might
need to make method calls where today's built-in hashCode() just
makes field references, but -- hey, how bad could that be?

IMHO, it could be pretty bad. Take java.lang.String, for
example: As things stand today hashCode() inspects the "value"
of the String, and everything it uses would be accessible to a
Hasher<String>. But hashCode() then caches the computed value
in a private field within String to avoid recomputing it on every
subsequent call! Could Hasher<String> do the same? How?

Nothing would prevent String from having a hashCode method which behaves
exactly as it does today. The StringHasher class would simply delegate
to the String.hashCode() method, which allows String to cache as expected.
Okay, so the implementor of String perceives the problem and
decides that String itself will provide a default Hasher<String>
implementation. (This might be a static nested class, but it'd
probably be more efficient to have String implement Hasher<String>
directly, so every String is its own Hasher.) And the implementor
of BigInteger does the same, and so does the implementor of URL,
and of File, and of -- Hey, wait a minute! We're right back where
we began, except with more overhead and more verbiage!
Oh no, dreaded letters in my source code. I must reduce it down to as
little as possible, even if it means having one class be responsible for
15 things.

Java is verbose, but that shouldn't determine what is good design and
what isn't. Also, as I've suggested elsethread, having a Hashable
interface (similar to Comparable) would allow Objects to define a
sensible default comparison/hashing algorithm *specific to that class*.

A class with subclasses has no sensible algorithm (unless it takes into
account the actual type before comparison). This is the use-case where
Hasher makes the most sense. The user of the objects of the class can
specify what they care about in the equality of two objects, even if
those objects are of different specific types.
 
M

markspace

The point is that hashing "Object" isn't entirely sensible in most
situations.

Why not? I do it all the time. Put an object in, get an object out.
Works great.

Seriously, I think you're not being intellectually honest about the
value of having a built-in hashCode() for every object.
 
D

Daniel Pitts

Why not? I do it all the time. Put an object in, get an object out.
Works great.
Do you put an "Object" in, or an "object" in, here is a big difference.
Also, how often do you put an object in expecting it to be hashed based
on value vs based on identity? My point was that most of the time the
expectation is that the hash is based on value, not identity.

Seriously, I think you're not being intellectually honest about the
value of having a built-in hashCode() for every object.
And you're not being intellectually honest about the cost of having a
built-in hashCode() for every object.

I'm not saying there isn't value, but that there would be just as much
value in using alternative approaches, including using an external
Hasher. And those approaches provide additional flexibility that isn't
available in the current library.
 
E

Eric Sosman

Do you put an "Object" in, or an "object" in, here is a big difference.
Also, how often do you put an object in expecting it to be hashed based
on value vs based on identity? My point was that most of the time the
expectation is that the hash is based on value, not identity.

Perhaps you're thinking too much about HashMap, and maybe
not enough about HashSet.
 
G

Gene Wirchenko

On 8/29/12 11:49 AM, Eric Sosman wrote:
[snip]
Okay, so the implementor of String perceives the problem and
decides that String itself will provide a default Hasher<String>
implementation. (This might be a static nested class, but it'd
probably be more efficient to have String implement Hasher<String>
directly, so every String is its own Hasher.) And the implementor
of BigInteger does the same, and so does the implementor of URL,
and of File, and of -- Hey, wait a minute! We're right back where
we began, except with more overhead and more verbiage!
Oh no, dreaded letters in my source code. I must reduce it down to as
little as possible, even if it means having one class be responsible for
15 things.

"There are three things a man must do
before his life is done;
Write two lines in APL,
And make the buggers run."
-- Stan Kelly-Bootle, "The Devil's DP Dictionary"
as quoted in http://news.ycombinator.com/item?id=1041500

[snip]

Sincerely,

Gene Wirchenko
 
J

Jan Burse

Eric said:
Perhaps you're thinking too much about HashMap, and maybe
not enough about HashSet.

HashSet is built on to of HashMap.
It is dummy object in, dummy object out.

The dummy object is declared as follows:

// Dummy value to associate with an Object in the backing Map
private static final Object PRESENT = new Object();

I guess the above implementation is a result of code
"reuse", not in the copy/paste sense, but in the OO
sense. There is no effort to reduce the memory footprint
for HashSet.

Bye
 
J

Jim Janney

markspace said:
Why not? I do it all the time. Put an object in, get an object
out. Works great.

I've worked on code that failed in mysterious ways because someone used
a HashMap assuming object identity, and then, a few years later, someone
else defined equals on one of the subclasses. If you need identity
semantics, use java.util.IdentityHashMap: that's what it's for.

In practice, when I use HashMap the keys are almost always Strings, or
else classes designed for use as keys.
 
D

Daniel Pitts

Perhaps you're thinking too much about HashMap, and maybe
not enough about HashSet.
How so?

HashSet is again often used for value deduplication, not as often
identity deduplication. My point remains that for most use-cases, having
hashCode() and equals() on Object isn't necessary and adds clutter.
 
D

Daniel Pitts

I've worked on code that failed in mysterious ways because someone used
a HashMap assuming object identity, and then, a few years later, someone
else defined equals on one of the subclasses. If you need identity
semantics, use java.util.IdentityHashMap: that's what it's for.

In practice, when I use HashMap the keys are almost always Strings, or
else classes designed for use as keys.
This is exactly my point, thanks. If they are going to be keys in a
HashMap, or values in a HashSet, they need to be designed appropriately.
Providing a "default" implementation that may be overridden in
subclasses unexpectedly is asking for bugs.
 
A

Andreas Leitgeb

Daniel Pitts said:
My point remains that for most use-cases, having
hashCode() and equals() on Object isn't necessary
and adds clutter.

I don't agree to this particular point, but I agree on
that a (non-generic) Hasher *interface* and a variant of
HashMap accepting such a Hasher and using that on the
keys instead of the keys' own methods, could be useful.

Hasher's hashCode taking Object could throw ClassCastException
for unsupported objects, which the HashMap could specifically
catch to shortcut the search for such an element in the map.
(Each implementation of Hasher would *tell* its own supported
objects, so that wouldn't be a problem with erasure.)
e.g.: a StringHeadHasher that defined equality on the first n
chars of a String would know what is a String or not, even if
the HashMap<String,...> itself doesn't, for erasure-reasons.

null would be a legal key, *iff* the Hasher supports it.

Such a separate Hasher could also be implemented across
subclass-boundaries (actually it would be automatically,
unless it is done for final classes, or does getClass()
inspection on the objects). Cross-Implementation equalities
are principially already known from List and Set (by their
contracts).

If such had existed from start, then at least there wouldn't
have been a need for a special IdentityHashMap. I think to
remember coming across other usecases in the past. (Of course,
there was always a workaround - of varying ugli- or roundabout-
ness.)

Anyway, nothing of that sort is likely to happen, so it's just
for the sake of discussion and learning new ideas in the course.
 
A

Arne Vajhøj


It will be code that accept all types of objects and some of them
will be distinct for each instance other will implement some type
of value equality.

That code smells.
I do it all the time. Put an object in, get an object out.
Works great.

The discussion is about keys not values.

Arne
 
E

Eric Sosman


?

package deal;
public class ToBeHashed {
private int noGettersForMeThanks;
...
}

package express;
Hasher<ToBeHashed> hasher = new Hasher<ToBeHashed>() {
public int hashCode(ToBeHashed obj) {
// How do you obtain the value of
// obj.noGettersForMeThanks
// if that would be useful?
}
...
}

As an example of why a hasher might want access to a strictly-private
field, I offered String: How could a Hasher<String> (outside String
itself) use String's private `hash' element? (And before you say
"Give String a getHash() method," ponder what hashCode() does.)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,534
Members
45,007
Latest member
obedient dusk

Latest Threads

Top