hashCode

P

Patricia Shanahan

To: Joshua Cranmer
From: Patricia Shanahan <[email protected]>

Better in the sense that you would never HAVE to override hashCode.

Now, there are cases where you HAVE to override it, or your code is
very broken.

Returning a constant hash code is correct in the same sense that
answering "yes" to the question "Can you tell me the correct way to do
this?" would be--syntactically and semantically correct, but completely
contrary to the actual intent of the question.

The point of the hash code is to provide a cheap way to quickly
distinguish inputs (in the sense that Pr(a.hashCode() == b.hashCode()
and !a.equals(b)) should be as small as possible [1]). A constant-value
hash completely negates the purpose of the hash code, and this renders
the hashCode again completely unusable for anything that actually wants
to use it.

In the default case, a.hashCode() == b.hashCode() only when a == b (this
definitely holds true with 32-bit machines and I'm pretty sure it still
holds true with 64-bit machines, but I'd have to reverify the JVM source
code to be certain). It is thus correct so long as identity equals is
correct. It is also potentially correct in a limited set of cases where
a.equals(b) and a != b. In all of these cases, it would not only be
correct but also extremely useful, having pretty strong guarantees about
the distribution of hash values.

[1] Actually, for good performance, hash codes should go one step
further and make slightly stronger guarantees about independence with
respect to the size of the hash table. But I digress.

I think there are two reasonably usable ways of handling this issue. One is the
current arrangement, in which every class has a hashCode that is expected to be
usable for selecting a hash table bucket.

Keeping hashCode as an Object method but making it useless for bucket selection
unless overridden would not be a good alternative.

A more reasonable alternative would be to have hashCode as the only member of a
HashKey interface that would be implemented by every class whose objects are
intended to be suitable for use as has keys. Those objects that have a hashCode
would still have to have a usable one, but some classes would not implement
HashKey and not have a hashCode at all.

Patricia

--- BBBS/Li6 v4.10 Dada-1
* Origin: Prism bbs (1:261/38)
--- Synchronet 3.16a-Win32 NewsLink 1.98
Time Warp of the Future BBS - telnet://time.synchro.net:24
 
J

Jan Burse

To: Joshua Cranmer
From: Jan Burse <[email protected]>

Joshua said:
In the default case, a.hashCode() == b.hashCode() only when a == b (this
definitely holds true with 32-bit machines and I'm pretty sure it still
holds true with 64-bit machines, but I'd have to reverify the JVM source
code to be certain).

Not at all. Even not for 32-bit machines. Not only because the hashCode()
usually uses less than 32 bits. But also for other reasons, namely the internal
algorithm how the hash is produced (although the below bug doesn't reveal much
internals):

http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6321873

But in the bottom of the above bug you can find code that searches for a clash.
It can take quite a while, but it is not excluded. This is what a sample run
produced according to the bug:

62028120: java.lang.Object@9cab16 - java.lang.Object@9cab16

So the 62028120-th new Object produced the same hashcode.

So

--- BBBS/Li6 v4.10 Dada-1
* Origin: Prism bbs (1:261/38)
--- Synchronet 3.16a-Win32 NewsLink 1.98
Time Warp of the Future BBS - telnet://time.synchro.net:24
 
E

Eric Sosman

To: Patricia Shanahan
From: Eric Sosman <[email protected]>

[...]
I think there are two reasonably usable ways of handling this issue. One
is the current arrangement, in which every class has a hashCode that is
expected to be usable for selecting a hash table bucket.

Keeping hashCode as an Object method but making it useless for bucket
selection unless overridden would not be a good alternative.

A more reasonable alternative would be to have hashCode as the only
member of a HashKey interface that would be implemented by every class
whose objects are intended to be suitable for use as has keys. Those
objects that have a hashCode would still have to have a usable one, but
some classes would not implement HashKey and not have a hashCode at all.

Ugh. So if J. Random Programmer is too lazy or unimaginative to
write hashCode(), that means I can't use his class as a HashMap key, or even
put instances in a HashSet? Ugh, again.

(And, no: I don't think a HashCalculator interface along the lines
of Comparable would save the day.)

--
Eric Sosman
(e-mail address removed)

--- BBBS/Li6 v4.10 Dada-1
* Origin: Prism bbs (1:261/38)
--- Synchronet 3.16a-Win32 NewsLink 1.98
Time Warp of the Future BBS - telnet://time.synchro.net:24
 
P

Patricia Shanahan

To: Eric Sosman
From: Patricia Shanahan <[email protected]>

[...]
I think there are two reasonably usable ways of handling this issue. One
is the current arrangement, in which every class has a hashCode that is
expected to be usable for selecting a hash table bucket.

Keeping hashCode as an Object method but making it useless for bucket
selection unless overridden would not be a good alternative.

A more reasonable alternative would be to have hashCode as the only
member of a HashKey interface that would be implemented by every class
whose objects are intended to be suitable for use as has keys. Those
objects that have a hashCode would still have to have a usable one, but
some classes would not implement HashKey and not have a hashCode at all.

Ugh. So if J. Random Programmer is too lazy or unimaginative to
write hashCode(), that means I can't use his class as a HashMap key,
or even put instances in a HashSet? Ugh, again.

(And, no: I don't think a HashCalculator interface along the lines
of Comparable would save the day.)

I'm not saying that it would be better than the current situation, just better
than having hashCode implementations that appear to be there, but in practice
must not be used for hash bucket selection.

Patricia

--- BBBS/Li6 v4.10 Dada-1
* Origin: Prism bbs (1:261/38)
--- Synchronet 3.16a-Win32 NewsLink 1.98
Time Warp of the Future BBS - telnet://time.synchro.net:24
 
L

Lew

To: Jan Burse
From: Lew <[email protected]>

Jan said:
There is no reference to Comparable and the compareTo. If I am not
True.

using HashMap or HashSet in my application, and still override
equals, I do not need to implement hashCode(). I could for example
True.

use TreeMap or TreeSet and implement the Comparable interface. There
is a reference to equals() back from Comparable though.

This requires a detailed knowledge of the implementation of the particular
'Map' or 'Set'. If its searches do not employ hash codes, then you do not need
to implement 'hashCode()' for value-equal types.

In the general case one prefers to underspecify the mechanics, that is, allow a
client of the type to deploy either 'equals()'-based or hash-based algorithms,
by ensuring the methods are consistent with each other per Joshua Bloch and
other notables.

Even for cases where one predicts the use will not require one or another of
the consistency practices, it is harmless to enforce them.

There are four methods a type might use to represent equality or identity and
deviations therefrom: 'equals()' of course, and 'hashCode()', 'compareTo()',
and 'toString()'. There may be external 'Comparator's on that type.

All these methods on or of a type, where they exist, should be consistent,
absent an overwhelming and sound reason not to be.

The default case from 'Object' is that 'equals()' implements ==, 'hashCode()'
returns something sort of like an address of the instance which is nearly
always unique within a given JVM run, 'toString()' returns a decorated string
version of the hash code, and 'compareTo()' doesn't exist.

To express value equality in a type, one must override 'equals()'. The rest are
optional in Arne's strictest and most correct use of that notion. It is also
harmless to keep the rest consistent and nearly always (as in you might
encounter one instance per career otherwise) useful.

One generally chooses 'TreeMap' for its sorting capability, not its reliance on
'equals()' vs. 'hashCode()' to achieve that. (Assuming it has such a reliance.)
Unless there's a compelling argument to rely on 'TreeMap''s undocumented
non-use of hash codes, why do so?

Yes, I am aware that the 'TreeMap' documentation makes it clear that
'hashCode()' shouldn't be involved. Without promising it isn't. But a
'Comparator' might use hash codes under the hood in a naive attempt to
shortcut a comparison, not knowing that the base type assumes no one would do
such a thing. Or some later client might need only equality and not order, and
throw the base type into a 'HashMap' with surprising results.

Should you design anything that violates the consistency rule, then please do
both Javadoc and code-comment it properly.

--
Lew
Honi soit qui mal y pense.
http://upload.wikimedia.org/wikipedia/commons/c/cf/Friz.jpg

--- BBBS/Li6 v4.10 Dada-1
* Origin: Prism bbs (1:261/38)
--- Synchronet 3.16a-Win32 NewsLink 1.98
Time Warp of the Future BBS - telnet://time.synchro.net:24
 
A

Arne Vajhøj

To: Robert Klemme
From: Arne Vajhoj <[email protected]>

Well, I would go as far as to say that it will perform poorly in all
cases where hashCode() is actually needed

More or less.
- and that makes it broken.

This thread started about whether it is correct. The term correct has a very
specific meaning in programming and that always return 1 code is correct.

Now you talk about broken. If broken means not good, then you are right. If
broken means not correct, then you are wrong. I suspect that broken is not very
well defined.
The whole idea of hashing is based on the fact that the hash code
_somehow_ represents the item to be hashed. If all items have the same
constant hash code there is no point in hashing at all. So while it
does work, it does not work as intended.

It disable the entire hashing functionality and a HashMap get characteristics
of ArrayList.

But the code should actually work.

Arne

--- BBBS/Li6 v4.10 Dada-1
* Origin: Prism bbs (1:261/38)
--- Synchronet 3.16a-Win32 NewsLink 1.98
Time Warp of the Future BBS - telnet://time.synchro.net:24
 
A

Arne Vajhøj

To: Patricia Shanahan
From: =?UTF-8?B?QXJuZSBWYWpow7hq?= <[email protected]>

Keeping hashCode as an Object method but making it useless for bucket
selection unless overridden would not be a good alternative.

A more reasonable alternative would be to have hashCode as the only
member of a HashKey interface that would be implemented by every class
whose objects are intended to be suitable for use as has keys. Those
objects that have a hashCode would still have to have a usable one, but
some classes would not implement HashKey and not have a hashCode at all.

That would avoid using the hash for classes where it does not make much sense.

I like it.

Given that there is a gazillion lines of code out there that stores Object,
then it can not be changed.

Arne

--- BBBS/Li6 v4.10 Dada-1
* Origin: Prism bbs (1:261/38)
--- Synchronet 3.16a-Win32 NewsLink 1.98
Time Warp of the Future BBS - telnet://time.synchro.net:24
 
A

Arne Vajhøj

To: Lew
From: =?UTF-8?B?QXJuZSBWYWpow7hq?= <[email protected]>

Even for cases where one predicts the use will not require one or
another of the consistency practices, it is harmless to enforce them.

There are four methods a type might use to represent equality or
identity and deviations therefrom: 'equals()' of course, and
'hashCode()', 'compareTo()', and 'toString()'. There may be external
'Comparator's on that type.

All these methods on or of a type, where they exist, should be
consistent, absent an overwhelming and sound reason not to be.
Should you design anything that violates the consistency rule, then
please do both Javadoc and code-comment it properly.

I agree.

(well toString is not high on my priority list for what needs to behave certain
ways, but ...)

To make it easier to get them consistent, then I think it would be nice with
syntactic sugar.

Like:

public class Data {
@ValueId
private int iv;
@ValueId
private String sv;
// the rest of the class
}

which would cause the compiler to emit equals and hashCode methods that are
test of the @ValueId fields are equal and creates a "decent" hash.

public class Data implements Comparable<Data> {
@ValueId
private int iv;
@ValueId
private String sv;
// the rest of the class
}

could cause the compiler to output a compareTo as well.

It is invisible code, but we already got that with the default constructor.

And it will help make those methods consistent.

Of course explicit override should still be possible.

Arne

--- BBBS/Li6 v4.10 Dada-1
* Origin: Prism bbs (1:261/38)
--- Synchronet 3.16a-Win32 NewsLink 1.98
Time Warp of the Future BBS - telnet://time.synchro.net:24
 
R

Robert Klemme

To: Arne Vajhøj
From: Robert Klemme <[email protected]>

More or less.


This thread started about whether it is correct. The term correct
has a very specific meaning in programming and that always return 1
code is correct.

Now you talk about broken.

Actually it wasn't me who brought up the term. :)
If broken means not good, then you are right.
If broken means not correct, then you are wrong. I suspect
that broken is not very well defined.

Right. :)
It disable the entire hashing functionality and a HashMap get
characteristics of ArrayList.

An ArrayList allows multiple occurrences of the same instance - and does not
store key value pairs. That won't be the case with HashMap as equals() (if
properly implemented) will prevent that (albeit slowly, or more correct: slower
than with a proper implementation of hashCode()). Also, a HashMap will
degenerate more to a LinkedList via the chaining of a bucket's entries.
But the code should actually work.

Yes, sort of - depending on whether O(1) is a requirement or not. Still, the
idea to implement hashCode() in Object in a way to return a constant to avoid
necessity of overriding it is crazy - especially since you then cannot
efficiently use Object as hash key - which you can today.

Kind regards

robert


--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

--- BBBS/Li6 v4.10 Dada-1
* Origin: Prism bbs (1:261/38)
--- Synchronet 3.16a-Win32 NewsLink 1.98
Time Warp of the Future BBS - telnet://time.synchro.net:24
 
A

Arne Vajhøj

To: Robert Klemme
From: Arne Vajhoj <[email protected]>

An ArrayList allows multiple occurrences of the same instance - and does
not store key value pairs. That won't be the case with HashMap as
equals() (if properly implemented) will prevent that (albeit slowly, or
more correct: slower than with a proper implementation of hashCode()).
Also, a HashMap will degenerate more to a LinkedList via the chaining of
a bucket's entries.

I guess my statement was a bit misleading.

.... get O(1) characteristics for getting data similar to various List
implementation.

Arne

--- BBBS/Li6 v4.10 Dada-1
* Origin: Prism bbs (1:261/38)
--- Synchronet 3.16a-Win32 NewsLink 1.98
Time Warp of the Future BBS - telnet://time.synchro.net:24
 
A

Arne Vajhøj

To: Robert Klemme
From: Arne Vajhoj <[email protected]>

Yes, sort of - depending on whether O(1) is a requirement or not.

That is a rare functional requirement.

(common non-functional)
Still,
the idea to implement hashCode() in Object in a way to return a constant
to avoid necessity of overriding it is crazy - especially since you then
cannot efficiently use Object as hash key - which you can today.

Actually the more I think about the question the more logical it sounds!

We are looking at two alternatives: A) default functions properly but good
performance requires an override B) default gives good performance but may need
an override to function properly

#A actually sounds more the Java way to me.

Arne

--- BBBS/Li6 v4.10 Dada-1
* Origin: Prism bbs (1:261/38)
--- Synchronet 3.16a-Win32 NewsLink 1.98
Time Warp of the Future BBS - telnet://time.synchro.net:24
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads

multiple inheritance 5
multiple inheritance 14
Bluetooth programming 0
multiple inheritance 6
multiple inheritance 5
Bluetooth programming 0
Bluetooth programming 0
Swing 7

Members online

No members online now.

Forum statistics

Threads
473,764
Messages
2,569,567
Members
45,041
Latest member
RomeoFarnh

Latest Threads

Top