hashCode

M

Mike Winter

]
In my essay I recommend XOR which is an inherentely faster operation
than multiply.

Hasn't that been wrong since about the invention of the 80386 processor
family ?

Not that far back: the Pentium required 9-11 cycles to complete a MUL
instruction compared to 1-3 for XOR (and the like), depending on operand
locations and widths.
Pretty sure by now MUL and XOR both take one cycle and that's it.

More-or-less, but the former is still slower for wider operands.
However, your point is well-taken: it needn't be as much a concern in
most cases.
 
L

Lew

If the code per specs is guaranteed to work then it is correct.

Good (or just decent) performance is not necessary for code to
be correct.

At least not in the traditional programming terminology.

In plain English maybe.

I see your point, but that is not to say that the specs exclude performance
considerations.

In the case of 'hashCode()', the Javadocs do say, "This method is supported
for the benefit of hash tables such as those provided by HashMap."
<http://docs.oracle.com/javase/7/docs/api/java/lang/Object.html#hashCode()>

The key question here is how you define "benefit". I argue that a hash code
that is constant does not benefit, say, a 'HashMap' because one of our desired
uses is constant-order retrieval.

"This implementation provides constant-time performance for the basic
operations (get and put), assuming the hash function disperses the elements
properly among the buckets."
<http://docs.oracle.com/javase/7/docs/api/java/util/HashMap.html>

Each specification refers to the other. Ergo they are meant to be considered
together. Taken together, the documentation clearly specifies that "correct"
or "proper" includes performance considerations. Therefore, by what you say,
the simple "return 1;" is not correct.

It certainly would not be correct for the 'Object' implementation.
"As much as is reasonably practical, the hashCode method defined by class
Object does return distinct integers for distinct objects." [op. cit.]

As you say, Arne, "correct" means it follows the spec. The OP's suggested
implementation violates the spec on two fronts.
 
L

Lew

Eric said:
Okay: Then returning a constant 1 (or 42 or 0 or whatever)
would in fact satisfy the letter of the law regarding hashCode():

Not if you consider all aspects of what the Javadocs promise.

See my post upthread.
Whenever x.equals(y) is true, x.hashCode() == y.hashCode(). In
your example this would be trivially true because x,y,z,... all
have the same hashCode() value, whether they're equal or not --
You have lived up to the letter of the law.

No, because the law requires that the method support 'HashMap', which in turn
calls for "properly" hashed objects.
Of course, such a hashCode() would make all those hash-based
containers pretty much useless: They would work in the sense that
they would get the Right Answer, but they'd be abominably slow,
Indeed.

with expected performance of O(N) instead of O(1). See
<http://www.cs.rice.edu/~scrosby/hash/CrosbyWallach_UsenixSec2003/>
for a survey of some denial-of-service attacks that work by driving
hash tables from O(1) to O(N), resulting in catastrophic failure
of the attacked system.

In other words, the letter of the law on hashCode() is a bare
minimum that guarantees correct functioning, but it is not enough
to guarantee usability. Why isn't the law more specific? Because

Actually, if you consider all that the Javadocs tell you, this "letter of the
law" to which you refer is like saying the sequence "ABC" constitutes all of
"the ABCs".
nobody knows how to write "hashCode() must be correct *and* usable"
in terms that would cover all the classes all the Java programmers
have dreamed up and will dream up. Your hashCode() meets the bare
minimum requirement, but is not "usable." The actual hashCode()
provided by Object also meets the bare minimum requirement, and *is*
usable as it stands, until (and unless; you don't HAVE to) you
choose to implement other equals() semantics, and a hashCode() to
match them.

As Arne states, "correct" means "fulfills the specification". The
specification for Java API methods is the standard Javadocs, which do impose
performance considerations on 'hashCode()'.

One understands that the spec isn't always fully enforceable by the compiler.
[1] It is correct that the compiler will allow 'return 1;'. It is not correct
that that fulfills the specification.

[1] Doesn't one?
 
L

Lew

Jan said:
Maybe it would make sense to spell out what the contract
for hashCode() is. Well the contract is simply, the
following invariant should hold:

/* invariant that should hold */
if a.equals(b) then a.hashCode()==b.hashCode()

True, but if you read the specification for 'hashCode()' fully, that is not
the entire contract, only the compiler-enforceable part of it.

The entire specification requires that as much as feasible, the 'Object'
implementation distinguish distinct instances, and that the method generally
support 'HashMap', which promises O(1) 'get()' and 'put()' with a "proper"
(i.e., compliant) 'hashCode()'.
 
L

Lew

Lew said:
True, but if you read the specification for 'hashCode()' fully, that is not
the entire contract, only the compiler-enforceable part of it.

Oooops!

I made a mistake.

Not even that is compiler-enforceable.
 
A

Arne Vajhøj

I see your point, but that is not to say that the specs exclude
performance considerations.

In the case of 'hashCode()', the Javadocs do say, "This method is
supported for the benefit of hash tables such as those provided by
HashMap."
<http://docs.oracle.com/javase/7/docs/api/java/lang/Object.html#hashCode()>

The key question here is how you define "benefit". I argue that a hash
code that is constant does not benefit, say, a 'HashMap' because one of
our desired uses is constant-order retrieval.

Object having the method defined to support effective hashing
does not imply that it has to it just means that the potential
is there.
"This implementation provides constant-time performance for the basic
operations (get and put), assuming the hash function disperses the
elements properly among the buckets."

Yes. And here it makes an assumption. Not that hashCode is implemented
correct, but that it is implemented in a certain way.
Each specification refers to the other. Ergo they are meant to be
considered together. Taken together, the documentation clearly specifies
that "correct" or "proper" includes performance considerations.
Therefore, by what you say, the simple "return 1;" is not correct.
As you say, Arne, "correct" means it follows the spec. The OP's
suggested implementation violates the spec on two fronts.

No it does not.

It follows exactly the explicit stated contract in the Java doc:

<quote>
The general contract of hashCode is:

Whenever it is invoked on the same object more than once during an
execution of a Java application, the hashCode method must consistently
return the same integer, provided no information used in equals
comparisons on the object is modified. This integer need not remain
consistent from one execution of an application to another execution of
the same application.
If two objects are equal according to the equals(Object) method,
then calling the hashCode method on each of the two objects must produce
the same integer result.
It is not required that if two objects are unequal according to the
equals(java.lang.Object) method, then calling the hashCode method on
each of the two objects must produce distinct integer results. However,
the programmer should be aware that producing distinct integer results
for unequal objects may improve the performance of hashtables.
</quote>

The ability to support something does not make it part of the contract.

This is a classic test question in basic Java SE. And that returning
a constant is correct but not smart should be in most Java SE
text books.

Arne
 
A

Arne Vajhøj

Eric said:
Okay: Then returning a constant 1 (or 42 or 0 or whatever)
would in fact satisfy the letter of the law regarding hashCode():

Not if you consider all aspects of what the Javadocs promise.

See my post upthread.
Whenever x.equals(y) is true, x.hashCode() == y.hashCode(). In
your example this would be trivially true because x,y,z,... all
have the same hashCode() value, whether they're equal or not --
You have lived up to the letter of the law.

No, because the law requires that the method support 'HashMap', which in
turn calls for "properly" hashed objects.
Of course, such a hashCode() would make all those hash-based
containers pretty much useless: They would work in the sense that
they would get the Right Answer, but they'd be abominably slow,
Indeed.

with expected performance of O(N) instead of O(1). See
<http://www.cs.rice.edu/~scrosby/hash/CrosbyWallach_UsenixSec2003/>
for a survey of some denial-of-service attacks that work by driving
hash tables from O(1) to O(N), resulting in catastrophic failure
of the attacked system.

In other words, the letter of the law on hashCode() is a bare
minimum that guarantees correct functioning, but it is not enough
to guarantee usability. Why isn't the law more specific? Because

Actually, if you consider all that the Javadocs tell you, this "letter
of the law" to which you refer is like saying the sequence "ABC"
constitutes all of "the ABCs".
nobody knows how to write "hashCode() must be correct *and* usable"
in terms that would cover all the classes all the Java programmers
have dreamed up and will dream up. Your hashCode() meets the bare
minimum requirement, but is not "usable." The actual hashCode()
provided by Object also meets the bare minimum requirement, and *is*
usable as it stands, until (and unless; you don't HAVE to) you
choose to implement other equals() semantics, and a hashCode() to
match them.

As Arne states, "correct" means "fulfills the specification". The
specification for Java API methods is the standard Javadocs, which do
impose performance considerations on 'hashCode()'.

One understands that the spec isn't always fully enforceable by the
compiler. [1] It is correct that the compiler will allow 'return 1;'. It
is not correct that that fulfills the specification.

It fulfills the spec.

It does not fulfill you bizarre interpretation of "support".

Arne
 
A

Arne Vajhøj

True, but if you read the specification for 'hashCode()' fully, that is
not the entire contract, only the compiler-enforceable part of it.

The entire specification requires that as much as feasible, the 'Object'
implementation distinguish distinct instances, and that the method
generally support 'HashMap', which promises O(1) 'get()' and 'put()'
with a "proper" (i.e., compliant) 'hashCode()'.

Two wrong statements.

It says that the method is defined to support HashMap

And HashMap does not guarantee O(1) with a correct
hashCode - it guarantee that for one that return
good distributed values.

Arne
 
A

Arne Vajhøj

This is a classic test question in basic Java SE. And that returning
a constant is correct but not smart should be in most Java SE
text books.

Effective Java / Joshua Bloch:

<quote>
// The worst possible legal hash function - never use!
public int hashCode() { return 42; }

It is legal because it ensures that equal objects have the
same hash code. It's atrocious because ...
</quote>

Java 2 SUN Certified Programmer & Developer / Kathy Sierra & Bert Bates:

<quote>
A hashCode() that returns the same value for all instances whether
they're equal or not is still a legal - even appropriate - hashCode()
method! For example,
public int hashCode() {
return 1492;
}
would not violate the contract
....
This hashCode() method is horrible inefficient, ...
....
Nontheless, this one-hash-fits-all method would be
considered appropriate and even correct because it
doesn't violate the contract. Once more, correct does
not necessarily mean good.
</quote>

Arne
 
A

Arne Vajhøj

In my essay I recommend XOR which is an inherentely faster operation
than multiply. I wonder which actually works out better.

Multiply.

XOR has several problems:
- many small values give small result
- same values in different fields give same result
- two identical values give result zero
+ all those I did not think of.
If you had a
large number of fields, the multiply effect could fall off the left
hand end. It is the algorithm used for String which could have very
long strings, so Sun must have thought of that.

The multiply effect does not fall off the left with a value like 31
(it would with 32).

Arne
 
E

Eric Sosman

Effective Java / Joshua Bloch:

<quote>
// The worst possible legal hash function - never use!
public int hashCode() { return 42; }

It is legal because it ensures that equal objects have the
same hash code. It's atrocious because ...
</quote>

Java 2 SUN Certified Programmer & Developer / Kathy Sierra & Bert Bates:

<quote>
A hashCode() that returns the same value for all instances whether
they're equal or not is still a legal - even appropriate - hashCode()
method! For example,
public int hashCode() {
return 1492;
}
would not violate the contract
...
This hashCode() method is horrible inefficient, ...
...
Nontheless, this one-hash-fits-all method would be
considered appropriate and even correct because it
doesn't violate the contract. Once more, correct does
not necessarily mean good.
</quote>

All this means is that people know how to describe a "correct"
hashCode(), but nobody knows how to describe a "usable" hashCode()
in terms that apply testably to all circumstances.

The O.P. asked whether it would "be potentially better" if
Object's hashCode() returned a constant. He did *not* ask whether
such an implementation would be correct; he only asked if it would
"be potentially better." Upon prompting he explained what he
meant by "better," and in light of that explanation the answer
to his original question is NO. Discussions about "Oh, but it's
CORRECT" are just red herrings; it's still not "better."
 
A

Arne Vajhøj

The O.P. asked whether it would "be potentially better" if
Object's hashCode() returned a constant. He did *not* ask whether
such an implementation would be correct; he only asked if it would
"be potentially better." Upon prompting he explained what he
meant by "better," and in light of that explanation the answer
to his original question is NO. Discussions about "Oh, but it's
CORRECT" are just red herrings; it's still not "better."


The original questions were:

#Is it always technically correct to override the hashCode function
#like so:
#
# @Override
# public int hashCode() {
# return 1;
# }

For which the answer is YES. Per documentation.

But with really poor performance in many relevant cases.

#Would it be potentially better if that was Object's implementation?

Which was clarified to:

#Better in the sense that you would never HAVE to override hashCode.

For which the answer is also YES. Per the previous.

But with the same performance note. And a big sigh because it
seems to want to broaden bad performance from a single class
to the entire programming style (multiple classes).

Arne
 
L

Lew

Arne said:
Two wrong statements.

It says that the method is defined to support HashMap

And HashMap does not guarantee O(1) with a correct
hashCode - it guarantee that for one that return
good distributed values.

Granted.
 
J

Jan Burse

Jan said:
Maybe it would make sense to spell out what the contract
for hashCode() is.

Those who are interested can read the original text.
It is found here:

http://docs.oracle.com/javase/7/docs/api/java/lang/Object.html#hashCode()

---------------------------------------------------------------

"If two objects are equal according to the equals(Object)
method, then calling the hashCode method on each of the
two objects must produce the same integer result."

That is:

/* invariant that should hold */
if a.equals(b) then a.hashCode()==b.hashCode()

----------------------------------------------------------------

"It is not required that if two objects are unequal according
to the equals(java.lang.Object) method, then calling the
hashCode method on each of the two objects must produce
distinct integer results. However, the programmer should be
aware that producing distinct integer results for unequal
objects may improve the performance of hash tables."

That is:

/* not implied and thus not required by the invariant */
if a.hashCode()==b.hashCode() then a.equals(b)

It is also quite unlikely that a hashCode() would satisfy
the later, although the closer it comes to the later, the
better it works for HashMap, etc..

----------------------------------------------------------------

There is no reference to Comparable and the compareTo. If I am not
using HashMap or HashSet in my application, and still override
equals, I do not need to implement hashCode(). I could for example
use TreeMap or TreeSet and implement the Comparable interface. There
is a reference to equals() back from Comparable though.

http://docs.oracle.com/javase/6/docs/api/java/lang/Comparable.html

Bye
 
R

Robert Klemme

It is not broken.

It will perform poorly in many cases.

Well, I would go as far as to say that it will perform poorly in all
cases where hashCode() is actually needed - and that makes it broken.
The whole idea of hashing is based on the fact that the hash code
_somehow_ represents the item to be hashed. If all items have the same
constant hash code there is no point in hashing at all. So while it
does work, it does not work as intended.

Kind regards

robert
 
J

Joshua Cranmer

Better in the sense that you would never HAVE to override hashCode.

Now, there are cases where you HAVE to override it, or your code is very broken.

Returning a constant hash code is correct in the same sense that
answering "yes" to the question "Can you tell me the correct way to do
this?" would be--syntactically and semantically correct, but completely
contrary to the actual intent of the question.

The point of the hash code is to provide a cheap way to quickly
distinguish inputs (in the sense that Pr(a.hashCode() == b.hashCode()
and !a.equals(b)) should be as small as possible [1]). A constant-value
hash completely negates the purpose of the hash code, and this renders
the hashCode again completely unusable for anything that actually wants
to use it.

In the default case, a.hashCode() == b.hashCode() only when a == b (this
definitely holds true with 32-bit machines and I'm pretty sure it still
holds true with 64-bit machines, but I'd have to reverify the JVM source
code to be certain). It is thus correct so long as identity equals is
correct. It is also potentially correct in a limited set of cases where
a.equals(b) and a != b. In all of these cases, it would not only be
correct but also extremely useful, having pretty strong guarantees about
the distribution of hash values.

[1] Actually, for good performance, hash codes should go one step
further and make slightly stronger guarantees about independence with
respect to the size of the hash table. But I digress.
 
J

Jan Burse

Joshua said:
In the default case, a.hashCode() == b.hashCode() only when a == b (this
definitely holds true with 32-bit machines and I'm pretty sure it still
holds true with 64-bit machines, but I'd have to reverify the JVM source
code to be certain).

Not at all. Even not for 32-bit machines. Not only because
the hashCode() usually uses less than 32 bits. But also for
other reasons, namely the internal algorithm how the
hash is produced (although the below bug doesn't reveal
much internals):

http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6321873

But in the bottom of the above bug you can find code that
searches for a clash. It can take quite a while, but
it is not excluded. This is what a sample run produced
according to the bug:

62028120: java.lang.Object@9cab16 - java.lang.Object@9cab16

So the 62028120-th new Object produced the same hashcode.

So
 
E

Eric Sosman

[...]
I think there are two reasonably usable ways of handling this issue. One
is the current arrangement, in which every class has a hashCode that is
expected to be usable for selecting a hash table bucket.

Keeping hashCode as an Object method but making it useless for bucket
selection unless overridden would not be a good alternative.

A more reasonable alternative would be to have hashCode as the only
member of a HashKey interface that would be implemented by every class
whose objects are intended to be suitable for use as has keys. Those
objects that have a hashCode would still have to have a usable one, but
some classes would not implement HashKey and not have a hashCode at all.

Ugh. So if J. Random Programmer is too lazy or unimaginative to
write hashCode(), that means I can't use his class as a HashMap key,
or even put instances in a HashSet? Ugh, again.

(And, no: I don't think a HashCalculator interface along the lines
of Comparable would save the day.)
 
L

Lew

Jan said:
There is no reference to Comparable and the compareTo. If I am not
True.

using HashMap or HashSet in my application, and still override
equals, I do not need to implement hashCode(). I could for example
True.

use TreeMap or TreeSet and implement the Comparable interface. There
is a reference to equals() back from Comparable though.

This requires a detailed knowledge of the implementation of the particular
'Map' or 'Set'. If its searches do not employ hash codes, then you do not need
to implement 'hashCode()' for value-equal types.

In the general case one prefers to underspecify the mechanics, that is, allow
a client of the type to deploy either 'equals()'-based or hash-based
algorithms, by ensuring the methods are consistent with each other per Joshua
Bloch and other notables.

Even for cases where one predicts the use will not require one or another of
the consistency practices, it is harmless to enforce them.

There are four methods a type might use to represent equality or identity and
deviations therefrom: 'equals()' of course, and 'hashCode()', 'compareTo()',
and 'toString()'. There may be external 'Comparator's on that type.

All these methods on or of a type, where they exist, should be consistent,
absent an overwhelming and sound reason not to be.

The default case from 'Object' is that 'equals()' implements ==, 'hashCode()'
returns something sort of like an address of the instance which is nearly
always unique within a given JVM run, 'toString()' returns a decorated string
version of the hash code, and 'compareTo()' doesn't exist.

To express value equality in a type, one must override 'equals()'. The rest
are optional in Arne's strictest and most correct use of that notion. It is
also harmless to keep the rest consistent and nearly always (as in you might
encounter one instance per career otherwise) useful.

One generally chooses 'TreeMap' for its sorting capability, not its reliance
on 'equals()' vs. 'hashCode()' to achieve that. (Assuming it has such a
reliance.) Unless there's a compelling argument to rely on 'TreeMap''s
undocumented non-use of hash codes, why do so?

Yes, I am aware that the 'TreeMap' documentation makes it clear that
'hashCode()' shouldn't be involved. Without promising it isn't. But a
'Comparator' might use hash codes under the hood in a naive attempt to
shortcut a comparison, not knowing that the base type assumes no one would do
such a thing. Or some later client might need only equality and not order, and
throw the base type into a 'HashMap' with surprising results.

Should you design anything that violates the consistency rule, then please do
both Javadoc and code-comment it properly.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,578
Members
45,052
Latest member
LucyCarper

Latest Threads

Top