hashCode

B

bob smith

Is it always technically correct to override the hashCode function like so:

@Override
public int hashCode() {
return 1;
}

Would it be potentially better if that was Object's implementation?
 
A

Arne Vajhøj

Is it always technically correct to override the hashCode function like so:

@Override
public int hashCode() {
return 1;
}

It meets the minimum contractual obligation for that method.

Would it be potentially better if that was Object's implementation?

It has a different behavior that may not be valid if you override equals.

Arne
 
E

Eric Sosman

Is it always technically correct to override the hashCode function like so:

@Override
public int hashCode() {
return 1;
}

Would it be potentially better if that was Object's implementation?

Define "better."
 
M

markspace

It meets the minimum contractual obligation for that method.



It has a different behavior that may not be valid if you override equals.


I think at this point we are doing Bob's homework for him.
 
A

Arne Vajhøj

I think at this point we are doing Bob's homework for him.

Could be.

But I think the question whether returning a constant from
hashCode is a valid Java question for understanding the
contract for that method.

And I am pretty sure that I have seen other similar
examples (just with 42 as constant).

Arne
 
L

Lew

Roedy said:
bob smith wrote, quoted or indirectly quoted someone
who said :


that's about the worst possible hashCode function.

Normally that's correct, but it's conceivable that one might do it for
some hackish reason. In most situations where one might do such
an override as this, one would do better not to override hashCode().
See http://mindprod.com/jgloss/hashcode.html for tips on how to write
good ones.

The default of assembling it via the mix-in of attribute hash codes
using the Knuth constants is usually good enough.

h(object) = Sum(i ∈ 0.. n-1) of ( attribute * pow(31, n - 1 - i) );

or

public static int calculateHash(Foo arg) {
int h = 0;

for ( each attribute of Foo that contributes to 'equals()' )
{
h = 31 * h + attribute.hashCode();
}
return h;
}

http://en.wikipedia.org/wiki/Hash_function
http://en.wikipedia.org/wiki/Java_hashCode()
http://en.wikipedia.org/wiki/Java_hashCode()#Java
 
B

bob smith

Define "better."



--

Eric Sosman

(e-mail address removed)

Better in the sense that you would never HAVE to override hashCode.

Now, there are cases where you HAVE to override it, or your code is very broken.
 
L

Lew

It complies with the contract for 'hashCode()'. Is that all it takes to be correct?

Would what be better if what were Object's implementation of what?
Better in the sense that you would never HAVE to override hashCode.

Now, there are cases where you HAVE to override it, or your code is very broken.

No.

No matter what 'Object''s 'hashCode()' implementation were, it would need to
be overridden when you want value equality instead of object identity for
'equals()'.

See Joshua Bloch's seminal work /Effective Java/, which has items that pertain to this.

Bottom line: 'hashCode()', 'equals()', and when present, 'compareTo()' must be consistent.

'toString()' should be consistent with those.

As long as 'hashCode()' fulfills the contract, your code will work - functionally. But a bad
'hashCode()' could and likely will noticeably affect performance. There is more to correctness
than mere functional conformance.
 
A

Arne Vajhøj

that's about the worst possible hashCode function.

Yes, but the posted asked "Is it always technically correct to ..."
not whether is was "best possible".

Arne
 
A

Arne Vajhøj

Better in the sense that you would never HAVE to override hashCode.

Now, there are cases where you HAVE to override it, or your code is very broken.

It is not broken.

It will perform poorly in many cases.

Arne
 
A

Arne Vajhøj

As long as 'hashCode()' fulfills the contract, your code will work - functionally. But a bad
'hashCode()' could and likely will noticeably affect performance. There is more to correctness
than mere functional conformance.

If the code per specs is guaranteed to work then it is correct.

Good (or just decent) performance is not necessary for code to
be correct.

At least not in the traditional programming terminology.

In plain English maybe.

Arne
 
R

Roedy Green

h =3D 31 * h + attribute.hashCode();
}
In my essay I recommend XOR which is an inherentely faster operation
than multiply. I wonder which actually works out better. If you had a
large number of fields, the multiply effect could fall off the left
hand end. It is the algorithm used for String which could have very
long strings, so Sun must have thought of that.
 
E

Eric Sosman

On 8/10/2012 6:22 PM, bob smith wrote:
[... many blank lines removed for legibility's sake ...]
Better in the sense that you would never HAVE to override hashCode.

Now, there are cases where you HAVE to override it, or your code is very broken.

I cannot think of a case where you HAVE to override hashCode(),
except as a consequence of other choices that you didn't HAVE to
make. You don't HAVE to invent classes where distinct instances
are considered equal, and even if you do you don't HAVE to put those
instances in HashMaps or HashSets or whatever.

But that's a bit specious: All it says is that you don't HAVE
to override hashCode() because you don't HAVE to use things that
call it. It's like "You don't HAVE to pay taxes, because you don't
HAVE to live outside prison." So, let's take it as a given that
you will often need to write classes that override equals() and
hashCode() -- I imagine you understand that they go together.

Okay: Then returning a constant 1 (or 42 or 0 or whatever)
would in fact satisfy the letter of the law regarding hashCode():
Whenever x.equals(y) is true, x.hashCode() == y.hashCode(). In
your example this would be trivially true because x,y,z,... all
have the same hashCode() value, whether they're equal or not --
You have lived up to the letter of the law.

Of course, such a hashCode() would make all those hash-based
containers pretty much useless: They would work in the sense that
they would get the Right Answer, but they'd be abominably slow,
with expected performance of O(N) instead of O(1). See
<http://www.cs.rice.edu/~scrosby/hash/CrosbyWallach_UsenixSec2003/>
for a survey of some denial-of-service attacks that work by driving
hash tables from O(1) to O(N), resulting in catastrophic failure
of the attacked system.

In other words, the letter of the law on hashCode() is a bare
minimum that guarantees correct functioning, but it is not enough
to guarantee usability. Why isn't the law more specific? Because
nobody knows how to write "hashCode() must be correct *and* usable"
in terms that would cover all the classes all the Java programmers
have dreamed up and will dream up. Your hashCode() meets the bare
minimum requirement, but is not "usable." The actual hashCode()
provided by Object also meets the bare minimum requirement, and *is*
usable as it stands, until (and unless; you don't HAVE to) you
choose to implement other equals() semantics, and a hashCode() to
match them.
 
E

Eric Sosman

Yes, but the posted asked "Is it always technically correct to ..."
not whether is was "best possible".

He also asked whether it would "be potentially better."
 
J

Jan Burse

bob said:
Is it always technically correct to override the hashCode function like so:

@Override
public int hashCode() {
return 1;
}

Would it be potentially better if that was Object's implementation?

Maybe it would make sense to spell out what the contract
for hashCode() is. Well the contract is simply, the
following invariant should hold:

/* invariant that should hold */
if a.equals(b) then a.hashCode()==b.hashCode()

It should be noted that this does not imply:

/* not implied and thus not required by the invariant */
if a.hashCode()==b.hashCode() then a.equals(b)

It is also quite unlikely that a hashCode() would satisfy
the later, although the closer it comes to the later, the
better it works for HashMap, etc..

The default objects implementation of hashCode() matches
the default objects impementation of equals(). The default
objcts implementation of equals() is ==. And the default
objects implementation of hashCode() is
System.identityHashCode().

The System identity hash code is stored in the object
and generated by the system. It does not change during
GC although the internal object address might change
during GC. It is only 32bit although internal object
addresses might by 64bit with a corresponding JVM.

Returning a constant, any constant c not only 1, would be
technically correct correct for the default implementation
of the class object. Since it trivially satisfies the
invariant:

if a.equals(b) then c==c

is trivially true, since c==c is true. But it is not
better. Since you would get very degenerated HashMaps,
etc..

You need to override the hashhCode() when there is danger
that the invariant is not anymore satisified. This is
not the case when equals() is not overridden. So overriding
hashCode() just for fun when equals() is not overriden,
usually doesn't make sense. It will probably only slow
down the hashCode() calculation. So the following:

hashCode() = sum attr_i * c^i

Is not necessary. But it would be a possible way to go
when equals() were overriden in the following way:

equals(other) = and_i attr_i.equals(other.attr_i)

The above happens when you turn your object into a container
of other objects irrespective of the own object identity.
But beware if the container contains itself somewhere. This
is why we find in the code for Hashtable the following
complication:

public synchronized int hashCode() {
/*
* This code detects the recursion caused by computing the hash code
* of a self-referential hash table and prevents the stack overflow
* that would otherwise result. This allows certain 1.1-era
* applets with self-referential hash tables to work. This code
* abuses the loadFactor field to do double-duty as a hashCode
* in progress flag, so as not to worsen the space performance.
* A negative load factor indicates that hash code computation is
* in progress.
*/

Interestingly it will return a constant for the object when
it detects a loop. Maybe one could do better... Dunno

Bye
 
J

Jan Burse

Jan said:
during GC. It is only 32bit although internal object
addresses might by 64bit with a corresponding JVM.

Typically even less bits, since the same space
is used for some object flags.
 
A

Arne Vajhøj

He also asked whether it would "be potentially better."

"better to use Object hashCode" which again should bring the
correctness question before the performance question.

Arne
 
J

Joerg Meier

In my essay I recommend XOR which is an inherentely faster operation
than multiply.

Hasn't that been wrong since about the invention of the 80386 processor
family ? Pretty sure by now MUL and XOR both take one cycle and that's it.


Liebe Gruesse,
Joerg
 
J

Jan Burse

Roedy said:
If you had a
large number of fields, the multiply effect could fall off the left
hand end.

Actually this does not happen, since you multiply with 31,
which is 1+2+4+8+16. So that:

a*31+b = a*16+a*8+a*4+a*2+a+b

So for a HashMap that uses an index = hash & (2^n - 1) (which is
the same as hash mod 2^n), the impact of a will be still seen,
even when it occurs at the very left hand side.

There is some Microsoft C# HashMap implementation which does
not use mod 2^n, but instead some primes. In case the
implementation choses 31 as the designated prime, all
information but for the first field will be lost. But since
mod 2^32 is also applied, this might not be completely true.

For 2^n I don't know exactly how the impact could be
described. I guess in a HashMap with index = hash mod 2^1 the
hash amounts to a parity bit, since the sum in a+b acts like
an xor on the first right hand bit. But 2^n with n>1 the
31 multiplication is a little more crude.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,743
Messages
2,569,478
Members
44,899
Latest member
RodneyMcAu

Latest Threads

Top