equals and hashCode

T

Tom Dyess

This is gonna be a tough question to put into words, but here goes:

I understand the purpose of overriding equals() in comparing the data of an
object verses the reference, but the requirement to override hashCode() is a
little less clear.

Consider this idiom from this reference:
http://www.geocities.com/technofundo/tech/java/equalhash.html

"Equal objects must produce the same hash code as long as they are equal,
however unequal objects need not produce distinct hash codes."

The simplest way to create an equals and hashCode implementation for an
object would be to make a distinct hash code based on the value of the data
in the object, then in equals(), simply compare the value of hash codes. Why
would I not want to make a distinct hashCode based on the data - i.e.
"however unequal objects need not produce distinct hash codes"

Thanks for any information. I'm researching this because we are using
Hibernate in our project and want to understand how it can apply.
 
S

Sebastian Scheid

Tom Dyess said:
This is gonna be a tough question to put into words, but here goes:

I understand the purpose of overriding equals() in comparing the data of
an object verses the reference, but the requirement to override hashCode()
is a little less clear.

Consider this idiom from this reference:
http://www.geocities.com/technofundo/tech/java/equalhash.html

"Equal objects must produce the same hash code as long as they are equal,
however unequal objects need not produce distinct hash codes."

The simplest way to create an equals and hashCode implementation for an
object would be to make a distinct hash code based on the value of the
data in the object, then in equals(), simply compare the value of hash
codes.

That may not be as simple as it seems to you. Consider two classes A and B
which do not have any semantical interrelation but have similar fields:

class A {
int num1; // significant for equals
String name; // NOT significant

public boolean equals(Object obj) {
if(this == obj)
return true;
if((obj == null) || (obj.getClass() != this.getClass()))
return false;

A a = (A) obj;
return num1 == a.num1;
}

public int hashCode() {
int hash = 7;
hash = 31 * hash + num1;
return hash;
}

}

class B {
int num2; // significant for equals
Date date; // NOT significant

public boolean equals(Object obj) {
if(this == obj)
return true;
if((obj == null) || (obj.getClass() != this.getClass()))
return false;

B b = (B) obj;
return num2 == b.num2;
}

public int hashCode() {
int hash = 7;
hash = 31 * hash + num2;
return hash;
}

}


If instances of A and B have the same value in their num1/num2 field, their
hashcode is the same but they are not equal. How do you want to make sure
the hashCodes are distinct in this case? And this is a pretty obvious case
so you could raise the plea that changing one of the hashCode calculations
could solve the problem. But some other hashCode method could return the
same value with a complete different computation.

And there is an even simpler example:

class C {
int num1;
int num2;

equals...

public int hashCode() {
int hash = 7;
hash = 31 * hash + num1;
hash = 31 * hash + num2;
return hash;
}
}

If num1 and num2 are significant fields, that you involve in the calculation
of hashCode you cannot distinguish between two instances c1 (num1 = 3, num2
= 4) and c2 (num1 = 4, num2 = 3) with the given hashCode method. There might
be a possibility, e.g.

public int hashCode() {
int hash = 7;
hash = 31 * hash + num1;
hash = 32 * hash + num2;
return hash;
}

but I am not sure if this implementation does really calculate distinct
codes. Proofing this could be difficult especially if many fields are
involved in the calculation.

Why would I not want to make a distinct hashCode based on the data - i.e.
"however unequal objects need not produce distinct hash codes"

You cannot guaranty distinct hashCodes over several classes. While it is
expensive to do that for your own classes it is impossible if you add
classes that are not under your control.

Regards
Sebastian
 
G

googmeister

Why would I not want to make a distinct hashCode
based on the data - i.e. "however unequal objects
need not produce distinct hash codes"

It might not always be possible since there are only
2^32 possible hashCode values. So you'd be stuck
if you tried to do this for String values, since there
are many many more than 2^32 possible strings
values.
 
L

Lasse Reichstein Nielsen

Sebastian Scheid said:
class C {
int num1;
int num2;

equals...

public int hashCode() {
int hash = 7;
hash = 31 * hash + num1;
hash = 31 * hash + num2;
return hash;
}
}

If num1 and num2 are significant fields, that you involve in the calculation
of hashCode you cannot distinguish between two instances c1 (num1 = 3, num2
= 4) and c2 (num1 = 4, num2 = 3) with the given hashCode method.

Pedantry: the two examples does give different hash values
(((7*31)+3)*31+4) and (((7*31)+4)*31+3) respectively. However the
point is valid for other values, e.g.,
c1(num1 = 3, num2 = 4)
c2(num1 = 2, num2 = 35)

You cannot guaranty distinct hashCodes over several classes.

You cannot guarantee distinct hashCodes for even one class, since you
can have more than 2^32 different instances of it (given sufficient
RAM :) and hashCode() only returns an int.

/L
 
B

Boudewijn Dijkstra

Sebastian Scheid said:
[...]

class C {
int num1;
int num2;

equals...

public int hashCode() {
int hash = 7;
hash = 31 * hash + num1;
hash = 31 * hash + num2;
return hash;
}
}

If num1 and num2 are significant fields, that you involve in the calculation
of hashCode you cannot distinguish between two instances c1 (num1 = 3, num2
= 4) and c2 (num1 = 4, num2 = 3) with the given hashCode method.

These two particular instances happen to produce different hash codes: 6824
and 6854, respectively. In fact, there seems to be no combination of c1 and
c2 with c1.num1==c2.num2&&c1.num2==c2.num1 where they produce the same hash
code. But they do for arbitrary combinations, e.g. c1(3,66) and c2(5,4) both
produce 6886.
There might be a possibility, e.g.

public int hashCode() {
int hash = 7;
hash = 31 * hash + num1;
hash = 32 * hash + num2;
return hash;
}

but I am not sure if this implementation does really calculate distinct
codes.

No, it doesn't. In fact, if the number of valid combinations between num1 and
num2 exceeds 2^31-1, then there is no way of guarantueeing a distinct hash
code between distinct values.
Proofing this could be difficult especially if many fields are involved in
the calculation.

Not difficult. Just a matter of feeding the right equation into your integer
equation solver.
 
L

Leon

Tom Dyess said:
This is gonna be a tough question to put into words, but here goes:

I understand the purpose of overriding equals() in comparing the data of an
object verses the reference, but the requirement to override hashCode() is a
little less clear.

Consider this idiom from this reference:
http://www.geocities.com/technofundo/tech/java/equalhash.html

"Equal objects must produce the same hash code as long as they are equal,
however unequal objects need not produce distinct hash codes."

The simplest way to create an equals and hashCode implementation for an object
would be to make a distinct hash code based on the value of the data in the
object, then in equals(), simply compare the value of hash codes. Why would I
not want to make a distinct hashCode based on the data - i.e. "however unequal
objects need not produce distinct hash codes"

I disagree. The simplest way is to give the -same- value for every instance.

public int hashCode() {
return 0;
}

Greetings, Leon.
 
W

Wibble

Tom said:
This is gonna be a tough question to put into words, but here goes:

I understand the purpose of overriding equals() in comparing the data of an
object verses the reference, but the requirement to override hashCode() is a
little less clear.

Consider this idiom from this reference:
http://www.geocities.com/technofundo/tech/java/equalhash.html

"Equal objects must produce the same hash code as long as they are equal,
however unequal objects need not produce distinct hash codes."

The simplest way to create an equals and hashCode implementation for an
object would be to make a distinct hash code based on the value of the data
in the object, then in equals(), simply compare the value of hash codes. Why
would I not want to make a distinct hashCode based on the data - i.e.
"however unequal objects need not produce distinct hash codes"

Thanks for any information. I'm researching this because we are using
Hibernate in our project and want to understand how it can apply.
Even when the semantic space is small enough to fit into 32 bits,
computing a perfect hash is often expensive. If its cheap for your
class, then by all means compute it and compare the hash and type for
equalilty. Domains that would meet this criteria would be byte[4]
arrays or java.lang.Integer.
 
T

Tom Dyess

Wibble said:
Tom said:
This is gonna be a tough question to put into words, but here goes:

I understand the purpose of overriding equals() in comparing the data of
an object verses the reference, but the requirement to override
hashCode() is a little less clear.

Consider this idiom from this reference:
http://www.geocities.com/technofundo/tech/java/equalhash.html

"Equal objects must produce the same hash code as long as they are equal,
however unequal objects need not produce distinct hash codes."

The simplest way to create an equals and hashCode implementation for an
object would be to make a distinct hash code based on the value of the
data in the object, then in equals(), simply compare the value of hash
codes. Why would I not want to make a distinct hashCode based on the
data - i.e. "however unequal objects need not produce distinct hash
codes"

Thanks for any information. I'm researching this because we are using
Hibernate in our project and want to understand how it can apply.
Even when the semantic space is small enough to fit into 32 bits,
computing a perfect hash is often expensive. If its cheap for your class,
then by all means compute it and compare the hash and type for equalilty.
Domains that would meet this criteria would be byte[4] arrays or
java.lang.Integer.

Thanks for all the input. I can see where equals and compareTo would come in
handy, but what is a common use for a hashCode? For hash collections? Is it
implemented because of the lack of a common global context in clustering
scenarios (ie Tomcat)?
 
H

Harald

Tom Dyess said:
Thanks for all the input. I can see where equals and compareTo would come in
handy, but what is a common use for a hashCode? For hash collections? Is it

Interesting. I seem to be unable write 10 lines of Java without using
a HashMap, which obviously requires that the keys have a hash code.-)

Harald.
 
L

Leon

Thanks for all the input. I can see where equals and compareTo would come in
handy, but what is a common use for a hashCode? For hash collections? Is it
implemented because of the lack of a common global context in clustering
scenarios (ie Tomcat)?

With off course an efficient and effective hashcode method, using a HashSet
would greatly improve the speed of handling of collections. Here's an example:

/**
* answer whether c1 includes all elements of c2
* that is: for each element of c2 there is an equal object in c1
*/
public boolean includesAll(Collection c1, Collection c2) {

HashSet hashSet = new HashSet(c1);
return hashSet.containsAll(c2);

}

Greetings, Leon.
 
T

Tom Dyess

Harald said:
Interesting. I seem to be unable write 10 lines of Java without using
a HashMap, which obviously requires that the keys have a hash code.-)

Harald.

All of my HashMap usage comes from persistant data from Oralce. I use the
primary key as the hash when performing a put() so I can reference it
quickly by the key. I also use it for key/value tables.

params.put(rs.getString("par_name"), rs.getString("par_value"));

Does the first arguement override the obejct's internal hashCode
implementation?
 
W

Wibble

Tom said:
All of my HashMap usage comes from persistant data from Oralce. I use the
primary key as the hash when performing a put() so I can reference it
quickly by the key. I also use it for key/value tables.

params.put(rs.getString("par_name"), rs.getString("par_value"));

Does the first arguement override the obejct's internal hashCode
implementation?
Think harder Tom, and read carefully.
 
P

Patricia Shanahan

Boudewijn Dijkstra wrote:
....
No, it doesn't. In fact, if the number of valid combinations between num1 and
num2 exceeds 2^31-1, then there is no way of guarantueeing a distinct hash
code between distinct values.
....

Shouldn't that be 2^32, the number of distinct values of an int? A hash
code does not have to be positive.

Patricia
 
L

Lasse Reichstein Nielsen

Boudewijn Dijkstra said:
In fact, there seems to be no combination of c1 and c2 with
c1.num1==c2.num2&&c1.num2==c2.num1 where they produce the same hash
code.

c1.num1 = c1.num2 = c2.num1 = c2.num2 = 0;

(or any other where they are equal, but that's really cheating :)

For it to happen, you need two numbers, a and b, such that
a * 31 + b = b * 31 + a (mod 2^32)
Ignoring overflow, it would require
a * 30 == b * 30,
i.e., a==b.

With overflow, we can also succeede with:
a = Integer.MIN_VALUE;
b = 0;

:)
/L
 
T

Tom Dyess

Wibble said:
Your still ignorant and I have a clue, so I guess you lose.

In my 20 years and 5 languages of programming, I've noticed that the
condescending ones are always making up for what's lacking. Answer me this,
smart guy, why go through the effort of building a never-perfect hash
algorithm to distinctly identify each object that is persistant when I could
use a guaranteed unique primary key from my database like a sequence?
 
L

Lasse Reichstein Nielsen

Tom Dyess said:
why go through the effort of building a never-perfect hash algorithm
to distinctly identify each object that is persistant when I could
use a guaranteed unique primary key from my database like a
sequence?

Because that's not necessarily a hash code. Remember the single
requirement of the hashcode method:
if a.equals(b) then a.hashCode()==b.hashCode().

If your objects has an equals method that can equate two objects with
different unique id's, then the id can't be used as a hash code.

If you just inherit equals() from Object, then there is no reason
to make an intelligent hashCode either. Just use Math.random the
first time hashCode is called, and store if for later calls.

/L
 
T

Tom Dyess

Lasse Reichstein Nielsen said:
Because that's not necessarily a hash code. Remember the single
requirement of the hashcode method:
if a.equals(b) then a.hashCode()==b.hashCode().

If your objects has an equals method that can equate two objects with
different unique id's, then the id can't be used as a hash code.

If you just inherit equals() from Object, then there is no reason
to make an intelligent hashCode either. Just use Math.random the
first time hashCode is called, and store if for later calls.

/L
--
Lasse Reichstein Nielsen - (e-mail address removed)
DHTML Death Colors:
<URL:http://www.infimum.dk/HTML/rasterTriangleDOM.html>
'Faith without judgement merely degrades the spirit divine.'

Lasse,

Thanks for the information. Sounds similar to a COM GUID except can have
logic involved. Pretty neat.

Tom
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,764
Messages
2,569,566
Members
45,041
Latest member
RomeoFarnh

Latest Threads

Top