HashMaps, hashcodes, equals, and Serialization

S

Siam

Hi all,

As part of my application, I have an ArticleManager class, that
maintains a HashMap of Article objects (which have an overloaded equals
method - but not an overloaded hashcode). When my application is
closed, the ArticleManager serializes itself and all the Articles.
Questions:

1) How compulsory is it to overload the hashcode method, having
overloaded the equals method. If I don't, will this mean my HashMap
won't function correctly - i.e. if "Article ar1" exists in the hashmap,
and I call get(ar2) on the hashmap, such that ar1.equals(ar2), would it
not return me ar1, since their hashcodes are difference? Does the
HashMap code ever test for equality itself when putting and getting
objects, or does it simply rely on the contracted relationship between
hashcode() and equals()? How would I go about writing an effective
custom hashcode function?

2) The contract for hashcode, as written in the API, states "This
integer [the hashcode] need not remain consistent from one execution of
an application to another execution of the same application." Will this
cause complications for when my articles and the hashmap are
serialized, and later deserialized on another execution of my
application? In that, surely if the hashcode is used in determining
where in the table a certain Article object is placed, then if the
hashcode for the Article (when deserialized) changes on another
execution of the application, would the hashmap still be able to find
the Article in its previous place? Or does
serialization/deserialization maintain the hashcode of the object?

Cheers :)

Siam
 
L

Lew

Siam said:
Hi all,

As part of my application, I have an ArticleManager class, that
maintains a HashMap of Article objects (which have an overloaded equals
method - but not an overloaded hashcode). When my application is
closed, the ArticleManager serializes itself and all the Articles.
Questions:

1) How compulsory is it to overload the hashcode method, having
overloaded the equals method.

It is very compulsory, except when you have a distinct reason to make them
inconsistent. Even then it's compulsory, you just might have a reason to
disregard the rule.

No "Hash.+" anything will work correctly if hashCode() and equals() are
inconsistent. That is why they have "Hash" in their names!
2) The contract for hashcode, as written in the API, states "This
integer [the hashcode] need not remain consistent from one execution of
an application to another execution of the same application." Will this
cause complications for when my articles and the hashmap are
serialized, and later deserialized on another execution of my
application?

Your serialization format should not in any way depend on the hash. If it
does, then you have not written your readObject)() and writeObject() methods
correctly.

Read Joshua Bloch's _Effective Java Programming_, which has chapters covering
both these questions in detail.

- Lew
 
D

Daniel Pitts

Siam said:
Hi all,

As part of my application, I have an ArticleManager class, that
maintains a HashMap of Article objects (which have an overloaded equals
method - but not an overloaded hashcode). When my application is
closed, the ArticleManager serializes itself and all the Articles.
Questions:

1) How compulsory is it to overload the hashcode method, having
overloaded the equals method. If I don't, will this mean my HashMap
won't function correctly - i.e. if "Article ar1" exists in the hashmap,
and I call get(ar2) on the hashmap, such that ar1.equals(ar2), would it
not return me ar1, since their hashcodes are difference? Does the
HashMap code ever test for equality itself when putting and getting
objects, or does it simply rely on the contracted relationship between
hashcode() and equals()? How would I go about writing an effective
custom hashcode function?

2) The contract for hashcode, as written in the API, states "This
integer [the hashcode] need not remain consistent from one execution of
an application to another execution of the same application." Will this
cause complications for when my articles and the hashmap are
serialized, and later deserialized on another execution of my
application? In that, surely if the hashcode is used in determining
where in the table a certain Article object is placed, then if the
hashcode for the Article (when deserialized) changes on another
execution of the application, would the hashmap still be able to find
the Article in its previous place? Or does
serialization/deserialization maintain the hashcode of the object?

Cheers :)

Siam

It is always best practice that you override equals and hashCode
together, or not at all.

Basically, its a contract. You, as the implementor of your class, are
required to make hashCode and equals work consistently with eachother.

Anything that you use to calculate the hashCode must be represented in
the equals comparison. Note that it doesn't need to be the other way
around, however.

Hope this helps.
Daniel.
 
M

Mike Schilling

?
2) The contract for hashcode, as written in the API, states "This
integer [the hashcode] need not remain consistent from one execution of
an application to another execution of the same application." Will this
cause complications for when my articles and the hashmap are
serialized, and later deserialized on another execution of my
application? In that, surely if the hashcode is used in determining
where in the table a certain Article object is placed, then if the
hashcode for the Article (when deserialized) changes on another
execution of the application, would the hashmap still be able to find
the Article in its previous place? Or does
serialization/deserialization maintain the hashcode of the object?

During deserialization, the keys are rehashed and the hash chains rebuilt
from scratch, so the new hash table is correct even though the hash values
have changed.
 
E

Eric Sosman

Siam said:
Hi all,

As part of my application, I have an ArticleManager class, that
maintains a HashMap of Article objects (which have an overloaded equals
method - but not an overloaded hashcode). When my application is
closed, the ArticleManager serializes itself and all the Articles.
Questions:

1) How compulsory is it to overload the hashcode method, having
overloaded the equals method.

Completely optional. If you had over*ridden* the equals method
that would otherwise have been inherited, that would have been
different: It would then have been mandatory to over*ride* hashCode
as well.

Overload = Same method name (or constructor) but different
argument list.

Override = Same method name and same argument list, but
different implementation. I'm going to assume this is what you
actually meant.
If I don't, will this mean my HashMap
won't function correctly - i.e. if "Article ar1" exists in the hashmap,
and I call get(ar2) on the hashmap, such that ar1.equals(ar2), would it
not return me ar1, since their hashcodes are difference?

There is no telling what might happen. With high probability
the HashMap will fail to find anything for ar2; there is a very
tiny probability that it might find whatever it would have found
for ar1. (This happens if, by some outrageous coincidence, ar1
and ar2 have the same hashCode -- and since the JVM actively tries
to prevent this from happening, chances are against you.)
Does the
HashMap code ever test for equality itself when putting and getting
objects, or does it simply rely on the contracted relationship between
hashcode() and equals()?

Strictly speaking, you're not supposed to know or care how the
internals of HashMap work; that's part of Java's raison d'etre. All
you're supposed to care about is that if you stick to your side of
the bargain, HashMap will fulfill its obligations. Part of your
bargain with HashMap is that any pair of items for which equals is
true must have identical hashCode values: If you don't ensure that
this is so, you have not kept your side of the bargain and HashMap
is not obliged to behave as desired.

In actuality, HashMap first uses the key's hashCode to locate
a "bucket" -- a linked list internal to the HashMap -- where the
key/value pair resides if it's in the map at all. Then it walks
through the list looking for an existing entry with a key that has
the same hashCode as the one you're looking for and for which
equals returns true. If it finds such a key, hooray! Otherwise,
the HashMap concludes that the key isn't in the map. And that's
why hashCode and equals must agree: if two equals objects have
different hashCodes, HashMap's search will come up empty.
How would I go about writing an effective
custom hashcode function?

Compute something (repeatably) from some subset of the things
equals checks. Here is a valid but not very good version:

public int hashCode() { return 42; }

.... where the "subset" is empty. Note that this fulfills the bargain
with HashMap: any two equal objects have equal hashCodes, because all
objects have the same hashCode. With this hashCode, HashMap will
work correctly -- slowly, but correctly.

A better hashCode tries to "spread out" the computed values to
decrease the chance that a pair of unequal objects will have the
same hashCode. Start by considering the tests that equals does:
the items equals checks are all candidates for inclusion in the
hashCode computation. So, for example,

class ArticleManager {
private int xpos;
private int ypos;
private String text;
// ...
public boolean equals(Object obj) {
if (! (obj instanceof ArticleManager))
return false;
ArticleManager that = (ArticleManager)obj;
return this.xpos == that.xpos
&& this.ypos == that.ypos
&& this.text.equals(that.text);
}
public int hashCode() {
return text.hashCode();
}
}

would work. Things will usually work a little better if you
incorporate more of the equals-relevant fields and "stir" them
unsystematically into the mix:

public int hashCode() {
int hash = 9882345; // arbitrary value
hash = hash * 509 + xpos; // primes are traditional
hash = hash * 503 + ypos;
hash = hash * 499 + text.hashCode();
return hash;
}
2) The contract for hashcode, as written in the API, states "This
integer [the hashcode] need not remain consistent from one execution of
an application to another execution of the same application." Will this
cause complications for when my articles and the hashmap are
serialized, and later deserialized on another execution of my
application? In that, surely if the hashcode is used in determining
where in the table a certain Article object is placed, then if the
hashcode for the Article (when deserialized) changes on another
execution of the application, would the hashmap still be able to find
the Article in its previous place? Or does
serialization/deserialization maintain the hashcode of the object?

If the hashCode computation is "expensive," you might arrange
to compute it just once and store the result in a private field of
the object, either when the object is constructed or the first time
hashCode is called. If you serialize and deserialize that field
there could be trouble, because the contributing hashCodes of some
of the referenced objects might be different and the deserialized
value might not match what you'd get if you computed it anew. But
if you compute the hashCode fresh each time, or if you arrange to
recompute it upon deserialization, all should be well.

As for the HashMap, the folks who wrote it were aware that the
hashCodes of the keys it holds might change on deserialization, so
they wrote HashMap's own deserialization code to allow for that
fact. HashMap essentially "reloads" itself on deserialization,
storing each key/value pair according to the "new" hashCode values.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,768
Messages
2,569,574
Members
45,049
Latest member
Allen00Reed

Latest Threads

Top