hash codes

rmacnak · Jul 26, 2006

What propose/benefit does Object.hashCode() serve? Why would I want to
use this?

VisionSet · Jul 26, 2006

What propose/benefit does Object.hashCode() serve? Why would I want to
use this?

It is used to perform fast hashing ie equality checks of objects.
the equals() method is often slow to evaluate, hashcode should provide a
quickly evaluated value such that 2 equal objects always produce the same
hashcode. If 2 equal hashcodes do not however indicate an object is equal,
but rather if this is the case the calling code should also then call
equals(). This reduces hugely the number of times in total equals() is
called on say a large collection of objects.

cp · Jul 26, 2006

The purpose of computing a hash code is that the hash should be quicker to
calculate and compare than computing full object equality. eg.
someObject.equals(otherObejct);

rmacnak · Jul 26, 2006

VisionSet said:
It is used to perform fast hashing ie equality checks of objects.
the equals() method is often slow to evaluate, hashcode should provide a
quickly evaluated value such that 2 equal objects always produce the same
hashcode. If 2 equal hashcodes do not however indicate an object is equal,
but rather if this is the case the calling code should also then call
equals(). This reduces hugely the number of times in total equals() is
called on say a large collection of objects.

Thanks alot, but are there any other purposes besides faster equals()?
I think I heard something about checksums and hashs.

Mark Space · Jul 26, 2006

Thanks alot, but are there any other purposes besides faster equals()?
I think I heard something about checksums and hashs.

Besides the obvious (using hashcodes in a hashmap to lookup/search
faster), there's Bloom Filters. Bloom Filters speed computation of a
true/false value when the computation is very expensive.

http://www.beachsoftware.com/bloom/white_paper.html

Simon · Jul 27, 2006

VisionSet said:
It is used to perform fast hashing ie equality checks of objects.
the equals() method is often slow to evaluate, hashcode should provide a
quickly evaluated value such that 2 equal objects always produce the same
hashcode.

Assume two objects are represented by n and m bits, respectively. Then,
obviously, comparison is in O(min(n,m)). Can you give an example where hash code
computation for both objects is in o(min(n,m))?

What you can do, of course, is cache the hash value. Then, hash value
computation often is only a lookup and then is in O(1). Also, if you want to
compare to a remote object, comparing the hash value first can save
communication cost. However, I don't see how you can save computation time.

If 2 equal hashcodes do not however indicate an object is equal,
but rather if this is the case the calling code should also then call
equals(). This reduces hugely the number of times in total equals() is

I agree. So actually, if hash values are cached, comparing hashValue()s should
be the first thing to do inside the equals() method (as opposed to letting the
user do this each time before he uses the equals). Do you know why
String.equals() doen't do this?

Cheers,
Simon

Chris Uppal · Jul 27, 2006

cp said:
The purpose of computing a hash code is that the hash should be quicker to
calculate and compare than computing full object equality. eg.
someObject.equals(otherObejct);

Not necessarily true, nor is it a requirement. For the hash to be
performance-effective, it only needs to be fast enough that the time saved by
computing it /once/ is greater than the time saved by not having to do /many/
equality comparisons.

-- chris

Patricia Shanahan · Jul 27, 2006

Simon said:
VisionSet schrieb: ....

I agree. So actually, if hash values are cached, comparing hashValue()s should
be the first thing to do inside the equals() method (as opposed to letting the
user do this each time before he uses the equals). Do you know why
String.equals() doen't do this?

The hash values are cached, and are pre-checked before calling equals,
in classes such as HashMap.

I'm not sure that calculating and caching the hash values in String
would be a gain. It would depend on the average number of comparisons
per String, and also the lengths of the matching prefixes.

HashMap needs the hash code anyway, so pre-checking is effectively free.

Patricia

Simon · Jul 27, 2006

Patricia said:
The hash values are cached, and are pre-checked before calling equals,
in classes such as HashMap.

Probably I expressed myself incorrectly. I should have said "*since* hash values
are cached..." I'm not asking why String doesn't cache them, but why String
doesn't use them in its equals() method.

I'm not sure that calculating and caching the hash values in String
would be a gain. It would depend on the average number of comparisons
per String, and also the lengths of the matching prefixes.

Let's forget about the prefix length for a second and assume that the strings
differ only in the last character compared. Then, comparison and hash code
computation need, I believe, almost the same computation time. So you gain
something as soon as a String is involved in more than only a few (I would guess
2) comparisons.

Now, assume the strings differ after a short prefix. Then, as you said, you can
easily lose time by computing hash values instead of starting the comparison and
terminating with false quickly. Now you can do several things:

1 - Use the hash value at least if it was computed by some other reason before
(this is free)

2 - Many Strings of length n are created in linear time anyway (e.g. by reading
them from a file or stream, creating them with a StringBuffer, user input, ...).
So you can afford computing the hash value on the fly or after it has been
created and still have linear time. The only way to create a string of length n
in sublinear time I can think of is by substring() etc. E.g. the constructors
String(char[]), String(char[],int,int), and String(String) could call
hashValue() almost for free since they need linear time anyway.

3 - In String.equals() you could compute the hash value as you go along. If the
Strings are equal, you have computed the hash code for free and you can cache
it. If you terminate earlier, you can throw away your intermediate result. You
could still improve this by completing the hash value computation if the
matching prefix has length at least n/2, say. Ok, that doesn't improve the
situation much for adversarial input, but for typical inputs it could be
competitive. And you can also randomize that, and...

What? You think this is getting clumsy...

Well. One could use at least the first option since this one is essentially free.

Cheers,
Simon

Patricia Shanahan · Jul 27, 2006

Simon said:
Probably I expressed myself incorrectly. I should have said "*since* hash values
are cached..." I'm not asking why String doesn't cache them, but why String
doesn't use them in its equals() method.

You seem to be ignoring the issue of which class caches hash codes when.

String does not cache hash codes. HashMap, for example, does cache the
hash code for each key in the map, and only calculates the probe key
hash code once during a get.

Let's forget about the prefix length for a second and assume that the strings
differ only in the last character compared. Then, comparison and hash code
computation need, I believe, almost the same computation time. So you gain
something as soon as a String is involved in more than only a few (I would guess
2) comparisons.

Now, assume the strings differ after a short prefix. Then, as you said, you can
easily lose time by computing hash values instead of starting the comparison and
terminating with false quickly. Now you can do several things:

1 - Use the hash value at least if it was computed by some other reason before
(this is free)

Checking the hash codes costs three comparisons. The condition to
abandon is that the hashes are different and neither of them is zero,
the value used for a hash that has not yet been calculated.

If either hash code has not been calculated, or the strings being
compared are equal, or differ but have equal hash codes, or differ
somewhere in the first few characters, it is a net loss.

Note that the pre-check behavior of HashMap tends to reduce the
probability of comparisons between strings whose hash codes have already
been calculated. While it does force hash code calculation, it never
calls equals for a pair of objects with different hash codes.

It is possible that it might be a gain, but not obvious. It would
require experiments across a wide range of benchmarks.

2 - Many Strings of length n are created in linear time anyway (e.g. by reading
them from a file or stream, creating them with a StringBuffer, user input, ...).
So you can afford computing the hash value on the fly or after it has been
created and still have linear time. The only way to create a string of length n
in sublinear time I can think of is by substring() etc. E.g. the constructors
String(char[]), String(char[],int,int), and String(String) could call
hashValue() almost for free since they need linear time anyway.

String(char[]) etc. use System.arraycopy. Although it is also linear
time, for any reasonably long string is it much faster than character by
character copying.

3 - In String.equals() you could compute the hash value as you go along. If the
Strings are equal, you have computed the hash code for free and you can cache
it. If you terminate earlier, you can throw away your intermediate result. You
could still improve this by completing the hash value computation if the
matching prefix has length at least n/2, say. Ok, that doesn't improve the
situation much for adversarial input, but for typical inputs it could be
competitive. And you can also randomize that, and...

What? You think this is getting clumsy...

Well. One could use at least the first option since this one is essentially free.

Nope. If it does any good at all, it costs three comparisons.

Patricia

Simon · Jul 27, 2006

Patricia said:
You seem to be ignoring the issue of which class caches hash codes when.

String does not cache hash codes.

From the source of String.java, JDK 1.6.0

/** Cache the hash code for the string */
private int hash; // Default to 0

[...]

public int hashCode() {
int h = hash;
if (h == 0) {
int off = offset;
char val[] = value;
int len = count;

for (int i = 0; i < len; i++) {
h = 31*h + val[off++];
}
hash = h;
}
return h;
}

It is possible that it might be a gain, but not obvious. It would
require experiments across a wide range of benchmarks.
ACK.

Nope. If it does any good at all, it costs three comparisons.

You wrote in an earlier post:

HashMap needs the hash code anyway, so pre-checking is effectively free.

I'm just saying: If String.equals() does have the hash code anyway (by chance),
pre-checking is effectively free. Three comparisons aren't all that much are
they? You need 2 comparisons in any iteration of the comparison loop.

Cheers,
Simon

Thomas Fritsch · Jul 27, 2006

Patricia said:
String does not cache hash codes.

I have looked into Sun's "String.java" source (Java1.4.2),
and as I see it, String does cache its hash code:

<quote>
[...]
/** Cache the hash code for the string */
private int hash = 0;
[...]
public int hashCode() {
int h = hash;
if (h == 0) {
int off = offset;
char val[] = value;
int len = count;

for (int i = 0; i < len; i++) {
h = 31*h + val[off++];
}
hash = h;
}
return h;
}
[...]

HashMap, for example, does cache the
hash code for each key in the map, and only calculates the probe key
hash code once during a get.

Agreed.

Patricia Shanahan · Jul 27, 2006

Simon said:
Patricia said:

You seem to be ignoring the issue of which class caches hash codes when.

String does not cache hash codes.

Click to expand...

From the source of String.java, JDK 1.6.0

/** Cache the hash code for the string */
private int hash; // Default to 0

[...]

Sorry about that, I did check and found it does cache when hashCode is
called, but forgot I had already typed that line.

You wrote in an earlier post:

I'm just saying: If String.equals() does have the hash code anyway (by chance),
pre-checking is effectively free. Three comparisons aren't all that much are
they? You need 2 comparisons in any iteration of the comparison loop.

There are two important differences between HashMap and String on this:

1. HashMap must calculate the hash code in order to find the right
bucket, so it never has to test whether the hash code has been
calculated. It only has to do one compare, not three, to know that two
strings have different hash codes. String may nor may not already have
one or both hash codes, so it has to test whether they exist.

2. HashMap is going to have to incur the cost of a method call, at an
absolute minimum, if it does call equals. String is already sitting in
its own equals method, all ready to start comparing characters.

A lot depends on both the probability of an equals call for two String
instances that have different hash codes and both have been calculated,
and the average number of iterations of the comparison loop, given the
strings are different.

Patricia

Patricia Shanahan · Jul 27, 2006

Simon wrote:
....

I'm just saying: If String.equals() does have the hash code anyway (by chance),
pre-checking is effectively free. Three comparisons aren't all that much are
they? You need 2 comparisons in any iteration of the comparison loop.

....

I should perhaps explain why I think comparisons can be a big deal. The
problem is not the comparison itself, which can be just as fast as an
integer add, but the conditional branch that follows.

One of the nastiest, most disruptive things you can do to a processor is
force it to empty its pipeline and begin again.

The processor has to decide what to load next in the pipeline
immediately after loading the branch, many cycles before it executes and
the processor finds out if it is taken or not taken. The processor has
to guess, and if it guesses wrong all the code in the pipeline following
the branch is junk, and must be thrown away. It will have to wait for
the correct instructions to work their way through before it can do any
more useful work.

To reduce the frequency, processor designs incorporate various attempts
to predict the branch direction for frequently executed branches. Branch
prediction depends on the same sort of issues as caching - essentially,
it is an attempt to predict behavior based on recent history.

However, I don't think the branches involved in testing for hash codes
unequal but both non-zero are going to be particularly predictable, so
there is a good chance that at least one of them will be mispredicted.

Again, this is something that can only really be evaluated by
experiment, but the stakes are far higher than adding a few integer
instructions.

Patricia

Lasse Reichstein Nielsen · Jul 27, 2006

[email protected] said:
What propose/benefit does Object.hashCode() serve? Why would I want to
use this?

On top of the other benefits mentioned, it is also important that
hashCode generates a numeric value from an object. This allows it to
be used in numeric calculations like picking a bucket in HashMap.

You could make a "hash value" from objects that allowed quick
comparisons and was always equal if the objects were equal, but that
was not a number, but that would not sufficient for implementing a
HashMap.

/L

IEEE Xtreme 14.0 coupon codes problem	0	Apr 23, 2022
I am writing a Age of Empires game but it is being played by codes but ı am stuck.	1	Jul 14, 2023
Pixel based analysis of landsat satellite image Using python codes	1	May 1, 2023
Hash search and Interpolation search ?	3	Dec 30, 2022
I made a blockchain and want to make a cryptocurrency, but my code doesn't verify hash of each block	2	Jun 2, 2024
Hello all	1	May 25, 2022
equals and hascode	20	Dec 18, 2010
How to learn C code?	2	Sep 19, 2022

hash codes

rmacnak

VisionSet

cp

rmacnak

Mark Space

Simon

Chris Uppal

Patricia Shanahan

Simon

Patricia Shanahan

Simon

Thomas Fritsch

Patricia Shanahan

Patricia Shanahan

Lasse Reichstein Nielsen

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads