hashCode for 4 ints?

J

Jeff

Am I misusing HashMap or just need to fix the .hashCode() override for the
key object?

The class I use as a key for a HashMap is unique based on a combination of 4
ints. Normally, about 5000 instances are created and persist over time.
The maximum possible created is probably max around 40,000.

Since .hashCode() returns an int, how do I create a unique identifier for
the 128 bits contained in the 4 ints? Or should I swap to another data
structure?

Thanks!

// Here's a simplification of my class that illustrates my problem.

public class IntHashExample {
int A, B, C, D;

IntHashExample( int A , int B , int C , int D ) {
this.A = A;
this.B = B;
this.C = C;
this.D = D;
}

public boolean equals( Object arg0 ) {

IntHashExample other = ( IntHashExample ) arg0;

if ( ( A == other.A ) && ( B == other.B ) && ( C == other.C ) && ( D
== other.D ) )
return true;
else
return false;
}

// how do I create a hashcode to lookup the unique combination of the 4
ints? The following obviously wouldn't work,
// but hopefully indicates my need
public int hashCode() {

long highest = ( long ) A << 96;

long high = ( long ) B << 64;

long low = ( long ) C << 32;

long lowest = ( long ) D;

int code = ( int ) ( highest | high | low | lowest );

return code;
}
}
 
P

Patricia Shanahan

Jeff wrote:
....
Since .hashCode() returns an int, how do I create a unique identifier for
the 128 bits contained in the 4 ints? Or should I swap to another data
structure?
....

Why do you need the hash code to be a unique identifier?

Patricia
 
T

Thomas Weidenfeller

Jeff said:
The class I use as a key for a HashMap is unique based on a combination of 4
ints. Normally, about 5000 instances are created and persist over time.
The maximum possible created is probably max around 40,000.

Since .hashCode() returns an int, how do I create a unique identifier for
the 128 bits contained in the 4 ints? Or should I swap to another data
structure?

You don't. Hashes don't have to be unique. It is much better if they
are, but they can't under all conditions. That's why you also need to
have equals(). The rule is that hashCode() must return the same hash for
objects which are equal (and of course of the object is the same), but
is allowed to return the same hash also for unequal objects.


An extremely inefficient hash, but still an allowed implementation of
hashCode() would be:

public int hashCode() { return 0; }

NO, DON'T DO IT - this is just an example.

So what is a good hash? Entire computer science books can be filled with
research about this. I would suggest you start with a simple one, and
measure if it is ok for your task. Or, you could try stuff like the
following:

Warning:

- This was a quick hack, and I didn't verify if I got the algorithm even
remotely right.

- I have also no clue if the algorithm is suitable for your numbers

- It burns a lot more CPU cycles than a simper hash (like just xoring
all integers), and might not be worth the trouble at all


static final int FNV1_INIT = 0x811c9dc5;
static final int FNV1_PRIME = 0x01000193;

protected int hashFnvInt(int v, int hash) {
for(int i = 0; i < 4; i++) {

// equivalent to hash *= FNV1_PRIME
// might generate a lot of VM instructions
// and not worth the saving of the
// multiplication
hash += (hash << 1)
+ (hash << 4)
+ (hash << 7)
+ (hash << 8)
+ (hash << 24);
hash ^= v & 0xFF;
v >>= 8;
}
return hash;
}

public int hashCode() {
return hashFnvInt(D,
hashFnvInt(C,
hashFnvInt(B,
hashFnvInt(A, FNV1_INIT))));
}


Googel for FNV (Fowler, Noll, Vo) for details.

/Thomas
 
H

huddler

Patricia said:
Jeff wrote:
...
...

Why do you need the hash code to be a unique identifier?

<Warning: Java Newbie>

If the hash is used as the key in a HashMap, then non-unique hashes
would create a side effect to put() where an existing Object in the
HashMap is replaced with the new Object. He might not want that to
happen.
 
P

Patricia Shanahan

huddler said:
<Warning: Java Newbie>

If the hash is used as the key in a HashMap, then non-unique hashes
would create a side effect to put() where an existing Object in the
HashMap is replaced with the new Object. He might not want that to
happen.

The "key" is the object itself, not its hash. Replacement happens if,
and only if, the two objects are equal.

The hash is used for performance purposes, to reduce the number of
objects that HashMap has to look at when testing for equality.

Here's an example program, using the unrecommented, very poor
performance, but functionally correct trivial hashCode method:

import java.util.HashMap;
import java.util.Map;

public class TrivialHash {
String name;

public TrivialHash(String name) {
this.name = name;
}

public String toString() {
return name;
}

public boolean equals(Object o) {
return (o instanceof TrivialHash && name.equals(((TrivialHash)
o).name));
}

public int hashCode() {
return 0;
}

public static void main(String[] args) {
Map map = new HashMap();
map.put(new TrivialHash("A"), "X");
map.put(new TrivialHash("B"), "Y");
System.out.println(new TrivialHash("A").hashCode());
System.out.println(new TrivialHash("B").hashCode());
System.out.println(map.get(new TrivialHash("A")));
System.out.println(map.get(new TrivialHash("B")));
}
}

This program prints:

0
0
X
Y

demonstrating that equal hash codes do not prevent two key values from
being in the HashMap at the same time.

Patricia
 
H

huddler

Well, as a Java newbie, I'm in a constant state of confusion. In the
1.4.2 doc for the HashMap put() method, it says:

<quote>
public Object put(Object key,
Object value)

Associates the specified value with the specified key in this map. If
the map previously contained a mapping for this key, the old value is
replaced.
</quote>

So could I trouble you to explain (or point to an explanation) of why
your example and the doc SEEM to contradict each other?

http://java.sun.com/j2se/1.4.2/docs....html#put(java.lang.Object, java.lang.Object)

I await humiliation.
 
P

Patricia Shanahan

huddler said:
Well, as a Java newbie, I'm in a constant state of confusion. In the
1.4.2 doc for the HashMap put() method, it says:

<quote>
public Object put(Object key,
Object value)

Associates the specified value with the specified key in this map. If
the map previously contained a mapping for this key, the old value is
replaced.
</quote>

So could I trouble you to explain (or point to an explanation) of why
your example and the doc SEEM to contradict each other?

http://java.sun.com/j2se/1.4.2/docs....html#put(java.lang.Object, java.lang.Object)

I await humiliation.

I aim for enlightenment, rather than humiliation. Of course, I may miss
sometimes.

I think you may be misinterpreting the word "key" in the HashMap
documentation. It is also possible that the OP shares your confusion.

Look at the argument list. The key is the object, not its hash code.

In my example program, a TrivialHash object is equal to other
TrivialHash objects with the same name, and nothing else. The first call
uses as key a TrivialHash object with name "A". The second call uses a
TrivialHash object with name "B", which is unequal. As far as HashMap is
concerned, when I do the second map.put call, the map does not contain a
mapping for the key, so it does not need to replace an old value.

HashMap, like most hash tables, assumes that equal keys have equal hash
codes, but that some unequal keys may share the same hash code. That is
all HashMap is allowed to assume, because of the hashCode definition in
the Object javadocs: "If two objects are equal according to the
equals(Object) method, then calling the hashCode method on each of the
two objects must produce the same integer result."

Essentially, a hash table splits the keys into subsets, called buckets,
using the hash code. If two keys are equal, they go on the same bucket,
so a lookup only has to examine the keys in one bucket. However, two
keys in the same bucket, even if they have exactly the same hash, may
not be equal. The only way HashMap can be sure two Java objects are
equal is to get a true result from o1.equals(o2).

The behavior you seem to be expecting, using the hash code as key and
ignoring the object's equals method, does match a real but specialized
technique. A "perfect hash" ensures unequal keys have unequal hash
codes. It shifts some complexity from the lookup process to the
construction of the hash. It may be used, for example, for short lists
of keywords that do not change. Do a search for "perfect hash" if you
want to learn more.

The Java hashCode function is NOT required to be a perfect hash, and
HashMap does not assume it is one.

Patricia
 
H

huddler

Cripes, I hate when I'm this stupid in public.

Thanks for the discussion, Patricia.
 
G

George Cherry

Patricia Shanahan said:
The "key" is the object itself, not its hash. Replacement happens if,
and only if, the two objects are equal.

The hash is used for performance purposes, to reduce the number of
objects that HashMap has to look at when testing for equality.

Here's an example program, using the unrecommented, very poor
performance, but functionally correct trivial hashCode method:

import java.util.HashMap;
import java.util.Map;

public class TrivialHash {
String name;

public TrivialHash(String name) {
this.name = name;
}

public String toString() {
return name;
}

public boolean equals(Object o) {
return (o instanceof TrivialHash && name.equals(((TrivialHash)
o).name));
}

public int hashCode() {
return 0;
}

public static void main(String[] args) {
Map map = new HashMap();
map.put(new TrivialHash("A"), "X");
map.put(new TrivialHash("B"), "Y");
System.out.println(new TrivialHash("A").hashCode());
System.out.println(new TrivialHash("B").hashCode());
System.out.println(map.get(new TrivialHash("A")));
System.out.println(map.get(new TrivialHash("B")));
}
}

This program prints:

0
0
X
Y

demonstrating that equal hash codes do not prevent two key values from
being in the HashMap at the same time.

Patricia

Excellent example, Patricia. In the spirit of your
example, here is another one:

/*
* This program illustrates that if equals() is appropriately
* defined, then even a mediocre hashCode() will not prevent a
* hash collection from working correctly (although a mediocre
* hashCode() will defeat the performance purpose of a hash
* collection.
*/

import java.util.HashSet;
import java.util.Set;

public class SillyHashCodeTester {
private String name;
private int id;

public SillyHashCodeTester(String name, int id) {
this.name = name;
this.id = id;
}

public String toString() {
return name + ", " + id;
}

public boolean equals(Object o) {
return (
o instanceof SillyHashCodeTester
&& this.name.equals( ((SillyHashCodeTester)o).name )
&& this.id == ((SillyHashCodeTester)o).id
);
}

//A: Define a silly hashCode():
public int hashCode() {
return 42;
}

public static void main(String[] args) {
Set<SillyHashCodeTester> hashSet;
hashSet = new HashSet<SillyHashCodeTester>();

System.out.println( "Set size: " + hashSet.size() );
System.out.println("Add an object to the set.");
hashSet.add( new SillyHashCodeTester("abc", 100) );
System.out.println( "Set size: " + hashSet.size() + "\n" );

System.out.println("Add a non-duplicate object to the set.");
hashSet.add( new SillyHashCodeTester("abc", 200) );
System.out.println( "Set size: " + hashSet.size() + "\n" );

System.out.println("Try to add a duplicate object to the set.");
hashSet.add( new SillyHashCodeTester("abc", 100) );
System.out.println( "Set size: " + hashSet.size() + "\n" );

System.out.println("Try to add a duplicate object to the set.");
hashSet.add( new SillyHashCodeTester("abc", 200) );
System.out.println( "Set size: " + hashSet.size() + "\n" );

System.out.println("Add a non-duplicate object to the set.");
hashSet.add( new SillyHashCodeTester("xyz", 100) );
System.out.println( "Set size: " + hashSet.size() + "\n" );
}
}

/*
This program should print:

Set size: 0
Add an object to the set.
Set size: 1

Add a non-duplicate object to the set.
Set size: 2

Try to add a duplicate object to the set.
Set size: 2

Try to add a duplicate object to the set.
Set size: 2

Add a non-duplicate object to the set.
Set size: 3
*/

George
 
G

George Cherry

Patricia Shanahan said:
I aim for enlightenment, rather than humiliation. Of course, I may miss
sometimes.

I think you may be misinterpreting the word "key" in the HashMap
documentation. It is also possible that the OP shares your confusion.

Look at the argument list. The key is the object, not its hash code.

In my example program, a TrivialHash object is equal to other
TrivialHash objects with the same name, and nothing else. The first call
uses as key a TrivialHash object with name "A". The second call uses a
TrivialHash object with name "B", which is unequal. As far as HashMap is
concerned, when I do the second map.put call, the map does not contain a
mapping for the key, so it does not need to replace an old value.

HashMap, like most hash tables, assumes that equal keys have equal hash
codes, but that some unequal keys may share the same hash code.

Yes, part of the contract of a hashCode() is

1. Equal objects have equal hash codes.
2. Unequal objects MAY have the same hash code.
(Of course, IDEALLY they don't have the same hash code.)
That is
all HashMap is allowed to assume, because of the hashCode definition in
the Object javadocs: "If two objects are equal according to the
equals(Object) method, then calling the hashCode method on each of the
two objects must produce the same integer result."

Essentially, a hash table splits the keys into subsets, called buckets,
using the hash code. If two keys are equal, they go on the same bucket,
so a lookup only has to examine the keys in one bucket.

The bucket is often the head of a linked list, I think.
However, two
keys in the same bucket, even if they have exactly the same hash, may
not be equal. The only way HashMap can be sure two Java objects are
equal is to get a true result from o1.equals(o2).

Yes, o1.equals(o2) implies o1.hashCode() == o2.hashCode()
BUT o1.hashCode() == o2.hashCode() does not imply o1.equals(o2)
The behavior you seem to be expecting, using the hash code as key and
ignoring the object's equals method, does match a real but specialized
technique. A "perfect hash"

In this world there is no perfection, no permanence, and no self. -- Buddha
ensures unequal keys have unequal hash
codes. It shifts some complexity from the lookup process to the
construction of the hash. It may be used, for example, for short lists
of keywords that do not change. Do a search for "perfect hash" if you
want to learn more.

The Java hashCode function is NOT required to be a perfect hash, and
HashMap does not assume it is one.

Patricia

Great follow-up to your excellent example.

George
 
G

George Cherry

huddler said:
Cripes, I hate when I'm this stupid in public.

Thanks for the discussion, Patricia.

I beg to differ with you, huddler.
You are NOT stupid--the stupid
persist in their ignorance.

George
 
T

Thomas Weidenfeller

Roedy said:

Just XORing the integers can give very bad hash-codes. In the not to
unlikely case where all the involved integers are rather small
positive(!) numbers the hash is not well spread at all over the entire
possible range.

Small positive integers have many of their higher bits set to zero.
XORing zeros gives zeros again. A hash calculated from small positive
integers vi XORing can never exceed

2^(floor(log2(max(i1, i2, ..., in))) + 1) - 1

But even for positive integers which are not so small any more you end
up with bad hashes. E.g. if your largest integer is 10000, you would
still never get a hash code which is larger than 2^14 -1 = 16283.

/Thomas
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top