hashCode() ?

H

hack_tick

I need to find out whether the hashcode() function actually returns
unique value wrt every different object and can it be uswed to identify
a single object in an array of more than 1-million objects?
 
S

Stefan Schulz

I need to find out whether the hashcode() function actually returns
unique value wrt every different object and can it be uswed to identify
a single object in an array of more than 1-million objects?

Most assuredly not. It only guarantees that objects which equals() returns
true for have the same hashcode. The following is perfectly legal:

class HashBreaker {
public int hashCode(){
return 0;
}
}
 
N

NullBock

Short answer: no. Long answer, maybe. ;>

It depends on how you write the object. For instance, consider the
class

class UniqueHash {
static int id = 0;
int hashCode = id++;

public int hashCode() {
return hashCode;
}
}

Objects instantiated from this class are guaranteed to have unique hash
codes, and so the following will be true for all instances of a and b:

UniqueHash a,b;
//...
a.equals(b) == (a.hashCode() == b.hashCode());

And indeed, you could replace the a.equals(b) with a == b for the same
results.

Typically, though, such behavior is undesirable and even dangerous.
hashCode and equals do different things. It sounds like you need not
an array but a HashMap, which isn't significantly slower than an array
anyhow.


Walter Gildersleeve
Freiburg, Germany

______________________________________________________
http://linkfrog.net
URL Shortening
Free and easy, small and green.
 
N

NullBock

Ok, I'm just being anal, but I had to fix my code for multi-threading:

class UniqueHash {
static int id = 0;
int hashCode = nextHash();

private static synchronized int nextHash() {
return id++;
}

public int hashCode() {
return hashCode;
}

}


Walter Gildersleeve
Freiburg, Germany

______________________________________________________
http://linkfrog.net
URL Shortening
Free and easy, small and green.
 
H

hack_tick

I am not able to make out whats the problem in your class, seems
threadsafe, any ideas/suggestions from others?
 
N

NullBock

the command:

hashCode = id++;

consists of two, non-atomic actions. First, the hashCode field is set
to id, then id is incremented is incremented. What happens if one
thread yields in the middle of this operation? hashCode will get a
stale value of id.

id == 1;

thread one
hashCode = id [hashCode == 1]
yields
thread two
hashCode = id [hashCode == 1]
id ++ [id == 2]
yields
thread one
id ++ [id == 3]

Then you would have two instances of UniqueHash with the hashCode 1.

Walter Gildersleeve
Freiburg, Germany

______________________________________________________
http://linkfrog.net
URL Shortening
Free and easy, small and green.
 
Z

zero

I need to find out whether the hashcode() function actually returns
unique value wrt every different object and can it be uswed to identify
a single object in an array of more than 1-million objects?

From the Object JavaDoc:

<quote>
It is not required that if two objects are unequal according to the equals
(java.lang.Object) method, then calling the hashCode method on each of the
two objects must produce distinct integer results.
</quote>

In other words, a hashCode is not unique. A good hashCode implementation
does try its best to be unique, but there is no way to be sure. Just think
about it, conceptually there can be an unlimited number of different
objects. A hashCode however is an integer, which has at most 2^32
different values.

The hashCode inherited from Object is a "best effort" to create a unique
integer. However, for best results you should override hashCode if you
intend to use it to identify objects.
 
C

Chris Uppal

NullBock said:
Ok, I'm just being anal, but I had to fix my code for multi-threading:

Well, if we're nitpicking...

Your nextHash() depends on 32-bit integer arithmetic not wrapping around.
That's a plausible assumption for short-lived applications, but may not be
valid in general. Not a problem with your code, specifically, it's a certain
consquence hashCode() returning an int.

-- chris
 
R

Roedy Green

I need to find out whether the hashcode() function actually returns
unique value wrt every different object and can it be uswed to identify
a single object in an array of more than 1-million objects?

In general it will not. That is not its function. however
Object.hashcode uses the address as its hash, so it is unique.j

You could check by dumping all the hashcodes to a file and using an
external sort then looking for dups.
 
C

Chris Uppal

hack_tick said:
I need to find out whether the hashcode() function actually returns
unique value wrt every different object and can it be uswed to identify
a single object in an array of more than 1-million objects?

It's not defined to be true, so it isn't.

OTOH, if you are asking whether, /in fact/, it might work, then it /may/ be the
case that for a given JVM implementation, and providing you are only using
objects with the default hashCode() (or are using the system identity hash
explicitly), that no two objects that exist at the same time have the same
hash. However that assumption depends on the implementation of the JVM and is
quite likely to be invalid.

Don't do it in production code. Don't even do it in throwaway code unless (for
your purposes) any errors are tolerable and/or detectable.

-- chris
 
H

hack_tick

execuse me for my vary naive knowledge of the Java-language, but isnt
the function nextHash() synchronized?
 
N

NullBock

Exactly--it's the version of the UniqueHash that *is* thread-safe. The
original, where I merely set hashCode = id++, was not thread safe.

Walter Gildersleeve
Freiburg, Germany

______________________________________________________
http://linkfrog.net
URL Shortening
Free and easy, small and green.
 
N

NullBock

Yah, I thought about that, too. But if you want to have unique hash
codes for objects, then you're going to be confronted with integer
boundary problems at some point. I guess I could've set id to
Integer.MIN_VALUE, which've given more than twice as many
possibilities, eh?

'Course, I doubt anyone's going to need more than 2 bil instances
anyhow.
 
I

Ian Pilcher

Roedy said:
In general it will not. That is not its function. however
Object.hashcode uses the address as its hash, so it is unique.j

Actually, it does eventually produce collisions (Sun 1.5.0).
 
M

Mike Schilling

NullBock said:
Yah, I thought about that, too. But if you want to have unique hash
codes for objects, then you're going to be confronted with integer
boundary problems at some point. I guess I could've set id to
Integer.MIN_VALUE, which've given more than twice as many
possibilities, eh?

Same number of possibilities, just in a different order. (Integer.MAX_VALUE
+ 1 = Integer.MIN_VALUE).
 
T

Thomas Hawtin

NullBock said:
Ok, I'm just being anal, but I had to fix my code for multi-threading:

class AlmostUniqueHash {
static int id = 0;
int hashCode = nextHash();

private static synchronized int nextHash() {
return id++;
}

From 1.5 you can getter performance performance using
java.util.concurrent.atomic.AtomicInteger. OTOH, you might consider me
more anal than yourself.

(Modern Sun) ThreadLocal uses a similar tactic to encourage perfect
hashing. Its particular hash table implementation works better if hash
entries not only avoid clashing, but also avoid being close to one another.

Tom Hawtin
 
T

Thomas Hawtin

Roedy said:
In general it will not. That is not its function. however
Object.hashcode uses the address as its hash, so it is unique.j

We did this two or three months ago. Object addresses change on modern
JVMs, so that cannot be used as the way hashCode are retained (although
it may well have something to do with generation...). It is unfeasible
for even a 32-bit JVM to keep track of all allocated hash codes.
Therefore you should expect duplicates. Sun's (SE) JVM does not disappoint.

This is not an unusual misunderstanding. I have filed a bug with respect
to the Object.hashCode documentation. There is simplistic a program in
the comments that may find clashes.

http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6321873

Tom Hawtin
 
R

Roedy Green

consists of two, non-atomic actions. First, the hashCode field is set
to id, then id is incremented is incremented. What happens if one
thread yields in the middle of this operation? hashCode will get a
stale value of id.

An hashcode is assigned in the constructor. Only one thread can
possibly see an object as it is being constructed.
 
R

Roedy Green

Actually, it does eventually produce collisions (Sun 1.5.0).

If you held onto all your objects you could get uniqueness.

But this is abuse of hashCode. If you want some unique identifier,
assign it yourself, perhaps using a long.
 
G

~Glynne

hack_tick said:
I need to find out whether the hashcode() function actually returns
unique value wrt every different object and can it be uswed to identify
a single object in an array of more than 1-million objects?

The following are the only deductions that can be made about objects
and their hashcodes:
1) IF ob1.equals(ob2) THEN hash1==hash2
2) IF hash1 != hash2 THEN !(ob1.equals(ob2))

In particular, note that the converse statements do not hold, i.e.
1') hash1==hash2 does not imply that ob1.equals(ob2)
2') !(ob1.equals(ob2)) does not imply that hash1 != hash2


In light of this, an idiom that I've found useful is to go ahead and
use the hashcode to search the array. This will narrow your million
objects down to the one that you're looking for....plus zero or more
objects whose hashcodes collide with it.

Finally, you can filter the handful of candidates identified by
hashcode matching, by using object comparisons to find your one true
match.

~Glynne
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,780
Messages
2,569,611
Members
45,276
Latest member
Sawatmakal

Latest Threads

Top