Distinct ID Number Per Object?

D

Daniel Dyer

Eric said:
Lew said:
Hal Vaughan wrote:
So is it only in extreme cases like this where hashcodes would be
duplicated?

Hash codes have even fewer values than Strings. That means there must
be proportionately more collisions. Have you read the Javadocs on the
hashCode() method? You should. Also read the Javadocs on Map,
HashMap and IdentityHashMap.

As Twisted pointed out, the "Identity", i.e., the internal "address"
of an object, is unique for the lifetime of that object. [...]
Can you find this guarantee in the Javadoc or other
authoritative place? Does this rule out 64-bit JVM's?

You mean the guarantee that an object's "address" is unique during its
lifetime? How else would the JVM find a particular instance? In other
words, how could it possibly not be?

If two objects had the same "address", then a reference using that
"address" would not reference a single object, which contradicts the
very definition of an object reference.
a reference to the newly created object is returned as the result [of]
the indicated constructor

There is no way for one "address" to point to two objects simultaneously.

This question has nothing to do with bit width, AFAICS. I'm not really
sure how there could even be a question here.

I think Eric's point is that the number of possible hash codes is smaller
than the number of objects that can be addressed in a 64-bit JVM.
Therefore System.identityHashCode(Object) cannot, in a 64-bit VM at least,
guarantee to return unique values for all objects on the heap.

Dan.
 
L

Lew

Daniel said:
Eric said:
Lew wrote:
Hal Vaughan wrote:
So is it only in extreme cases like this where hashcodes would be
duplicated?

Hash codes have even fewer values than Strings. That means there
must be proportionately more collisions. Have you read the Javadocs
on the hashCode() method? You should. Also read the Javadocs on
Map, HashMap and IdentityHashMap.

As Twisted pointed out, the "Identity", i.e., the internal "address"
of an object, is unique for the lifetime of that object. [...]
Can you find this guarantee in the Javadoc or other
authoritative place? Does this rule out 64-bit JVM's?

You mean the guarantee that an object's "address" is unique during its
lifetime? How else would the JVM find a particular instance? In
other words, how could it possibly not be?

If two objects had the same "address", then a reference using that
"address" would not reference a single object, which contradicts the
very definition of an object reference.
a reference to the newly created object is returned as the result
[of] the indicated constructor

There is no way for one "address" to point to two objects simultaneously.

This question has nothing to do with bit width, AFAICS. I'm not
really sure how there could even be a question here.

I think Eric's point is that the number of possible hash codes is
smaller than the number of objects that can be addressed in a 64-bit
JVM. Therefore System.identityHashCode(Object) cannot, in a 64-bit VM
at least, guarantee to return unique values for all objects on the heap.

The fact that he didn't mention that method in his question, but instead
referenced my comments about "address", meant it would've taken quite the leap
for me to understand that context.

Nothing in Eric's post makes mention of System.identityHashCode(Object). How
did you infer it?
 
L

Lew

The fact that he didn't mention that method in his question, but instead
referenced my comments about "address", meant it would've taken quite
the leap for me to understand that context.

Nothing in Eric's post makes mention of
System.identityHashCode(Object). How did you infer it?

Perhaps you meant that Eric meant to refute Twisted's assertion, not
referenced in his post but repeated here:
I suggest you use System.identityHashCode(Object) to get these
numbers. It should be a) fixed for an object's lifetime in one session
(it will change when the object is serialized and later deserialized);
b) globally unique (within the one JVM anyway) as the usual
implementation of the default hash code for Object is the memory
address of that object, which is necessarily globally unique in that
scope; and c) not subject to being overridden unlike calling
hashCode() on the object. This of course works if you need a globally

b) is wrong on two counts. There is no guarantee of uniqueness within the
JVM, an application or otherwise to Object.hashCode(), as its docs clearly
state. Also, it is not correct that the "usual implementation of the default
hash code for Object is the memory address of that object". a) is simply a
rehash of the Javadocs' comments. (Pun intended.)

Why he would quote unrelated comments to refute that point, only he can say,
if indeed that is what happened. I took his remarks at face value.
 
S

Stefan Ram

Hal Vaughan said:
Okay, I got one working. Thanks!

In the meantime, I became aware of the fact, that
it would be more simple to put the field

private static int value = 0;

directly into the class »Identifier« and eliminate
the class »globalCounter«.
 
E

Eric Sosman

Lew said:
Daniel said:
Eric Sosman wrote:
Lew wrote:
Hal Vaughan wrote:
So is it only in extreme cases like this where hashcodes would be
duplicated?

Hash codes have even fewer values than Strings. That means there
must be proportionately more collisions. Have you read the
Javadocs on the hashCode() method? You should. Also read the
Javadocs on Map, HashMap and IdentityHashMap.

As Twisted pointed out, the "Identity", i.e., the internal
"address" of an object, is unique for the lifetime of that object.
[...]
Can you find this guarantee in the Javadoc or other
authoritative place? Does this rule out 64-bit JVM's?

You mean the guarantee that an object's "address" is unique during
its lifetime? How else would the JVM find a particular instance? In
other words, how could it possibly not be?

If two objects had the same "address", then a reference using that
"address" would not reference a single object, which contradicts the
very definition of an object reference.
<http://java.sun.com/docs/books/jls/third_edition/html/execution.html#12.5>

a reference to the newly created object is returned as the result
[of] the indicated constructor

There is no way for one "address" to point to two objects
simultaneously.

This question has nothing to do with bit width, AFAICS. I'm not
really sure how there could even be a question here.

I think Eric's point is that the number of possible hash codes is
smaller than the number of objects that can be addressed in a 64-bit
JVM. Therefore System.identityHashCode(Object) cannot, in a 64-bit VM
at least, guarantee to return unique values for all objects on the heap.

The fact that he didn't mention that method in his question, but instead
referenced my comments about "address", meant it would've taken quite
the leap for me to understand that context.

Nothing in Eric's post makes mention of
System.identityHashCode(Object). How did you infer it?

The inferential error, if there was one, was mine: When
you wrote about the "Identity" of an Object, I assumed you
meant its System.identityHashCode() value. My assumption was
(I thought) strengthened when you described the "Identity" as
"the internal `address'" of the object, which matches the
highly suggestive (but not 100% prescriptive) Javadoc. Did
you have some other way to find "the internal `address'" of a
Java object? System.hashCode() is the closest thing I can
think of, but there are many things I haven't thought of ...

And yes, the point of my question was that a 64-bit JVM
can (given enough heap) create more distinct objects than
there are hashCode() or identityHashCode() values. I was
attempting what's known as "Socratic questioning;" Socrates
was evidently better at it than I am -- and he got poisoned
for it, so perhaps I ought to quit while I still have the
option ...
 
R

rossum

I have a case where I'll need distinct and printable names to use in a
reference table. I'd like to make it so each object, whether it's of the
same class as any other object or not, can produce a distinct number. It
looks like if I get the hashcode for any object, the JVM attempts to give
each object a unique hashcode, but it doesn't seem to guarantee it.

Is there any way to get a unique string or number for each object that is
created by a particular JVM?

Thanks!

Hal
One possibility that has not yet been mentioned in this discussion is
to use a cryptographic hash or Message Digest. SHA-256 produces a 256
bit hash, so collisions are much less likely that with a 32 bit
integer hash. The cryptographic hash function will run rather more
slowly than the integer hash function though - nothing comes for free.

Depending on the providers available in your Java implementation have
a look at:

MD5: 128 bit output

SHA-1: 160 bit output

SHA-256: 256 bit output

Both MD5 and SHA-1 have cryptographic weaknesses, but they are fine
for non-cryptographic purposes. MD5 is less slow than either of the
SHA's

rossum
 
H

Hal Vaughan

Stefan said:
In the meantime, I became aware of the fact, that
it would be more simple to put the field

private static int value = 0;

directly into the class »Identifier« and eliminate
the class »globalCounter«.

That's how I thought of it. I might have seen it before and forgotten it,
since it came so easily to me. I did this:

protected static int componentID = 0;

protected int myID = componentID++;

This was in the superclass for all the subclasses that need distinct IDs.
It covers more than my original idea, but that helps in the long run.

Hal
 
M

Mark Thornton

Twisted said:
I suggest you use System.identityHashCode(Object) to get these
numbers. It should be a) fixed for an object's lifetime in one session
(it will change when the object is serialized and later deserialized);
b) globally unique (within the one JVM anyway) as the usual
implementation of the default hash code for Object is the memory
address of that object, which is necessarily globally unique in that
scope; and c) not subject to being overridden unlike calling

While the objects address may be unique (amongst objects existing at a
given time), the value returned by System.identityHashCode is NOT
guaranteed to be unique. Indeed in some cases it couldn't be. The
hashCode is a 32 bit integer, but a 64 bit VM could have more than 2^32
objects, in which case some of those objects would have the same hash code.

Mark Thornton
 
S

Stefan Ram

Mark Thornton said:
While the objects address may be unique (amongst objects existing at a
given time), the value returned by System.identityHashCode is NOT
guaranteed to be unique. Indeed in some cases it couldn't be. The
hashCode is a 32 bit integer, but a 64 bit VM could have more than 2^32
objects, in which case some of those objects would have the same hash code.

I am somewhat disappointed, that no one has yet
reported about results from the program I posted in

<[email protected]>

public class Main
{ final static java.lang.String lineSeparator =
java.lang.System.getProperty( "line.separator" );
public static void main( final java.lang.String[] args )
{ final java.lang.Object object = new java.lang.Object();
final int code = object.hashCode();
java.lang.Object object1;
int code1;
do
{ code1 =( object1 = new java.lang.Object() ).hashCode(); }
while( code1 != code );
java.lang.System.out.print
(( object == object1 )+ lineSeparator +
code + lineSeparator +
code1 + lineSeparator ); }}
 
L

Lew

Eric said:
The inferential error, if there was one, was mine: When
you wrote about the "Identity" of an Object, I assumed you
meant its System.identityHashCode() value. My assumption was

Oh, I see. No, I'd've said the method name in full if I meant it.

There is nothing in the System.identityHashCode() Javadocs to suggest that it
returns unique values even in a 32-bit JVM. I was referring to the rather
complex and unspecified actual JVM "address", to which I refer always in
quotes because it is not an address as thought of in many other machine
architectures.
(I thought) strengthened when you described the "Identity" as
"the internal `address'" of the object, which matches the

Yes, but not the System.identityHashCode().
highly suggestive (but not 100% prescriptive) Javadoc. Did

I don't think it's "highly" suggestive at all. They say they implement the
Object.hashCode() method as a conversion to int from the "internal address" of
an object. They clearly do not specify that conversion, nor what they mean by
the "internal address". A cursory examination of how such "internal
addresses" are implemented reveals, for example, that Sun's implementation is
a pointer to a table of a pair of pointers to locations in the class area and
the heap. Clearly this is "converted" to int via an algorithm that, as has
been documented and restated, reduces the value set from the domain ("internal
addresses") to the range (int). There is nothing in the Sun document from
which one can conclude that this conversion results in a unique value; /au
contraire/ the documents for hashCode(), and transitively
System.identityHashCode(), explicitly warn that the value cannot be guaranteed
to be unique. Again, 32-bit or 64-bit makes no difference.
you have some other way to find "the internal `address'" of a
Java object?

No, nor was this a topic in this thread. There was a misstatement upthread
that the "internal address" is somehow equivalent to the hashCode(), but that
is neither my nor Sun's fault.

I find it generally irrelevant to look for this "address". It is enough to
hold a reference in a variable.
System.hashCode() is the closest thing I canthink of, but there are many things I haven't thought of ...

I assume you mean Object.hashCode(), which System.identityHashCode() invokes.

There is nothing in the return value of either method that is structurally or
meaningfully similar to the address of the object. These methods return an
int; a JVM "address" is certainly not an int, nor conceptually representable
as a one-to-one mapping to the int range. Nor is the "address" required to
remain "constant" during the lifetime of the object, unlike the return value
of hashCode(). The two are in completely different semantic spaces.
And yes, the point of my question was that a 64-bit JVM
can (given enough heap) create more distinct objects than
there are hashCode() or identityHashCode() values. I was

So does a 32-bit JVM, potentially. Neither method guarantees a unique result,
within the JVM, within an application, or within a moment. That is why both
HashMap and IdentityHashMap have to resolve collisions. Even in a 32-bit
environment.

However, the object's "address", whatever that is, is perforce unique.
 
H

Hal Vaughan

Stefan said:
I am somewhat disappointed, that no one has yet
reported about results from the program I posted in

One point here: I'm self taught. There's a lot I know I don't know. I
don't see what limits the loop. Just how much will this do and will it
slow down a system or anything like that? I'm asking because I don't have
a spare system to test anything on now.

Hal
 
T

Twisted

[snip]

Uh-oh. Some people are apparently on the warpath again, and I've been
attacked and accused of stuff.

Disregard all disparaging comments directed towards me. None of them
are true. They are to be ignored.

Regarding identityHashCode() -- I have it on good authority than the
Sun JVM implementation, and the typical implementation, uses the RAM
address of the object's handle (which isn't moved by compacting gc).
This address is necessarily unique for objects of overlapping
lifetime. The 32 bit code derived from it is not guaranteed unique on
a 64-bit system, but the odds of a collision are still extremely
minuscule unless the system is vastly larger than any current hardware
can cope with. (The running Java app had to occupy a gig or more with
just the objects that need unique IDs before a collision is remotely
likely.)

Of course the slight risk might still be intolerable. As with the risk
that occurs when using a JVM that may not use the RAM address, it's
probably also quite small. The RAM address of the handle is a free
collisionless hash on 32-bit architectures, and still gives a very
good hash distribution on 64-bit ones, so it is difficult to imagine
why anyone would implement a JVM to do something more complicated that
probably gives a poorer distribution of hashes and more collisions,
unless it was radically different in its guts, say not even having a
handle with a pointer to the class and a pointer to the instance, and
then it's hard to see how they could make GC work...

Regardless, the OP has since revealed that they control a base class
of the classes that need the IDs, which makes it simple to solve their
problem with zero risk of collisions. The method was also mentioned in
my earlier post, though this fact seems to have gone unacknowledged:

public class Base {
public final long id; // should stay unique even on 64-bit
architectures, or with long running systems
private static long idGenerator;
public Base () {
synchronized (Base.class) {
id = idGenerator;
idGenerator++;
}
}
...
}

If you don't construct Base instances in more than one thread at a
time, you can dump the synchronization. Otherwise it is needed to
prevent race conditions with accessing and incrementing idGenerator,
which could result in two objects getting the same id at the same
time, and the next id in sequence being skipped.
 
S

Stefan Ram

Hal Vaughan said:
One point here: I'm self taught. There's a lot I know I don't
know. I don't see what limits the loop. Just how much will
this do and will it slow down a system or anything like that?
I'm asking because I don't have a spare system to test anything
on now.

Usually, an operating system allows one to terminate a JVM at
any time chosen.
 
T

Twisted

The RAM address of the handle is a free
collisionless hash on 32-bit architectures, and still gives a very
good hash distribution on 64-bit ones, so it is difficult to imagine
why anyone would implement a JVM to do something more complicated that
probably gives a poorer distribution of hashes and more collisions.

Actually, it occurs to me on rereading that to get a better
distribution on 64-bit architectures, you:
a) use the handle's address on 32 bit architectures; and
b) use bits 35 to 4 on 64 bit architectures.

This is because on 64 bit architectures, you'd be implementing your
JVM with a 16-byte handle (two eight-byte pointers, one to the class
and one to the instance) and for speed aligning them on 16-byte
boundaries, and surely packing them densely in a particular part of
memory regardless. This means you can drop the low-order 4 bits from
the handle address as probably-all-zeros and surely-all-the-same; they
don't contribute any distinctness to hash values derived from the
address. After that, using the least significant remaining 32 bits
gives best results, as anything but a huge system has a lot of zeros
in the high order bits that also contribute no distinctiveness to the
hash values. On the other hand, zeros creeping into bit 35 or below
isn't a worry as it just means you have less than 2^32 objects -- and
if the handles are contiguous in memory, this actually means there
will not be any collisions at all.

(And remember that as far as the machine/CPU is concerned, addresses
are just integers of some size or another; distinctions between int
and pointer or reference are artifacts of the higher level language.)
 
H

Hal Vaughan

Stefan said:
Usually, an operating system allows one to terminate a JVM at
any time chosen.

Yes, but sometimes a program grabs so many resources that it can take a long
time to type in "kill -9 ----".

I'll try it when I don't have the IDE up and I'm trying to get this current
job finished.

Hal
 
H

Hal Vaughan

Twisted wrote:

....
public class Base {
public final long id; // should stay unique even on 64-bit
architectures, or with long running systems
private static long idGenerator;
public Base () {
synchronized (Base.class) {
id = idGenerator;
idGenerator++;
}
}
...
}

There may be nothing to this, but, as I've said in this thread before, and
said on this group many times, being self taught, I know there are many
things I've missed. Is there any particular reason for you using this:

id = idGenerator;
idGenerator++;

Instead of this:

id = idGenerator++;
If you don't construct Base instances in more than one thread at a
time, you can dump the synchronization. Otherwise it is needed to
prevent race conditions with accessing and incrementing idGenerator,
which could result in two objects getting the same id at the same
time, and the next id in sequence being skipped.

While this is working with Swing, all the objects are created before the
first interactive window opens, so, no, race conditions are not an issue.

Hal
 
D

Daniel Dyer

[snip]

Uh-oh. Some people are apparently on the warpath again, and I've been
attacked and accused of stuff.

I'd hardly call it an attack. Lew's comments addressed his disagreement
with what you wrote, he said nothing about you yourself. This is a forum
for discussion, disagreements are to be expected.
Disregard all disparaging comments directed towards me. None of them
are true. They are to be ignored.
Right...

Regarding identityHashCode() -- I have it on good authority than the
Sun JVM implementation, and the typical implementation, uses the RAM
address of the object's handle (which isn't moved by compacting gc).

The Javadocs for Object.hashcode() say:

"As much as is reasonably practical, the hashCode method defined by class
Object does return distinct integers for distinct objects. (This is
typically implemented by converting the internal address of the object
into an integer, but this implementation technique is not required by the
JavaTM programming language.)"

Which seems to be entirely consistent with what you are saying. So I'm
not sure where Lew is coming from when he says:
it is not correct that the "usual implementation of the defaulthash code
for Object is the memory address of that object"

Unless he merely means that the value returned is not necessarily the
address itself but is trivially derived from the address.

Dan.
 
D

Daniel Dyer

Twisted wrote:

...

There may be nothing to this, but, as I've said in this thread before,
and
said on this group many times, being self taught, I know there are many
things I've missed. Is there any particular reason for you using this:

id = idGenerator;
idGenerator++;

Instead of this:

id = idGenerator++;

The first example is less confusing. The single-line variant is modifying
two variables. And if you don't think the second example has the
potential for confusion, you may be surprised that is not semantically
equivalent to the first (it does something different).

You might want to write a little program to demonstrate the difference
between these two assignments.

id = idGenerator++;

id = ++idGenerator;


Anyway, for a simpler version of the same idea implemented in Twisted's
code, just use java.util.concurrent.atomic.AtomicLong. It deals with the
synchronisation and incrementing for you.

Dan.
 
D

Daniel Dyer

The first example is less confusing. The single-line variant is
modifying two variables. And if you don't think the second example has
the potential for confusion, you may be surprised that is not
semantically equivalent to the first (it does something different).

Sorry that's completely wrong.

It is equivalent but, as my post aptly demonstrates ;), it's not as
obvious.

Dan.
 
L

Lew

Daniel said:
"As much as is reasonably practical, the hashCode method defined by
class Object does return distinct integers for distinct objects. (This
is typically implemented by converting the internal address of the
object into an integer, but this implementation technique is not
required by the JavaTM programming language.)"

Which seems to be entirely consistent with what you are saying. So I'm
not sure where Lew is coming from when he says:

Because they do not say the algorithm for "converting the internal address".
Which part of the JVM address would they use? The handle, which comprises a
pointer to two other pointers in some Sun implementations? The pointer
values? The heap offset into which one of the pointers points? The pointer
to the class area? All these ingredients are necessary to make up a real
"address" in the JVM, but only an int appears in the hashCode() output.
Also, notice the words, "As much as is reasonably practical". They are
telling you right in the Javadocs that it is not a guarantee.

Once again, the int of a hashCode() and the "address" of an object have
different structures, different interpretations and different semantic spaces.

It isn't. One implementation is to derive the int from some part of the
"address" of the object by an unspecified algorithm. The derived int is not
the same thing as the source "address".
Unless he merely means that the value returned is not necessarily the
address itself but is trivially derived from the address.

Maybe trivially, maybe not, but certainly derived from, and not equivalent to.
It can't be. As others pointed out, the theoretical space of addresses is
larger than can be represented in 32 bits.

All of this goes to what others on this thread have also pointed out, that
identityHashCode() and the underlying Object.hashCode() cannot be relied upon
to achieve a guaranteed unique handle for an object.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,432
Messages
2,571,680
Members
48,796
Latest member
Greg L.

Latest Threads

Top