hashcode calculation for a Collection of objects

J

Jimmy

If 2 Collections contain 2 List of objects, and the type of this
object has hashCode() override (hashcode calculation using all fields
in this object), can these 2 different List instances compare by using
list1.hashCode() == list2.hashCode(); without iterate through each
list by comparing each item?

Thanks,
Jimmy
 
?

=?ISO-8859-1?Q?Arne_Vajh=F8j?=

Jimmy said:
If 2 Collections contain 2 List of objects, and the type of this
object has hashCode() override (hashcode calculation using all fields
in this object), can these 2 different List instances compare by using
list1.hashCode() == list2.hashCode(); without iterate through each
list by comparing each item?

Depends on what semantics you want.

The comparison is valid Java, but it returning true does not
guarantee identical elements.

Multiple values gets mapped to the same hash code, so
same value implies same hash code but same hash code
does not imply same value.

Arne
 
R

rossum

If 2 Collections contain 2 List of objects, and the type of this
object has hashCode() override (hashcode calculation using all fields
in this object), can these 2 different List instances compare by using
list1.hashCode() == list2.hashCode(); without iterate through each
list by comparing each item?

Thanks,
Jimmy
Assuming that hashCode() is implemented correctly then equal hash
codes is neccessary, but not sufficient for the two lists to be equal.

Basically you are going to have to check that two lists with equal
hash codes actually are equal.

If the hash codes are different then you can be sure that the two
lists are different.

rossum
 
D

Daniel Pitts

If 2 Collections contain 2 List of objects, and the type of this
object has hashCode() override (hashcode calculation using all fields
in this object), can these 2 different List instances compare by using
list1.hashCode() == list2.hashCode(); without iterate through each
list by comparing each item?

Thanks,
Jimmy

Assuming that the Collection implementations are the same, and List
implementations are the same, you can call c1.equals(c2) (that way you
don't have to iterate over them yourself)

If that is an expensive operation, then you can use "c1.hashCode() ==
c2.hashCode() && c1.equals(c2)"

Read <http://java.sun.com/j2se/1.4.2/docs/api/java/util/
Collection.html#equals(java.lang.Object)> for more information about
equals() and Collections.
 
R

Roedy Green

If 2 Collections contain 2 List of objects, and the type of this
object has hashCode() override (hashcode calculation using all fields
in this object), can these 2 different List instances compare by using
list1.hashCode() == list2.hashCode(); without iterate through each
list by comparing each item?

One way to compute a collective hashCode is to xor the individual
hashCodes. ^ is the xor operator. Comparing collective hashCodes
gives you are rough measure of equality. If the hashCodes don't
match, then the collections are definitely different. If the hashCodes
DO match then the collections are likely the same, and you can then
use another method to verify.

In comparing objects your equals method might first compare hashCodes
as a way of quickly discovering inequality based on input from all
relevant fields.
 
L

Lew

Daniel said:
Assuming that the Collection implementations are the same, and List
implementations are the same, you can call c1.equals(c2) (that way you
don't have to iterate over them yourself)

If that is an expensive operation, then you can use "c1.hashCode() ==
c2.hashCode() && c1.equals(c2)"

This will more than double the time to evaluate equal lists and increase the
time for all others, compared to just using equals():

The hash code of a list is defined to be the result of the following calculation:

int hashCode = 1;
Iterator<E> i = list.iterator();
while (i.hasNext()) {
E obj = i.next();
hashCode = 31*hashCode + (obj==null ? 0 : obj.hashCode());
}

If anything, because equals() doesn't need to perform the multiply-add
operations the simpler approach is probably faster always.

Extra points to those who know why List.hashCode() multiplies by the magic
number 31.
 
P

Patricia Shanahan

Lew said:
This will more than double the time to evaluate equal lists and increase
the time for all others, compared to just using equals():



If anything, because equals() doesn't need to perform the multiply-add
operations the simpler approach is probably faster always.

Also, equals can stop early. The AbstractList implementation stops
without examining any elements if the two references are equal or the
other is not a List. Given two distinct List references, it stops at the
first inequality or when it runs out of elements in the shorter List.

Patricia
 
E

Eric Sosman

Roedy Green wrote On 08/07/07 20:40,:
If 2 Collections contain 2 List of objects, and the type of this
object has hashCode() override (hashcode calculation using all fields
in this object), can these 2 different List instances compare by using
list1.hashCode() == list2.hashCode(); without iterate through each
list by comparing each item?


One way to compute a collective hashCode is to xor the individual
hashCodes. ^ is the xor operator. [...]

The Javadoc for List specifies how hashCode()
should be computed. Since List is an interface there
is no language-supported way to enforce what the Javadoc
requires (nor to detect buggy implementations of it),
but the specification is there nonetheless.

Similarly, Set specifies how hashCode() is to be
computed -- and calls for a different algorithm than
List uses. (List's hashCode() depends on the order of
the elements as well as on their hashCodes; Set has no
concept of order.)

So: Any two Lists (of any type) containing equal
elements in the same order will have equal hashCodes if
their hashCode() methods are implemented as called for.
Also, any two Sets (of any type) containing equal elements
should have equal hashCodes. But equality doesn't hold
across different interfaces: A List with elements x,y,z
most likely has a different hashCode than a Set of x,y,z.
 
O

Olle

So: Any two Lists (of any type) containing equal
elements in the same order will have equal hashCodes if
their hashCode() methods are implemented as called for.
Also, any two Sets (of any type) containing equal elements
should have equal hashCodes. But equality doesn't hold
across different interfaces: A List with elements x,y,z
most likely has a different hashCode than a Set of x,y,z.

Just to clairify:

A List can never equal a Set, even if they contain the same elements
they are NOT the same (since the Set lacks the order information).

Cut from Collection's JavaDoc:

The general contract for the Object.equals method states that equals
must be symmetric (in other words, a.equals(b) if and only if
b.equals(a)). The contracts for List.equals and Set.equals state that
lists are only equal to other lists, and sets to other sets. Thus, a
custom equals method for a collection class that implements neither
the List nor Set interface must return false when this collection is
compared to any list or set. (By the same logic, it is not possible to
write a class that correctly implements both the Set and List
interfaces.)

Read more on List and Set if this is not enough proof :)

/Olle
 
D

Daniel Pitts

Just to clairify:

A List can never equal a Set, even if they contain the same elements
they are NOT the same (since the Set lacks the order information).

Cut from Collection's JavaDoc:

The general contract for the Object.equals method states that equals
must be symmetric (in other words, a.equals(b) if and only if
b.equals(a)). The contracts for List.equals and Set.equals state that
lists are only equal to other lists, and sets to other sets. Thus, a
custom equals method for a collection class that implements neither
the List nor Set interface must return false when this collection is
compared to any list or set. (By the same logic, it is not possible to
write a class that correctly implements both the Set and List
interfaces.)

Read more on List and Set if this is not enough proof :)

/Olle

If you really want to know if the list and set contain exactly the
same items, you can use
if (list.size() == set.size() && set.containsAll(list)) {
}
Not I specifically choose set.containsAll(), rather than
list.containsAll(), since set implementations tend to be more
efficient at this operation.
 
P

Patricia Shanahan

Daniel Pitts wrote:
....
If you really want to know if the list and set contain exactly the
same items, you can use
if (list.size() == set.size() && set.containsAll(list)) {
}
Not I specifically choose set.containsAll(), rather than
list.containsAll(), since set implementations tend to be more
efficient at this operation.

Suppose the list contains "A", "A", and the set contains "A" and "B"?

Testing the other way round would be correct, because the set cannot
contain duplicates. Alternatively, create a new HashSet containing the
elements of the List and test it for equality to the set.

Patricia
 
C

Christian

Roedy said:
One way to compute a collective hashCode is to xor the individual
hashCodes. ^ is the xor operator. Comparing collective hashCodes
gives you are rough measure of equality. If the hashCodes don't
match, then the collections are definitely different. If the hashCodes
DO match then the collections are likely the same, and you can then
use another method to verify.

In comparing objects your equals method might first compare hashCodes
as a way of quickly discovering inequality based on input from all
relevant fields.

XOR seems to be not the best idea to do this..
multiplication is probably better.. as with xor you will miss Objects if
they are even times in the list..
especially if you have lists of Boolean or other Objects with only a few
different hashcodes this would be bad..
 
T

Thomas Hawtin

Christian said:
multiplication is probably better.. as with xor you will miss Objects if
they are even times in the list..

You have to be very careful with multiplication. What if a number of the
hash values were even?

Tom Hawtin
 
K

kaldrenon

Alternatively, create a new HashSet containing the
elements of the List and test it for equality to the set.

Couldn't this easily break due to sets containing no duplicate?
Consider List X == ["A","A","B"] and Set Y == ["A","B"]. If you
convert X to a set it will become ["A","B"], will it not?

Unless the conditions of the algorithm guarantee uniqueness in both
Collections, converting a List to a Set sounds dangerous and could
lead to error.
 
L

Lew

Thomas said:
You have to be very careful with multiplication. What if a number of the
hash values were even?
Extra points to those who know why List.hashCode() multiplies by the magic number 31.

(Knuth explained it. Hint: the hash code calculation uses 32-bit integer
arithmetic without exceptions for overflow.)
 
P

Patricia Shanahan

kaldrenon said:
Alternatively, create a new HashSet containing the
elements of the List and test it for equality to the set.

Couldn't this easily break due to sets containing no duplicate?
Consider List X == ["A","A","B"] and Set Y == ["A","B"]. If you
convert X to a set it will become ["A","B"], will it not?

Unless the conditions of the algorithm guarantee uniqueness in both
Collections, converting a List to a Set sounds dangerous and could
lead to error.

Remember the problem description this was intended to solve: "If you
really want to know if the list and set contain exactly the
same items..."

Your X and Y do contain exactly the same items, so the comparison should
return true.

Conversion of List to Set is an information-destroying operation, but in
this particular case, the information that gets destroyed is exactly the
superfluous information that prevents compareTo from getting the right
answer. Sometimes, it is good to destroy information.

I would agree with you if the problem were to determine if the list and
set contain the same items the same number of times each.

Patricia
 
K

kaldrenon

I would agree with you if the problem were to determine if the list and
set contain the same items the same number of times each.

This is what I had inferred the problem to be, hence my questioning of
your suggestion. But given a situation of only checking /which/ values
are in each collection, as opposed to how many of each, you are right
that moving a List into a Set is what will be needed.
 
R

Roedy Green

XOR seems to be not the best idea to do this..
multiplication is probably better.. as with xor you will miss Objects if
they are even times in the list..

the nice properties of XOR are:

1. it does not depend on order of computation.

2. it does not "waste" bits. If you change even one bit in one of the
components, the final value will change.

3. it is quick.

4. it does not tend to collapse the range of the digest into a
narrower band.

The nasty properties of XOR are:

1. it treats identical pairs of component as if they were not there.

2. it ignores order. A ^ B is the same a B ^ A. for that you want
some sort of checksum/digest such as Adlerian. See
http://mindprod.com/jgloss/digest.html
 
T

Twisted

On Aug 9, 10:36 pm, Roedy Green <[email protected]>
wrote:
[snip excellent discussion of pros and cons of xor in hashCode()]

One place xor would be acceptable would be where there is no chance of
duplication. A Set, say, except that the Set interface specifies, and
AbstractSet enforces, that the Set hashcode be the sum; nonetheless
xor would have worked too if Sun had chosen it. An object whose fields
are of different types can xor their hashCodes to get a hashCode for
itself safely, if none of these types are related in the type
hierarchy other than common descent from Object.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,582
Members
45,057
Latest member
KetoBeezACVGummies

Latest Threads

Top