fast hashCode for an array.

O

Oz Levanon

Hi, I'm implementing a hashCode method for a class that holds (among
other members) a byte array (byte []).

Currently, to get the hashCode of the byte[], I'm using:
int hashCode = new String(data).hashCode();

Since this class will be used many times in a Hashtable, I wanted to
know if that's an efficient way to get the hashCode, since the
creation of a new String must be time consuming.

The other methods I know of that return a number representing the
array (Adler32, MD5, CRC32) are even slower.

Is there a faster way to get the hashCode for an array of primitives?

TIA, Oz.
 
M

Michael Borgwardt

Oz said:
Hi, I'm implementing a hashCode method for a class that holds (among
other members) a byte array (byte []).

Currently, to get the hashCode of the byte[], I'm using:
int hashCode = new String(data).hashCode();

I'd avoid this, not because of speed issues, but because it's
dependent on the platform default encoding and abuses functionality.
Since this class will be used many times in a Hashtable, I wanted to
know if that's an efficient way to get the hashCode, since the
creation of a new String must be time consuming.

The other methods I know of that return a number representing the
array (Adler32, MD5, CRC32) are even slower.

Is there a faster way to get the hashCode for an array of primitives?

Are you really certain that speed of the hashCode implementation is really
so much of an issue? Have you run a profiler?
Premature optimzation is a Bad Thing.

I'd use Adler32 (which is intended to produce something akin to a hashCode
an is faster than the others) and if speed really is an issue, copy the
bytes into an int in chunks of 4 using shifts and the ^ (XOR) operator.
 
M

Murray

Oz Levanon said:
Hi, I'm implementing a hashCode method for a class that holds (among
other members) a byte array (byte []).

Currently, to get the hashCode of the byte[], I'm using:
int hashCode = new String(data).hashCode();

Since this class will be used many times in a Hashtable, I wanted to
know if that's an efficient way to get the hashCode, since the
creation of a new String must be time consuming.

The other methods I know of that return a number representing the
array (Adler32, MD5, CRC32) are even slower.

Is there a faster way to get the hashCode for an array of primitives?

TIA, Oz.

Hi Oz,

The creation of a String from a byte array is rather complicated if you look
at the source code. I didn't go all the way through the code, but I would
assume at some point iteration over the byte array would occur. Then
String.hashCode() also iterates over the resulting char array.

You could write a utility method that iterates only once. I've done
something like this in the past:

public static int hashCode(byte[] bytes) {
int hash = 9; // arbitrary seed value
int multiplier = 13; // arbitrary multiplier value
for (int i = 0; i < bytes.length; i++) {
hash = hash * multiplier + bytes;
}
return hash;
}

If the byte[] is large and performance is a major issue, you might also want
to consider not even using the byte[] in your hashCode calculation. It's a
common misconception that hashCode must provide a unique value for an
object; this is not the case. If there are other values/Objects in your
class whose hashes will provide a reasonable diversity of hashCode values,
then you don't really need to include the byte[] hash.

Out of interest, how are you implementing the equals() method?

If thread safety is a non-issue, HashMap is faster than Hashtable. Also, if
you find you're implementing equals and hashCode frequently, the Jakarta
Commons Lang package has the very useful EqualsCodeBuilder and
HashCodeBuilder. http://jakarta.apache.org/commons/lang/
 
C

Chris Smith

Murray said:
If the byte[] is large and performance is a major issue, you might also want
to consider not even using the byte[] in your hashCode calculation. It's a
common misconception that hashCode must provide a unique value for an
object; this is not the case. If there are other values/Objects in your
class whose hashes will provide a reasonable diversity of hashCode values,
then you don't really need to include the byte[] hash.

Which is 99.9999% true. It's worth making the actual true statement,
though, which is something like this:

The hashCode needs to provide the same value for objects that are
equal, but it need not always provide different values for different
objects.

Here's an example of a case where Murray's statement isn't true, but the
contract-based statement is:

public class ModArray
{
private int mod;
private byte[] data;

...

public void equals(Object obj)
{
if (!(obj instanceof ModArray)) return false;
ModArray other = (ModArray) obj;

if (data.length != other.data.length) return false;
for (int i = 0; i < data.length; i++)
{
int a = data % mod;
int b = other.data % other.mod;

if (a != b) return false;
}

return true;
}

public int hashCode()
{
...
}
}

The only way, in the code above, to implement hashCode correctly without
walking through all the elements of the byte[] would be to return a
constant value. You cannot base the hash code on the 'mod' field
without also taking into account the array contents.

This was, of course, an extreme example, but you should be careful with
partial-data hashcodes whenever equality depends on the relationship
between data, rather than just the pure data itself.

--
www.designacourse.com
The Easiest Way to Train Anyone... Anywhere.

Chris Smith - Lead Software Developer/Technical Trainer
MindIQ Corporation
 
C

Chris Smith

Chris said:
The only way, in the code above, to implement hashCode correctly without
walking through all the elements of the byte[] would be to return a
constant value. You cannot base the hash code on the 'mod' field
without also taking into account the array contents.

Oops. Amend by deleting the first "all". It's entirely legal to
examine only the first n elements (for any value of n), and thus keep
the hashcode calculation O(1) on the size of the array. Sorry for the
typo.

--
www.designacourse.com
The Easiest Way to Train Anyone... Anywhere.

Chris Smith - Lead Software Developer/Technical Trainer
MindIQ Corporation
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,763
Messages
2,569,562
Members
45,038
Latest member
OrderProperKetocapsules

Latest Threads

Top