hashCode() for Custom classes

A

Arne Vajhøj

Mark said:
They need to be random primarily on their lower order bits because most
hash functions will just use modulo arithmetic to clip the hash code to
the proper table size.

That should only be a problem if it is modulus of a power of two.
However they need to be random in all bits because you don't know the
table size ahead of time (other than it's, hopefully, less than the size
of an integer).

Since Java only supports arrays dimension/indexes of int, then
your hope will always come true in this case.

Arne
 
M

Mark Space

Logan said:
[1] Yes, it's true that the only thing that absolutely has to be
unique is the combination of source IP, source port, dest IP,
and dest port (and time), but in practice AFAIK TCP
implementations tend to just pick a new ephemeral port number

Nope. :)

Certain severs/NAT servers that follow a convention (I'd blame Microsoft
but this one really isn't their fault) that re-uses port numbers. In
other words, if you open a connection, then close it again, the standard
guarantees that the next port used will be the same as the last one.
You'll keep getting the same IP address and port number over and over.

That might or might not affect your hashing, depending on what you do
with your connection objects after the connection closes. (Get rid of
them really really quickly, I hope.)

Of course, with multiple connections in succession from the same
user-name and password, including those in the hash won't help, because
everything will still be the same, even though the connections are
different.


(BSD based servers do not use this convention, because it's a really
stupid security hole.)
 
M

Mark Space

Roedy said:
In JDK 1.0.x and 1.1.x the hashCode function for long Strings worked
by sampling every nth character. This pretty well guaranteed you would
have many Strings hashing to the same value, thus slowing down

I think they also included at most 16 samples from the string, spacing
the samples out at every nth character as you say.

Only using 16 samples gave even worse performance than just picking
every nth character because 16 characters gives very little to hash on,
really.
 
M

Mark Space

Roedy said:
XOR is going to much must faster than multiply. Multiply though

This hasn't been true since about 1990. Modern architectures make
bit-wise operations, integer math, and floating point take about the
same time, which is not much.

On Intel, read from cache, each one of these instructions will take the
same amount of time: one cycle.

ARM being a RISC architecture is similar but for different reasons.

Smaller CPUs excepted.
 
S

sasuke

Wow, thanks a lot for such overwhelming replies to this thread, I
never thought this amount of thinking went into devising a hashCOde()
function for custom classes. I really appreciate the time and effort
everyone has put into replying to this thread.

A few points I would like to put from my side.

I never thought of the possibility of XORing the values instead of
doing plain jane addition. I will keep in mind to always use XOR
instead of addition given the benefits of uniform distribution offered
by it though I am not sure how this can be proved mathematically.
(considering I am pretty bad at math)

As far as caching the hashCode computed, this can't be done since my
class is not immutable so I would end up tracking changes to my member
variables and re-calculating the hashCode() everytime a variable is
modified i.e. everytime a setter is called.

The reason I was considering the possibility of generating a hashCode
using the String representation of the individual elements
(member.toString()) was because I was pretty sure that given the good
implementation of the hashCOde() method of String class and being
pretty much sure that the concatentated string representation would be
almost always unique. I overlooked the performance and memory overhead
given that a lot many temporary string instances would be formed since
I was under the assumption that most of the strings created during
program execution are pooled and hence re-calculating the hashCode()
using pooled instances would be a breeze.

I also would keep in mind that only those fields which affect the
equals() be used in the computation of hashCode().

Thanks again for your help and time.

Thanks and regards,
~/Sasuke
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,776
Messages
2,569,603
Members
45,191
Latest member
BuyKetoBeez

Latest Threads

Top