J
junzhang1983
3ks
Peter said:I definitely would not want the hash codes to be 0 and 1.
Lew said:Please include a body to your posts.
Why not?
Please include a body to your posts.
Why not?
Because they only differ in one bit. It is often thought a good thing for
hashes of different objects to differ in as many bits as possible.
The low-order bits are particularly important, given that hashes are
often reduced to a smaller range by modding, but all the bits might be
used for something, so having them be different is a useful hedge.
That argument suggests that 0 and -1 might be the best values. I wonder
what Peter would think of that?
Lew said:How often are Boolean hashes compared to Integer hashes?
Peter Duniho said:I'm not sure what you mean there. Ideally, just because a class has both
Integer and Boolean members, that wouldn't mean you'd actually be
_comparing_ Integer hashes to Boolean hashes.
By how many bits is it possible for a universe of two values to
differ?
1231 and 1237 differ in three bits. If differing in as many bits as
possible is good, why didn't they choose values that differed by more?
Patricia's argument that Boolean hashes need to differ from common
Integer hashes might apply, assuming that there's any benefit to
that.
How often are Boolean hashes compared to Integer hashes? How often are
hashcodes picked for a class so that it can be compared to a different
class?
I suspect that some early Java implementor decided that two arbitrary
prime numbers were just fine, and didn't happen to like 17.
The Integer hashCode() is the primitive int the Integer represents.
public said:If the occupied positions are 14, 29, 33, 57, 193, 385, 772, 807, 901,
and 917, you get these hashes: 14, 13, 1, 9, 1, 1, 4, 7, 5, and 5.
Quite a few hash collisions.
public said:He did -- he said " 3ks". Not intelligible, but it does seem to
constitute a body.
Bletch!
That will have a pretty poor distribution of hash values, because the
integer values that arise in actual running code don't tend to be
completely randomly distributed. Integers richer in factors are more
common, as are smaller integers.
Multiplying the integer's value by a predetermined prime number near
(Integer.MAX_VALUE times 1.61803398) would have been a much better
choice of hash function, scattering the hash values pretty randomly
about the full range of integer values, varying a lot in each bit
position, whatever the input distribution.
[... more pseudo-numerological nonsense ...]
Mark said:Choose 10 random numbers between 1 and 16. How many will be duplicates?
I don't see how this arbitrary list of number is somehow significant in
proving that the hashcodes from Integer are poorly distributed.
Eric said:Nonsense. If the input consists of 100 zeroes, 10 ones,
and one each of 2..9, the hashes will consist of 100 h(0)s,
10 h(1)s, and one each of h(2..9). Nothing has changed,
except possibly for the worse if h(0)==h(6), say.
[... more pseudo-numerological nonsense ...]
public said:Even integers are more common than odd,
public said:Eric said:Nonsense. If the input consists of 100 zeroes, 10 ones,
and one each of 2..9, the hashes will consist of 100 h(0)s,
10 h(1)s, and one each of h(2..9). Nothing has changed,
except possibly for the worse if h(0)==h(6), say.
[... more pseudo-numerological nonsense ...]
Your insulting attitude in response to a perfectly civil post baffles me.
public said:Eric said:Nonsense. If the input consists of 100 zeroes, 10 ones,
and one each of 2..9, the hashes will consist of 100 h(0)s,
10 h(1)s, and one each of h(2..9). Nothing has changed,
except possibly for the worse if h(0)==h(6), say.
[... more pseudo-numerological nonsense ...]
Your insulting attitude in response to a perfectly civil post baffles me.
I'll copy and paste my reasoning from my reply to Mark Space, who had a
similar objection but was far more polite.
It's not just the distribution of the hash values over the space of
possible integer values that counts. It's the distribution of the hash
values over the actual hash buckets produced in an actual hash table,
which typically is much smaller. You can have few collisions in a large
space and still end up with many in a small space, especially if the
small space's size is a power of two and the low-order bits of the hash
are biased. Even integers are more common than odd, so in any hash table
with integer keys using the current integer hash code and with an even
number of buckets, it's likely the odd buckets will be underutilized and
the even ones will have an unnecessary collision or several.
public said:It's not just the distribution of the hash values over the space of
possible integer values that counts. It's the distribution of the hash
values over the actual hash buckets produced in an actual hash table,
which typically is much smaller. You can have few collisions in a large
space and still end up with many in a small space, especially if the
small space's size is a power of two and the low-order bits of the hash
are biased. Even integers are more common than odd, so in any hash table
with integer keys using the current integer hash code and with an even
number of buckets, it's likely the odd buckets will be underutilized and
the even ones will have an unnecessary collision or several.
Want to reply to this thread or ask your own question?
You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.