String based hashCode

J

jlukar

It is very unlikely that you will find a hash code that is unique. As
a matter of fact, you may have misunderstood the use of hashes.

Yes I believe so. I thought it can be used in place of a key as it
implied in my mind a unique key.

This lets you know that if hashA != hashB, then A != B. if hashA ==
hashB, A might = B.
aha!



If your business field is exactly 5 uppercase letters, then that gives
you a possibility of 26^5 combinations, which fits within a 24 bit
number. You can get your hash value by treating the string of 5
uppercase letters as a base 26 number, A = 0, B = 1, ..., Z = 25.
The string AZWAA would be the number 0 * 26^0 + 25 * 26^1 + 22 * 26^2
+ 0 * 26^3 + 0 * 26^4
Notice that each term is a factor of a power of 26.

business field will be a 11 chars combined. The calculation another
poster did, makes this impossible for me.

Hope this helps.

very much so. thank you.
 
J

jlukar

One possible way is to use a database. Store the string with a
sequence-defined identifier, and use that identifier as your ID, guaranteed
unique. Or just use the string as the key.

can't do DB. It is a real-time high throughput system which means no
DB in between a transaction.


You haven't really explained why you need it to be unique, though. Java's
HashMap implementation (and every other implementation of hash I've seen)
handles collisions just fine, and you can too.

I want to be able to retrieve a HashMap value using a key. So I don't
understand your comment about HashMap handling collisions fine. (which
couple of other posters referred to as well).

e.g.

if I have "xyz123" and "abc345" both happen to get the same key
value of 124444 and I search the HashMap for key 124444, I will get
two objects returned. That would not be desirable.

Do you mean I should then do another check and see that I should pick
"xyz123" or "abc345" ??


thanks much.
 
J

jlukar

that was funny. I also now know what bigamy is which I had to lookup.

Thanks very much for this detail explanation. It is very clear why my
string hashCode can not possibly result in a unique key fitting into
an integer.

I have since settled for using the concatentated string object as a
secondary key, still utilizing my original domain business object
interface but just filling bogus int. for the records primary key.


thanks.
k.

[concerning non-uniqueness of hash codes]
I mean that the lengght of string is fixed an no more than 11 chars
overall. You see I know that LASTNAME can not be more than 5
characters. I also know that EMPLOYEE-CATEGORY can not be bigger than
5 characters. So the combination is LASTNAME+.+EMPLOYEE-CATEGORY
= 11 characters total.

Let's do some arithmetic.

There is exactly one String of length zero.

There are 65536 different Strings of length one.

There are 65536^2 = 4,294,967,296 Strings of length two
(using ^ to indicate exponentiation).

...

There are 65536^11 Strings of length eleven.

All told, there are (65536^12 - 1) / 65535 Strings of
length eleven or less. That's just a little under 1e53 (the
11-character Strings make up the vast majority).

Meanwhile, there are 2^32 = 4,294,967,296 distinct hashCode()
values, because that's how many different `int' values exist.

4,294,967,296 may seem like a lot of hashCode() values, but
next to 95,782,432,828,056,470,036,404,813,049,000,000,000,000,-
000,000,000,000+ it is as nothing. If you try to marry the
Strings and the hashCode()s, you will commit bigamy on a truly
grand scale.

And that is why your hope for a universal and unique hash
code for Strings -- even for short Strings -- is in vain.
 
P

Patricia Shanahan

On Mar 8, 8:12 pm, (e-mail address removed) (Mark Rafn) wrote: ....


I want to be able to retrieve a HashMap value using a key. So I don't
understand your comment about HashMap handling collisions fine. (which
couple of other posters referred to as well).

e.g.

if I have "xyz123" and "abc345" both happen to get the same key
value of 124444 and I search the HashMap for key 124444, I will get
two objects returned. That would not be desirable.

Do you mean I should then do another check and see that I should pick
"xyz123" or "abc345" ??

No, we mean that the HashMap keys should be the strings "xyz123" and
"abc345". It is possible, but unlikely, that they have the same hash
code. HashMap knows that, and deals with it.

If you do:

myMap.put("xyz123",emp1);
myMap.put("abc345",emp2);

where emp1 and emp2 are different Employee references, then
myMap.get("xyz123") will definitely return emp1, not emp2, regardless of
hashCode collisions.

Patricia
 
D

djthomp

I want to be able to retrieve a HashMap value using a key. So I don't
understand your comment about HashMap handling collisions fine. (which
couple of other posters referred to as well).

e.g.

if I have "xyz123" and "abc345" both happen to get the same key
value of 124444 and I search the HashMap for key 124444, I will get
two objects returned. That would not be desirable.

Do you mean I should then do another check and see that I should pick
"xyz123" or "abc345" ??

thanks much.

A HashMap is a reliable mapping between a set of keys and a collection
of values, and any object in java can be used as a key or stored as a
value.

If you do myMap.put("xyz123", objectA) and myMap.put("abc345",
objectB), then myMap.get("xyz123") will always return objectA and
myMap.get("abc345") will always return objectB (provided that you
don't overwrite either entry by using the same key with a different
object). It doesn't matter what the hashCode result is for either
string, HashMap knows how to keep your objects distinct even if their
keys have the same hashCode as long as the keys themselves are not
equal.

If all of the objects you are storing are represented uniquely by the
11 character string you have described, then you should be able to use
that 11 character string as the direct key into the HashMap without
worrying about the details of the implementation.
 
P

Patricia Shanahan

djthomp wrote:
....
A HashMap is a reliable mapping between a set of keys and a collection
of values, and any object in java can be used as a key or stored as a
value.

The only limitation on this is that the key object's classes MUST have
equals and hashCode implemented in accordance with the contract
described in the Object API documentation.

Of course, String does have correct equals and hashCode implementations.

Patricia
 
T

Tom Hawtin

Patricia said:
djthomp wrote:
...

The only limitation on this is that the key object's classes MUST have
equals and hashCode implemented in accordance with the contract
described in the Object API documentation.

Point of pedantry: And it mustn't do anything silly with recursion. For
instance, using a HashMap as a key for a HashMap it uses as a key (if
you follow). It's also possible to break deserialisation dependent upon
the order (and nesting) of the object.
Of course, String does have correct equals and hashCode implementations.

More pedantry: Way back in JLS1 String was defined to have an
unimplementable hashCode.

Tom Hawtin
 
M

Mark Rafn

One possible way is to use a database. Store the string with a
can't do DB. It is a real-time high throughput system which means no
DB in between a transaction.

Using a sequence in a DB doesn't mean you need to query it every time. You
can cache dozens or hundreds of them, use them as needed, and throw away the
ones you don't use.

If you need a truly unique number (say, for a primary key), this is really
your best bet.
I want to be able to retrieve a HashMap value using a key. So I don't
understand your comment about HashMap handling collisions fine. (which
couple of other posters referred to as well).

If you're using a java.util.HashMap, why were you worried about generating
integers, and why did you care if they were negative? It's keyed on
Object, so your string ID is fine.
if I have "xyz123" and "abc345" both happen to get the same key
value of 124444 and I search the HashMap for key 124444, I will get
two objects returned. That would not be desirable.

Right, so search on your ACTUAL key - either a Key object you define, or a
string concatenation with some separator that's not part of the strings. Let
the Map implementation deal with how the hashing works.
Do you mean I should then do another check and see that I should pick
"xyz123" or "abc345" ??

No. If "xyz123abc345" is your business key, use it as your Map key. There's
no reason to mess with picking an integer for a key.
 
C

Chris Uppal

Tom said:
More pedantry: Way back in JLS1 String was defined to have an
unimplementable hashCode.

While that is primarily of academic interest, I don't think that
mentioning it is /pedantry/, as such, is it ?

;-)

-- chris
 
B

Ben Kaufman

The javadoc says the formula to calcuate the hashCode for String is:

s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]

this can result in negative numbers which is not desirable for me.
From the formula this is not clear why as always positive number
should be generated.
Any idea folks ?


Second question:

I am counting on hashCode to help me with generating a unique Integer
key using two concatenated strings the combination of which is
gauranteed to be unique. .

e.g.
("LASTNAME"+"."+"EMPLOYEE-CATEGORY") will always be unique.

so

("LASTNAME"+"."+"EMPLOYEE-CATEGORY") .hashCode() is what I wanted to
use.

will above hashCode() always be unique given that the string
combination used to generate it is unique ?

help???
k.

Why do you believe that ("LASTNAME"+"."+"EMPLOYEE-CATEGORY") will always
unique?

Ben
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads

hashCode 95
hashCode 50
hashcode 5
Hashcode and Equal 4
why 31 in hashcode for string 8
Eclipse generated hashCode() and Compiler Efficiency 4
hashCode and equals (again) 19
equals() and hashcode() not working... 6

Members online

Forum statistics

Threads
473,777
Messages
2,569,604
Members
45,234
Latest member
SkyeWeems

Latest Threads

Top