hash for String and StringBuffer

J

John Galt

I have noticed a rather strange thing when working with Strings and
StringBuffers:

$ cat s.java
public class s{
public static void main(String argv[])
{
String s1, s2;
StringBuffer sb1, sb2;

s1 = new String(argv[0]);
s2 = new String(argv[0]);
sb1 = new StringBuffer(s1);
sb2 = new StringBuffer(s1);

System.out.println(
" s1 = " + s1.hashCode() +
" s2 = " + s2.hashCode() +
" sb1 = " + sb1.hashCode() +
" sb2 = " + sb2.hashCode());
}
}

When I run it:

$ java s alpha
s1 = 92909918 s2 = 92909918 sb1 = 12386568 sb2 = 9360485
$ java s bravo
s1 = 93998218 s2 = 93998218 sb1 = 12386568 sb2 = 9360485
$ java s charlie
s1 = 739067762 s2 = 739067762 sb1 = 12386568 sb2 = 9360485
$ java s delta
s1 = 95468472 s2 = 95468472 sb1 = 12386568 sb2 = 9360485

My questions:

1. Two Strings always hash the same if they are constructed from the
same "source" string. True? (The hashCode() is what's used to put
stuff into a Hashtable, right?)
2. If many more Strings are created from the original two Strings,
they all will also hash the same. True?
3. Two StringBuffers, even if created from the same String (or
"source" string), aren't guaranteed to hash the same. True? (It looks
to me from my experience that they use the address of the StringBuffer
or something - "alpha" can't possibly hash the same as "charlie".)
What is going on here?

This problem bit me bad when I was putting StringBuffers into a
Hashtable. I am guessing I should use a String from now on. But I have
another question here.

Suppose that the following things happen in my program:
- I create a bunch of Strings, and put them into a Vector.
- I iterate over the Vector and put some those Strings into a
Hashtable.
- I then set the Vector to null. Will my Hashtable be affected? (I
tend to think no, coz those Strings that _did_ go into the Hashtable
are still referenced by the Hashtable and hence won't be garbage
collected.)

TIA,
John Galt.
 
A

Andrew Hobbs

John Galt said:
I have noticed a rather strange thing when working with Strings and
StringBuffers:

$ cat s.java
public class s{
public static void main(String argv[])
{
String s1, s2;
StringBuffer sb1, sb2;

s1 = new String(argv[0]);
s2 = new String(argv[0]);
sb1 = new StringBuffer(s1);
sb2 = new StringBuffer(s1);

System.out.println(
" s1 = " + s1.hashCode() +
" s2 = " + s2.hashCode() +
" sb1 = " + sb1.hashCode() +
" sb2 = " + sb2.hashCode());
}
}

When I run it:

$ java s alpha
s1 = 92909918 s2 = 92909918 sb1 = 12386568 sb2 = 9360485
$ java s bravo
s1 = 93998218 s2 = 93998218 sb1 = 12386568 sb2 = 9360485
$ java s charlie
s1 = 739067762 s2 = 739067762 sb1 = 12386568 sb2 = 9360485
$ java s delta
s1 = 95468472 s2 = 95468472 sb1 = 12386568 sb2 = 9360485

My questions:

1. Two Strings always hash the same if they are constructed from the
same "source" string. True? (The hashCode() is what's used to put
stuff into a Hashtable, right?)

If you look at the docs (for Sun) it gives the formula for a String hashcode
as being

" s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]
using int arithmetic, where s is the ith character of the string, n is
the length
of the string, and ^ indicates exponentiation. (The hash value of the empty
string is zero.)"
2. If many more Strings are created from the original two Strings,
they all will also hash the same. True?

The formula means that it doesn't matter how you construct a string or where
it comes from, if it has the same sequence of characters it should give the
same hashcode.
3. Two StringBuffers, even if created from the same String (or
"source" string), aren't guaranteed to hash the same. True? (It looks
to me from my experience that they use the address of the StringBuffer
or something - "alpha" can't possibly hash the same as "charlie".)
What is going on here?

The docs indicate that StringBuffer does not implement its own hashcode()
but uses the default value for object.

"As much as is reasonably practical, the hashCode method defined by class
Object does return distinct integers for distinct objects. (This is
typically implemented by converting the internal address of the object into
an integer, but this implementation technique is not required by the JavaTM
programming language.)
This problem bit me bad when I was putting StringBuffers into a
Hashtable. I am guessing I should use a String from now on. But I have
another question here.

Suppose that the following things happen in my program:
- I create a bunch of Strings, and put them into a Vector.
- I iterate over the Vector and put some those Strings into a
Hashtable.
- I then set the Vector to null. Will my Hashtable be affected? (I

No! Why would it?
tend to think no, coz those Strings that _did_ go into the Hashtable
are still referenced by the Hashtable and hence won't be garbage
collected.)

Yes.

Cheers

Andrew


--
********************************************************
Andrew Hobbs PhD

MetaSense Pty Ltd - www.metasense.com.au
Australia

61 8 9246 2026
metasens AntiSpam @iinet dot net dot au


*********************************************************
 
H

hiwa

The simplest answer may be: Object#hashCode() and Object#equals() are
overridden in String, but not in StringBuffer.

JVM string pool does an optimization. For same two strings, it only
stores one and its references are used throughout whole application.
When a Vector is destroyed, string reference in it would be destroyed,
but the string itself and its references elsewhere are not. If no
reference remains, the string is GCed.

Java reference is a 32 bit data of which content is address value of
an object. It's a pointer sans API.
 
M

Michael Borgwardt

hiwa said:
JVM string pool does an optimization. For same two strings, it only
stores one and its references are used throughout whole application.
When a Vector is destroyed, string reference in it would be destroyed,
but the string itself and its references elsewhere are not. If no
reference remains, the string is GCed.

Note that this only applies to Strings that are either compile-time
constants (i.e. literals and combinations thereof) or returned by the
String.intern() method.
Java reference is a 32 bit data of which content is address value of
an object. It's a pointer sans API.

No. Neither the Java language specification nor the Java virtual machince
specification state the size of a reference or how it is implemented.
A particular JVM implementation may implement it the way you describe,
but it doesn't have to.
 
D

Doug Pardee

1. Two Strings always hash the same if they are constructed from the
same "source" string. True? (The hashCode() is what's used to put
stuff into a Hashtable, right?)
2. If many more Strings are created from the original two Strings,
they all will also hash the same. True?
3. Two StringBuffers, even if created from the same String (or
"source" string), aren't guaranteed to hash the same. True? (It looks
to me from my experience that they use the address of the StringBuffer
or something - "alpha" can't possibly hash the same as "charlie".)
What is going on here?

Because a StringBuffer is mutable, it should never be used as a key
value. The "value" of the string data could be changed after the
object was stored, but the container would be unaware that the key
value had changed. Consequently, the object becomes misfiled in the
container and probably cannot be located. If the container looks under
the old key value, it finds your object with the new key value, which
is not what the container was looking for. If the container looks
under the new key value, it doesn't find your object at all.

Since StringBuffer should not be used as a key value, and that is
where hashCode is needed, there is no need for a proper hashCode
method in StringBuffer.

The correct thing to do is to use sb.toString() to return a String.
Because String is immutable, it can safely be used as a key value. It
is no coincidence that String has a proper implementation of the
hashCode method.
This problem bit me bad when I was putting StringBuffers into a
Hashtable. I am guessing I should use a String from now on.

Exactly.

And while you're at it, you might want to use HashMap instead of
Hashtable.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,766
Messages
2,569,569
Members
45,044
Latest member
RonaldNen

Latest Threads

Top