hash for String and StringBuffer

Discussion in 'Java' started by John Galt, Feb 24, 2004.

  1. John Galt

    John Galt Guest

    I have noticed a rather strange thing when working with Strings and
    StringBuffers:

    $ cat s.java
    public class s{
    public static void main(String argv[])
    {
    String s1, s2;
    StringBuffer sb1, sb2;

    s1 = new String(argv[0]);
    s2 = new String(argv[0]);
    sb1 = new StringBuffer(s1);
    sb2 = new StringBuffer(s1);

    System.out.println(
    " s1 = " + s1.hashCode() +
    " s2 = " + s2.hashCode() +
    " sb1 = " + sb1.hashCode() +
    " sb2 = " + sb2.hashCode());
    }
    }

    When I run it:

    $ java s alpha
    s1 = 92909918 s2 = 92909918 sb1 = 12386568 sb2 = 9360485
    $ java s bravo
    s1 = 93998218 s2 = 93998218 sb1 = 12386568 sb2 = 9360485
    $ java s charlie
    s1 = 739067762 s2 = 739067762 sb1 = 12386568 sb2 = 9360485
    $ java s delta
    s1 = 95468472 s2 = 95468472 sb1 = 12386568 sb2 = 9360485

    My questions:

    1. Two Strings always hash the same if they are constructed from the
    same "source" string. True? (The hashCode() is what's used to put
    stuff into a Hashtable, right?)
    2. If many more Strings are created from the original two Strings,
    they all will also hash the same. True?
    3. Two StringBuffers, even if created from the same String (or
    "source" string), aren't guaranteed to hash the same. True? (It looks
    to me from my experience that they use the address of the StringBuffer
    or something - "alpha" can't possibly hash the same as "charlie".)
    What is going on here?

    This problem bit me bad when I was putting StringBuffers into a
    Hashtable. I am guessing I should use a String from now on. But I have
    another question here.

    Suppose that the following things happen in my program:
    - I create a bunch of Strings, and put them into a Vector.
    - I iterate over the Vector and put some those Strings into a
    Hashtable.
    - I then set the Vector to null. Will my Hashtable be affected? (I
    tend to think no, coz those Strings that _did_ go into the Hashtable
    are still referenced by the Hashtable and hence won't be garbage
    collected.)

    TIA,
    John Galt.
     
    John Galt, Feb 24, 2004
    #1
    1. Advertising

  2. John Galt

    Andrew Hobbs Guest

    "John Galt" <> wrote in message
    news:...
    > I have noticed a rather strange thing when working with Strings and
    > StringBuffers:
    >
    > $ cat s.java
    > public class s{
    > public static void main(String argv[])
    > {
    > String s1, s2;
    > StringBuffer sb1, sb2;
    >
    > s1 = new String(argv[0]);
    > s2 = new String(argv[0]);
    > sb1 = new StringBuffer(s1);
    > sb2 = new StringBuffer(s1);
    >
    > System.out.println(
    > " s1 = " + s1.hashCode() +
    > " s2 = " + s2.hashCode() +
    > " sb1 = " + sb1.hashCode() +
    > " sb2 = " + sb2.hashCode());
    > }
    > }
    >
    > When I run it:
    >
    > $ java s alpha
    > s1 = 92909918 s2 = 92909918 sb1 = 12386568 sb2 = 9360485
    > $ java s bravo
    > s1 = 93998218 s2 = 93998218 sb1 = 12386568 sb2 = 9360485
    > $ java s charlie
    > s1 = 739067762 s2 = 739067762 sb1 = 12386568 sb2 = 9360485
    > $ java s delta
    > s1 = 95468472 s2 = 95468472 sb1 = 12386568 sb2 = 9360485
    >
    > My questions:
    >
    > 1. Two Strings always hash the same if they are constructed from the
    > same "source" string. True? (The hashCode() is what's used to put
    > stuff into a Hashtable, right?)


    If you look at the docs (for Sun) it gives the formula for a String hashcode
    as being

    " s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]
    using int arithmetic, where s is the ith character of the string, n is
    the length
    of the string, and ^ indicates exponentiation. (The hash value of the empty
    string is zero.)"

    > 2. If many more Strings are created from the original two Strings,
    > they all will also hash the same. True?


    The formula means that it doesn't matter how you construct a string or where
    it comes from, if it has the same sequence of characters it should give the
    same hashcode.

    > 3. Two StringBuffers, even if created from the same String (or
    > "source" string), aren't guaranteed to hash the same. True? (It looks
    > to me from my experience that they use the address of the StringBuffer
    > or something - "alpha" can't possibly hash the same as "charlie".)
    > What is going on here?
    >


    The docs indicate that StringBuffer does not implement its own hashcode()
    but uses the default value for object.

    "As much as is reasonably practical, the hashCode method defined by class
    Object does return distinct integers for distinct objects. (This is
    typically implemented by converting the internal address of the object into
    an integer, but this implementation technique is not required by the JavaTM
    programming language.)

    > This problem bit me bad when I was putting StringBuffers into a
    > Hashtable. I am guessing I should use a String from now on. But I have
    > another question here.
    >
    > Suppose that the following things happen in my program:
    > - I create a bunch of Strings, and put them into a Vector.
    > - I iterate over the Vector and put some those Strings into a
    > Hashtable.
    > - I then set the Vector to null. Will my Hashtable be affected? (I


    No! Why would it?

    > tend to think no, coz those Strings that _did_ go into the Hashtable
    > are still referenced by the Hashtable and hence won't be garbage
    > collected.)


    Yes.

    Cheers

    Andrew


    --
    ********************************************************
    Andrew Hobbs PhD

    MetaSense Pty Ltd - www.metasense.com.au
    Australia

    61 8 9246 2026
    metasens AntiSpam @iinet dot net dot au


    *********************************************************

    >
    > TIA,
    > John Galt.
     
    Andrew Hobbs, Feb 25, 2004
    #2
    1. Advertising

  3. John Galt

    hiwa Guest

    The simplest answer may be: Object#hashCode() and Object#equals() are
    overridden in String, but not in StringBuffer.

    JVM string pool does an optimization. For same two strings, it only
    stores one and its references are used throughout whole application.
    When a Vector is destroyed, string reference in it would be destroyed,
    but the string itself and its references elsewhere are not. If no
    reference remains, the string is GCed.

    Java reference is a 32 bit data of which content is address value of
    an object. It's a pointer sans API.

    (John Galt) wrote in message news:<>...
    > I have noticed a rather strange thing when working with Strings and
    > StringBuffers:
    >
    > $ cat s.java
    > public class s{
    > public static void main(String argv[])
    > {
    > String s1, s2;
    > StringBuffer sb1, sb2;
    >
    > s1 = new String(argv[0]);
    > s2 = new String(argv[0]);
    > sb1 = new StringBuffer(s1);
    > sb2 = new StringBuffer(s1);
    >
    > System.out.println(
    > " s1 = " + s1.hashCode() +
    > " s2 = " + s2.hashCode() +
    > " sb1 = " + sb1.hashCode() +
    > " sb2 = " + sb2.hashCode());
    > }
    > }
    >
    > When I run it:
    >
    > $ java s alpha
    > s1 = 92909918 s2 = 92909918 sb1 = 12386568 sb2 = 9360485
    > $ java s bravo
    > s1 = 93998218 s2 = 93998218 sb1 = 12386568 sb2 = 9360485
    > $ java s charlie
    > s1 = 739067762 s2 = 739067762 sb1 = 12386568 sb2 = 9360485
    > $ java s delta
    > s1 = 95468472 s2 = 95468472 sb1 = 12386568 sb2 = 9360485
    >
    > My questions:
    >
    > 1. Two Strings always hash the same if they are constructed from the
    > same "source" string. True? (The hashCode() is what's used to put
    > stuff into a Hashtable, right?)
    > 2. If many more Strings are created from the original two Strings,
    > they all will also hash the same. True?
    > 3. Two StringBuffers, even if created from the same String (or
    > "source" string), aren't guaranteed to hash the same. True? (It looks
    > to me from my experience that they use the address of the StringBuffer
    > or something - "alpha" can't possibly hash the same as "charlie".)
    > What is going on here?
    >
    > This problem bit me bad when I was putting StringBuffers into a
    > Hashtable. I am guessing I should use a String from now on. But I have
    > another question here.
    >
    > Suppose that the following things happen in my program:
    > - I create a bunch of Strings, and put them into a Vector.
    > - I iterate over the Vector and put some those Strings into a
    > Hashtable.
    > - I then set the Vector to null. Will my Hashtable be affected? (I
    > tend to think no, coz those Strings that _did_ go into the Hashtable
    > are still referenced by the Hashtable and hence won't be garbage
    > collected.)
    >
    > TIA,
    > John Galt.
     
    hiwa, Feb 25, 2004
    #3
  4. hiwa wrote:
    > JVM string pool does an optimization. For same two strings, it only
    > stores one and its references are used throughout whole application.
    > When a Vector is destroyed, string reference in it would be destroyed,
    > but the string itself and its references elsewhere are not. If no
    > reference remains, the string is GCed.


    Note that this only applies to Strings that are either compile-time
    constants (i.e. literals and combinations thereof) or returned by the
    String.intern() method.

    > Java reference is a 32 bit data of which content is address value of
    > an object. It's a pointer sans API.


    No. Neither the Java language specification nor the Java virtual machince
    specification state the size of a reference or how it is implemented.
    A particular JVM implementation may implement it the way you describe,
    but it doesn't have to.
     
    Michael Borgwardt, Feb 25, 2004
    #4
  5. John Galt

    Doug Pardee Guest

    (John Galt) wrote:
    > 1. Two Strings always hash the same if they are constructed from the
    > same "source" string. True? (The hashCode() is what's used to put
    > stuff into a Hashtable, right?)
    > 2. If many more Strings are created from the original two Strings,
    > they all will also hash the same. True?
    > 3. Two StringBuffers, even if created from the same String (or
    > "source" string), aren't guaranteed to hash the same. True? (It looks
    > to me from my experience that they use the address of the StringBuffer
    > or something - "alpha" can't possibly hash the same as "charlie".)
    > What is going on here?


    Because a StringBuffer is mutable, it should never be used as a key
    value. The "value" of the string data could be changed after the
    object was stored, but the container would be unaware that the key
    value had changed. Consequently, the object becomes misfiled in the
    container and probably cannot be located. If the container looks under
    the old key value, it finds your object with the new key value, which
    is not what the container was looking for. If the container looks
    under the new key value, it doesn't find your object at all.

    Since StringBuffer should not be used as a key value, and that is
    where hashCode is needed, there is no need for a proper hashCode
    method in StringBuffer.

    The correct thing to do is to use sb.toString() to return a String.
    Because String is immutable, it can safely be used as a key value. It
    is no coincidence that String has a proper implementation of the
    hashCode method.

    > This problem bit me bad when I was putting StringBuffers into a
    > Hashtable. I am guessing I should use a String from now on.


    Exactly.

    And while you're at it, you might want to use HashMap instead of
    Hashtable.
     
    Doug Pardee, Feb 25, 2004
    #5
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Mike
    Replies:
    3
    Views:
    1,856
  2. Darren
    Replies:
    5
    Views:
    4,477
    Darren
    Jul 28, 2004
  3. gaurav v bagga

    String Vs. StringBuffer

    gaurav v bagga, Jan 18, 2007, in forum: Java
    Replies:
    5
    Views:
    508
  4. rp
    Replies:
    1
    Views:
    539
    red floyd
    Nov 10, 2011
  5. mikew01
    Replies:
    9
    Views:
    323
    markspace
    May 28, 2012
Loading...

Share This Page