efficient use of hashtable with string like keys

Discussion in 'Java' started by Harald Kirsch, Aug 15, 2003.

  1. My applications does a lot of text crunching, filtering
    information out of the text. For reasons of efficiency
    all the text handling is done in StringBuffer objects.

    When the application finds something interesting, it wants
    to store it in a Hashtable with the text to be used as a
    key still in the StringBuffer.

    In order to test whether the key is already known to the Hashtable
    I currently create a String object from the StringBuffer. But
    most of the time the key happens to be already known to the
    Hashtable and the String created is immediately thrown away again.

    How can I use a reusable object, e.g. a StringBuffer, as a key to
    the Hashtable such that hashing is performed according .equals()
    instead of according to object identity.

    I know that it will be a disaster if I change a stored object later,
    but maybe someone knows a clean way to get around the superflous
    creation of a String object.

    Thanks,
    Harald.
     
    Harald Kirsch, Aug 15, 2003
    #1
    1. Advertising

  2. Harald Kirsch

    VisionSet Guest

    "Harald Kirsch" <> wrote in message
    news:...
    >
    > In order to test whether the key is already known to the Hashtable
    > I currently create a String object from the StringBuffer. But
    > most of the time the key happens to be already known to the
    > Hashtable and the String created is immediately thrown away again.


    Well if most occasions the Hashtable already holds it then don't worry, you
    never made a new String to check it with. Strings are interned by default,
    that is only one copy of an identical string is made, even when you do new
    String("xyz"), if xyz already exists in the JVM, your reference just points
    to that.

    --
    Mike W
     
    VisionSet, Aug 15, 2003
    #2
    1. Advertising

  3. Harald Kirsch:

    [...]

    >I know that it will be a disaster if I change a stored object later,
    >but maybe someone knows a clean way to get around the superflous
    >creation of a String object.


    I don't know about a perfect solution, but if you do create the String
    and replace it with what intern() returns on that new String, you
    avoid having a String with the same content twice:

    String newString = ...;
    newString = newString.intern();

    But I'd do some measurements if you really have a bottleneck. There
    may be no need for optimization.

    Regards,
    Marco
    --
    Please reply in the newsgroup, not by email!
    Java programming tips: http://jiu.sourceforge.net/javatips.html
    Other Java pages: http://www.geocities.com/marcoschmidt.geo/java.html
     
    Marco Schmidt, Aug 15, 2003
    #3
  4. VisionSet wrote:
    > "Harald Kirsch" <> wrote in message
    > news:...
    >
    >>In order to test whether the key is already known to the Hashtable
    >>I currently create a String object from the StringBuffer. But
    >>most of the time the key happens to be already known to the
    >>Hashtable and the String created is immediately thrown away again.

    >
    >
    > Well if most occasions the Hashtable already holds it then don't worry, you
    > never made a new String to check it with. Strings are interned by default,
    > that is only one copy of an identical string is made, even when you do new
    > String("xyz"), if xyz already exists in the JVM, your reference just points
    > to that.


    I'm sorry, but that's correct only for compile-time computed Strings.
    Strings created at runtime may indeed be equal to but distinct from
    other Strings in the same VM. It is always possible to get a reference
    to a globally shared String representing the same sequence of characters
    (via the intern() method) but you still end up creating and then
    discarding a distinct String object. See the JLS discussion at

    http://java.sun.com/docs/books/jls/second_edition/html/lexical.doc.html#19369


    John Bollinger
     
    John C. Bollinger, Aug 15, 2003
    #4
  5. Harald Kirsch wrote:
    > My applications does a lot of text crunching, filtering
    > information out of the text. For reasons of efficiency
    > all the text handling is done in StringBuffer objects.
    >
    > When the application finds something interesting, it wants
    > to store it in a Hashtable with the text to be used as a
    > key still in the StringBuffer.
    >
    > In order to test whether the key is already known to the Hashtable
    > I currently create a String object from the StringBuffer. But
    > most of the time the key happens to be already known to the
    > Hashtable and the String created is immediately thrown away again.
    >
    > How can I use a reusable object, e.g. a StringBuffer, as a key to
    > the Hashtable such that hashing is performed according .equals()
    > instead of according to object identity.


    You have a misconception. Hashtables use Objects' (keys') hashCode
    method to perform hashing and their equals() methods to compare keys
    that are in the same hash bucket. The Map interface explains the
    applicable contract in some detail.

    It is possible that you are confused because StringBuffers inherit their
    equals() and hashCode() methods from Object, which implements equals()
    in terms of object identity.

    > I know that it will be a disaster if I change a stored object later,
    > but maybe someone knows a clean way to get around the superflous
    > creation of a String object.


    Changing an object that is in use as a Hashtable key in a way that
    affects its hashCode or equals() method results in unspecified behavior
    of the Hashtable. You don't want that.

    There is no solution that provides the behavior you want within the
    constraints you have specified (source key data from a StringBuffer, key
    object to implement equals differently than StringBuffer.equals() does,
    and new object creation impermissable). The only way to reduce the
    number of transient object instantiations is to rework your algorithm so
    that you get fewer duplicate hits in the first place.


    John Bollinger
     
    John C. Bollinger, Aug 15, 2003
    #5
  6. Harald Kirsch

    Adam Maass Guest

    "Harald Kirsch" <> wrote:
    > My applications does a lot of text crunching, filtering
    > information out of the text. For reasons of efficiency
    > all the text handling is done in StringBuffer objects.
    >
    > When the application finds something interesting, it wants
    > to store it in a Hashtable with the text to be used as a
    > key still in the StringBuffer.
    >
    > In order to test whether the key is already known to the Hashtable
    > I currently create a String object from the StringBuffer. But
    > most of the time the key happens to be already known to the
    > Hashtable and the String created is immediately thrown away again.
    >
    > How can I use a reusable object, e.g. a StringBuffer, as a key to
    > the Hashtable such that hashing is performed according .equals()
    > instead of according to object identity.
    >
    > I know that it will be a disaster if I change a stored object later,
    > but maybe someone knows a clean way to get around the superflous
    > creation of a String object.
    >


    First off, creating a String object from a StringBuffer is relatively
    inexpensive... the String and the StringBuffer initially share internal
    storage. Only when the StringBuffer is susequently modified does the
    StringBuffer class make a copy of the internal storage.


    Alternatively, you need to write your own class that has the mutability of
    StringBuffer but the equals and hashCode semantics of String. Not hard,
    really.


    -- Adam Maass
     
    Adam Maass, Aug 15, 2003
    #6
  7. Harald Kirsch

    Chris Uppal Guest

    Adam Maass wrote:

    > Alternatively, you need to write your own class that has the
    > mutability of StringBuffer but the equals and hashCode semantics of
    > String. Not hard, really.


    Or switch the HashTable to a java.util.TreeMap with a custom Comparator.

    -- chris
     
    Chris Uppal, Aug 16, 2003
    #7
  8. "John C. Bollinger" <> wrote:
    > Harald Kirsch wrote:
    > > My applications does a lot of text crunching, filtering
    > > information out of the text. For reasons of efficiency
    > > all the text handling is done in StringBuffer objects.
    > >
    > > When the application finds something interesting, it wants
    > > to store it in a Hashtable with the text to be used as a
    > > key still in the StringBuffer.
    > >
    > > In order to test whether the key is already known to the Hashtable
    > > I currently create a String object from the StringBuffer. But
    > > most of the time the key happens to be already known to the
    > > Hashtable and the String created is immediately thrown away again.
    > >
    > > How can I use a reusable object, e.g. a StringBuffer, as a key to
    > > the Hashtable such that hashing is performed according .equals()
    > > instead of according to object identity.

    >
    > You have a misconception. Hashtables use Objects' (keys') hashCode
    > method to perform hashing and their equals() methods to compare keys
    > that are in the same hash bucket. The Map interface explains the
    > applicable contract in some detail.
    >
    > It is possible that you are confused because StringBuffers inherit their
    > equals() and hashCode() methods from Object, which implements equals()
    > in terms of object identity.


    No, I was not confused, but my question was probably not clear
    clear enough about this point. That it cannot work with the available
    implementation of .hashcode() and .equals() is obvious. I hoped someone
    came up with an ingenious idea of how to borrow String.hashcode() and and
    String.equals() and apply it to StringBuffer without rewriting everything.

    > There is no solution that provides the behavior you want within the
    > constraints you have specified (source key data from a StringBuffer, key
    > object to implement equals differently than StringBuffer.equals() does,
    > and new object creation impermissable). The only way to reduce the
    > number of transient object instantiations is to rework your algorithm so
    > that you get fewer duplicate hits in the first place.


    Unfortunately the keys stem directly from real world data. The task is
    simple: store a key only once, even if seen many times in the
    input. I don't see how this could be reworked to not produce the
    to-be-checked keys from the data.

    Thanks anyway,
    Harald.
     
    Harald Kirsch, Aug 16, 2003
    #8
  9. "Adam Maass" <> wrote:
    > "Harald Kirsch" <> wrote:

    [snip
    > > How can I use a reusable object, e.g. a StringBuffer, as a key to
    > > the Hashtable such that hashing is performed according .equals()
    > > instead of according to object identity.
    > >
    > > I know that it will be a disaster if I change a stored object later,
    > > but maybe someone knows a clean way to get around the superflous
    > > creation of a String object.
    > >

    >
    > First off, creating a String object from a StringBuffer is relatively
    > inexpensive... the String and the StringBuffer initially share internal
    > storage. Only when the StringBuffer is susequently modified does the
    > StringBuffer class make a copy of the internal storage.


    The whole point of using a StringBuffer is to change it all the time.
    In fact in my application it is totally counterproductive that
    StringBuffer.toString() tries create a String object which shares the
    char[] because an immediate, inevitable, subsequent change of the StringBuffer
    then reallocates the StringBuffers char[]. This is bad for two reasons:

    1) After the StringBuffer, during several .append, grew large enough to
    hold all the typical pieces of text I deal with, the reallocation most
    probably makes it too short, triggering even more reallocations in subsequent
    ..appends.

    2) The String object created keeps the char[] which is most probably far
    to large.

    I guess this unhelpful implementation only serves the compiler well to
    implement things like "a"+"b"+345+"some other text" fairly efficient.

    Harald.
     
    Harald Kirsch, Aug 16, 2003
    #9
  10. Harald Kirsch

    Adam Maass Guest

    "Harald Kirsch" <> wrote:
    > "Adam Maass" <> wrote:
    > > "Harald Kirsch" <> wrote:

    > [snip
    > > > How can I use a reusable object, e.g. a StringBuffer, as a key to
    > > > the Hashtable such that hashing is performed according .equals()
    > > > instead of according to object identity.
    > > >
    > > > I know that it will be a disaster if I change a stored object later,
    > > > but maybe someone knows a clean way to get around the superflous
    > > > creation of a String object.
    > > >

    > >
    > > First off, creating a String object from a StringBuffer is relatively
    > > inexpensive... the String and the StringBuffer initially share internal
    > > storage. Only when the StringBuffer is susequently modified does the
    > > StringBuffer class make a copy of the internal storage.

    >
    > The whole point of using a StringBuffer is to change it all the time.
    > In fact in my application it is totally counterproductive that
    > StringBuffer.toString() tries create a String object which shares the
    > char[] because an immediate, inevitable, subsequent change of the

    StringBuffer
    > then reallocates the StringBuffers char[]. This is bad for two reasons:
    >
    > 1) After the StringBuffer, during several .append, grew large enough to
    > hold all the typical pieces of text I deal with, the reallocation most
    > probably makes it too short, triggering even more reallocations in

    subsequent
    > .appends.
    >
    > 2) The String object created keeps the char[] which is most probably far
    > to large.
    >
    > I guess this unhelpful implementation only serves the compiler well to
    > implement things like "a"+"b"+345+"some other text" fairly efficient.
    >


    In which case, you need a custom class with the mutability of StringBuffer
    but the .equals and .hashCode semantics of String. This is not hard to do
    since the source of these standard classes is available.

    -- Adam Maass
     
    Adam Maass, Aug 16, 2003
    #10
  11. Harald Kirsch

    Roedy Green Guest

    On 16 Aug 2003 05:29:03 -0700, (Harald Kirsch) wrote
    or quoted :

    >I hoped someone
    >came up with an ingenious idea of how to borrow String.hashcode() and and
    >String.equals() and apply it to StringBuffer without rewriting everything.


    A few thoughts:

    1. You can't change the lookup hashCode of an object in a HashMap on
    the fly. You have to remove it, change the fields the hashCode
    depends on and re-add it. HashMap presumes the hashCode won't change.

    2. You would likely want to cache it since it is expensive to compute.

    3. If the StringBuffer is changing all the time, you might want a
    faster but lower quality algorithm, or perhaps even one that just
    looks at the first few characters to speed computation. You don't
    need to recompute it if the first few chars don't change.

    --
    Canadian Mind Products, Roedy Green.
    Coaching, problem solving, economical contract programming.
    See http://mindprod.com/jgloss/jgloss.html for The Java Glossary.
     
    Roedy Green, Aug 16, 2003
    #11
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Alexander Mueller

    Hashtable/Two keys

    Alexander Mueller, Mar 19, 2005, in forum: Java
    Replies:
    29
    Views:
    7,455
    Chris Uppal
    Mar 22, 2005
  2. Replies:
    8
    Views:
    4,595
    Bjorn Abelli
    May 8, 2006
  3. Replies:
    8
    Views:
    386
    Twisted
    Jun 19, 2006
  4. Boris Punk

    Ordering of hashtable keys

    Boris Punk, Jul 17, 2010, in forum: Java
    Replies:
    6
    Views:
    464
  5. Tom Anderson
    Replies:
    1
    Views:
    356
    Eric Sosman
    Nov 28, 2010
Loading...

Share This Page