String.substring, under the hood

Discussion in 'Java' started by Roedy Green, Jan 14, 2006.

  1. Roedy Green

    Roedy Green Guest

    Here is how String.substring works:

    public String substring(int beginIndex, int endIndex) {
    if (beginIndex < 0) {
    throw new StringIndexOutOfBoundsException(beginIndex);
    }
    if (endIndex > count) {
    throw new StringIndexOutOfBoundsException(endIndex);
    }
    if (beginIndex > endIndex) {
    throw new StringIndexOutOfBoundsException(endIndex -
    beginIndex);
    }
    return ((beginIndex == 0) && (endIndex == count)) ? this :
    new String(offset + beginIndex, endIndex - beginIndex,
    value);
    }


    Note that it now always creates a new string (unless the substring is
    the string itself.) It used to create a view into the underlying
    string.

    So the efficiencies have changed. Substring no longer pins the
    underlying big string. On the other hand, you will create many string
    objects by using substring. So be careful with it. It is no longer
    free in terms of ram to have many substrings of your big string.
    --
    Canadian Mind Products, Roedy Green.
    http://mindprod.com Java custom programming, consulting and coaching.
     
    Roedy Green, Jan 14, 2006
    #1
    1. Advertising

  2. Roedy Green wrote:
    [...]
    > Note that it now always creates a new string (unless the substring is
    > the string itself.) It used to create a view into the underlying
    > string.
    >
    > So the efficiencies have changed. Substring no longer pins the
    > underlying big string. On the other hand, you will create many string
    > objects by using substring. So be careful with it. It is no longer
    > free in terms of ram to have many substrings of your big string.


    While i consider such things implementation details of
    java.lang.String, and therefore not really my concern, the new
    behaviour is more in line with the "reasonable expectations" of most
    programmers. If i create a new object, i expect its storage to be
    allocated somewhere. Also, if i drop a reference to a very long string,
    but retain a tiny subsection, i expect to be able to drop all but the
    small subsection.
     
    Stefan Schulz, Jan 14, 2006
    #2
    1. Advertising

  3. Also, upon looking at the code again, it still creates a "view" which
    is backed by the same char array (same as before!)
     
    Stefan Schulz, Jan 14, 2006
    #3
  4. Roedy Green

    Chris Uppal Guest

    Roedy Green wrote:

    > Note that it now always creates a new string (unless the substring is
    > the string itself.) It used to create a view into the underlying
    > string.


    The substring created will share the underlying char[] array.

    To the best of my memory that has always been the behavior.
    Unfortunately, I don't have source from a JDK before 1.4.2 handy to
    check.

    -- chris
     
    Chris Uppal, Jan 14, 2006
    #4
  5. Roedy Green

    Chris Smith Guest

    Roedy Green wrote:
    > Note that it now always creates a new string (unless the substring is
    > the string itself.) It used to create a view into the underlying
    > string.


    This seems to come up every once in a while.

    > return ((beginIndex == 0) && (endIndex == count)) ? this :
    > new String(offset + beginIndex, endIndex - beginIndex,
    > value);


    This is a call to a private constructor inside the String class, which
    reuses the underlying char[]. It does not do the same thing as the
    public String(String) constructor, which copies the underlying data. So
    when people say that "new String" copies the underlying char[], you
    should only apply that statement to the String(String) overloaded
    constructor, and not to the String(int,int,char[]) private overload used
    there.

    --
    www.designacourse.com
    The Easiest Way To Train Anyone... Anywhere.

    Chris Smith - Lead Software Developer/Technical Trainer
    MindIQ Corporation
     
    Chris Smith, Jan 14, 2006
    #5
  6. Stefan Schulz wrote:
    > Roedy Green wrote:
    > [...]
    >
    >>Note that it now always creates a new string (unless the substring is
    >>the string itself.) It used to create a view into the underlying
    >>string.


    As pointed out in other postings, it doesn't copy. String uses the
    rather confusing technique of rearranging arguments in order to give
    constructors different semantics. A (package) private constructor does
    not do the additional copy.

    >>So the efficiencies have changed. Substring no longer pins the
    >>underlying big string. On the other hand, you will create many string
    >>objects by using substring. So be careful with it. It is no longer
    >>free in terms of ram to have many substrings of your big string.


    Pin refers to stopping an object from being moved by the garbage
    collector. The new (sub)String (strongly) references the full character
    array of the original String.

    > While i consider such things implementation details of
    > java.lang.String, and therefore not really my concern, the new
    > behaviour is more in line with the "reasonable expectations" of most
    > programmers. If i create a new object, i expect its storage to be
    > allocated somewhere. Also, if i drop a reference to a very long string,
    > but retain a tiny subsection, i expect to be able to drop all but the
    > small subsection.


    Performance is externally visible behaviour. It is quite normal for
    client code to take it into account.

    Tom Hawtin
    --
    Unemployed English Java programmer
    http://jroller.com/page/tackline/
     
    Thomas Hawtin, Jan 14, 2006
    #6
  7. > >>So the efficiencies have changed. Substring no longer pins the
    > >>underlying big string. On the other hand, you will create many string
    > >>objects by using substring. So be careful with it. It is no longer
    > >>free in terms of ram to have many substrings of your big string.

    >
    > Pin refers to stopping an object from being moved by the garbage
    > collector. The new (sub)String (strongly) references the full character
    > array of the original String.


    This is exactly what i said. I just wonder what the OP meant when the
    complained about a new String being created... with Strings being
    immutable, you need to create a new copy each time you modify it (for
    example, by taking a substring). The backing character array is not
    copied, though (which can lead to unexpectedly high memory costs for
    small strings).

    > > While i consider such things implementation details of
    > > java.lang.String, and therefore not really my concern, the new
    > > behaviour is more in line with the "reasonable expectations" of most
    > > programmers. If i create a new object, i expect its storage to be
    > > allocated somewhere. Also, if i drop a reference to a very long string,
    > > but retain a tiny subsection, i expect to be able to drop all but the
    > > small subsection.

    >
    > Performance is externally visible behaviour. It is quite normal for
    > client code to take it into account.


    That is correct, however the definition of the substring method does
    not offer any guarantees about performance. It might be constant time,
    but possibly wasting space (the current method), or it might take
    linear to the length of the substring, or anything else. The method
    definiton does not tell you one way or another, so you should not rely
    on the behaviour. Maybe another JRE will do things completely the other
    way around. The actual time needed depends on the implementation, and
    without any specified behaviour is not an external characteristic.
     
    Stefan Schulz, Jan 14, 2006
    #7
  8. On Sat, 14 Jan 2006 07:36:33 +0000, Roedy Green wrote:

    > Here is how String.substring works:


    ....snip Sun's implementation...
    > return ... new String(offset + beginIndex, endIndex - beginIndex, value);


    > Note that it now always creates a new string (unless the substring is
    > the string itself.) It used to create a view into the underlying
    > string.
    >
    > So the efficiencies have changed. Substring no longer pins the
    > underlying big string. On the other hand, you will create many string
    > objects by using substring. So be careful with it. It is no longer
    > free in terms of ram to have many substrings of your big string.


    Note which constructor this invokes: String (int, int, char[]). Sun's
    implementation of substring, at least as of 1.5.05 and as far back as I've
    been using Java, shares the char[] containing the String's characters
    on calls to substring. It still pins the underlying char array, and is
    still cheap both computationally and memory-wise if the originating
    string's lifespan is at least as long as those of the substrings.
     
    Owen Jacobson, Jan 14, 2006
    #8
  9. On Sat, 14 Jan 2006 22:32:15 +0000, Owen Jacobson wrote:

    ....snip...

    edit: ****, beaten.
     
    Owen Jacobson, Jan 14, 2006
    #9
  10. Roedy Green

    Roedy Green Guest

    On Sat, 14 Jan 2006 07:36:33 GMT, Roedy Green
    <> wrote, quoted or
    indirectly quoted someone who said :

    >Here is how String.substring works:


    here is my latest understanding:


    substring is clever. It does not make a deep copy of the substring the
    way most languages do. It just creates a pointer into the original
    immutable String, i.e. points to the value char[] of the base string,
    and tracks the starting offset where the substring starts and count of
    how long the substring is. This could be confusing if you were
    low-level debugging since you would see the whole String, not just the
    substring. There were reports of a bug in Microsoft's implementation
    of substring. The downside of this cleverness is a tiny substring of a
    giant base String could suppress garbage collection of that big String
    in memory even if the whole String were no longer needed. (actually
    its value char[] array is held in RAM; the String object itself could
    be collected.)

    It is probably still a good idea to use indexOf( lookFor, offset )
    with a rather than creating a substring first and using indexOf(
    lookFor ) on that.

    If you know a tiny substring is holding a giant string in RAM, that
    would otherwise be garbage collected, you can break the bond by using
    littleString = new String( littleString ) which will create a new
    smaller backing char[] with no ties to the original String.

    If you are a curious sort, and study the code for String. substring in
    src.zip, this sharing logic might not be apparent. The key is a
    non-public String constructor that takes parameters in the reverse of
    the usual order String (int offset, int count, char value[]).
    --
    Canadian Mind Products, Roedy Green.
    http://mindprod.com Java custom programming, consulting and coaching.
     
    Roedy Green, Jan 14, 2006
    #10
  11. Roedy Green

    Alan Krueger Guest

    Thomas Hawtin wrote:
    > Performance is externally visible behaviour. It is quite normal for
    > client code to take it into account.


    It might be externally visible, but it may not be guaranteed by the
    creator of the class. Relying on internal implementation details
    violates encapsulation and may break if the internal implementation is
    changed.
     
    Alan Krueger, Jan 16, 2006
    #11
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Roedy Green
    Replies:
    32
    Views:
    4,769
    Chris Uppal
    Jan 19, 2006
  2. Davmagic .Com

    What's Under The Hood? (OT)

    Davmagic .Com, Dec 9, 2003, in forum: HTML
    Replies:
    1
    Views:
    340
    Louis Somers
    Dec 9, 2003
  3. Eric Pederson

    File objects? - under the hood question

    Eric Pederson, Jan 19, 2005, in forum: Python
    Replies:
    3
    Views:
    287
    Jeremy Bowers
    Jan 21, 2005
  4. python under the hood

    , Oct 20, 2006, in forum: Python
    Replies:
    3
    Views:
    755
    John Salerno
    Oct 20, 2006
  5. mikeyz9
    Replies:
    1
    Views:
    582
    Johannes Schaub (litb)
    Mar 11, 2010
Loading...

Share This Page