Substring changes (JDK 1.7)

Discussion in 'Java' started by Jan Burse, Jan 10, 2013.

  1. Jan Burse

    Jan Burse Guest

    Dear All,

    > Recent versions of the JDK do not reuse the backing char[].
    > The reason is that the offset and length fields have been
    > removed from String to save memory.


    Did this affect some of your code?

    Bye
     
    Jan Burse, Jan 10, 2013
    #1
    1. Advertising

  2. Jan Burse

    markspace Guest

    On 1/10/2013 5:38 AM, Jan Burse wrote:
    > Dear All,
    >
    > > Recent versions of the JDK do not reuse the backing char[].
    > > The reason is that the offset and length fields have been
    > > removed from String to save memory.

    >
    > Did this affect some of your code?
    >
    > Bye



    Wrong on both counts. Where did you read this nonsense?

    <http://hg.openjdk.java.net/jdk7/jdk7-gate/jdk/file/tip/src/share/classes/java/lang/String.java>
     
    markspace, Jan 10, 2013
    #2
    1. Advertising

  3. On 1/10/2013 10:15 AM, markspace wrote:
    > On 1/10/2013 5:38 AM, Jan Burse wrote:
    >> Dear All,
    >>
    >> > Recent versions of the JDK do not reuse the backing char[].
    >> > The reason is that the offset and length fields have been
    >> > removed from String to save memory.

    >>
    >> Did this affect some of your code?
    >>
    >> Bye

    >
    >
    > Wrong on both counts. Where did you read this nonsense?
    >
    > <http://hg.openjdk.java.net/jdk7/jdk7-gate/jdk/file/tip/src/share/classes/java/lang/String.java>


    <http://hg.openjdk.java.net/jdk8/jdk8-gate/jdk/rev/2c773daa825d>
    suggests differently...


    --
    Beware of bugs in the above code; I have only proved it correct, not
    tried it. -- Donald E. Knuth
     
    Joshua Cranmer, Jan 10, 2013
    #3
  4. Jan Burse

    markspace Guest

    On 1/10/2013 8:48 AM, Joshua Cranmer wrote:

    >
    > <http://hg.openjdk.java.net/jdk8/jdk8-gate/jdk/rev/2c773daa825d>
    > suggests differently...



    That's 8, not 7. If you're going to ask about JDK 8, don't put "JDK
    1.7" in your subject title.
     
    markspace, Jan 10, 2013
    #4
  5. Jan Burse

    Lars Enderin Guest

    2013-01-10 18:22, markspace skrev:
    > On 1/10/2013 8:48 AM, Joshua Cranmer wrote:
    >
    >>
    >> <http://hg.openjdk.java.net/jdk8/jdk8-gate/jdk/rev/2c773daa825d>
    >> suggests differently...

    >
    >
    > That's 8, not 7. If you're going to ask about JDK 8, don't put "JDK
    > 1.7" in your subject title.
    >
    >

    The only question was in the OP. Jan Burse set the title, not Joshua.

    --
    Lars Enderin
     
    Lars Enderin, Jan 10, 2013
    #5
  6. Jan Burse

    Jan Burse Guest

    Jan Burse schrieb:
    > Dear All,
    >
    > > Recent versions of the JDK do not reuse the backing char[].
    > > The reason is that the offset and length fields have been
    > > removed from String to save memory.

    >
    > Did this affect some of your code?
    >
    > Bye


    Its from JDK 1.7 Update 10

    Look see:

    C:\Users\Jan Burse>java -version
    java version "1.7.0_10"
    Java(TM) SE Runtime Environment (build 1.7.0_10-b18)
    Java HotSpot(TM) 64-Bit Server VM (build 23.6-b04, mixed mode)

    rt.jar:

    public final class String
    implements java.io.Serializable, Comparable<String>, CharSequence {
    /** The value is used for character storage. */
    private final char value[];

    /** Cache the hash code for the string */
    private int hash; // Default to 0

    /** use serialVersionUID from JDK 1.0.2 for interoperability */
    private static final long serialVersionUID = -6849794470754667710L;

    -- and --

    public String substring(int beginIndex, int endIndex) {
    if (beginIndex < 0) {
    throw new StringIndexOutOfBoundsException(beginIndex);
    }
    if (endIndex > value.length) {
    throw new StringIndexOutOfBoundsException(endIndex);
    }
    int subLen = endIndex - beginIndex;
    if (subLen < 0) {
    throw new StringIndexOutOfBoundsException(subLen);
    }
    return ((beginIndex == 0) && (endIndex == value.length)) ? this
    : new String(value, beginIndex, subLen);
    }

    -- and --

    public String(char value[], int offset, int count) {
    if (offset < 0) {
    throw new StringIndexOutOfBoundsException(offset);
    }
    if (count < 0) {
    throw new StringIndexOutOfBoundsException(count);
    }
    // Note: offset or count might be near -1>>>1.
    if (offset > value.length - count) {
    throw new StringIndexOutOfBoundsException(offset + count);
    }
    this.value = Arrays.copyOfRange(value, offset, offset+count);
    }
     
    Jan Burse, Jan 10, 2013
    #6
  7. Jan Burse

    Jan Burse Guest

    Hi,

    It was originally observed in a Scala newsgroup:

    why is String grouped() so slow?
    https://groups.google.com/forum/?fromgroups=#!topic/scala-user/D1qmblInfyg

    Bye

    Jan Burse schrieb:
    > Jan Burse schrieb:
    >> Dear All,
    >>
    >> > Recent versions of the JDK do not reuse the backing char[].
    >> > The reason is that the offset and length fields have been
    >> > removed from String to save memory.

    >>
    >> Did this affect some of your code?
    >>
    >> Bye

    >
    > Its from JDK 1.7 Update 10
    >
    > Look see:
    >
    > C:\Users\Jan Burse>java -version
    > java version "1.7.0_10"
    > Java(TM) SE Runtime Environment (build 1.7.0_10-b18)
    > Java HotSpot(TM) 64-Bit Server VM (build 23.6-b04, mixed mode)
    >
    > rt.jar:
     
    Jan Burse, Jan 10, 2013
    #7
  8. Jan Burse

    Roedy Green Guest

    On Thu, 10 Jan 2013 14:38:36 +0100, Jan Burse <>
    wrote, quoted or indirectly quoted someone who said :

    >
    >Did this affect some of your code?


    If this change happens, you would no longer consider using new String(
    String) to unencumber a substring.

    You no longer have to worry a about a tiny substring holding a meg+
    sized base string around in memory.
    --
    Roedy Green Canadian Mind Products http://mindprod.com
    Students who hire or con others to do their homework are as foolish
    as couch potatoes who hire others to go to the gym for them.
     
    Roedy Green, Jan 10, 2013
    #8
  9. On 10.01.2013 21:22, Roedy Green wrote:

    > If this change happens, you would no longer consider using new String(
    > String) to unencumber a substring.
    >
    > You no longer have to worry a about a tiny substring holding a meg+
    > sized base string around in memory.


    Instead you have to worry about tons of substrings drawn from the same
    input String to occupy a lot more memory and slowing down GC. Trade
    offs, trade offs...

    Cheers

    robert

    --
    remember.guy do |as, often| as.you_can - without end
    http://blog.rubybestpractices.com/
     
    Robert Klemme, Jan 10, 2013
    #9
  10. Jan Burse

    Roedy Green Guest

    On Thu, 10 Jan 2013 22:58:24 +0100, Robert Klemme
    <> wrote, quoted or indirectly quoted
    someone who said :

    >Instead you have to worry about tons of substrings drawn from the same
    >input String to occupy a lot more memory and slowing down GC. Trade
    >offs, trade offs...


    I wonder if it could work like this.

    Perhaps GC could notice a giant string encumbered by a few small
    strings, and could do a new String for you and gc the base string.

    If you don't need the base string itself, I think most of the time you
    are best off to do the new string.

    For what I do, I am peeling off small strings from a big string which
    represents a file image. I keep the big string to the last minute.
    Encumbering works well for me.
    --
    Roedy Green Canadian Mind Products http://mindprod.com
    Students who hire or con others to do their homework are as foolish
    as couch potatoes who hire others to go to the gym for them.
     
    Roedy Green, Jan 10, 2013
    #10
  11. Jan Burse

    Stefan Ram Guest

    Robert Klemme <> writes:
    >On 10.01.2013 21:22, Roedy Green wrote:
    >>You no longer have to worry a about a tiny substring holding a meg+
    >>sized base string around in memory.

    >Instead you have to worry about tons of substrings drawn from the same
    >input String to occupy a lot more memory and slowing down GC. Trade
    >offs, trade offs...


    But this is more natural, it fulfills the expection of non-expert
    programmers. Expert programmers can implement a custom string class
    with the previous behaviour, or, - possibly better - a custom
    implementation of CharSequence (if only more APIs would use
    CharSequence instead of String!).
     
    Stefan Ram, Jan 11, 2013
    #11
  12. On 11.01.2013 06:26, Stefan Ram wrote:
    > Robert Klemme <> writes:
    >> On 10.01.2013 21:22, Roedy Green wrote:
    >>> You no longer have to worry a about a tiny substring holding a meg+
    >>> sized base string around in memory.

    >> Instead you have to worry about tons of substrings drawn from the same
    >> input String to occupy a lot more memory and slowing down GC. Trade
    >> offs, trade offs...

    >
    > But this is more natural, it fulfills the expection of non-expert
    > programmers.


    But it would be a significant change. There is so much software written
    under the assumption of the old implementation. That change might
    actually break existing programs (break in the sense of less performance
    or new GC issues).

    Then again it might be that there are just not that many programs which
    make use of that knowledge. Who knows?

    > Expert programmers can implement a custom string class
    > with the previous behaviour,


    Well, shouldn't such a basic thing be part of the standard library?

    > or, - possibly better - a custom
    > implementation of CharSequence (if only more APIs would use
    > CharSequence instead of String!).


    I agree. But unfortunately public classes and APIs are set in stone.

    Kind regards

    robert

    --
    remember.guy do |as, often| as.you_can - without end
    http://blog.rubybestpractices.com/
     
    Robert Klemme, Jan 11, 2013
    #12
  13. Jan Burse

    Roedy Green Guest

    On Fri, 11 Jan 2013 07:29:16 +0100, Robert Klemme
    <> wrote, quoted or indirectly quoted
    someone who said :

    >Well, shouldn't such a basic thing be part of the standard library?


    String is final and many things take a String parm and nothing else.
    You can create something similar and use it like String.
    --
    Roedy Green Canadian Mind Products http://mindprod.com
    Students who hire or con others to do their homework are as foolish
    as couch potatoes who hire others to go to the gym for them.
     
    Roedy Green, Jan 11, 2013
    #13
  14. Jan Burse

    Jan Burse Guest

    Roedy Green schrieb:
    > On Thu, 10 Jan 2013 14:38:36 +0100, Jan Burse <>
    > wrote, quoted or indirectly quoted someone who said :
    >
    >>
    >> Did this affect some of your code?

    >
    > If this change happens, you would no longer consider using new String(
    > String) to unencumber a substring.
    >
    > You no longer have to worry a about a tiny substring holding a meg+
    > sized base string around in memory.
    >


    Have to sift through my code and check
    every line that uses substring() whether
    there is some better solution.

    For example I trapped myself doing things like:

    int k = path.lastIndexOf('/');
    while (k!=-1) {
    String name = path.substring(k+1);
    /* do something with name */
    path = path.substring(0,k);
    k = path.lastIndexOf('/');
    }

    I guess the compiler cannot eliminate the copying
    in the last substring(0,k), since String does not
    have a length field anymore.

    It would need to introduce an extra field in the
    code, this also how I would rewrite the code and
    used the two arguments variant of lastIndexOf.

    But I guess the JIT cannot do it automatically,
    or will it? Ever seen a tool that shows the
    JITed assembler?

    Bye

    P.S.: I also wonder how performant java.io.File
    now is.
     
    Jan Burse, Jan 11, 2013
    #14
  15. Jan Burse

    Jan Burse Guest

    Chris Uppal schrieb:
    >> For example I trapped myself doing things like:
    >> >
    >> > int k = path.lastIndexOf('/');
    >> > while (k!=-1) {
    >> > String name = path.substring(k+1);
    >> > /* do something with name */
    >> > path = path.substring(0,k);
    >> > k = path.lastIndexOf('/');
    >> > }
    >> >

    > And what's wrong with that ? Seems a sensible approach to me.
    >
    > If you mean that it's suddenly/significantly/ slower, then I don't believe
    > you. (Though I freely admit that there will be a tiny few cases where it
    > /does/ matter -- in which cases I will be wrong.)
    >
    > -- chris
    >
    >


    With the sharing semantics, its complexity is O(n+m), where
    n is the length of the string and m is the number of
    backslashes. The m counts for the number of creation of
    shared String shells.

    Without the sharing semantics, when substring copies, its
    complexity is O(n^2), assuming m is not too small. In each
    of the m interation you do not anymore create a String shell,
    but instead in the following statement

    path = path.substring(0,k);

    you do copy a fair amount of path. JDK 1.7 Update 10 has not
    anymore the sharing semantics. So when my m are not too small,
    its probably a good idea to rewrite the code.

    Bye
     
    Jan Burse, Jan 11, 2013
    #15
  16. Jan Burse

    Jan Burse Guest

    Jan Burse schrieb:
    > Without the sharing semantics, when substring copies, its
    > complexity is O(n^2), assuming m is not too small. In each
    > of the m interation you do not anymore create a String shell,
    > but instead in the following statement


    I guess a better estimate would be O(m^2 * n/m) = O(m * n).
     
    Jan Burse, Jan 11, 2013
    #16
  17. On 11.01.2013 08:19, Roedy Green wrote:
    > On Fri, 11 Jan 2013 07:29:16 +0100, Robert Klemme
    > <> wrote, quoted or indirectly quoted
    > someone who said :
    >
    >> Well, shouldn't such a basic thing be part of the standard library?

    >
    > String is final and many things take a String parm and nothing else.
    > You can create something similar and use it like String.


    There is no reason in what you say that it should not be part of the std
    lib.

    Kind regards

    robert

    --
    remember.guy do |as, often| as.you_can - without end
    http://blog.rubybestpractices.com/
     
    Robert Klemme, Jan 12, 2013
    #17
  18. Jan Burse

    markspace Guest

    On 1/12/2013 8:50 AM, Robert Klemme wrote:
    > On 11.01.2013 08:19, Roedy Green wrote:
    >> On Fri, 11 Jan 2013 07:29:16 +0100, Robert Klemme
    >> <> wrote, quoted or indirectly quoted
    >> someone who said :
    >>
    >>> Well, shouldn't such a basic thing be part of the standard library?

    >>
    >> String is final and many things take a String parm and nothing else.
    >> You can create something similar and use it like String.

    >
    > There is no reason in what you say that it should not be part of the std
    > lib.


    javax.swing.text.Segment preserves the semantics of a shared buffer.
    It's not a drop-in replacement for String (many of the methods differ or
    are absent). But Segment is extensible, so critical missing methods
    could be added.

    I wonder if the best way to go would be to cheat and have String
    extended into a SharedString with the old implementation. This would
    violate the finality of String, but it's possible to synthesize these
    sorts of things if one has control of the JVM. Obviously, this needs to
    come from Oracle.
     
    markspace, Jan 12, 2013
    #18
  19. Jan Burse

    Jan Burse Guest

    markspace schrieb:
    > javax.swing.text.Segment preserves the semantics of a shared buffer.
    > It's not a drop-in replacement for String (many of the methods differ or
    > are absent). But Segment is extensible, so critical missing methods
    > could be added.


    Not available on Android I guess, :-(
     
    Jan Burse, Jan 12, 2013
    #19
  20. Jan Burse

    markspace Guest

    On 1/12/2013 10:14 AM, Jan Burse wrote:
    > markspace schrieb:
    >> javax.swing.text.Segment preserves the semantics of a shared buffer.
    >> It's not a drop-in replacement for String (many of the methods differ or
    >> are absent). But Segment is extensible, so critical missing methods
    >> could be added.

    >
    > Not available on Android I guess, :-(



    Or you could just write your own from scratch. It's not hard. But
    again I kind of doubt anyone is doing enough heavy string processing on
    a small embedded device like Android where this kind of thing is going
    to affect actual performance.

    Didn't your original complaint come from the Scala group? What does
    Scala have to do with Android?
     
    markspace, Jan 12, 2013
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Babar
    Replies:
    1
    Views:
    485
    Chris Smith
    May 20, 2004
  2. Thomas G. Marshall
    Replies:
    5
    Views:
    864
    Thomas G. Marshall
    Aug 6, 2004
  3. Ulf Meinhardt
    Replies:
    0
    Views:
    6,498
    Ulf Meinhardt
    Aug 10, 2006
  4. Replies:
    3
    Views:
    231
    Sherm Pendley
    Aug 3, 2005
  5. jlp
    Replies:
    12
    Views:
    521
    markspace
    Feb 2, 2013
Loading...

Share This Page