String.substring in JDK 1.7.0_6+

Discussion in 'Java' started by jlp, Feb 1, 2013.

  1. jlp

    jlp Guest

    The String class was modified in JDK 1.7.0_6.
    String.substring that was 0(1) before JDK 1.7.0_6, now becomes O(n)

    All is well explained at :
    http://java-performance.info/changes-to-string-java-1-7-0_06/

    I wrote a small test:
    https://gist.github.com/4692960

    java -Xms128M -Xmx128M teststring.Main 100000 1000000

    On my desktop:
    jdk 1.7.0_11 => 33 seconds / 252 KBytes Memory
    jdk 1.6.0_38 => 25 milliseconds / 782 KBytes Memory
    more than 1000 times faster ! ( Ok! for this stupid test ;-) )

    I don't think it is a good improvement ! Uses less memory, but you
    retrieve it, when the object is garbaged
    What do you think about this ?

    --
    Cordialement
    Jean-Louis Pasturel
    jlp, Feb 1, 2013
    #1
    1. Advertising

  2. jlp

    markspace Guest

    On 2/1/2013 10:33 AM, jlp wrote:
    > What do you think about this ?
    >


    I think micro-benchmarks don't work.
    markspace, Feb 1, 2013
    #2
    1. Advertising

  3. jlp

    jlp Guest

    Le 01/02/2013 19:38, markspace a écrit :
    > On 2/1/2013 10:33 AM, jlp wrote:
    >> What do you think about this ?
    >>

    >
    > I think micro-benchmarks don't work.
    >
    >


    What is wrong in this test ?

    --
    Cordialement
    Jean-Louis Pasturel
    jlp, Feb 1, 2013
    #3
  4. jlp

    markspace Guest

    On 2/1/2013 10:42 AM, jlp wrote:
    > Le 01/02/2013 19:38, markspace a écrit :
    >> On 2/1/2013 10:33 AM, jlp wrote:
    >>> What do you think about this ?
    >>>

    >>
    >> I think micro-benchmarks don't work.
    >>
    >>

    >
    > What is wrong in this test ?
    >



    It's a micro-benchmark.
    markspace, Feb 1, 2013
    #4
  5. jlp

    jlp Guest

    Le 01/02/2013 19:45, markspace a écrit :
    > On 2/1/2013 10:42 AM, jlp wrote:
    >> Le 01/02/2013 19:38, markspace a écrit :
    >>> On 2/1/2013 10:33 AM, jlp wrote:
    >>>> What do you think about this ?
    >>>>
    >>>
    >>> I think micro-benchmarks don't work.
    >>>
    >>>

    >>
    >> What is wrong in this test ?
    >>

    >
    >
    > It's a micro-benchmark.
    >

    ok ;-)

    But it seems problematic in "real world"
    https://jira.springsource.org/brows....system.issuetabpanels:changehistory-tabpanel
    http://grokbase.com/t/gg/scala-user/131993ttrq/why-is-string-grouped-so-slow

    --
    Cordialement
    Jean-Louis Pasturel
    jlp, Feb 1, 2013
    #5
  6. jlp

    markspace Guest

    On 2/1/2013 10:57 AM, jlp wrote:

    > But it seems problematic in "real world"
    > https://jira.springsource.org/brows....system.issuetabpanels:changehistory-tabpanel



    That's a better test, using production code. However, note that it's
    scoped to only one small portion of the code, only while loading a lot
    of small scripts.

    It's really common for code that was once "working" to suddenly develop
    undesirable characteristics as it's exposed to new input or new
    environments or new anything. It's just something that happens and part
    of the normal maintenance of code.

    What do I think of it? It's normal.
    markspace, Feb 1, 2013
    #6
  7. jlp

    Jan Burse Guest

    There seems to be more going on with Strings:
    http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6962931

    But I didn't find the original change request (CR)
    or request for enhancement concerning the copy
    semantics. There was some estimate on applications
    impact and allotment for redesigning applications.

    I didn't know about hash32 thing in the link below.
    Your micro benchmark doesn't test hashCode, does it?

    jlp schrieb:
    > The String class was modified in JDK 1.7.0_6.
    > String.substring that was 0(1) before JDK 1.7.0_6, now becomes O(n)
    >
    > All is well explained at :
    > http://java-performance.info/changes-to-string-java-1-7-0_06/
    >
    > I wrote a small test:
    > https://gist.github.com/4692960
    >
    > java -Xms128M -Xmx128M teststring.Main 100000 1000000
    >
    > On my desktop:
    > jdk 1.7.0_11 => 33 seconds / 252 KBytes Memory
    > jdk 1.6.0_38 => 25 milliseconds / 782 KBytes Memory
    > more than 1000 times faster ! ( Ok! for this stupid test ;-) )
    >
    > I don't think it is a good improvement ! Uses less memory, but you
    > retrieve it, when the object is garbaged
    > What do you think about this ?
    >
    Jan Burse, Feb 1, 2013
    #7
  8. In article <510c0a6a$0$8985$>,
    jlp <> wrote:

    > The String class was modified in JDK 1.7.0_6.
    > String.substring that was 0(1) before JDK 1.7.0_6, now becomes O(n)
    >
    > All is well explained at :
    > http://java-performance.info/changes-to-string-java-1-7-0_06/
    >
    > I wrote a small test:
    > https://gist.github.com/4692960
    >
    > java -Xms128M -Xmx128M teststring.Main 100000 1000000
    >
    > On my desktop:
    > jdk 1.7.0_11 => 33 seconds / 252 KBytes Memory
    > jdk 1.6.0_38 => 25 milliseconds / 782 KBytes Memory
    > more than 1000 times faster ! ( Ok! for this stupid test ;-) )
    >
    > I don't think it is a good improvement ! Uses less memory, but you
    > retrieve it, when the object is garbaged
    > What do you think about this ?


    It's an unbelievable change. Buffer sharing in Java 1-6 had the simple
    workaround of calling the String(String) constructor. There's no
    workaround for Strings getting much slower in Java 7+.

    What Oracle did in Java 7 would only make sense if CharSequence had
    buffer sharing and better support. I just checked Java 7, and
    CharSequence looks more useless than ever. String.subSequence()
    allocates a new char[] and the usual parsers (Integer, Long, Float)
    still only accept a String. Slow Strings it is.


    Side rant:
    Sun broke buffer sharing between StringBuffer and String back in Java 5.
    The reason was so that AbstractStringBuilder class could support the
    implementations for both StringBuffer and StringBuilder. Had they kept
    the implementations split, we could still have a very fast
    StringBuffer.toString(). As a final F-U, none of classes can be
    extended even through there's no buffer sharing that can be hacked.
    --
    I will not see posts from Google because I must filter them as spam
    Kevin McMurtrie, Feb 2, 2013
    #8
  9. jlp

    markspace Guest

    On 2/1/2013 8:58 PM, Kevin McMurtrie wrote:

    > and the usual parsers (Integer, Long, Float)
    >still only accept a String. Slow Strings it is.
    >


    This I agree is a bit of a bummer, it would be useful for the parsers to
    take CharSequence for flexibility. Integers aren't hard to parse but
    floats and doubles are non-trivial.

    Note however that Scanner accepts both Readable (a Reader) and
    ReadableByteChannel in its constructors.

    >
    > Side rant:
    > Sun broke buffer sharing between StringBuffer and String back in Java 5.


    Probably because Strings needed to be immutable and there's no way to do
    that when sharing a mutable buffer.

    > we could still have a very fast
    > StringBuffer.toString().


    Nope, see above.

    > As a final F-U, none of classes can be
    > extended even through there's no buffer sharing that can be hacked.
    >


    Probably because they don't want you doing stupid broken things, like
    trying to share buffers between immutable and mutable objects.

    I still agree that CharSequence could be made more useful though, that's
    a good idea. Hmmm.
    markspace, Feb 2, 2013
    #9
  10. In article <keid8j$2og$>,
    markspace <> wrote:

    > On 2/1/2013 8:58 PM, Kevin McMurtrie wrote:
    >
    > > and the usual parsers (Integer, Long, Float)
    > >still only accept a String. Slow Strings it is.
    > >

    >
    > This I agree is a bit of a bummer, it would be useful for the parsers to
    > take CharSequence for flexibility. Integers aren't hard to parse but
    > floats and doubles are non-trivial.
    >
    > Note however that Scanner accepts both Readable (a Reader) and
    > ReadableByteChannel in its constructors.
    >
    > >
    > > Side rant:
    > > Sun broke buffer sharing between StringBuffer and String back in Java 5.

    >
    > Probably because Strings needed to be immutable and there's no way to do
    > that when sharing a mutable buffer.


    The original StringBuffer class was synchronized, final, and protected
    its internal char[]. There was no way to trick it into altering the
    buffer after turning it into String. At worst it was the same speed as
    today's StringBuilder. For the common case of being a single-use
    object, it was much faster.


    > > we could still have a very fast
    > > StringBuffer.toString().

    >
    > Nope, see above.
    >
    > > As a final F-U, none of classes can be
    > > extended even through there's no buffer sharing that can be hacked.
    > >

    >
    > Probably because they don't want you doing stupid broken things, like
    > trying to share buffers between immutable and mutable objects.
    >
    > I still agree that CharSequence could be made more useful though, that's
    > a good idea. Hmmm.

    --
    I will not see posts from Google because I must filter them as spam
    Kevin McMurtrie, Feb 2, 2013
    #10
  11. jlp

    markspace Guest

    On 2/2/2013 8:43 AM, Kevin McMurtrie wrote:

    > The original StringBuffer class was synchronized, final, and protected
    > its internal char[]. There was no way to trick it into altering the
    > buffer after turning it into String.



    Other than calling append(), you mean? Maybe StringBuffer also cleared
    its buffer so it couldn't be reused (although I don't see that in the
    docs), however I'd bet that current implementations rely on JIT compiler
    to optimize away unneeded buffer copies.

    I'm thinking this does mobile a disservice, however, because JIT might
    be hard to do on small devices.
    markspace, Feb 2, 2013
    #11
  12. In article <kejngo$lp1$>,
    markspace <> wrote:

    > On 2/2/2013 8:43 AM, Kevin McMurtrie wrote:
    >
    > > The original StringBuffer class was synchronized, final, and protected
    > > its internal char[]. There was no way to trick it into altering the
    > > buffer after turning it into String.

    >
    >
    > Other than calling append(), you mean? Maybe StringBuffer also cleared
    > its buffer so it couldn't be reused (although I don't see that in the
    > docs), however I'd bet that current implementations rely on JIT compiler
    > to optimize away unneeded buffer copies.
    >
    > I'm thinking this does mobile a disservice, however, because JIT might
    > be hard to do on small devices.


    The original StringBuffer went to copy-on-write mode after calling
    toString(). You can go read the old code for yourself. There was no
    JVM trick involved.

    Some future JVMs do have JIT tricks to improve String performance. It's
    not clear how that would perform or what the side effects would be. One
    proposed trick was to make the String(String) constructor a no-op. That
    could have disastrous consequences. For example, the code below uses
    specific object references as signals. If the String constructor was a
    no-op, the signal references would be interned constants that are not
    unique.

    static final String eofMarker = new String("EOF");
    static final String flushMarker = new String("Flush");
    final ArrayBlockingQueue<String> queue= new
    ArrayBlockingQueue<String>(1000);

    void processQueue() throws InterruptedException
    {
    String str;
    while ((str= queue.take()) != eofMarker)
    {
    if (str == flushMarker)
    {
    //Flush

    }
    else
    {
    //Process string

    }
    }
    }
    --
    I will not see posts from Google because I must filter them as spam
    Kevin McMurtrie, Feb 2, 2013
    #12
  13. jlp

    markspace Guest

    On 2/2/2013 2:46 PM, Kevin McMurtrie wrote:

    > Some future JVMs do have JIT tricks to improve String performance. It's
    > not clear how that would perform or what the side effects would be. One


    The main think I'd like to see as a "trick" would be to spot when an
    array is not accessed after a copy, thus negating the need for a copy.

    String constructor:

    public String( char[] chars ) {
    this.buffer = Arrays.copyOf( chars, chars.length );
    }

    Usage:

    public String someMethod() {
    char[] myBuff = ... // a local variable
    return new String( myBuff );
    }

    Spotting that the copy isn't needed because myBuff is local and can't be
    accessed after the return is one obvious optimization.

    If this type of analysis is very hard, I can see the original
    implementation of StringBuilder would be advantageous. OTOH, it doesn't
    look hard, and I'd bet there's a lot of situations where checking a
    "copy-on-write bit" is a bigger performance hit.
    markspace, Feb 2, 2013
    #13
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Babar
    Replies:
    1
    Views:
    412
    Chris Smith
    May 20, 2004
  2. Thomas G. Marshall
    Replies:
    5
    Views:
    786
    Thomas G. Marshall
    Aug 6, 2004
  3. Ulf Meinhardt
    Replies:
    0
    Views:
    6,397
    Ulf Meinhardt
    Aug 10, 2006
  4. Replies:
    3
    Views:
    190
    Sherm Pendley
    Aug 3, 2005
  5. Jan Burse

    Substring changes (JDK 1.7)

    Jan Burse, Jan 10, 2013, in forum: Java
    Replies:
    20
    Views:
    552
    Jan Burse
    Jan 13, 2013
Loading...

Share This Page