String.substring in JDK 1.7.0_6+

jlp · Feb 1, 2013

The String class was modified in JDK 1.7.0_6.
String.substring that was 0(1) before JDK 1.7.0_6, now becomes O(n)

All is well explained at :
http://java-performance.info/changes-to-string-java-1-7-0_06/

I wrote a small test:
https://gist.github.com/4692960

java -Xms128M -Xmx128M teststring.Main 100000 1000000

On my desktop:
jdk 1.7.0_11 => 33 seconds / 252 KBytes Memory
jdk 1.6.0_38 => 25 milliseconds / 782 KBytes Memory
more than 1000 times faster ! ( Ok! for this stupid test ;-) )

I don't think it is a good improvement ! Uses less memory, but you
retrieve it, when the object is garbaged
What do you think about this ?

markspace · Feb 1, 2013

What do you think about this ?

I think micro-benchmarks don't work.

jlp · Feb 1, 2013

Le 01/02/2013 19:38, markspace a écrit :

I think micro-benchmarks don't work.

What is wrong in this test ?

markspace · Feb 1, 2013

Le 01/02/2013 19:38, markspace a écrit :

What is wrong in this test ?

It's a micro-benchmark.

jlp · Feb 1, 2013

Le 01/02/2013 19:45, markspace a écrit :

It's a micro-benchmark.

ok ;-)

But it seems problematic in "real world"
https://jira.springsource.org/brows....system.issuetabpanels:changehistory-tabpanel
http://grokbase.com/t/gg/scala-user/131993ttrq/why-is-string-grouped-so-slow

markspace · Feb 1, 2013

But it seems problematic in "real world"
https://jira.springsource.org/brows....system.issuetabpanels:changehistory-tabpanel

That's a better test, using production code. However, note that it's
scoped to only one small portion of the code, only while loading a lot
of small scripts.

It's really common for code that was once "working" to suddenly develop
undesirable characteristics as it's exposed to new input or new
environments or new anything. It's just something that happens and part
of the normal maintenance of code.

What do I think of it? It's normal.

Jan Burse · Feb 1, 2013

There seems to be more going on with Strings:
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6962931

But I didn't find the original change request (CR)
or request for enhancement concerning the copy
semantics. There was some estimate on applications
impact and allotment for redesigning applications.

I didn't know about hash32 thing in the link below.
Your micro benchmark doesn't test hashCode, does it?

Kevin McMurtrie · Feb 2, 2013

jlp said:
The String class was modified in JDK 1.7.0_6.
String.substring that was 0(1) before JDK 1.7.0_6, now becomes O(n)

All is well explained at :
http://java-performance.info/changes-to-string-java-1-7-0_06/

I wrote a small test:
https://gist.github.com/4692960

java -Xms128M -Xmx128M teststring.Main 100000 1000000

On my desktop:
jdk 1.7.0_11 => 33 seconds / 252 KBytes Memory
jdk 1.6.0_38 => 25 milliseconds / 782 KBytes Memory
more than 1000 times faster ! ( Ok! for this stupid test ;-) )

I don't think it is a good improvement ! Uses less memory, but you
retrieve it, when the object is garbaged
What do you think about this ?

It's an unbelievable change. Buffer sharing in Java 1-6 had the simple
workaround of calling the String(String) constructor. There's no
workaround for Strings getting much slower in Java 7+.

What Oracle did in Java 7 would only make sense if CharSequence had
buffer sharing and better support. I just checked Java 7, and
CharSequence looks more useless than ever. String.subSequence()
allocates a new char[] and the usual parsers (Integer, Long, Float)
still only accept a String. Slow Strings it is.

Side rant:
Sun broke buffer sharing between StringBuffer and String back in Java 5.
The reason was so that AbstractStringBuilder class could support the
implementations for both StringBuffer and StringBuilder. Had they kept
the implementations split, we could still have a very fast
StringBuffer.toString(). As a final F-U, none of classes can be
extended even through there's no buffer sharing that can be hacked.

markspace · Feb 2, 2013

and the usual parsers (Integer, Long, Float)
still only accept a String. Slow Strings it is.

This I agree is a bit of a bummer, it would be useful for the parsers to
take CharSequence for flexibility. Integers aren't hard to parse but
floats and doubles are non-trivial.

Note however that Scanner accepts both Readable (a Reader) and
ReadableByteChannel in its constructors.

Side rant:
Sun broke buffer sharing between StringBuffer and String back in Java 5.

Probably because Strings needed to be immutable and there's no way to do
that when sharing a mutable buffer.

we could still have a very fast
StringBuffer.toString().

Nope, see above.

As a final F-U, none of classes can be
extended even through there's no buffer sharing that can be hacked.

Probably because they don't want you doing stupid broken things, like
trying to share buffers between immutable and mutable objects.

I still agree that CharSequence could be made more useful though, that's
a good idea. Hmmm.

Kevin McMurtrie · Feb 2, 2013

markspace said:
This I agree is a bit of a bummer, it would be useful for the parsers to
take CharSequence for flexibility. Integers aren't hard to parse but
floats and doubles are non-trivial.

Note however that Scanner accepts both Readable (a Reader) and
ReadableByteChannel in its constructors.

Probably because Strings needed to be immutable and there's no way to do
that when sharing a mutable buffer.

The original StringBuffer class was synchronized, final, and protected
its internal char[]. There was no way to trick it into altering the
buffer after turning it into String. At worst it was the same speed as
today's StringBuilder. For the common case of being a single-use
object, it was much faster.

markspace · Feb 2, 2013

The original StringBuffer class was synchronized, final, and protected
its internal char[]. There was no way to trick it into altering the
buffer after turning it into String.

Other than calling append(), you mean? Maybe StringBuffer also cleared
its buffer so it couldn't be reused (although I don't see that in the
docs), however I'd bet that current implementations rely on JIT compiler
to optimize away unneeded buffer copies.

I'm thinking this does mobile a disservice, however, because JIT might
be hard to do on small devices.

Kevin McMurtrie · Feb 2, 2013

markspace said:
The original StringBuffer class was synchronized, final, and protected
its internal char[]. There was no way to trick it into altering the
buffer after turning it into String.

Click to expand...

Other than calling append(), you mean? Maybe StringBuffer also cleared
its buffer so it couldn't be reused (although I don't see that in the
docs), however I'd bet that current implementations rely on JIT compiler
to optimize away unneeded buffer copies.

I'm thinking this does mobile a disservice, however, because JIT might
be hard to do on small devices.

The original StringBuffer went to copy-on-write mode after calling
toString(). You can go read the old code for yourself. There was no
JVM trick involved.

Some future JVMs do have JIT tricks to improve String performance. It's
not clear how that would perform or what the side effects would be. One
proposed trick was to make the String(String) constructor a no-op. That
could have disastrous consequences. For example, the code below uses
specific object references as signals. If the String constructor was a
no-op, the signal references would be interned constants that are not
unique.

static final String eofMarker = new String("EOF");
static final String flushMarker = new String("Flush");
final ArrayBlockingQueue<String> queue= new
ArrayBlockingQueue<String>(1000);

void processQueue() throws InterruptedException
{
String str;
while ((str= queue.take()) != eofMarker)
{
if (str == flushMarker)
{
//Flush

}
else
{
//Process string

}
}
}

markspace · Feb 2, 2013

Some future JVMs do have JIT tricks to improve String performance. It's
not clear how that would perform or what the side effects would be. One

The main think I'd like to see as a "trick" would be to spot when an
array is not accessed after a copy, thus negating the need for a copy.

String constructor:

public String( char[] chars ) {
this.buffer = Arrays.copyOf( chars, chars.length );
}

Usage:

public String someMethod() {
char[] myBuff = ... // a local variable
return new String( myBuff );
}

Spotting that the copy isn't needed because myBuff is local and can't be
accessed after the return is one obvious optimization.

If this type of analysis is very hard, I can see the original
implementation of StringBuilder would be advantageous. OTOH, it doesn't
look hard, and I'd bet there's a lot of situations where checking a
"copy-on-write bit" is a bigger performance hit.

RedHat9 + Sun JDK 1.4.2_1 + NPTL = issues	0	Sep 30, 2003
Eclipse memory setting and other tricks	0	Mar 7, 2004
parallel gc versus serial gc	3	Dec 17, 2006
garbage collector	0	Dec 19, 2006
How to read sequentially from a random point in a large Xml File.(200 - 2000 MB)	1	Apr 3, 2008
comp.lang.java.gui FAQ	0	Sep 13, 2006
For performance, write it in C	130	Jul 26, 2006
comp.lang.java.gui FAQ	0	Mar 14, 2006

String.substring in JDK 1.7.0_6+

jlp

markspace

jlp

markspace

jlp

markspace

Jan Burse

Kevin McMurtrie

markspace

Kevin McMurtrie

markspace

Kevin McMurtrie

markspace

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads