StringBuilder

J

Jan Burse

bob said:
Is there any way to use StringBuilder but with + signs instead of
append?

The + in Java does already internally use StringBuilder.

There is only an issue when you want to accumulate a string
value. If you then explicitly use StringBuilder you are
faster, because you save the new StringBuilder() and toString().

So this is faster, since it uses 1 new and 1 toString():

StringBuilder buf=new StringBuilder();
for (int i=0; i<100; i++) {
buf.append(i);
buf("*");
buf.append(i);
buf.append("=");
buf.append((i*i));
buf.append("\n");
}
System.out.println(buf.toString());

Whereby this code is slower:

String res="";
for (int i=0; i<100; i++) {
res+=i+"*"+i+"="+(i*i)+"\n";
}
System.out.println(res);

It is translated to the following code by the compiler, and
thus uses 100 new and 100 toString():

String res="";
for (int i=0; i<100; i++) {
StringBuilder _buf=new StringBuilder(res);
_buf.append(i);
_buf("*");
_buf.append(i);
_buf.append("=");
_buf.append((i*i));
_buf.append("\n");
res=_buf.toString();
}
System.out.println(res);

For more information see for example here:
http://caprazzi.net/posts/java-bytecode-string-concatenation-and-stringbuilder/

Best Regards
 
R

Roedy Green

Is there any way to use StringBuilder but with + signs instead of
append?

No. Java has so such syntactic sugar. However, the IntelliJ IDE will
convert from + notation to sb.append notation.
--
Roedy Green Canadian Mind Products
http://mindprod.com
The modern conservative is engaged in one of man's oldest exercises in moral philosophy; that is,
the search for a superior moral justification for selfishness.
~ John Kenneth Galbraith (born: 1908-10-15 died: 2006-04-29 at age: 97)
 
J

Jan Burse

Roedy said:
No. Java has so such syntactic sugar. However, the IntelliJ IDE will
convert from + notation to sb.append notation.

Maybe the IDE provides something similar. Basically I have seeing
the IDE mock about the following:

buf.append(x+y);

It then suggests to use:

buf.append(x);
buf.append(y);

And it also mocks about the following:

StringBuffer buf=new StringBuffer();

It then suggests to use:

StringBuilder buf=new StringBuilder();

But that + is internally realized via StringBuilder is a property
of javac with target >= 1.5 and not of the IDE.

See for example:
http://caprazzi.net/posts/java-bytecode-string-concatenation-and-stringbuilder/

Bye
 
A

Arne Vajhøj

The + in Java does already internally use StringBuilder.

There is only an issue when you want to accumulate a string
value. If you then explicitly use StringBuilder you are
faster, because you save the new StringBuilder() and toString().

So this is faster, since it uses 1 new and 1 toString():

StringBuilder buf=new StringBuilder();
for (int i=0; i<100; i++) {
buf.append(i);
buf("*");
buf.append(i);
buf.append("=");
buf.append((i*i));
buf.append("\n");
}
System.out.println(buf.toString());

Whereby this code is slower:

String res="";
for (int i=0; i<100; i++) {
res+=i+"*"+i+"="+(i*i)+"\n";
}
System.out.println(res);

It is translated to the following code by the compiler, and
thus uses 100 new and 100 toString():

String res="";
for (int i=0; i<100; i++) {
StringBuilder _buf=new StringBuilder(res);
_buf.append(i);
_buf("*");
_buf.append(i);
_buf.append("=");
_buf.append((i*i));
_buf.append("\n");
res=_buf.toString();
}
System.out.println(res);

For more information see for example here:
http://caprazzi.net/posts/java-bytecode-string-concatenation-and-stringbuilder/

That has been known for 10-15 years.

It should be in any Java book above beginners level.

Arne
 
R

Roedy Green

So this is faster, since it uses 1 new and 1 toString():

Further you can optimise the size of the buffer. In experiments I did
with a program that does a lot of String building this improved
performance by 10%

I now use FastCat which gets the size of the buffer bang on every
time.

see http://mindprod.com/products1.html#FASTCAT
--
Roedy Green Canadian Mind Products
http://mindprod.com
The modern conservative is engaged in one of man's oldest exercises in moral philosophy; that is,
the search for a superior moral justification for selfishness.
~ John Kenneth Galbraith (born: 1908-10-15 died: 2006-04-29 at age: 97)
 
R

Roedy Green

And it also mocks about the following

I am not sure what you mean by "mock". Some IDEs will suggest code
changes if you ask for suggestions (inspecting). IntelliJ will do
that. In addition it will convert the code for you (refactoring).

--
Roedy Green Canadian Mind Products
http://mindprod.com
The modern conservative is engaged in one of man's oldest exercises in moral philosophy; that is,
the search for a superior moral justification for selfishness.
~ John Kenneth Galbraith (born: 1908-10-15 died: 2006-04-29 at age: 97)
 
A

Arne Vajhøj

Like other ancient performance-practices that have been obsoleted by
today's compilers?

No.

What we are talking about is the "+=" part and that part is
still relevant for todays compilers.

Actually JB's example use = but to be equivalent to the first code
the it need to be +=.

And besides what he mention as reasons there are also
the 100 new strings.
We have been told that using StringBuffer was the better alternative.
Now compilers have switched from using StringBuffer for the above
example to the unsynchronized StringBuilder. Those who have manually
used the StringBuffer have stopped the compiler for doing that for them
and must rely on the JITs lock elision algorithm.

That is a little interesting quirk.

The outside loop string += part is more efficient with StringB*. And
I am pretty sure that the StringBuffer/StringBuilder difference
is negligible.

But the inside loop string + is probably a little bit faster
with StringBuilder than StringBuffer.

You can consider that insignificant compared to the first.

Or you could argue for using StringB* and append of string +.
So as long as this part of the code does not represent a critical
performance-bottleneck, I would recommend to use the simple, stupid
"slow" variant and hope for future compilers to detect and optimize that
pattern.

As long as it is not critical then readability should be the deciding
factor.

Arne
 
R

Roedy Green

Is simple pattern that could be detected by tomorrows JIT-compilers and
transformed into:

I jittered at the Jet people about such optimisations. They seemed to
feel that string concatenation was not important an operation for
people to worry about in optimising. I disagreed since so much of my
code is about fiddling text. I don't have a handle on its importance
generally.
 
S

Stanimir Stamenkov

Mon, 05 Sep 2011 05:27:15 +0200, /Jan Burse/:
If you then explicitly use StringBuilder you are
faster, because you save the new StringBuilder() and toString().

So this is faster, since it uses 1 new and 1 toString():

The StringBuilder.toString() is really fast - that's the point, and
I don't think it is worth mentioning it.
 
J

Jan Burse

Stanimir said:
Mon, 05 Sep 2011 05:27:15 +0200, /Jan Burse/:


The StringBuilder.toString() is really fast - that's the point, and I
don't think it is worth mentioning it.

I am not sure whether I can agree directly.

The StringBuilder is a mutable object. The String is a
immutable object. Therefore the obvious fast implementation
that would share the buffer between StringBuilder and
String does not work. Because the following code would
break the immutability of String:

StringBuilder buf=new StringBuilder();

buf.append("Hello World!");

String str=buf.toString();

buf.replace(6,11,"Java");

System.out.println("str="+str);

By a side effect via buf replace the value of the
string str would change. Therefore we find the following
slow implementation of toString() in the reference
implementation. Please note the comment:

429 public String toString() {
430 // Create a copy, don't share the array
431 return new String(value, 0, count);
432 }

http://kickjava.com/src/java/lang/StringBuilder.java.htm

And if we look at the used constructor, it does really
make a copy. There would be a non public constructor
in String that allows some sharing, and that is for
example used to implement substring. But this time
a constructor is used that does not do a sharing:

197 public String(char value[], int offset, int count) {
198 if (offset < 0) {
199 throw new StringIndexOutOfBoundsException(offset);
200 }
201 if (count < 0) {
202 throw new StringIndexOutOfBoundsException(count);
203 }
204 // Note: offset or count might be near -1>>>1.
205 if (offset > value.length - count) {
206 throw new StringIndexOutOfBoundsException
(offset + count);
207 }
208 char[] v = new char[count];
209 System.arraycopy(value, offset, v, 0, count);
210 this.offset = 0;
211 this.count = count;
212 this.value = v;
213 }

http://kickjava.com/src/java/lang/String.java.htm

Eventually some programm analysis would allow sharing.
But the copying has also a positive effect. When
the StringBuilder by manipulation has gained a much
greater capacity than necessary, then the copying will
create a smaller char array, so that less space is used
as soon as the StringBuilder is reclaimed.

But maybe you are right, that toString() is nevertheless
fast. Since a) allocating objects is usually fast and
b) System array copy can also be fast. And together
with the capacity reducing effect this could all lead
to a small overhead.

BTW: OpenJDK uses the same code. In Harmony we find
a shared flag in the AbstractStringBuilder, and a
heuristic when sharing is done or not. The non public
String constructor is used for sharing:

public String toString() {
if (count == 0) {
return ""; //$NON-NLS-1$
}
// Optimize String sharing for more performance
int wasted = value.length - count;
if (wasted >= 256
|| (wasted >= INITIAL_CAPACITY &&
wasted >= (count >> 1))) {
return new String(value, 0, count);
}
shared = true;
return new String(0, count, value);
}

http://www.java2s.com/Open-Source/J...kage/java/lang/AbstractStringBuilder.java.htm

There is then a little overhead in the basic operations
of StringBuilder to check for sharing, and in case that
there is sharing, a copy is made.

final void replace0(int start, int end, String string) {
[...]
if (!shared) {
// index == count case is no-op
System.arraycopy(value, end, value, start
+ stringLength, count - end);
} else {
char[] newData = new char[value.length];
System.arraycopy(value, 0, newData, 0, start);
// index == count case is no-op
System.arraycopy(value, end, newData, start
+ stringLength, count - end);
value = newData;
shared = false;
}

Probably gain in speed by the sharing compensates for
this little extra check needed everwhere. So probably
toString() is relatively fast here, assuming that sharing
happens enough often. When we look at the loop example
then we can positively influence sharing when we give
a good initial capacity, because then waste is small.

But giving an initial capacity for the whole loop is
propably non trivial. How does the digit size of
squares develop. So assume our StringBuilder grows
according to its enlargeBuffer rule. In the case of
Harmony the capacity is growing by a factor 1.5 and by
adding 2.

So initially we will have waste >= count/2 whenever
an enlargement happend, because of the adding of two
we have waste = count/2 + 2. So no sharing will happen.
When we then have added n characters, we will have
waste' = count/2 + 2 - n and count' = count + n.
We have only waste' < count' / 2 when 2 - n < n / 2.
So only after adding 2 characters sharing will happen
again for shure.

So the heuristic has a little glitch. But never mind.

Best Regards
 
R

Roedy Green

The StringBuilder.toString() is really fast - that's the point, and
I don't think it is worth mentioning it.

The places where StringBuilder wastes CPU cycles is when you have a
bad estimate and it has to create new buffer and copy what it has done
so far to it. If estimate is two low, you can get repeated such
doublings. If it is too high, you fill up RAM too quickly and force
premature c.

My solution was FastCat which is quite easy to get a bang on estimate.
see http://mindprod.com/products1.html#FASTCAT

StringBuilder composes its string in a char[]. Unfortunately it can't
simply plop that into a String object at the end. It has to allocate
yet another buffer, copy into it, and that becomes your string object.
The JVM is worried there might be encumbrances (pointers to) the
char[]. So it has to copy rather than reference. Perhaps a little
native code could bypass the final copy.
 
J

Jan Burse

Roedy said:
The JVM is worried there might be encumbrances (pointers to) the char[].

Well, strictly speaking, a pointer alone is not harmful and quite
impossible since the field is for example package local in the
AbstractStringBuilder class and the accessor is also package local.

The problem is invoking a method of an object that has access to
the char[] and will do some write into the array between offset
and offset+count of the string.

Bye
 
R

Roedy Green

The problem is invoking a method of an object that has access to
the char[] and will do some write into the array between offset
and offset+count of the string.

Exactly. If there exists a reference to the char after it is inside
the String, that is a security breach, since it could be used to
modify the String.
 
J

Jan Burse

Peter said:
In any case, if Java does _not_ implement it that way, I suspect that's

Well, I wouldn't say its a matter of *Java*. Its a matter
of the given JDK how it is implementented. *Java* defines
the contract but there are many implementations.

From my post you can read off the following findings:

Oracle: Always does a copy (I only checked rt.jar,
not some alt-rt.jar).
OpenJDK: Always does a copy.
Harmony: Most of the time does a sharing and
flags the builder (Similarly like you
describe the .NET implementation)

But beware the list is not complete, for example I
didn't check the Apple classes.jar or IBM's rt.jar.
And also the finding only holds for JRE 1.6, and can
change at any time if the provider of the JRE decides
so. Or might be different for 32-bit and 64-bit etc..
different decision-making process rather than ignorance. In other
words, they have already considered whether it's a worthwhile
optimization and decided

There is not only a plurality of *they* as can be seen
from above, but there is also a plurality what means
*worthwhile*. An implementation with a more reference
implementation character, like for example the Oracle
JRE, might take the simple route. Since the focus is
then more on functional requirements than on non-
functional requirements.
Which strongly suggests that anyone worrying a priori
about the performance of StringBuilder before they have
demonstrated it's an actual bottleneck in their program
is wasting their time.

I guess it is more about performance tuning than removing
bottlenecks. And any performance gain is only seen in
programs that make heavy use of StringBuilder. Such
measurements have already been done over and over. See
for example (not exactly measurig toString()):

+=: 546 ms
StringBuilder, default initial capacity: 30ms
StringBuilder, exact initial capacity: 10ms

http://christian.bloggingon.net/arc...ile-bei-der-verwendung-von-stringbuilder.aspx

But measurements in detail will vary depending on JDK
and machine. But whether one JDK changes the measurment
fundamentally depends very much of the available
algorithms for the functional requirements and how
these algorithms behave non-functionally.

But I would say the consumers of JDKs are we the
programmers, so it is good to measure, inspect and
debate JDKs and not blindly trust any *they*.

Bye
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,904
Latest member
HealthyVisionsCBDPrice

Latest Threads

Top