Optiimising StringBuilder Size revisited

R

Roedy Green

I wrote some months back that I got my main app, the HTMLMACRO running
10% faster just by optimising the size of new StringBuilder.

This set me wondering how it might be fine tuned further, and perhaps
turned into a commercial tool to fine tune it for any app.

Let's pretend that for every new StringBuilder invocation I collected
statistics of every string built and how long it was.

Knowing that, how big should you make the initialisation parameter?

Right now I pretty well always set it to the longest string. But in a
few cases I made it shorter.

Setting it to the longest string conserves CPU time in an infinitely
RAM rich environment. However outside the government, you probably
also want to conserve RAM by making letting a few of the StringBuildes
grow in return for most StringBuilders being smaller.

You need some measure of tradeoff of doublings vs wasted bytes in a
given invocation. I suppose you might find that number by experiment.

A mindless algorithm might work like this:

Simulate the doublings and wasted bytes of every for every possible
StringBuilder size between min and max length and pick the op for each
instance. This would not be done while the app was running, so even a
crude (o n^2) algorithm like that would likely be acceptable.
 
M

Mark Space

Roedy said:
You need some measure of tradeoff of doublings vs wasted bytes in a
given invocation. I suppose you might find that number by experiment.

One possibility that just occurs to me: pick half of the largest record
string size. That way, you're exactly one doubling away from the
largest possible value.

Considering most string are probably shorter than half of the max buffer
size, this should result in a minimum of doubling.

OTOH, if the average length of a string is larger than one half of the
maximum buffer size, you'll double the size of your buffer more than
half of the time. In that case, pick the largest size instead.


How are you collecting statistics? It seems to me to be pretty easy to
replace StringBuilder in your own code, but replacing every use of
StringBuilder in all the various libraries would be nigh impossible.
Did you hack the runtime library?


I'm trying to condition myself to pick a larger StringBuffer size.
Rather than just use the default constructor, I pick an initial size.
Usually I try to envision the output of the operation as one line (80
char default), two lines (160 char default) or "lots" (*) (256 or 512
char default, depending on my mood).


(*) Terry Pratchett reference unintended.
 
R

Roedy Green

How are you collecting statistics? It seems to me to be pretty easy to
replace StringBuilder in your own code, but replacing every use of
StringBuilder in all the various libraries would be nigh impossible.
Did you hack the runtime library?

For now I manually added some code just prior to the toString. For a
commercial product you would have to hack the StringBuilder class to
add the instrumentation. I think there is some tool intended for
debuggers to do that.
 
T

Tom Anderson

For now I manually added some code just prior to the toString. For a
commercial product you would have to hack the StringBuilder class to add
the instrumentation. I think there is some tool intended for debuggers
to do that.

It's called the -Xbootclasspath/p flag. You must make an instrumented
version of StringBuilder, then point to it with that flag - it will get
loaded in place of the standard one.

Although apparently:

"Note: Applications that use this option for the purpose of overriding a
class in rt.jar should not be deployed as doing so would contravene the
Java 2 Runtime Environment binary code license."

Huh.

tom
 
T

Tom Anderson

Knowing that, how big should you make the initialisation parameter?

Really, you want to compute it, or at least estimate it, at runtime based
on what you're going to put in the buffer. Obviously, you can't do that
with an automatic tool, at least not without serious cleverness.

However, one thing you might consider is having the tool report the sites
where the most waste occurs - not just the most waste per instance, but
the most waste total, ie waste per instance * number of buffers allocated
at that site during the run. The programmer could then go and hand-tune
the sizes there if they wanted. This would be after applying your
automatic tuning process, of course.
Setting it to the longest string conserves CPU time in an infinitely RAM
rich environment. However outside the government, you probably also
want to conserve RAM by making letting a few of the StringBuildes grow
in return for most StringBuilders being smaller.

Since arrays need to be zeroed when they're allocated, allocating an
overly large array does also waste cycles.

Indeed, since StringBuffers are generally fairly short-lived, the worry is
not really about using up RAM, it's about generating garbage and making
the collector run more often that it otherwise would. Again, a CPU cost.
You need some measure of tradeoff of doublings vs wasted bytes in a
given invocation.

This is the crux of it - you can fairly easily calculate the number of
unnecessary doublings and number of wasted bytes that any given initial
size will give you, but how do you compare those numbers to decide which
initial size is best? It's easy when, for some initial sizes a and b, a
gives both fewer or equal doublings and fewer wasted bytes than b, but
that will be fairly uncommon - usually, wasting fewer bytes means making
more doublings.

I suggest a fairly simple metric: total number of chars allocated. The
logic here is that the CPU cost of both allocation (zeroing) and
deallocation (making GC happen more frequently) is directly proportional
to this. Working out the total allocation for a given combination of
initial and final size is fairly straightforward:

int allocation(int initial, int final) {
if (final <= initial) return initial ;
else return initial + allocation((initial * 2), final) ;
}

if you record all the final sizes in an array called finals, you can
evaluate the lifecycle cost of a given initial size choice easily:

int lifecycleAllocation(int initial, int[] finals) {
int sum = 0 ;
for (int final: finals) sum += allocation(initial, finals) ;
return sum ;
}

Then, as you say, you just run over every possible initial value from the
size of the smallest string to the largest, and pick the one which gives
the lowst lifecycle allocation.

What's slightly counterintuitive about this approach is that there's no
concept of wasted space. But you don't need it - you just need to allocate
as few characters as possible.

If you wanted to be really anal, you could modify the allocation()
function to take account of the size of an array header - i don't know for
sure how big that is, but i think it's something like three or four
machine words, which is equivalent to six or eight chars.
A mindless algorithm might work like this:

Simulate the doublings and wasted bytes of every for every possible
StringBuilder size between min and max length and pick the op for each
instance.

"The op"?

tom
 
T

Tom Anderson

Mitigating that, the GC for a young generation with nearly all short-lived
objects is fast. The collector is optimized for the use case where nearly
everything dies before a minor collection.

By filling the young generation with dead objects, you incur more frequent GC
cycles that take very little time each. Just a copy of the few remaining
live objects in the nursery, and Bob's your uncle.

Discounting the effect of Hotspot on inlining the StringBuilder operations to
where perhaps they have no impact on heap at all.

All well and good.

But when Roedy actually tried it, he made his app run 10% faster.

tom
 
L

Lew

All well and good.

But when Roedy actually tried it, he made his app run 10% faster.

With the "-server" option to the JVM?

Not having seen the experiment or the data it used, nor what JVM
options pertained, I don't know how his results generalize to other
situations.
 
T

Tom Anderson

With the "-server" option to the JVM?

No idea. Roedy?
Not having seen the experiment or the data it used, nor what JVM options
pertained, I don't know how his results generalize to other situations.

True. What would be some good benchmarks for this sort of thing? Javac?
Maybe not. What sort of app builds a lot of strings?

tom
 
R

Roedy Green

Discounting the effect of Hotspot on inlining the StringBuilder operations to
where perhaps they have no impact on heap at all.

I have tried to persuade the Jet people to implement various
StringBuilder optimisations, but they felt they StringBuilder
operations did not form a sufficiently substantial part of the
workload to be worth the bother. I gave them another prod a week or
so ago handling the carrot of my 10% improvement.

I figured the following optimisations would also be possible.

1. calculating the precise size of the string buffer, and plopping the
various strings into place and converting that to a String, probably
not even executing any StringBuffer code. This would mean
procrastinating allocating the buffer until the pieces had been
computed.

2. at load time concatenating String constants that get plopped one
after the other into a Stringbuffer.

It might even be possible for an ordinary human to implement these as
peephole optimisations on class files.
 
R

Roedy Green

But when Roedy actually tried it, he made his app run 10% faster

Most apps would not have anywhere near such a dramatic improvement.
Pretty well all my app does is expand parameters in various clever
ways with StringBuilder into fluffy HTML. It reexpands my entire
website each time it runs, and the checks if the old and new differ,
and if they do, writes the new version to disk. See
http://mindprod.com/application/htmlmacros.html

However, any Servlet or JSP app, under the hood is probably quite
similar. The improvement would not be as dramatic since it is
overshadowed by other overhead including transmit time.

People tend to sniff at 10%, but when you have a server farm costing
many millions of dollars, 10% is quite a big deal.

Corporations are happy to fire lifetime employees to save even 1%.
 
R

Roedy Green

No idea. Roedy?

I got the improvement both with Jet and with -client. I did not think
to test -server. Sorry about that. You might do an simpler test with
just one StringBuilder, and see if -server does anything clever if you
use bad estimates for the size.
 
Z

zerg

Tom said:
Since arrays need to be zeroed when they're allocated, allocating an
overly large array does also waste cycles.

Indeed, since StringBuffers are generally fairly short-lived, the worry
is not really about using up RAM, it's about generating garbage and
making the collector run more often that it otherwise would. Again, a
CPU cost.

Not all CPU cycles are created equal, however. It may be worth it, in
some cases, to save ten cycles when the app is busy with work to do even
at the cost of twenty later, if those twenty can be deferred until the
app is idle (say, waiting for I/O events to occur).

This would, in Java, occur if you save cycles up front but by a method
that generates more garbage.

If the app is busy for long enough, though, the GC will not be able to
be deferred until it's idle.

And then we're getting into -server, GC-tuning, and other complicated
stuff involving jvm command-line option tweaking. :)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,579
Members
45,053
Latest member
BrodieSola

Latest Threads

Top