Very Basic Performance Question

R

Rhino

Given a loop like this:

for (int ix=0; ix<myArray.length; ix++) {
System.out.println(ix);
}

or especially this:

for (int ix=0; ix<Math.sqrt(shoeSize)*hairlength/(IQ); ix++) {
System.out.println(ix);
}

I've always assumed that it made more sense calculate the
"loop controller" value in the second term of the for loop (sorry, I forget
the proper name), rather than make Java look it up
or calculate it, especially for large numbers of iterations of the loop. It
just made no sense to do the same calculation or lookup a thousand or a
million times; it seemed a lot more reasonable to do the calculation or
lookup ONCE, store the value in a variable, and then plug the variable into
the for loop, like this:

int loopController = Math.sqrt(shoeSize)*hairlength/(IQ);
for (int ix=0; ix<loopController; ix++) {
System.out.println(ix);
}

Is my reasoning sound or does Java have "tricks" (optimizations) that make
it more sensible to leave the lookup or calculation in the loop, thus
avoiding the need to create the variable that controls the loop?

I've been wondering about that for a while and am finally getting around to
asking ;-)

I realize that I could set up a benchmark to find out for sure but I'm
betting that the answer is already well-known so I'd rather ask here and
save myself the work ;-)
 
W

Wiseguy

Rhino said:
Given a loop like this:

for (int ix=0; ix<myArray.length; ix++) {
System.out.println(ix);
}

or especially this:

for (int ix=0; ix<Math.sqrt(shoeSize)*hairlength/(IQ); ix++) {
System.out.println(ix);

Your intuition is correct. It is not good to (assume) that the
compiler/interpretter will recognize the control expression as an
invariant and only evaluate it once. Some languages/compilers do, and
others do not.

Always assume the worst case and move the complicated expression outside of
the loop statement, storing its value in a temporary variable, unless there
is a good reason not to.

The other part of this is that (at least in C/C++) it is possible to modify
the control expression from within the loop. Like, what if the loop
itself modifies (shoesize) from your example above?
 
C

Chris Smith

Rhino said:
Given a loop like this:

for (int ix=0; ix<myArray.length; ix++) {
System.out.println(ix);
}

or especially this:

for (int ix=0; ix<Math.sqrt(shoeSize)*hairlength/(IQ); ix++) {
System.out.println(ix);
}

Although your message is oriented toward performance, I'll note that
there's another important difference between those two loops. In the
first case, the expression myArray.length" really is the most
descriptive way you can think about the resulting value. In the second
case (and with most other complex calculations), the quantity being
calculated can be expressed with a shorter name that captures the
meaning of the value.

The second example is one of those cases where readability and
performance concerns are actually in agreement with each other. That's
great, and saves you from trading off two beneficial values.
I've always assumed that it made more sense calculate the
"loop controller" value in the second term of the for loop (sorry, I forget
the proper name), rather than make Java look it up
or calculate it, especially for large numbers of iterations of the loop.

However, in the general case with simpler expressions (such as
array.length for example), I don't agree that it's a good idea to do
this. It's unlikely to make any substantial difference in your
application's performance since you probably do far more than access one
array length field in your loop body.

The readability advantages of placing very simple expressions inline is,
however, considerable. This is especially true for "myArray.length"
because the for loop is idiomatic and factoring out the termination
condition into a local causes the reader to slow down so as to figure
out what weird thing you're doing there.
Is my reasoning sound or does Java have "tricks" (optimizations) that make
it more sensible to leave the lookup or calculation in the loop, thus
avoiding the need to create the variable that controls the loop?

It is not guaranteed that loop-invariant expressions are optimized in
Java, and chances are it actually depends on the JIT's adaptive
profiling of the section of code in question (assuming you're referring
to J2SE or J2EE environments; J2ME has a whole different performance
picture). I wonder if you aren't worrying too soon, though.

--
www.designacourse.com
The Easiest Way To Train Anyone... Anywhere.

Chris Smith - Lead Software Developer/Technical Trainer
MindIQ Corporation
 
R

Rhino

Chris Smith said:
Although your message is oriented toward performance, I'll note that
there's another important difference between those two loops. In the
first case, the expression myArray.length" really is the most
descriptive way you can think about the resulting value. In the second
case (and with most other complex calculations), the quantity being
calculated can be expressed with a shorter name that captures the
meaning of the value.

The second example is one of those cases where readability and
performance concerns are actually in agreement with each other. That's
great, and saves you from trading off two beneficial values.
I share your views on readability; that's the *other* main reason that I was
replacing the calculation with the variable in the second example ;-) I'd
rather give the calculation a meaningful name (via the variable that
contains the result) than make someone look at the raw calculation and try
to figure out what the result actually means.
However, in the general case with simpler expressions (such as
array.length for example), I don't agree that it's a good idea to do
this. It's unlikely to make any substantial difference in your
application's performance since you probably do far more than access one
array length field in your loop body.
In hindsight, I really should have posed my questions somewhat differently.
The answers I received confirmed my suspicions that I would definitely get a
performance benefit from doing the calculation in the second example outside
of the loop ONCE and then referring to it within the loop via the variable
name and that it was not wise to count on Java to optimize that for me.

However, the other thing I was after - and didn't ask really clearly - was
whether there was any benefit to *performance* (I agree with your remarks
about readability) to put myArray.length into a variable outside the loop
and then refer to the variable within the 'for' statement. In other words,
which will perform better:

for (int ix=0; ix<myArray.length; ix++) {
//whatever
}

or

int arraySize = myArray.length;
for (int ix=0; ix<arraySize; ix++) {
//whatever
}

I really don't know if the cost of creating the arraySize variable and
referring to it each time the loop iterates is more, less or the same as
looking up the value of the length class variable in the array on each
iteration. I'm guessing the difference is neglible for a small number of
iterations but possibly substantial on a large number of iterations but I'm
not sure which would be better. I'm also not sure what would count as
"large" in this scenario? A hundred iterations? A billion?
The readability advantages of placing very simple expressions inline is,
however, considerable. This is especially true for "myArray.length"
because the for loop is idiomatic and factoring out the termination
condition into a local causes the reader to slow down so as to figure
out what weird thing you're doing there.
That's why I would only replace "myArray.length" with something equally (or
more) readable ;-)
It is not guaranteed that loop-invariant expressions are optimized in
Java, and chances are it actually depends on the JIT's adaptive
profiling of the section of code in question (assuming you're referring
to J2SE or J2EE environments; J2ME has a whole different performance
picture). I wonder if you aren't worrying too soon, though.
I'm really not losing sleep over this ;-) I've been wondering about it for
months but hadn't gotten around to finding out. I just wanted to verify that
I wasn't totally mistaken in my assumptions. If I could count on Java to
optimize the loop terminator, I would probably shut the whole issue of
replacing the calculation from the second example of my original post out of
my mind from here on forward; now that I know this isn't guaranteed, I will
continue to do things as I have been doing.

I'm less concerned about the issue of replacing myArray.length with a
pre-calculated variable but I'd still be curious to know the answer; it will
affect future code that I write.

Rhino
 
A

Alun Harford

Rhino said:
Given a loop like this:

for (int ix=0; ix<myArray.length; ix++) {
System.out.println(ix);
}

or especially this:

for (int ix=0; ix<Math.sqrt(shoeSize)*hairlength/(IQ); ix++) {
System.out.println(ix);
}

I've always assumed that it made more sense calculate the
"loop controller" value in the second term of the for loop (sorry, I forget
the proper name), rather than make Java look it up
or calculate it, especially for large numbers of iterations of the loop. It
just made no sense to do the same calculation or lookup a thousand or a
million times; it seemed a lot more reasonable to do the calculation or
lookup ONCE, store the value in a variable, and then plug the variable into
the for loop, like this:

int loopController = Math.sqrt(shoeSize)*hairlength/(IQ);
for (int ix=0; ix<loopController; ix++) {
System.out.println(ix);
}

Well the first case is just looking up the value of a variable. If you add
the line in the first case, any decent compiler will ignore it and look at
myArray.length every time.
In the second case, your compiler will almost certainly be able to work out
that Math.sqrt(shoeSize)*hairlength/(IQ) isn't going to change with each
call, and effectively add the line for you *BUT* if Math.sqrt were to, for
example, change a variable to count the number of times sqrt was called, the
compiler won't stand a chance.
Basically, do it the second way - it's easier to read and is never slower
than doing it the first way (or at least I can't think of a way it would
ever be).

Alun Harford
 
T

Thomas Fritsch

Rhino wrote:

[...]
However, the other thing I was after - and didn't ask really clearly - was
whether there was any benefit to *performance* (I agree with your remarks
about readability) to put myArray.length into a variable outside the loop
and then refer to the variable within the 'for' statement. In other words,
which will perform better:

for (int ix=0; ix<myArray.length; ix++) {
//whatever
}

or

int arraySize = myArray.length;
for (int ix=0; ix<arraySize; ix++) {
//whatever
}
I personally prefer
for (int n=myArray.length, ix=0; ix<n; ix++) {
//whatever
}
which gives another 5% enhancement in readability, and makes clear that
n is not used outside the for-loop.

[...]
 
P

Patricia Shanahan

Thomas said:
Rhino wrote:

[...]
However, the other thing I was after - and didn't ask really clearly -
was
whether there was any benefit to *performance* (I agree with your remarks
about readability) to put myArray.length into a variable outside the loop
and then refer to the variable within the 'for' statement. In other
words,
which will perform better:

for (int ix=0; ix<myArray.length; ix++) {
//whatever
}

or

int arraySize = myArray.length;
for (int ix=0; ix<arraySize; ix++) {
//whatever
}
I personally prefer
for (int n=myArray.length, ix=0; ix<n; ix++) {
//whatever
}
which gives another 5% enhancement in readability, and makes clear that
n is not used outside the for-loop.

[...]

Has anyone tested these ideas using compilers and JVM's that
are in use now? The most basic rule of performance
enhancement is measure, measure, measure.

If it is a worthwhile enhancement, it should be easy to
prove it by demonstrating a significant speed-up on a loop
that does something other than the loop overhead code.

Patricia
 
C

Chris Smith

Rhino said:
In hindsight, I really should have posed my questions somewhat differently.
The answers I received confirmed my suspicions that I would definitely get a
performance benefit from doing the calculation in the second example outside
of the loop ONCE and then referring to it within the loop via the variable
name and that it was not wise to count on Java to optimize that for me.

I'd stop short of "definitely", but it's true that it's not wise to put
complex calculations inside a for loop construct.
However, the other thing I was after - and didn't ask really clearly - was
whether there was any benefit to *performance* (I agree with your remarks
about readability) to put myArray.length into a variable outside the loop
and then refer to the variable within the 'for' statement.

I don't personally see why using myArray.length inline would possibly
provide any kind of performance benefit. The declaration of a local
variable should be considered free in performance-critical code. The
JIT compiler will translate your code to an intermediate form anyway,
and it will forget whether your original source declared a variable to
hold that value. Assuming that the compiler recognizes
"myArray.length" as loop-invariant, there should be no difference
between the two.

If the compiler does not recognize myArray.length as loop-invariant,
then the performance of this loop can be decreased by the inline form.
The lookup of an array length is a a very cheap operation, though.
Because the field is guaranteed to be immutable, there's no
synchronization needed. Assuming you actually do something in your
loop, the benefit for the loop is going to be at most a few percent.
Multiply that by the percentage of your program time that's spent in
that loop to find the total impact. You're probably spending all this
time worrying about making a hundredth of a percent improvement to your
application performance.
That's why I would only replace "myArray.length" with something equally (or
more) readable ;-)

You don't seem to have understood what I said. The idiomatic nature of
a for loop used to walk through an array makes it readable. If you
introduce your own variable names, then you immediately cause the
programmer to slow down and figure out why you didn't just use
myArray.length instead. If I were reading your code, I would then go
through and look for the possibility that you are replacing the array by
one of a different length and using the variable to keep the original
length... or other such tricks that I'd need to be aware of. That
happens regardless of the name you've chosen for the local variable,
simply because you've written code that's unfamiliar.

Not to mention that, to a competent Java programmer, there really is
nothing more readable than myArray.length.

--
www.designacourse.com
The Easiest Way To Train Anyone... Anywhere.

Chris Smith - Lead Software Developer/Technical Trainer
MindIQ Corporation
 
R

Rhino

Chris Smith said:
I'd stop short of "definitely", but it's true that it's not wise to put
complex calculations inside a for loop construct.


I don't personally see why using myArray.length inline would possibly
provide any kind of performance benefit. The declaration of a local
variable should be considered free in performance-critical code. The
JIT compiler will translate your code to an intermediate form anyway,
and it will forget whether your original source declared a variable to
hold that value. Assuming that the compiler recognizes
"myArray.length" as loop-invariant, there should be no difference
between the two.

If the compiler does not recognize myArray.length as loop-invariant,
then the performance of this loop can be decreased by the inline form.
The lookup of an array length is a a very cheap operation, though.
Because the field is guaranteed to be immutable, there's no
synchronization needed. Assuming you actually do something in your
loop, the benefit for the loop is going to be at most a few percent.
Multiply that by the percentage of your program time that's spent in
that loop to find the total impact. You're probably spending all this
time worrying about making a hundredth of a percent improvement to your
application performance.
Fair enough ;-)

I just thought I'd see if there was any significant performance benefit to
either approach. It sounds like you're pretty sure there isn't so I'll stop
worrying ;-)
You don't seem to have understood what I said. The idiomatic nature of
a for loop used to walk through an array makes it readable. If you
introduce your own variable names, then you immediately cause the
programmer to slow down and figure out why you didn't just use
myArray.length instead. If I were reading your code, I would then go
through and look for the possibility that you are replacing the array by
one of a different length and using the variable to keep the original
length... or other such tricks that I'd need to be aware of. That
happens regardless of the name you've chosen for the local variable,
simply because you've written code that's unfamiliar.
Ahh, I get your point now. It's sort of a psychological one: if the code
DOESN'T use myArray.length, have a very close look in case the code is doing
something rather unexpected.
Not to mention that, to a competent Java programmer, there really is
nothing more readable than myArray.length.
Fair enough.

Okay, thanks to all of you for your input. I guess we've flogged that dead
horse enough now ;-)

Rhino
 
D

Dimitri Maziuk

Rhino sez:
....
I just thought I'd see if there was any significant performance benefit to
either approach. It sounds like you're pretty sure there isn't so I'll stop
worrying ;-)

Array.length is fetching an int (class member) via a pointer. It
shouldn't be any slower than checking an int (local variable) on stack.

List.size() returns an int member of List. There is overhead of the
function call, but it shouldn't be significant and could be optimized
away anyway. Personally, I'd expect them to special-case list.size()
in the optimizer (if general algorithm doesn't optimize it away on its
own) because it's used in loop constructs so much.

That said, the only way to find out what's really going on is to read
the bytecode and to benchmark -- for each JVM, JIT, etc.

Dima
 
J

John C. Bollinger

Dimitri said:
Rhino sez:
...




Array.length is fetching an int (class member) via a pointer.

Well, no, technically it isn't. Arrays classes don't have fields named
"length". Array.length is neither more nor less than the Java syntax
for obtaining the length of an array. There is a specific bytecode for
the corresponding VM-level operation.
It
shouldn't be any slower than checking an int (local variable) on stack.

And that's simply not supportable, even if you were right about length
being a field. It is fully an implementation detail, and I would in
fact *expect* a slight difference in the speed of the two operations.
Moreover, the local variable case is easier pickings for the JIT's
optimizer.
List.size() returns an int member of List.

No. Sorry, it's just not true at that level of generality. The
concrete implementations provided in the platform library may do this,
but many other implementations do not necessarily do the same. I have
one or two special-purpose implementations of my own that specifically
do not do so.
There is overhead of the
function call, but it shouldn't be significant and could be optimized
away anyway. Personally, I'd expect them to special-case list.size()
in the optimizer (if general algorithm doesn't optimize it away on its
own) because it's used in loop constructs so much.

It's not so easy. Runtime type information is required in the general
case, even if the declared type of the List is, for example, LinkedList.
This is because the actual object on which size() is to be invoked may
be a subclass of LinkedList that has overridden the size() method.
That said, the only way to find out what's really going on is to read
the bytecode and to benchmark -- for each JVM, JIT, etc.

And of the two, rely most on the latter. Bytecode tells you what a
particular compiler's version of the class directs the VM to do.
Performance testing tells you how that translates into execution time.
 
D

Dimitri Maziuk

John C. Bollinger sez:
Dimitri Maziuk wrote: ....

Well, no, technically it isn't. Arrays classes don't have fields named
"length". Array.length is neither more nor less than the Java syntax
for obtaining the length of an array. There is a specific bytecode for
the corresponding VM-level operation.

Heh. I learned something today.
Not having the source for "native" stuff in src.zip is a bitch.
It's not so easy. Runtime type information is required in the general
case, even if the declared type of the List is, for example, LinkedList.
This is because the actual object on which size() is to be invoked may
be a subclass of LinkedList that has overridden the size() method.

True. I was thinking of standard ArrayList, of course, and ArrayList
is a very special case of List.
And of the two, rely most on the latter. Bytecode tells you what a
particular compiler's version of the class directs the VM to do.
Performance testing tells you how that translates into execution time.

Well, bytecode would've told me that Array.length is not fetching a
member -- had I bothered to look. :)

Dima
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,430
Messages
2,571,676
Members
48,796
Latest member
Greg L.

Latest Threads

Top