method calls faster than a loop?

T

tom fredriksen

Hi

I did a test to compare a task making lots of method calls to the same
operation in a loop, just to get a sense of the speed cost of method
calls. What i found was a bit strange. one would think the loop would be
the fastest solution, but in my example its actually the method call
that is faster. Does anybody have any ideas why that is? or if there is
something wrong with the code?

Are there any other factors than the speed of the jump and the time it
takes to allocate parameters on the stack that is relevant to the test?

/tom


import java.util.*;

public class BranchTest
{
private Random rand = new Random();

public int add(int sum)
{
return(sum + rand.nextInt(10));
}

public static void main(String args[])
{

{
BranchTest b = new BranchTest();

int count = Integer.parseInt(args[0]);
int total = 0;
long startTime = System.currentTimeMillis();

for(int c=0; c<count; c++) {
total = b.add(total);
}
long endTime = System.currentTimeMillis();

System.out.println("Elapsed time: " + (endTime - startTime));
}

{
Random rand = new Random();
int count = Integer.parseInt(args[0]);
int total = 0;
long startTime = System.currentTimeMillis();

for(int c=0; c<count; c++) {
total += rand.nextInt(10);
}
long endTime = System.currentTimeMillis();

System.out.println("Elapsed time: " + (endTime - startTime));
}
}
}
 
C

Chris Uppal

tom said:
I did a test to compare a task making lots of method calls to the same
operation in a loop, just to get a sense of the speed cost of method
calls. What i found was a bit strange.

There's no point at all in doing this kind of micro-benchmarking unless you
also take account of the behaviour of the JITer (and also the optimiser which
can just remove whole chunks of code).

Say you want to compare code snippet A with snippet B. Pull both snippets out
into methods runA() and runB(). In each method include enough loops that the
time taken is significant in relation to the measuring "noise" (5 or 10
seconds should be enough). The put /another/ loop around the calls to runA()
and runB() which calls each alternately, timing both, and which loops at least
half-a-dozen times, printing out the time taken for each call.

You will almost certainly see that the first few calls are slower than the
later ones. You will also get an informal impression of how much jitter there
is the results. If you want to be formal you can use statistical techniques to
analyse the results, but even without that a number is basically meaningless
unless you have a sense of how much error there is likely to be in it.

BTW, the use of explicit methods which you call (rather than just using a
doubly-nested loop) is /not/ optional if you want your results to be
indicative. It's also always worth trying with the -server option to get a
better view of how much difference the JITer is making, and of how JITable the
code is.

-- chris
 
T

tom fredriksen

Chris said:
There's no point at all in doing this kind of micro-benchmarking unless you
also take account of the behaviour of the JITer (and also the optimiser which
can just remove whole chunks of code).

How am I not doing that here?
If you by that mean that the second test should also be an instance
method then ok, but will it matter that much? I have doubts.
Say you want to compare code snippet A with snippet B. Pull both snippets out
into methods runA() and runB(). In each method include enough loops that the
time taken is significant in relation to the measuring "noise" (5 or 10
seconds should be enough).

That would be what I am saying in the above comment then...
The put /another/ loop around the calls to runA()
and runB() which calls each alternately, timing both, and which loops at least
half-a-dozen times, printing out the time taken for each call.

You will almost certainly see that the first few calls are slower than the
later ones. You will also get an informal impression of how much jitter there
is the results. If you want to be formal you can use statistical techniques to
analyse the results, but even without that a number is basically meaningless
unless you have a sense of how much error there is likely to be in it.

I dont see how double-nested loop are going to show anything, please
explain.
BTW, the use of explicit methods which you call (rather than just using a
doubly-nested loop) is /not/ optional if you want your results to be
indicative.

??? same as above.
It's also always worth trying with the -server option to get a
better view of how much difference the JITer is making, and of how JITable the
code is.

Thats fair enough, since there is quite a difference in their speeds and
functionality as sun mostly focuses on server side development.


/tom
 
T

Timo Stamm

tom said:
Hi

I did a test to compare a task making lots of method calls to the same
operation in a loop, just to get a sense of the speed cost of method
calls.


What's the use?

If you find that method calls are slow, will you stop using methods?

It's silly to optimize without any need for optimization. It only hurts
readability and probably make you code run slower in the next VM.


Timo
 
T

tom fredriksen

Timo said:
What's the use?

If you find that method calls are slow, will you stop using methods?

It's silly to optimize without any need for optimization. It only hurts
readability and probably make you code run slower in the next VM.

:) It is as I said just to get a sense of the speed of it compared to
f.ex for loops. My main reason for trying this is a typical claim:

"C++ is slower than C because of the oo fashion of methods calls all
over the place, even for tiny operations." (paraphrased)

Jumps take time, even on jump-prediction machines such as x86.
So I wanted to make a little test to see if it there was any significant
difference, and the test showed the complete opposite of my assumption.
So, this was a small test of speed cost of one consequence of the oo
paradigm. Since I am interested in understanding programming languages
and their consequences, I find this some interest to me. But if you have
a better suggestion of how to test the claim, please share.

/tom
 
T

Timo Stamm

tom said:
:) It is as I said just to get a sense of the speed of it compared to
f.ex for loops. My main reason for trying this is a typical claim:

"C++ is slower than C because of the oo fashion of methods calls all
over the place, even for tiny operations." (paraphrased)

I understand.

I think it should be very easy to "prove" superiority of either
technology/language/whatever with the properly adjusted benchmarks. The
industry does it all the time.

These microbenchmarks never tell anything about real-world performance.
(Except when you really compare apples to oranges, say: ruby and
assembler). They are only used to hype or slate something.

The best argument against claims like the above shouldn't be about
performance. It should rather be: The OO language allows me to create
more secure, easier to maintain, more reusable software. It is usually
much cheaper to stick more RAM into a machine or even buy a faster one,
than spending countless hours on hand-optimizing code and maintining it.

Jumps take time, even on jump-prediction machines such as x86.
So I wanted to make a little test to see if it there was any significant
difference, and the test showed the complete opposite of my assumption.

There are so many variables. Without profound knowledge about compilers
in general, and about the JVM (including the JIT) in particular, you can
only guess.

Anyways, the concept of virtual machines is the way to go. Today,
compilers produce much more efficient code for SMP than a human could
write in assembler. The same will be true for statically versus in-time
compiled code (in some cases, it already is).

See: http://portal.acm.org/citation.cfm?id=345105.352548


Timo
 
T

tom fredriksen

Timo said:
The best argument against claims like the above shouldn't be about
performance. It should rather be: The OO language allows me to create
more secure, easier to maintain, more reusable software. It is usually
much cheaper to stick more RAM into a machine or even buy a faster one,
than spending countless hours on hand-optimizing code and maintining it.

In one respect you are entirely correct, in another you are wrong;
performance does matter. I am of the opinion that if you can make a
program use less resources and work faster then your employer or client
will save money and time. And if you add that up for an entire company,
region or country, you save a lot of money that could be spent on
otherwise. (But thats just me talking, coming from 5 years in C exile,
but also frustrated by the meaningless waste of resources and the
corresponding attitude.) One thing that really annoys me is how resource
hungry java is, and I would like to try my bit to reduce the requirements.

But in any case, the issue here was more about comparing things for
background info.

/tom
 
O

Oliver Wong

tom fredriksen said:
Jumps take time, even on jump-prediction machines such as x86.
So I wanted to make a little test to see if it there was any significant
difference, and the test showed the complete opposite of my assumption.

You seem to be implying that there is a "jump" used in method calls, but
there is no "jump" used in for-loops.

- Oliver
 
T

Timo Stamm

tom said:
In one respect you are entirely correct, in another you are wrong;
performance does matter.

I didn't say that. Of course performance matters.

You can easily find bottlenecks using a profiler. Sometimes it is
possible to make the programm run 1000 times faster at this point with a
minor change in design that only takes a few hours.

To optimize the program to the very last processor cycle takes probably
weeks.


"There is no doubt that the grail of efficiency leads to abuse.
Programmers waste enormous amounts of time thinking about, or worrying
about, the speed of noncritical parts of their programs, and these
attempts at efficiency actually have a strong negative impact when
debugging and maintenance are considered. We should forget about small
efficiencies, say about 97% of the time: premature optimization is the
root of all evil."

Donald E. Knuth, "Structured Programming With Go To Statements,"
Computing Reviews. Vol. 6, December 1974, p. 268.


I am of the opinion that if you can make a
program use less resources and work faster then your employer or client
will save money and time. And if you add that up for an entire company,
region or country, you save a lot of money that could be spent on
otherwise. (But thats just me talking, coming from 5 years in C exile,
but also frustrated by the meaningless waste of resources and the
corresponding attitude.) One thing that really annoys me is how resource
hungry java is, and I would like to try my bit to reduce the requirements.

But in any case, the issue here was more about comparing things for
background info.


"A good programmer will [...] be wise to look carefully at the critical
code; but only after that code has been identified. It is often a
mistake to make a priori judgments about what parts of a program are
really critical, since the universal experience of programmers who have
been using measurement tools has been that their intuitive guesses fail."

Same source as above.


Timo
 
T

tom fredriksen

Timo said:
tom said:
In one respect you are entirely correct, in another you are wrong;
performance does matter.

I didn't say that. Of course performance matters.

You can easily find bottlenecks using a profiler. Sometimes it is
possible to make the programm run 1000 times faster at this point with a
minor change in design that only takes a few hours.

To optimize the program to the very last processor cycle takes probably
weeks.


"There is no doubt that the grail of efficiency leads to abuse.
Programmers waste enormous amounts of time thinking about, or worrying
about, the speed of noncritical parts of their programs, and these
attempts at efficiency actually have a strong negative impact when
debugging and maintenance are considered. We should forget about small
efficiencies, say about 97% of the time: premature optimization is the
root of all evil."

"A good programmer will [...] be wise to look carefully at the critical
code; but only after that code has been identified. It is often a
mistake to make a priori judgments about what parts of a program are
really critical, since the universal experience of programmers who have
been using measurement tools has been that their intuitive guesses fail."

Donald E. Knuth, "Structured Programming With Go To Statements,"
Computing Reviews. Vol. 6, December 1974, p. 268.

I am fully aware of the consequences of early optimisations and that the
algorithms are on of the most important issues when programming. If one
only considers that then one can make the statement that all languages
are equally efficient. And that is certainly not true, that is why
operating systems are programmed in C. If all code can be automatically
shifted (through the compiler) to a different compiling paradigm which
improves on the machine efficiency then all the better.

In any case those statements were made in 1974, when the community used
languages such as C, Fortran, Algol 68, Cobol etc. The first languages
used, other than machine code, where run in an interpreter only because
the hardware did not have support for floating point operations. So it
has to be simulated. As soon as fortran appear the interpred language
idea allmost disappeared. So the statements must be somewhat modified to
be entirely true today, not forgetting that the language used then did
not have the all the language constructs we have today or all the
processor commands we have today.

/tom
 
R

Roedy Green

You seem to be implying that there is a "jump" used in method calls, but
there is no "jump" used in for-loops.

At a machine code level, of course there is, unless the method call is
inlined. Even an IF contains a hidden jump.

a call is a push execution pointer to stack followed by a jump.

a ret is a pop execution pointer and continue at that point,
effectively an indirect jump.

A loop compares i with n and if too big jumps out past the end of the
loop. At the end of the loop is a jump back to the top.
 
T

tom fredriksen

Oliver said:
You seem to be implying that there is a "jump" used in method calls,
but there is no "jump" used in for-loops.

Yes and yes, what is you objection?

/tom
 
O

Oliver Wong

tom fredriksen said:
I am fully aware of the consequences of early optimisations and that the
algorithms are on of the most important issues when programming. If one
only considers that then one can make the statement that all languages are
equally efficient.

Assuming you by "efficient", you mean runtime performance, then
technically all languages ARE equally efficient in the sense that it isn't
meaningful to talk about the runtime performance of a programming language
at all. In other words, the runtime performance of Java is just as "not
meaningful" as the runtime performance of C.

If you're willing to only discuss asymptotic runtime performance, then
you can talk about the runtime performance of programs, but if you want
something more precise than that, you can't even talk about programs: you'd
have to talk about the sequence of operations performed by the CPU, and the
surrounding architecture (e.g. how much cache is there, how fast is the RAM
compared to the cache, how much RAM is there, how fast is the disk compared
to RAM, etc.)

There isn't a one-to-one mapping between programs written in a specific
language and operations performed by the CPU. More specifically, there's
theoretically no reason why you couldn't write a compiler that takes C
source code and emits Java bytecode, or which takes Java source code and
emits native Win32/x86 executables.

Furthermore, if you take two different Java compilers, there's no
guarantee you'll get the same bytecode from the same Java source code. If
you take two differen JVMs, there's no guarantee that you'll get the same
sequence of operations from the same bytecode.

In other words, it doesn't really make sense to say "Method calls are
faster/slower than loops", unless you add a heck of a lot of qualifications
to that statement (e.g. "on this particular compiler, with this particular
JVM, on this particular CPU, with this much RAM, on this OS, and these were
the background tasks that were running, and here's my swap disk
configuration, and here's how long I've let the JVM 'warm up' to perform
optimizations, and here are the flags I've passed in, etc.")

And that is certainly not true, that is why operating systems are
programmed in C.

I'm not sure this is "why" operating systems are programmed in C. I
figure the reason has more to do with the relatively few number of
Java-to-machine-code compilers compared to the relatively large number of
C-to-machine-code compilers.
In any case those statements were made in 1974, when the community used
languages such as C, Fortran, Algol 68, Cobol etc. The first languages
used, other than machine code, where run in an interpreter only because
the hardware did not have support for floating point operations. So it has
to be simulated. As soon as fortran appear the interpred language idea
allmost disappeared. So the statements must be somewhat modified to be
entirely true today, not forgetting that the language used then did not
have the all the language constructs we have today or all the processor
commands we have today.

I consider the original statement to still be generally true (except I
wouldn't have used the term "the root of all evil"; rather I would have used
"bad"). What modifications do you propose need to be made to the statements?

- Oliver
 
O

Oliver Wong

Oliver Wong said:
As Roedy pointed out, loops uses jump instructions as well.

I forgot to mention this second objection: the method calls could be
inline, so that the reverse of the above assertions are true. That is "a
method call does NOT use the jump instruction (since the method was
inlined), but the for-loop DOES use the jump instruction (since the loop was
not unrolled)".

Note also that I said "*COULD* be true". As I mentioned in another
branch of this thread, it's generally not meaningful to speak about the
runtime performance of a program without specifying things like what
optimizations the compiler will or will not carry out.

- Oliver
 
T

tom fredriksen

Oliver said:
Assuming you by "efficient", you mean runtime performance, then
technically all languages ARE equally efficient in the sense that it
isn't meaningful to talk about the runtime performance of a programming
language at all.

Thats where you are wrong, they are then not equally efficient at a
machine code level or cpu level or whatever you want to call it.
There isn't a one-to-one mapping between programs written in a
specific language and operations performed by the CPU. More
specifically, there's theoretically no reason why you couldn't write a
compiler that takes C source code and emits Java bytecode, or which
takes Java source code and emits native Win32/x86 executables.

Furthermore, if you take two different Java compilers, there's no
guarantee you'll get the same bytecode from the same Java source code.
If you take two differen JVMs, there's no guarantee that you'll get the
same sequence of operations from the same bytecode.

In other words, it doesn't really make sense to say "Method calls are
faster/slower than loops", unless you add a heck of a lot of
qualifications to that statement (e.g. "on this particular compiler,
with this particular JVM, on this particular CPU, with this much RAM, on
this OS, and these were the background tasks that were running, and
here's my swap disk configuration, and here's how long I've let the JVM
'warm up' to perform optimizations, and here are the flags I've passed
in, etc.")

All of the three above statements are valid to a certain extent, but it
still does not disprove my statement.

Argh.. It annoys me that a lot of people programming higher level
abstraction languages, compared to C, don't realise that the language is
less runtime efficient because of all the implications of higher level
abstractions in the language, which are inherent. Java and other
interpreted languages are good languages, and I love them as well, but
that does not mean they do not have there problems and that they have
their areas of use and disuse.
I'm not sure this is "why" operating systems are programmed in C. I
figure the reason has more to do with the relatively few number of
Java-to-machine-code compilers compared to the relatively large number
of C-to-machine-code compilers.

Of course if there was another language which could consistently show
better performance than C with higher abstraction levels so that the OS
could have more advanced features or less errors and security problems,
then that would be the reason. Unfortunately it is not so.
I consider the original statement to still be generally true (except
I wouldn't have used the term "the root of all evil"; rather I would
have used "bad"). What modifications do you propose need to be made to
the statements?

Generally I agree as well. But 30-40 years more with expertise of how to
create compilers and processors have made an impact on what makes an
efficient program. Thats mostly the area where the modifications would be.
 
T

tom fredriksen

Oliver said:
I forgot to mention this second objection: the method calls could be
inline, so that the reverse of the above assertions are true. That is "a
method call does NOT use the jump instruction (since the method was
inlined), but the for-loop DOES use the jump instruction (since the loop
was not unrolled)".

Note also that I said "*COULD* be true". As I mentioned in another
branch of this thread, it's generally not meaningful to speak about the
runtime performance of a program without specifying things like what
optimizations the compiler will or will not carry out.

Yes, but it is a local jump (less than 256 bytes) so it does not affect
runtime efficiency except for the clock cycles it takes to interpret the
jump command, which is 1 or 2 cycles.

A method jump can cause the processor to have to get code from outside
the Level 1 cache, which is the closest 128 or 256 bytes of code. It
could even need to go outside the level 2 cache. If this start to happen
then the a lot of time is used on setting up the new execution environment.

So any calls that are outside the local jump range does take longer time
and if that happens, even for methods with tiny operations, it will
affect performance. But that was what I wanted to test, how much it
actually affected it.

/tom
 
R

Roedy Green

Yes, but it is a local jump (less than 256 bytes) so it does not affect
runtime efficiency except for the clock cycles it takes to interpret the
jump command, which is 1 or 2 cycles.

It depends on your processor. Jumps on some like the old 8086 throw
the look ahead out of kilter. The power pc absorbs them, so they are
almost free. Even at best they are going to make useless some of your
lookahead cache.
 
O

Oliver Wong

tom fredriksen said:
Yes, but it is a local jump (less than 256 bytes) so it does not affect
runtime efficiency except for the clock cycles it takes to interpret the
jump command, which is 1 or 2 cycles.

A method jump can cause the processor to have to get code from outside the
Level 1 cache, which is the closest 128 or 256 bytes of code. It could
even need to go outside the level 2 cache. If this start to happen then
the a lot of time is used on setting up the new execution environment.

So any calls that are outside the local jump range does take longer time
and if that happens, even for methods with tiny operations, it will affect
performance. But that was what I wanted to test, how much it actually
affected it.

I'm saying there might not be a method jump at all, due to method
inlining. Certainly the benchmark code you wrote seems simple enough that a
compiler would inline it:

public int add(int sum) {
return sum + rand.nextInt(10);
}

Now if you did something like create a new throwable, so that the shape
of the stack mattered, you might give the compiler a harder time with the
inlining, but a sufficiently intelligent compiler might also find a way
around that too.

By trying to "force" the existence of a so-called "method jump", you're
trying to outsmart the compiler, which is a losing game in the long run,
because in 100 years, you'll probably be dead, but the body of knowledge
referred to as "compiler optimization techniques" will still exist, and
still be improved upon, and your benchmark results will change anyway.

- Oliver
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,577
Members
45,052
Latest member
LucyCarper

Latest Threads

Top