method calls faster than a loop?

Oliver Wong · Mar 15, 2006

tom fredriksen said:
Thats where you are wrong, they are then not equally efficient at a
machine code level or cpu level or whatever you want to call it.

When you start talking about machine code, you're no longer talking
about just the programming language, but also the compiler and Virtual
Machines (if applicable) as well. That was my main point.

All of the three above statements are valid to a certain extent, but it
still does not disprove my statement.

Argh.. It annoys me that a lot of people programming higher level
abstraction languages, compared to C, don't realise that the language is
less runtime efficient because of all the implications of higher level
abstractions in the language, which are inherent. Java and other
interpreted languages are good languages, and I love them as well, but
that does not mean they do not have there problems and that they have
their areas of use and disuse.

This is not nescessarily so. A program written in Java, when compared to
a similar program written in C, may perform slower with one particular
Compiler/JVM/OS/Architecture/Machine configuration, but may before better in
a different one.

Consider, for example, the MMX, SSE and other "specialized" instruction
sets. If you were to code in a low level language, you might write code that
says "Okay, multiply the value in this register by the cosine of the value
in that register; next take the sine of this value, add it to that value in
that register;" and so on, and have 20 operations.

In a high level language, you might have something like
"polygon.rotate(27)".

When the compiler sees the lower level instructions, it can try to do
some sort of peephole optimization, but the compiler doesn't have a "big
picture" idea of what you're trying to accomplish.

With the high level code, the compiler could say "Oh, I see what he's
trying to do. He's trying to rotate a polygon about the origin. Well, using
SSE, I can do that in a single clock cycle." and so the compiler puts in the
appropriate instruction.

In other words, your general perception that Java is "slower" than C,
probably has very little to do with the fact that Java is a higher level
language than C. I claim that higher level languages are easier to optimize,
because they provide the optimizer with more metadata. Of course, in an
ideal world, our optimizers would produce the best possible code, regardless
of whether extra "hints" were provided or not.

The reason you might have this perception is that you're probably
measuring a program being compiled to bytecode and run on a virtual machine
on top of a Win32 platform (for example) against a program compiled to
native Win32. So you're adding an extra layer of interpretation.

Again, I stress that this is not simply an issue of programming
language, but the entire stack of language, compiler, virtual machine, OS,
architecture, machine configuration.

Of course if there was another language which could consistently show
better performance than C with higher abstraction levels so that the OS
could have more advanced features or less errors and security problems,
then that would be the reason. Unfortunately it is not so.

I'm not sure I understand what you're saying here. Are you disputing my
above claim, or just making an unrelated side remark, or what?

Generally I agree as well. But 30-40 years more with expertise of how to
create compilers and processors have made an impact on what makes an
efficient program. Thats mostly the area where the modifications would be.

Yes, compilers have improved over time. But that doesn't mean that
programmers are now better at guessing where the bottleneck in their program
lies. I think Knuth is still correct, even today, when he asserted "It is
often a mistake to make a priori judgments about what parts of a program are
really critical, since the universal experience of programmers who have been
using measurement tools has been that their intuitive guesses fail."

- Oliver

tom fredriksen · Mar 15, 2006

Oliver said:
When you start talking about machine code, you're no longer talking
about just the programming language, but also the compiler and Virtual
Machines (if applicable) as well. That was my main point.

Yes, you all ways have to take into consideration all that make up a
language to make it able to run on a computer, so languages are therefor
still not equally efficient. Any language has to compile/interpret to
machine code to be able to run, thats where the runtime efficiency of
the language is measured.

This is not nescessarily so. A program written in Java, when compared
to a similar program written in C, may perform slower with one
particular Compiler/JVM/OS/Architecture/Machine configuration, but may
before better in a different one.

See above statement.

Again, I stress that this is not simply an issue of programming
language, but the entire stack of language, compiler, virtual machine,
OS, architecture, machine configuration.

Except that the OS, architecture and machine configuration for one
single machine does not change between to language test as I did, so
they are irrelevant. But between different hw architectures it does make
a difference that is true. Thats not the case in my test and I dont
have access to other architectures so I cant make that estimate right
now, but if I did I would of course try it there as well to see.

I'm not sure I understand what you're saying here. Are you disputing
my above claim, or just making an unrelated side remark, or what?

I am saying that is why os's are still programmed in C.

Yes, compilers have improved over time.

But that doesn't mean that
programmers are now better at guessing where the bottleneck in their
program lies.

I have never said that and I am still not disputing the essence of
knuths statement.

Timo Stamm · Mar 15, 2006

tom said:
In any case those statements were made in 1974, when the community used
languages such as C, Fortran, Algol 68, Cobol etc. The first languages
used, other than machine code, where run in an interpreter only because
the hardware did not have support for floating point operations. So it
has to be simulated. As soon as fortran appear the interpred language
idea allmost disappeared.

AFAIK, Smalltalk was already compiled into byte-code and translated into
machine code at runtime.

So the statements must be somewhat modified to
be entirely true today

Certainly, but regarding all the languages that are compiled at runtime,
I think it might have become even truer:

The languages are specified, but the compilers differ. You have
different performance-characteristics with different VMs, and you even
have different performance-characteristics by simply starting suns VM in
server mode.

Timo

Roedy Green · Mar 16, 2006

There isn't a one-to-one mapping between programs written in a specific
language and operations performed by the CPU. More specifically, there's
theoretically no reason why you couldn't write a compiler that takes C
source code and emits Java bytecode, or which takes Java source code and
emits native Win32/x86 executables.

No bloody way. C programs can do all sorts of things no Java program,
or byte code program can, e.g. call native platform calls, poke ram,
poke i/o ports, diddle with regen buffers ...

From an optimising compiler's point of view, being able to trust your
program not to do anything funny really helps you optimise.
The degree of "cheating" permitted comes from the high level language.
C is a high cheat language. Java a low cheat. The more cheating the
more tricky, platform specific tricks you can pull off, but the more
the possiblity of such cheating hamstrings the optimiser.

There is also the matter of trust.A language that completely trusts
the programmer to know what he is doing and inserts no safety nets,
such as FORTH can generate faster code than one that guarantees to
detect and report certain classes of errors, such as Java.

Different languages have different impedance matching with the
hardware. Unfortunately old fashioned languages are a better fit for
old fashioned computer architectures. That tends to keep both locked
in place, stiffling innovation. Internally, we are still pretty much
thinking in Fortran.

Stefan Ram · Mar 16, 2006

Roedy Green said:
No (...) way. C programs can do all sorts of things no Java program,
or byte code program can, e.g. call native platform calls, poke ram,
poke i/o ports, diddle with regen buffers ...

A portable C program with behavior specified by
ISO/IEC 9899:1999 (E) can not do any of these.

When one is doing such things, one is using extensions
to the semantics of ISO/IEC 9899:1999 (E) and possibly
platform specific libraries.

If one is willing to use extensions to the semantics
of the JLS and platform specific libraries, then the
same actions can by done by a »Java« program.

Patricia Shanahan · Mar 16, 2006

tom said:
Hi

I did a test to compare a task making lots of method calls to the same
operation in a loop, just to get a sense of the speed cost of method
calls. What i found was a bit strange. one would think the loop would be
the fastest solution, but in my example its actually the method call
that is faster. Does anybody have any ideas why that is? or if there is
something wrong with the code?

Are there any other factors than the speed of the jump and the time it
takes to allocate parameters on the stack that is relevant to the test?

The cost of rand.nextInt(10) seems relevant.

There is the call to rand.nextInt(10), a call to the private method
next(), which itself does a call to each of AtomicLong's get() and
compareAndSet().

[Although some of these calls happen inside loops, they are single
iteration loops in the normal case.]

Thus, assuming no inlining, you are comparing a loop containing 4 calls
to a loop containing 5 calls, probably not the most sensitive way of
measuring call overhead.

Patricia

Chris Uppal · Mar 16, 2006

tom said:
How am I not doing that here?

You are not doing /any/ of the things I recommended.

The JIT can (depending on implementation) have difficulty replacing code that
is actually executing. So /always/ put the code you want to measure into a
separate method which you call. That way the JIY will find it easier and/or
possible to install a new version of the method /after/ it has finished
executing for the Nth time (for some N). Thereafter you will be running JITed
code, which is normally what you want to measure.

The JIT takes time to realise that something should be optimised, and then
takes longer still actually to do the optimisation. So /always/ run the target
methods several times, printing out the time taken for each run.

If you don't do these things then it is almost certain that you are not
measuring anything meaningful.

-- chris

tom fredriksen · Mar 16, 2006

Chris said:
You are not doing /any/ of the things I recommended.

Is this what you mean? at lest now I am getting the correct proportions.
Code at end of post.

testCall
Elapsed time: 101608
Elapsed time: 101079
Elapsed time: 101222
Elapsed time: 101418
Elapsed time: 101113
testLoop
Elapsed time: 97781
Elapsed time: 96484
Elapsed time: 95248
Elapsed time: 95238
Elapsed time: 95211
Mixed (call/loop/call/loop...)
Elapsed time: 100391
Elapsed time: 95168
Elapsed time: 102205
Elapsed time: 95159
Elapsed time: 100404
Elapsed time: 95208
Elapsed time: 101018
Elapsed time: 95244
Elapsed time: 101798
Elapsed time: 95341

To me the number show loops are faster. It has 10 billion
calls/iterations per test, a program running for say hours or days will
therefore be slower than an iterative solution because of the
accumulated numbers. So holding this up to the oo paradigm vs procedural
paradigm (c vs java/c++), one can assume that oo seems to be slightly
slower because of the number of method calls inherently involved in a
programs execution. But this depends on whether you can accept that
small penalty, you could of course just buy a computer with two
cores/cpu's instead but that, to me, is not the what the discussion is
about at the moment.

Of course there are many issues to consider for how correct the
measurements are, and it should also be tested in C++ and in C as
comparison, but the numbers are interesting background information to
have for the future.

import java.util.*;

public class BranchTest
{
private Random rand = new Random();
private int count = 1000000000;

public int add(int sum)
{
return(sum + rand.nextInt(10));
}

public void testCall()
{

BranchTest b = new BranchTest();
int total = 0;

long startTime = System.currentTimeMillis();
for(int c=0; c<count; c++) {
total = b.add(total);
}
long endTime = System.currentTimeMillis();

System.out.println("Elapsed time: " + (endTime - startTime));
}

public void testLoop()
{
Random rand = new Random();
int total = 0;

long startTime = System.currentTimeMillis();
for(int c=0; c<count; c++) {
total += rand.nextInt(10);
}
long endTime = System.currentTimeMillis();

System.out.println("Elapsed time: " + (endTime - startTime));
}

public static void main(String args[])
{
BranchTest bt = new BranchTest();

System.out.println("testCall");
for(int c=0; c<5; c++)
bt.testCall();

System.out.println("testLoop");
for(int c=0; c<5; c++)
bt.testLoop();

System.out.println("Mixed (call/loop/call/loop...)");
for(int c=0; c<5; c++) {
bt.testCall();
bt.testLoop();
}
}
}

Oliver Wong · Mar 16, 2006

tom fredriksen said:
Yes, you all ways have to take into consideration all that make up a
language to make it able to run on a computer, so languages are therefor
still not equally efficient. Any language has to compile/interpret to
machine code to be able to run, thats where the runtime efficiency of the
language is measured.

I disagree that a compiler/interpreter is "part of" the language. To me,
that's like saying "The English language specifies as part of its standard
that it must be written using pencil and paper, therefore anything you write
using a pen or computer, or speak verbally, isn't officially considered
English", for example.

[snipped parts for which there are no disagreements]

Except that the OS, architecture and machine configuration for one single
machine does not change between to language test as I did, so they are
irrelevant. But between different hw architectures it does make a
difference that is true. Thats not the case in my test and I dont have
access to other architectures so I cant make that estimate right now, but
if I did I would of course try it there as well to see.

Ok, let me try taking a step back for a bit. From my perspective, the
summary of this whole thread is this:

Q: Method calls should be slower than for-loops. But my benchmarks show
otherwise. What's wrong?
A: There are two things wrong. (1) The belief that method calls should be
slower than for-loops. (2) The belief that your benchmark actually tests
this.

I am saying that is why os's are still programmed in C.

You used the word "that" a lot, so I'm going to start giving them subscripts
for clarification. Here's your original claim (with subscripts inserted in):

<quote>
I am fully aware of the consequences of early optimisations and that the
algorithms are on of the most important issues when programming. If one
only considers that_0 then one can make the statement that all languages
are equally efficient. And that_1 is certainly not true, that_2 is why
operating systems are programmed in C.
</quote>

When you say "And that_2 is why operating systems are programmed in C",
I thought the antecedent of "that_2" is "And that_1 is certainly not true".
The antecedent of "that_1" is "all languages are equally efficient".

So to paraphrase, it sounds like you were saying "The reason operating
systems are programmed in C is that it is not true that all languages are
equally efficient." There's a couple of problems with this claim. One is the
phrasing "*THE* reason" implies that there's only one reason, and the
phrasing "operating systems are programmed in C" implies that all operating
systems are written in C. I hope you'll agree that both are untrue.

Even if we "weaken" the statement to "The main reason most operating
systems are programmed in C is because C is inherently faster than all other
languages", I still disagree with it. If it were all about performance, and
if certain languages are "inherently faster" than others, everyone would
just write the whole OS in assembly (assembly being the language oft
believed to be the most "inherent fast").

I claim that the main reason that C is a popular language for writing
major components (e.g. the kernel) of an OS is that availabilty of tools for
C, and the fact that C is relatively well suited for the job. Many compilers
for languages, for example, require an additional runtime to be present
(e.g. Java, LISP, Forth, C++, and others) which simply isn't available to
you when you write a compiler.

Of course, you could always write your own compiler for, say, LISP, that
DOESN'T assume the presence of a runtime, but no one (AFAIK) has gotten
around to doing that yet. Someone HAS written a minimal JVM in assembler,
and used that to write a OS in Java, to show you that this is possible in
practice, and not merely in theory. See http://en.wikipedia.org/wiki/JavaOS

[snipped parts for which there are no disagreements]

- Oliver

Roedy Green · Mar 16, 2006

The JIT takes time to realise that something should be optimised, and then
takes longer still actually to do the optimisation. So /always/ run the target
methods several times, printing out the time taken for each run.

You want to find out how long it takes to notice the optimisation is
needed, then do a run that discards the timings prior to that point. I
call these cold and warm benchmarks.

Both are interesting. Sometimes you will not run the program long
enough for the optimisation to kick in, and the slow speed is the one
you really want.

see http://mindprod.com/jgloss/benchmark.html

Oliver Wong · Mar 16, 2006

Roedy Green said:
You want to find out how long it takes to notice the optimisation is
needed, then do a run that discards the timings prior to that point. I
call these cold and warm benchmarks.

Both are interesting. Sometimes you will not run the program long
enough for the optimisation to kick in, and the slow speed is the one
you really want.

see http://mindprod.com/jgloss/benchmark.html

Note though that a JIT optimizier might do several layers of
optimizations. That is, if it notices a method being run 500 times, it might
try some "easy" optimizations first. When it counts 2500 invocations, it
might then try a "medium" optimization, and so on.

- Oliver

Chris Uppal · Mar 17, 2006

tom said:
Is this what you mean? at lest now I am getting the correct proportions.
Code at end of post.

Pretty much, yes.

A couple of minor nitpicks: the code for the two cases isn't quite as similar
as it could be -- for instance in one case you create one object for each timed
test, and access the random stream from one of its instance fields on each
iteration, whereas in the other test you (unnecessarily) follow a different
pattern where no object is created and the reference to the random stream is
held in a local variable. I doubt if that makes much difference, but why
introduce obscuring effects into the equation at all ? (Especially when the
purpose of the exercise is to /discover/ what kinds of effects small
differences make -- so you shouldn't "just assume" that instance fields are
accessed at the same speed as values on the stack.)

Also I would have skipped straight to the "Mixed" loop (as you call it). I
can't see much point in running one set of tests, then the other, and then
interleaving them. BTW, in case it's not obvious, the reason for interleaving
the tests is just to minimise any skew caused by whatever external influences
might be affecting the test.

Elapsed time: 101798
Elapsed time: 95341 [...]
To me the number show loops are faster. It has 10 billion
calls/iterations per test

If I haven't misread the code, then I think there are 1 billion iterations per
test. Suggesting that the overhead per call, in this particular case, is
around 6 nanoseconds. Which doesn't seem too implausible to me, although I
would expect to get different results (both higher and lower) with different
tests.

Of course there are many issues to consider for how correct the
measurements are, and it should also be tested in C++ and in C as
comparison, but the numbers are interesting background information to
have for the future.

Agreed, and -- like you apparently -- I think that every competent programmer
should have a gut sense of cost-breakdown of their code.

-- chris

tom fredriksen · Mar 17, 2006

Chris said:
Pretty much, yes.

A couple of minor nitpicks: the code for the two cases isn't quite as similar
as it could be -- for instance in one case you create one object for each timed
test, and access the random stream from one of its instance fields on each
iteration, whereas in the other test you (unnecessarily) follow a different
pattern where no object is created and the reference to the random stream is
held in a local variable. I doubt if that makes much difference, but why
introduce obscuring effects into the equation at all ? (Especially when the
purpose of the exercise is to /discover/ what kinds of effects small
differences make -- so you shouldn't "just assume" that instance fields are
accessed at the same speed as values on the stack.)

Also I would have skipped straight to the "Mixed" loop (as you call it). I
can't see much point in running one set of tests, then the other, and then
interleaving them. BTW, in case it's not obvious, the reason for interleaving
the tests is just to minimise any skew caused by whatever external influences
might be affecting the test.

Thats how I understood what you where saying, but it does not affect it,
only makes the test take more time

Elapsed time: 101798
Elapsed time: 95341 [...]
To me the number show loops are faster. It has 10 billion
calls/iterations per test

Click to expand...

If I haven't misread the code, then I think there are 1 billion iterations per
test.

Yes, I see there is a typo, so the code decides.

/tom

Roedy Green · Mar 17, 2006

so you shouldn't "just assume" that instance fields are
accessed at the same speed as values on the stack.)

Which is Java is far from true. locals are much faster than field
instances.

If you ever use IntelliJ code inspector, it jitters at you to convert
nearly all your fields to locals, and to reduce scope to private. Both
these help the compiler generate faster code.

Daniel Dyer · Mar 17, 2006

Which is Java is far from true. locals are much faster than field
instances.

If you ever use IntelliJ code inspector, it jitters at you to convert
nearly all your fields to locals, and to reduce scope to private. Both
these help the compiler generate faster code.

I would guess that the intention behind that inspection is more to do with
encapsulation than performance.

Dan.

Roedy Green · Mar 17, 2006

I would guess that the intention behind that inspection is more to do with
encapsulation than performance.

It is a happy case where both goals are equally served.

I am moving more toward making things private UNTIL I need the broader
scope. It makes debugging and maintaining easier knowing there are no
outsiders to consider.

I used to make thing package and protected thinking about what you
might want to use in some potential extension.

Oliver Wong · Mar 17, 2006

Roedy Green said:
I am moving more toward making things private UNTIL I need the broader
scope. It makes debugging and maintaining easier knowing there are no
outsiders to consider.

I used to make thing package and protected thinking about what you
might want to use in some potential extension.

There was a discussion about this on the OO newsgroup, I think (might
have been comp.object, might have been comp.programming). It was pointed
out[*] that in Java, in a really "disastrous" situation, you could always
use reflection to gain access to a private member, so perhaps it's more
cautious to make things "too" private, and then use reflection in
emergencies, than to make things "too public", and... well... be stuck.

The other nice thing is that using reflections is really inconvenient,
so it makes you think long and hard about whether there might be a better
way to accomplish what you're trying to do.

- Oliver

[*]: Okay, I was the one who pointed that out, so I'm basically just
reiterating my own opinion here.

Stefan Ram · Mar 17, 2006

Oliver Wong said:
use reflection to gain access to a private member, so perhaps it's more
cautious to make things "too" private, and then use reflection in
emergencies, than to make things "too public", and... well... be stuck.

This kind of access protection might be the holy grail of many
beginner's text books on object-oriented programming.

But I wonder whether anyone has ever seen a situation where a
serious bug or problem was caused by a forgotten »private«
keyword?

I sometimes make variables that are conceptually »private«
»public« in order to emulate something like the C++ »friend«
functions in Java (in a very raw way: All methods are friends,
then.)

One problem with this approach is that the privacy of these
variables is not documented, so they might be misinterpret as
part of the public interface. This shows that one important
duty of »private« is not to protect access under every
circumstance but to serve as documentation of the intent of
the author. Thus, »private« also guides JavaDoc not to
publish the member by default.

Chris Uppal · Mar 18, 2006

Stefan said:
This kind of access protection might be the holy grail of many
beginner's text books on object-oriented programming.

But I wonder whether anyone has ever seen a situation where a
serious bug or problem was caused by a forgotten »private«
keyword?

I doubt it if happens often. In Smalltalk methods are 'public' or 'private'
only by convention, and it doesn't cause much in the way of problems (given
adequate support for expressing the convention -- an IDE issue, not a language
issue). As a result I can be aggressive about /not/ making things public
unless I'm /sure/ I want to commit to maintaining them across future extension
and redesign. Contrariwise fields are always private (accessible /only/ to the
object, not to other instances of that class), and although I'm quite happy
with that, the Java experience makes me doubt whether that enforced
encapsulation is really buying me very much.

Encapsulation is, IMO, a reflection of the fact that we just /don't care/ about
an object's implementation. We are interested in how it behaves, not in what
other aspects it might leave hanging out which we would rather not look at.
"private" and "package private" are about policing that indifference (a rather
odd concept in itself ;-) they do not /provide/ encapsulation.

The bigger risk with making things public, comes with what I call "publishing"
a field or method. I.e. when you (implicitly) commit, to users of your code,
that <such-and-such> a member will continue to exist, and to mean what it means
now. "Published" corresponds, in Java terms, to "public" or "protected".
Premature publication restricts future growth. Unfortunately Java conflates
encapsulation and enforcement, so there is no category which means "you /can/
use this if you have to, but I'm making no promises that it'll still be there
in the next release" -- the nearest you can get is "public"+"deprecated"...

I sometimes make variables that are conceptually »private«
»public« in order to emulate something like the C++ »friend«
functions in Java (in a very raw way: All methods are friends,
then.)
One problem with this approach is that the privacy of these
variables is not documented, so they might be misinterpret as
part of the public interface.

You could give 'em names like

privateDoFoo()
doFoo_DONT_USE_THIS()
$doFoo$()

This shows that one important
duty of »private« is not to protect access under every
circumstance but to serve as documentation of the intent of
the author.

Agreed -- as per above.

-- chris

Stefan Ram · Mar 18, 2006

Chris Uppal said:
You could give 'em names like
privateDoFoo()
doFoo_DONT_USE_THIS()
$doFoo$()

Right now, I am using names like:

zzField
zzMethod

So they get sorted at the end in the JavaDocs. This does not
look good and the names still show up in the JavaDoc.
So possibly, I could write:

@de.dclj.ram.meta.accessibility.Private field

Then a copy of this would be made to be used as the source for
JavaDoc, and a program will alter this copy to read:

private field

so that JavaDoc will not show it. (It would be more clean to
write a custom Doclet being aware of
»@de.dclj.ram.meta.accessibility.Private«.)

Help in hangman game	1	Jul 24, 2023
Passing a Method Name to a Method, Redux	51	Jun 24, 2011
Why is java consumer/producer so much faster than C++	39	Jul 22, 2012
rules engines: faster implementation than methods invocations?	2	Oct 27, 2004
Dynamic method calls	5	Dec 21, 2006
state of an object in setUp() of junit TestCase	34	Oct 30, 2010
How to try a range of hex values in C# code ?	0	Nov 19, 2022
Method Inlining in HotSpot	0	Mar 17, 2008

method calls faster than a loop?

Oliver Wong

tom fredriksen

Timo Stamm

Roedy Green

Stefan Ram

Patricia Shanahan

Chris Uppal

tom fredriksen

Oliver Wong

Roedy Green

Oliver Wong

Chris Uppal

tom fredriksen

Roedy Green

Daniel Dyer

Roedy Green

Oliver Wong

Stefan Ram

Chris Uppal

Stefan Ram

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads