Speed of interfaces vs inheritance

C

Chris

In a particularly time-critical part of my app I need to call a method
on an external class in a tight loop. The class can be different in
different contexts. I've got two choices: create an interface and have
the various different classes implement it, or create an abstract class
and have the various different classes extend it.

I seem to recall, in years past, that the interface approach was slower.
Is this still the case?

(In this particular situation, either approach works fine, though I
personally like the interface approach better).
 
M

Mike Schilling

Chris said:
In a particularly time-critical part of my app I need to call a method on
an external class in a tight loop. The class can be different in different
contexts. I've got two choices: create an interface and have the various
different classes implement it, or create an abstract class and have the
various different classes extend it.

I seem to recall, in years past, that the interface approach was slower.
Is this still the case?

(In this particular situation, either approach works fine, though I
personally like the interface approach better).

The only way to answer this question is to write a test.
 
?

=?ISO-8859-1?Q?Arne_Vajh=F8j?=

Chris said:
In a particularly time-critical part of my app I need to call a method
on an external class in a tight loop. The class can be different in
different contexts. I've got two choices: create an interface and have
the various different classes implement it, or create an abstract class
and have the various different classes extend it.

I seem to recall, in years past, that the interface approach was slower.
Is this still the case?

I guess that would be implementation specific.

I would expect them to be relative close in performance
due the similarity of what has to be done.

And if the method is actually doing some work the call overhead should
be minimal compared to what is happening inside the method.

A quick test with some randomly created code on what happen
to be my default Java (SUN Java 1.5 Win32) interface is
actually faster than abstract base class.

Arne
 
T

Tom Hawtin

Mike said:
The only way to answer this question is to write a test.

Yeah, but write a good one...

FWIW, a couple years ago I tested dynamically loading a class through a
class loader and having a loop repeatedly calling an (interface) method
which threw an exception. Under the Sun server HotSpot, the loop was
eliminated. Client performance wasn't too bad either.

I believe interface methods can get slow where the same runtime class is
called through a number of different interfaces. A highly unlikely
situations, particularly if you were considering an abstract base class.

Tom Hawtin
 
D

Daniel Pitts

Chris said:
In a particularly time-critical part of my app I need to call a method
on an external class in a tight loop. The class can be different in
different contexts. I've got two choices: create an interface and have
the various different classes implement it, or create an abstract class
and have the various different classes extend it.

I seem to recall, in years past, that the interface approach was slower.
Is this still the case?

(In this particular situation, either approach works fine, though I
personally like the interface approach better).

The overhead of method calls of any sort or considered trivial in most
circumstances. I would suggest using the approach that gives you the
best OO design, and then changing it around if its not fast enough.
Determining if its not fast enough should be down by using a profiler,
not a guess and check method.
 
A

andrewmcdonagh

In a particularly time-critical part of my app I need to call a method
on an external class in a tight loop. The class can be different in
different contexts. I've got two choices: create an interface and have
the various different classes implement it, or create an abstract class
and have the various different classes extend it.

I seem to recall, in years past, that the interface approach was slower.
Is this still the case?

(In this particular situation, either approach works fine, though I
personally like the interface approach better).


The other advice about testing instead of guessing is the most
important one to follow .

Aside from that, if your design choice of ABC or Interface causes such
a performance problem, then maybe there's another design approach.

Tell Don't Ask....


Instead of your code doing the loop and worrying about performance,
Tell the two other classes to do the loop for you.


so, instead of

class Client {

public void run() {
HttpServer h = new HttpServer();

for (int i = 0; i < 65000; i++ {
h.doSomething();
}

}


we have....

class Client {

public void run() {
HttpServer h = new HttpServer();

h.doSomething(); // for loop logic moved into HttpServer
}
}

Andrew
 
J

John Ersatznom

andrewmcdonagh said:
so, instead of

class Client {

public void run() {
HttpServer h = new HttpServer();

for (int i = 0; i < 65000; i++ {
h.doSomething();
}

}


we have....

class Client {

public void run() {
HttpServer h = new HttpServer();

h.doSomething(); // for loop logic moved into HttpServer
}
}

Shouldn't that be h.doSomething(65000)? Unless even the length of the
loop should be up to the HttpServer class and not policy set by the
caller... :)

Of course, it should really be h.doSomething(SOME_FUNKY_CONSTANT_NAME); ...

As for call resolution speed, given that any way you slice it it's a
singly-indirected virtual function call under the hood it should be the
same speed as any other polymorphic call, and almost the same speed as a
C++ one with a decently-performing JIT-capable VM. If Java had
multiply-dynamic binding OOTB, then polymorphic methods would get slower
proportionately to the number of dynamically-bound-to arguments, but
Java doesn't (although it actually does *almost* have closures, if you
know where to look)...
 
C

Chris Uppal

Daniel said:
The overhead of method calls of any sort or considered trivial in most
circumstances.

Agreed, but not in /all/ circumstances. The OP has indicated that he suspects
that he has a candidate for one of the cases where it is not.

I would suggest using the approach that gives you the
best OO design, and then changing it around if its not fast enough.
Agreed.


Determining if its not fast enough should be down by using a profiler,
not a guess and check method.

Not a profiler, unless you (or he) has access to a /far/ better profiler than I
have ever encountered -- the overhead of a method call is too low for normal
profilers even to try to measure it. You /can/ use a profiler to measure the
execution time of both forms of the calling method, but that's just a
awkward and tricky way to measure the overall execution time (of the method),
and it'd be much easier to measure it directly without the (near certain) risk
that the profiler is affecting the results.

Profilers are very useful for identifiying coarse-grained hot-spots, but are
not (in my experience) any use at all in identifying fine-grained issues.

-- chris
 
C

Chris Uppal

Chris said:
I seem to recall, in years past, that the interface approach was slower.
Is this still the case?

As others have said, it depends, and (if it matters that much to you) you'll
have to measure it.

I got interested enough to try a few experiments (these were all with jdk 1.6,
but 1.5 gives very similar results).

For a start, comparing a method like:
public void method() { ++datum; }
where "datum" is an instance field. Just to get a ballpark figure, calling
that via an interface using the client JVM takes about 12 nanoseconds if called
as an override of an abstract method, vs about 15 if the same code is the
implementation of an interface. That is /not/ the bottom-line, there are many
highly important effects to consider, but even from that approximate data we
can see that the difference between the two kinds of method invocation is
/very/ small. If you are going to invoke the "same" method(s) a billion or two
times then using abstract methods will save you a handful of seconds overall.
More importantly, the time is so small, that it's difficult (but not
impossible) to imagine a case where the body of the method -- even a very
trivial method, as above -- wouldn't dominate the execution time. At this very
fine-grained level, the number of parameters will also effect the invocation
overhead -- in this case the method has one parameter (this), in other cases it
might have more or less than that.

However, if you are /that/ bothered by performance, you are not going to be
using the client JVM in the first place. Switching to the server JVM (in this
specific test) reduces both times, to about 9 vs 12 nanoseconds -- so the
relative difference is about the same.

Another thing to note is the degree of actual polymorphism at the call site.
In my test code (given in full at the end of this post), the naively expected
polymorphism is 3 (there are three different -- albeit identical --
implementations of the abstract and interface methods). What's more, each of
these possibilities occurs equally often. Considerations of polymorphism
affect how the JITed code works. A typical implementation technique (and one
used by Sun's VMs) is to attempt to compile a virtual or interface method call
into a sequence of test-and-branch instructions (thus avoiding a slow vtable
indirection). So the number of possible candidate classes which have to be
tested for at each invocation will affect both the strategy for generating the
code, and its actual execution profile. So, what happens if a given call-site
is not, in fact, heavily polymorphic ? To try that out I commented out the
last two lines of the time() function (below), so that the calls were actually
monomorphic. With the client JVM that completely eliminated the difference
between the two kinds of method invocation (about 12 nanos in either case).
However, with the server JVM, once it had realised that there was no real
polymorphism at the call site (in either case) it was able to inline the method
calls -- but that further permitted it to optimise the loops, and in fact
allowed it to remove such a large chunk of the code (not all) that the test was
no longer useful.

But that kind of optimisation makes considering call sites difficult. In the
case of my test code there are two call sites (in timeInterfaceCall() and
timeAbstractCall()) which /may/ be polymorphic. But there is no guarantee that
the optimisation will leave those alone. If the optimiser had attempted to
"split" either of those methods (effectively pushing the test-and-branch
outside the timing loop) then it would have been able to inline both cases
completely, even in the polymorphic case (there would be three largely
identical copies of each of the two timing methods). I don't know whether
Sun's current generation of JVMs is capable of splitting, but the technique is
moderately well-known, and is implemented in the Animorphic SELF VM on which a
lot of the runtime optimisation in Hotspot is based.

Another point to consider (and it's a flaw in my test) is that conditional
branches directly affect execution time on processors with pipelining and
branch prediction. In my test the CPUs branch prediction would be been almost
perfect, but that might not be the case in a real application.

Anyway, I think the bottom line is more like. The overhead is -- at worst --
very, very, small (but measurable); it is also subject to very active (and
often successful) attempts by the JIT to optimise it out.

Sample code follows. Note that it runs in an infinite loop. It takes the
JITer a while to complete its optimisations -- on my machine the JIT generated
several versions of the timing loops as it put more and more work into
optimising them, it seemed to have settled down after about 20 seconds.

-- chris


======== code ========
interface Interface
{
void interfaceCall();
}

abstract class Abstract
{
abstract public void abstractCall();
}

class Concrete1
extends Abstract
implements Interface
{
public long datum;
public void interfaceCall() { ++datum; }
public void abstractCall() { ++datum; }
}

class Concrete2
extends Abstract
implements Interface
{
public long datum;
public void interfaceCall() { ++datum; }
public void abstractCall() { ++datum; }
}

class Concrete3
extends Abstract
implements Interface
{
public long datum;
public void interfaceCall() { ++datum; }
public void abstractCall() { ++datum; }
}


public class Test
{
private static final int LOOPS_PER = 50 * 1000 * 1000;

// leave this running for a while (20 seconds minimum to allow the
// JITer time to finish optimising
public static void
main(String[] args)
{
Concrete1 c1 = new Concrete1();
Concrete2 c2 = new Concrete2();
Concrete3 c3 = new Concrete3();

while (c1.datum + c2.datum + c3.datum >= 0)
time(c1, c2, c3);
}

private static void
time(Concrete1 c1, Concrete2 c2, Concrete3 c3)
{
// comment out some of these lines to reduce/eliminate
// polymorphism at the call sites in timeAbstractCall()
// and timeInterfaceCall()
timeAbstractCall(c1); timeInterfaceCall(c1);
timeAbstractCall(c2); timeInterfaceCall(c2);
timeAbstractCall(c3); timeInterfaceCall(c3);
}

private static void
timeAbstractCall(Abstract it)
{
long start = System.currentTimeMillis();
for (int i = 0; i < LOOPS_PER; i++)
it.abstractCall();
long end = System.currentTimeMillis();
report(end-start, " abstract", LOOPS_PER);
}

private static void
timeInterfaceCall(Interface it)
{
long start = System.currentTimeMillis();
for (int i = 0; i < LOOPS_PER; i++)
it.interfaceCall();
long end = System.currentTimeMillis();
report(end-start, "interface", LOOPS_PER);
}

private static void
report(long millis, String type, int calls)
{
double nanosPerCall = millis * 1.0e6 / calls;
System.out.println(type + ": " + nanosPerCall + " nanosec/call");
}
}
=====================
 
T

Tom Hawtin

Chris said:
very, very, small (but measurable); it is also subject to very active (and
often successful) attempts by the JIT to optimise it out.

With the Sun server VM, your code shows an awful lot of 'optimiser'
activity. Running your test unaltered (on 1.6, Server VM, single core
Opteron 128), my *first* result is: abstract: 0.96 nanosec/call,
interface: 1.08 nanosec/call. It fluctuates wildly, but always slower,
from there.

The performance appears to be stumbling because although the
implementations are all called as often as one another, that is not true
if you look at a sample of the order of, say, 10,000 iterations. I'm
sure if you switched the obscure VM options to see the compile times,
they would be high.

If you call all the implementations at once:
o The results are consistent.
o The server VM times practically disappear.
o The client VM times are the same for interface and abstract class.

Tom Hawtin
 
J

John Ersatznom

Chris said:
Profilers are very useful for identifiying coarse-grained hot-spots, but are
not (in my experience) any use at all in identifying fine-grained issues.

What is useful for that? (I'd have thought running suspected hotspots by
themselves in a loop with a huge number of iterations, while profiling. :))
 
J

John Ersatznom

Tom said:
The performance appears to be stumbling because although the
implementations are all called as often as one another, that is not true
if you look at a sample of the order of, say, 10,000 iterations. I'm
sure if you switched the obscure VM options to see the compile times,
they would be high.

If you call all the implementations at once:
o The results are consistent.
o The server VM times practically disappear.
o The client VM times are the same for interface and abstract class.

It sounds from this like the JIT compiler can actually DEgrade
performance for pathological patterns of polymorphic calls. In
particular, it looks like threaded code with different threads doing the
same thing (via the same class) with different implementations of
something else (via polymorphic object parameters to that class) might
choke. Seems you get rock-solid performance if all implementations that
occur very often occur at similar frequencies and the rest are rare,
e.g. all equally common or one overwhelmingly dominant one, but if
there's a broader frequency spectrum you might be SOL.
 
T

Tom Hawtin

John said:
It sounds from this like the JIT compiler can actually DEgrade
performance for pathological patterns of polymorphic calls. In

That's a general property of optimisations. Whenever you optimise
something, expect performance in certain conditions to be pessimised.
particular, it looks like threaded code with different threads doing the
same thing (via the same class) with different implementations of
something else (via polymorphic object parameters to that class) might
choke. Seems you get rock-solid performance if all implementations that
occur very often occur at similar frequencies and the rest are rare,
e.g. all equally common or one overwhelmingly dominant one, but if
there's a broader frequency spectrum you might be SOL.

It's okay different implementations being invoked with different
frequencies, just so long as you don't get batches into the tens of
thousands in one and then another.

Tom Hawtin
 
C

Chris Uppal

John Ersatznom wrote:

[me:]
What is useful for that? (I'd have thought running suspected hotspots by
themselves in a loop with a huge number of iterations, while profiling.
:))

Assuming you don't want to go the analytic route (and who does ?), the only
option I know of is to run huge loops (with no profiler messing up the
execution pattern). Note that in any application where such issues matter,
there is a natural source of such loops -- the application itself.

Micro-"benchmarks" like the one I posted are useful for developing a feeling
for the numbers involved, and for identifying the kinds of situations where
fine-grained issues arise (and hence developing a decent intuition about /when/
it's worth doing real benchmarking); but they are no substitute for measuring
the target code itself.

-- chris
 
C

Chris Uppal

Tom Hawtin wrote:

[me:]
With the Sun server VM, your code shows an awful lot of 'optimiser'
activity. Running your test unaltered (on 1.6, Server VM, single core
Opteron 128), my *first* result is: abstract: 0.96 nanosec/call,
interface: 1.08 nanosec/call. It fluctuates wildly, but always slower,
from there.

That's very odd. I'm running on a 1.5 GHz Pentium "M" (which I would have
thought was similar enough that the JITing would work very similarly if not
identically), and see nothing like that pattern. At the start the times vary
wildly, but after 20 seconds or so they settle down to a neat (but not perfect)
pattern showing the numbers I mentioned. (The monomorphic case with -server is
different, as I mentioned before).

Do you know a better way to tell what the optimiser is doing than using the JVM
TI interfaces ? That is what I was using to monitor the JIT (in separate runs
from the ones where I quoted times, of course), and it showed up to four
versions of the timing loops being generated over the first up to ten seconds
with no further reported activity thereafter (over several minutes).

The performance appears to be stumbling because although the
implementations are all called as often as one another, that is not true
if you look at a sample of the order of, say, 10,000 iterations.

Continually repeated re-JITing doesn't sound at all like what I was seeing.

If you call all the implementations at once:
o The results are consistent.
o The server VM times practically disappear.
o The client VM times are the same for interface and abstract class.

I know of no good way to "call all the implementations at once" and still be
able to measure anything using the relatively crude tools available to me. I
suppose I could reduce the number of loops to, say, 1000, and then collect and
average /lots/ of such micro-runs, is that what you tried ?

-- chris
 
C

Chris Uppal

John said:
It sounds from this like the JIT compiler can actually DEgrade
performance for pathological patterns of polymorphic calls.

That's always possible, since the JIT itself is not free. If it decides to do
some work, then it is gambling that the (unknown) future execution patterns
will be such that the extra work now turns out to be of nett value in the long
run -- but there's no guarantee that the gamble will pay off.

-- chris
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,767
Messages
2,569,572
Members
45,046
Latest member
Gavizuho

Latest Threads

Top