Speed of interfaces vs inheritance

Discussion in 'Java' started by Chris, Dec 30, 2006.

  1. Chris

    Chris Guest

    In a particularly time-critical part of my app I need to call a method
    on an external class in a tight loop. The class can be different in
    different contexts. I've got two choices: create an interface and have
    the various different classes implement it, or create an abstract class
    and have the various different classes extend it.

    I seem to recall, in years past, that the interface approach was slower.
    Is this still the case?

    (In this particular situation, either approach works fine, though I
    personally like the interface approach better).
     
    Chris, Dec 30, 2006
    #1
    1. Advertisements

  2. The only way to answer this question is to write a test.
     
    Mike Schilling, Dec 30, 2006
    #2
    1. Advertisements

  3. I guess that would be implementation specific.

    I would expect them to be relative close in performance
    due the similarity of what has to be done.

    And if the method is actually doing some work the call overhead should
    be minimal compared to what is happening inside the method.

    A quick test with some randomly created code on what happen
    to be my default Java (SUN Java 1.5 Win32) interface is
    actually faster than abstract base class.

    Arne
     
    =?ISO-8859-1?Q?Arne_Vajh=F8j?=, Dec 30, 2006
    #3
  4. Chris

    Tom Hawtin Guest

    Yeah, but write a good one...

    FWIW, a couple years ago I tested dynamically loading a class through a
    class loader and having a loop repeatedly calling an (interface) method
    which threw an exception. Under the Sun server HotSpot, the loop was
    eliminated. Client performance wasn't too bad either.

    I believe interface methods can get slow where the same runtime class is
    called through a number of different interfaces. A highly unlikely
    situations, particularly if you were considering an abstract base class.

    Tom Hawtin
     
    Tom Hawtin, Dec 30, 2006
    #4
  5. Chris

    Daniel Pitts Guest

    The overhead of method calls of any sort or considered trivial in most
    circumstances. I would suggest using the approach that gives you the
    best OO design, and then changing it around if its not fast enough.
    Determining if its not fast enough should be down by using a profiler,
    not a guess and check method.
     
    Daniel Pitts, Dec 31, 2006
    #5

  6. The other advice about testing instead of guessing is the most
    important one to follow .

    Aside from that, if your design choice of ABC or Interface causes such
    a performance problem, then maybe there's another design approach.

    Tell Don't Ask....


    Instead of your code doing the loop and worrying about performance,
    Tell the two other classes to do the loop for you.


    so, instead of

    class Client {

    public void run() {
    HttpServer h = new HttpServer();

    for (int i = 0; i < 65000; i++ {
    h.doSomething();
    }

    }


    we have....

    class Client {

    public void run() {
    HttpServer h = new HttpServer();

    h.doSomething(); // for loop logic moved into HttpServer
    }
    }

    Andrew
     
    andrewmcdonagh, Jan 1, 2007
    #6
  7. Shouldn't that be h.doSomething(65000)? Unless even the length of the
    loop should be up to the HttpServer class and not policy set by the
    caller... :)

    Of course, it should really be h.doSomething(SOME_FUNKY_CONSTANT_NAME); ...

    As for call resolution speed, given that any way you slice it it's a
    singly-indirected virtual function call under the hood it should be the
    same speed as any other polymorphic call, and almost the same speed as a
    C++ one with a decently-performing JIT-capable VM. If Java had
    multiply-dynamic binding OOTB, then polymorphic methods would get slower
    proportionately to the number of dynamically-bound-to arguments, but
    Java doesn't (although it actually does *almost* have closures, if you
    know where to look)...
     
    John Ersatznom, Jan 4, 2007
    #7
  8. Chris

    Chris Uppal Guest

    Agreed, but not in /all/ circumstances. The OP has indicated that he suspects
    that he has a candidate for one of the cases where it is not.

    Not a profiler, unless you (or he) has access to a /far/ better profiler than I
    have ever encountered -- the overhead of a method call is too low for normal
    profilers even to try to measure it. You /can/ use a profiler to measure the
    execution time of both forms of the calling method, but that's just a
    awkward and tricky way to measure the overall execution time (of the method),
    and it'd be much easier to measure it directly without the (near certain) risk
    that the profiler is affecting the results.

    Profilers are very useful for identifiying coarse-grained hot-spots, but are
    not (in my experience) any use at all in identifying fine-grained issues.

    -- chris
     
    Chris Uppal, Jan 4, 2007
    #8
  9. Chris

    Chris Uppal Guest

    As others have said, it depends, and (if it matters that much to you) you'll
    have to measure it.

    I got interested enough to try a few experiments (these were all with jdk 1.6,
    but 1.5 gives very similar results).

    For a start, comparing a method like:
    public void method() { ++datum; }
    where "datum" is an instance field. Just to get a ballpark figure, calling
    that via an interface using the client JVM takes about 12 nanoseconds if called
    as an override of an abstract method, vs about 15 if the same code is the
    implementation of an interface. That is /not/ the bottom-line, there are many
    highly important effects to consider, but even from that approximate data we
    can see that the difference between the two kinds of method invocation is
    /very/ small. If you are going to invoke the "same" method(s) a billion or two
    times then using abstract methods will save you a handful of seconds overall.
    More importantly, the time is so small, that it's difficult (but not
    impossible) to imagine a case where the body of the method -- even a very
    trivial method, as above -- wouldn't dominate the execution time. At this very
    fine-grained level, the number of parameters will also effect the invocation
    overhead -- in this case the method has one parameter (this), in other cases it
    might have more or less than that.

    However, if you are /that/ bothered by performance, you are not going to be
    using the client JVM in the first place. Switching to the server JVM (in this
    specific test) reduces both times, to about 9 vs 12 nanoseconds -- so the
    relative difference is about the same.

    Another thing to note is the degree of actual polymorphism at the call site.
    In my test code (given in full at the end of this post), the naively expected
    polymorphism is 3 (there are three different -- albeit identical --
    implementations of the abstract and interface methods). What's more, each of
    these possibilities occurs equally often. Considerations of polymorphism
    affect how the JITed code works. A typical implementation technique (and one
    used by Sun's VMs) is to attempt to compile a virtual or interface method call
    into a sequence of test-and-branch instructions (thus avoiding a slow vtable
    indirection). So the number of possible candidate classes which have to be
    tested for at each invocation will affect both the strategy for generating the
    code, and its actual execution profile. So, what happens if a given call-site
    is not, in fact, heavily polymorphic ? To try that out I commented out the
    last two lines of the time() function (below), so that the calls were actually
    monomorphic. With the client JVM that completely eliminated the difference
    between the two kinds of method invocation (about 12 nanos in either case).
    However, with the server JVM, once it had realised that there was no real
    polymorphism at the call site (in either case) it was able to inline the method
    calls -- but that further permitted it to optimise the loops, and in fact
    allowed it to remove such a large chunk of the code (not all) that the test was
    no longer useful.

    But that kind of optimisation makes considering call sites difficult. In the
    case of my test code there are two call sites (in timeInterfaceCall() and
    timeAbstractCall()) which /may/ be polymorphic. But there is no guarantee that
    the optimisation will leave those alone. If the optimiser had attempted to
    "split" either of those methods (effectively pushing the test-and-branch
    outside the timing loop) then it would have been able to inline both cases
    completely, even in the polymorphic case (there would be three largely
    identical copies of each of the two timing methods). I don't know whether
    Sun's current generation of JVMs is capable of splitting, but the technique is
    moderately well-known, and is implemented in the Animorphic SELF VM on which a
    lot of the runtime optimisation in Hotspot is based.

    Another point to consider (and it's a flaw in my test) is that conditional
    branches directly affect execution time on processors with pipelining and
    branch prediction. In my test the CPUs branch prediction would be been almost
    perfect, but that might not be the case in a real application.

    Anyway, I think the bottom line is more like. The overhead is -- at worst --
    very, very, small (but measurable); it is also subject to very active (and
    often successful) attempts by the JIT to optimise it out.

    Sample code follows. Note that it runs in an infinite loop. It takes the
    JITer a while to complete its optimisations -- on my machine the JIT generated
    several versions of the timing loops as it put more and more work into
    optimising them, it seemed to have settled down after about 20 seconds.

    -- chris


    ======== code ========
    interface Interface
    {
    void interfaceCall();
    }

    abstract class Abstract
    {
    abstract public void abstractCall();
    }

    class Concrete1
    extends Abstract
    implements Interface
    {
    public long datum;
    public void interfaceCall() { ++datum; }
    public void abstractCall() { ++datum; }
    }

    class Concrete2
    extends Abstract
    implements Interface
    {
    public long datum;
    public void interfaceCall() { ++datum; }
    public void abstractCall() { ++datum; }
    }

    class Concrete3
    extends Abstract
    implements Interface
    {
    public long datum;
    public void interfaceCall() { ++datum; }
    public void abstractCall() { ++datum; }
    }


    public class Test
    {
    private static final int LOOPS_PER = 50 * 1000 * 1000;

    // leave this running for a while (20 seconds minimum to allow the
    // JITer time to finish optimising
    public static void
    main(String[] args)
    {
    Concrete1 c1 = new Concrete1();
    Concrete2 c2 = new Concrete2();
    Concrete3 c3 = new Concrete3();

    while (c1.datum + c2.datum + c3.datum >= 0)
    time(c1, c2, c3);
    }

    private static void
    time(Concrete1 c1, Concrete2 c2, Concrete3 c3)
    {
    // comment out some of these lines to reduce/eliminate
    // polymorphism at the call sites in timeAbstractCall()
    // and timeInterfaceCall()
    timeAbstractCall(c1); timeInterfaceCall(c1);
    timeAbstractCall(c2); timeInterfaceCall(c2);
    timeAbstractCall(c3); timeInterfaceCall(c3);
    }

    private static void
    timeAbstractCall(Abstract it)
    {
    long start = System.currentTimeMillis();
    for (int i = 0; i < LOOPS_PER; i++)
    it.abstractCall();
    long end = System.currentTimeMillis();
    report(end-start, " abstract", LOOPS_PER);
    }

    private static void
    timeInterfaceCall(Interface it)
    {
    long start = System.currentTimeMillis();
    for (int i = 0; i < LOOPS_PER; i++)
    it.interfaceCall();
    long end = System.currentTimeMillis();
    report(end-start, "interface", LOOPS_PER);
    }

    private static void
    report(long millis, String type, int calls)
    {
    double nanosPerCall = millis * 1.0e6 / calls;
    System.out.println(type + ": " + nanosPerCall + " nanosec/call");
    }
    }
    =====================
     
    Chris Uppal, Jan 4, 2007
    #9
  10. Chris

    Tom Hawtin Guest

    With the Sun server VM, your code shows an awful lot of 'optimiser'
    activity. Running your test unaltered (on 1.6, Server VM, single core
    Opteron 128), my *first* result is: abstract: 0.96 nanosec/call,
    interface: 1.08 nanosec/call. It fluctuates wildly, but always slower,
    from there.

    The performance appears to be stumbling because although the
    implementations are all called as often as one another, that is not true
    if you look at a sample of the order of, say, 10,000 iterations. I'm
    sure if you switched the obscure VM options to see the compile times,
    they would be high.

    If you call all the implementations at once:
    o The results are consistent.
    o The server VM times practically disappear.
    o The client VM times are the same for interface and abstract class.

    Tom Hawtin
     
    Tom Hawtin, Jan 4, 2007
    #10
  11. What is useful for that? (I'd have thought running suspected hotspots by
    themselves in a loop with a huge number of iterations, while profiling. :))
     
    John Ersatznom, Jan 5, 2007
    #11
  12. It sounds from this like the JIT compiler can actually DEgrade
    performance for pathological patterns of polymorphic calls. In
    particular, it looks like threaded code with different threads doing the
    same thing (via the same class) with different implementations of
    something else (via polymorphic object parameters to that class) might
    choke. Seems you get rock-solid performance if all implementations that
    occur very often occur at similar frequencies and the rest are rare,
    e.g. all equally common or one overwhelmingly dominant one, but if
    there's a broader frequency spectrum you might be SOL.
     
    John Ersatznom, Jan 6, 2007
    #12
  13. Chris

    Tom Hawtin Guest

    That's a general property of optimisations. Whenever you optimise
    something, expect performance in certain conditions to be pessimised.
    It's okay different implementations being invoked with different
    frequencies, just so long as you don't get batches into the tens of
    thousands in one and then another.

    Tom Hawtin
     
    Tom Hawtin, Jan 6, 2007
    #13
  14. Chris

    Chris Uppal Guest

    John Ersatznom wrote:

    [me:]
    Assuming you don't want to go the analytic route (and who does ?), the only
    option I know of is to run huge loops (with no profiler messing up the
    execution pattern). Note that in any application where such issues matter,
    there is a natural source of such loops -- the application itself.

    Micro-"benchmarks" like the one I posted are useful for developing a feeling
    for the numbers involved, and for identifying the kinds of situations where
    fine-grained issues arise (and hence developing a decent intuition about /when/
    it's worth doing real benchmarking); but they are no substitute for measuring
    the target code itself.

    -- chris
     
    Chris Uppal, Jan 6, 2007
    #14
  15. Chris

    Chris Uppal Guest

    Tom Hawtin wrote:

    [me:]
    That's very odd. I'm running on a 1.5 GHz Pentium "M" (which I would have
    thought was similar enough that the JITing would work very similarly if not
    identically), and see nothing like that pattern. At the start the times vary
    wildly, but after 20 seconds or so they settle down to a neat (but not perfect)
    pattern showing the numbers I mentioned. (The monomorphic case with -server is
    different, as I mentioned before).

    Do you know a better way to tell what the optimiser is doing than using the JVM
    TI interfaces ? That is what I was using to monitor the JIT (in separate runs
    from the ones where I quoted times, of course), and it showed up to four
    versions of the timing loops being generated over the first up to ten seconds
    with no further reported activity thereafter (over several minutes).

    Continually repeated re-JITing doesn't sound at all like what I was seeing.

    I know of no good way to "call all the implementations at once" and still be
    able to measure anything using the relatively crude tools available to me. I
    suppose I could reduce the number of loops to, say, 1000, and then collect and
    average /lots/ of such micro-runs, is that what you tried ?

    -- chris
     
    Chris Uppal, Jan 6, 2007
    #15
  16. Chris

    Chris Uppal Guest

    That's always possible, since the JIT itself is not free. If it decides to do
    some work, then it is gambling that the (unknown) future execution patterns
    will be such that the extra work now turns out to be of nett value in the long
    run -- but there's no guarantee that the gamble will pay off.

    -- chris
     
    Chris Uppal, Jan 6, 2007
    #16
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.