Is Java 1.6 number crunching slower than 1.5?

Discussion in 'Java' started by Kevin McMurtrie, Aug 13, 2009.

  1. I'm tuning some graphics routines that do pure number crunching.
    There's no AWT, Swing, memory allocation, java.lang.Math, or system
    calls in the main loops. It's just running kernels over RGBA int arrays
    to apply an anti-aliased affine transformation. I've optimized the code
    over time but one thing always remains the same: Java 1.6 benchmarks at
    4% slower than 1.5.

    Is there anything I should look out for in Java 1.6 HotSpot? Different
    register allocation? Slower array access (pointer math)? Strange
    runtime overhead in loops? Is it just an Apple thing that I shouldn't
    worry about?

    Version info:

    MacOS X 10.5.8

    Darwin desktop.pixelmemory.us 9.8.0 Darwin Kernel Version 9.8.0: Wed Jul
    15 16:55:01 PDT 2009; root:xnu-1228.15.4~1/RELEASE_I386 i386

    java version "1.5.0_20"
    Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_20-b02-308)
    Java HotSpot(TM) 64-Bit Server VM (build 1.5.0_19-137, mixed mode)

    java version "1.6.0_15"
    Java(TM) SE Runtime Environment (build 1.6.0_15-b02-215)
    Java HotSpot(TM) 64-Bit Server VM (build 14.1-b02-87, mixed mode)

    --
    I will not see your reply if you use Google.
    Kevin McMurtrie, Aug 13, 2009
    #1
    1. Advertising

  2. Kevin McMurtrie

    Roedy Green Guest

    On Thu, 13 Aug 2009 01:15:01 -0700, Kevin McMurtrie
    <> wrote, quoted or indirectly quoted someone who
    said :

    >Is there anything I should look out for in Java 1.6 HotSpot? Different
    >register allocation?


    I would guess all it is is that 1.6 library code is fatter. Even with
    the same code generated for your app, 1.6 would run slower in the same
    amount of RAM.

    Even if you don't use methods of a used class, the code for them still
    gets loaded.

    Do an experiment. In a RAM rich machine crank up the various RAM
    allocations on the java.exe command line as see if the gap narrows.
    --
    Roedy Green Canadian Mind Products
    http://mindprod.com

    "If you think it’s expensive to hire a professional to do the job, wait until you hire an amateur."
    ~ Red Adair (born: 1915-06-18 died: 2004-08-07 at age: 89)
    Roedy Green, Aug 13, 2009
    #2
    1. Advertising

  3. In article <>,
    Roedy Green <> wrote:

    > On Thu, 13 Aug 2009 01:15:01 -0700, Kevin McMurtrie
    > <> wrote, quoted or indirectly quoted someone who
    > said :
    >
    > >Is there anything I should look out for in Java 1.6 HotSpot? Different
    > >register allocation?

    >
    > I would guess all it is is that 1.6 library code is fatter. Even with
    > the same code generated for your app, 1.6 would run slower in the same
    > amount of RAM.
    >
    > Even if you don't use methods of a used class, the code for them still
    > gets loaded.
    >
    > Do an experiment. In a RAM rich machine crank up the various RAM
    > allocations on the java.exe command line as see if the gap narrows.


    I'm testing on a 5GB machine with a 3GB heap.

    --
    I will not see your reply if you use Google.
    Kevin McMurtrie, Aug 13, 2009
    #3
  4. Kevin McMurtrie

    Roedy Green Guest

    On Thu, 13 Aug 2009 08:44:25 -0700, Kevin McMurtrie
    <> wrote, quoted or indirectly quoted someone who
    said :

    >I'm testing on a 5GB machine with a 3GB heap.


    You can disassemble to see the generated class code, but I don't know
    what you might use (without spending) that would let you peek at the
    generated machine code. Is there esoteric option or annotation to get
    a look?

    Perhaps you could use nanotime to benchmark some code that does not do
    memory allocation. It seems unlikely number crunching code would not
    get faster with each release.

    You need to figure out some way to discount GC overhead and OS
    swapping overhead. I would expect that to be bigger in 1.6.
    --
    Roedy Green Canadian Mind Products
    http://mindprod.com

    "If you think it’s expensive to hire a professional to do the job, wait until you hire an amateur."
    ~ Red Adair (born: 1915-06-18 died: 2004-08-07 at age: 89)
    Roedy Green, Aug 13, 2009
    #4
  5. Kevin McMurtrie

    markspace Guest

    Kevin McMurtrie wrote:

    > Is there anything I should look out for in Java 1.6 HotSpot? Different
    > register allocation? Slower array access (pointer math)? Strange
    > runtime overhead in loops? Is it just an Apple thing that I shouldn't
    > worry about?
    >


    4% sounds like a very small number, and it might be hard to find what is
    causing that in the code. I'm hardly an expert, but I'll take a wild
    stab at it.

    First question would be what optimizations and/or parameters are you
    passing to each JVM? This is an area where the two JVMs might not be
    exactly equivalent, as the defaults tend to vary from build to build (as
    we were just discussing on another thread.)
    markspace, Aug 13, 2009
    #5
  6. Kevin McMurtrie

    Arne Vajhøj Guest

    Kevin McMurtrie wrote:
    > I'm tuning some graphics routines that do pure number crunching.
    > There's no AWT, Swing, memory allocation, java.lang.Math, or system
    > calls in the main loops. It's just running kernels over RGBA int arrays
    > to apply an anti-aliased affine transformation. I've optimized the code
    > over time but one thing always remains the same: Java 1.6 benchmarks at
    > 4% slower than 1.5.
    >
    > Is there anything I should look out for in Java 1.6 HotSpot? Different
    > register allocation? Slower array access (pointer math)? Strange
    > runtime overhead in loops? Is it just an Apple thing that I shouldn't
    > worry about?


    1.6 is slightly different from 1.5 - some things may be faster, some
    things may be slower.

    I have my own little micro benchmark and it shows identical int and
    double performance but significantly improved String performance
    from 1.5 to 1.6.

    But different benchmarks will give different results.

    I obviously assume that you are using -server.

    But you could try and experiment with some of the more
    exotic -XX options.

    Arne
    Arne Vajhøj, Aug 13, 2009
    #6
  7. Kevin McMurtrie

    Arne Vajhøj Guest

    Roedy Green wrote:
    > On Thu, 13 Aug 2009 08:44:25 -0700, Kevin McMurtrie
    > <> wrote, quoted or indirectly quoted someone who
    > said :
    >> I'm testing on a 5GB machine with a 3GB heap.

    >
    > You can disassemble to see the generated class code,


    Since the optimization is done in the JIT compiler not in javac, then
    there is not much point in that.

    > Perhaps you could use nanotime to benchmark some code


    Benchmarks that need nanotime have to much uncertainty on a typical
    multiuser OS.

    > It seems unlikely number crunching code would not
    > get faster with each release.


    It is not unlikely that some number crunching code would get slower.

    It is unlikely that the majority of number crunching code would
    get slower.

    > You need to figure out some way to discount GC overhead and OS
    > swapping overhead. I would expect that to be bigger in 1.6.


    Give that the OP stated:
    There's no ... , memory allocation, ...
    then he does not.

    Arne
    Arne Vajhøj, Aug 13, 2009
    #7
  8. On Aug 13, 9:45 pm, Roedy Green <>
    wrote:
    >
    > You can disassemble to see the generated class code, but I don't know
    > what you might use (without spending) that would let you peek at the
    > generated machine code. Is there esoteric option or annotation to get
    > a look?


    There is such an option (-XX:+PrintAssembly), but unfortunately I
    think
    it is disabled in the production build of the VM. If you are
    interested
    enough, it is possible to write a very simple agent in JVMTI that can
    accomplish this as well.

    Regards,
    Daniel
    Daniel Sjöblom, Aug 14, 2009
    #8
  9. I pulled out a few bits of code and patched it together so a test case
    does the same kind of math as the real deal. (Don't be a style freak -
    it's demo fragment squished to fit in a Usenet posting.)

    Machine:
    MacOS X 10.5.8

    Darwin desktop.pixelmemory.us 9.8.0 Darwin Kernel Version 9.8.0: Wed Jul
    15 16:55:01 PDT 2009; root:xnu-1228.15.4~1/RELEASE_I386 i386

    ---------------
    Java 1.5

    Version:
    java version "1.5.0_20"
    Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_20-b02-308)
    Java HotSpot(TM) 64-Bit Server VM (build 1.5.0_19-137, mixed mode)

    Options: -d64 -mx2G

    Output:
    Millis: 5979.04
    Millis: 5984.168
    Millis: 5987.027
    Millis: 5979.992
    Millis: 5953.974

    ---------------
    Java 1.6

    Version:
    java version "1.6.0_15"
    Java(TM) SE Runtime Environment (build 1.6.0_15-b02-215)
    Java HotSpot(TM) 64-Bit Server VM (build 14.1-b02-87, mixed mode)

    Options: -d64 -mx2G

    Output:
    Millis: 6943.407
    Millis: 6937.324
    Millis: 6917.524
    Millis: 6931.662
    Millis: 6917.065

    ---------------

    public class Benchmark
    {
    static final int s_offR= 16, s_offG= 8, s_offB= 0, s_offA= 24;
    static final int s_maskR= 0xff0000, s_maskG= 0xff00,
    s_maskB= 0xff, s_maskA= 0xff000000;

    public static void main (final String args[])
    {
    final Benchmark b= new Benchmark ();
    b.rasterize();

    for (int restest= 0; restest < 5; ++restest)
    {
    final long start= System.nanoTime();
    for (int i= 0; i < 1000; ++i)
    b.rasterize();
    final long end= System.nanoTime();

    System.out.println("Millis: " + (end - start) / 1000000d);
    }
    }


    final int m_src[], m_dst[];
    final int m_srcYstride, m_dstYstride;
    float m_R, m_G, m_B, m_A;

    public Benchmark ()
    {
    m_src= new int [640 * 480];
    m_dst= new int [320 * 240];
    m_srcYstride= 640;
    m_dstYstride= 320;
    }

    void rasterize ()
    {
    final short kerns[][][]= new short[][][] {
    {{300, 134, -23, 121}, {234, 45, 12, -18},
    {37, -86, 7, 0}, {4, 86, -13, 197}},
    {{300, 134, -23, 123}, {45, 234, 12, -20},
    {37, -54, 7, 0}, {4, 54, -13, 197}}
    };
    final int sum= 1069;


    for (int srcY= 0, dstY= 0, ySrcScan= 0;
    srcY < (480 - 4);
    srcY+= 2, ++dstY, ySrcScan+= 2*m_srcYstride)
    {
    for (int srcX= 0, dstX= 0; srcX < (640 - 4); srcX+= 2, ++dstX)
    {
    m_R= m_G= m_B= m_A= 0;
    read4x4 (kerns[srcY & 1], m_src[ySrcScan + srcX], 0, 0);
    writePixel (sum, dstX, dstY);
    }
    }
    }

    final void writePixel
    (final float kernelSum, final int dstX, final int dstY)
    {
    final int v;
    if (m_A > 0)
    {
    final int r= (int)(m_R / m_A);
    final int g= (int)(m_G / m_A);
    final int b= (int)(m_B / m_A);
    final int a= (int)(m_A / kernelSum);
    v= (((r < 0) ? 0 : ((r > 255) ? 255 : r) << s_offR) & s_maskR)
    | (((g < 0) ? 0 : ((g > 255) ? 255 : g) << s_offG) & s_maskG)
    | (((b < 0) ? 0 : ((b > 255) ? 255 : b) << s_offB) & s_maskB)
    | (((a < 0) ? 0 : ((a > 255) ? 255 : a) << s_offA) & s_maskA);
    }
    else
    v= 0;

    m_dst[dstX + dstY * m_dstYstride]= v;
    }


    final void read4x4
    (final short[][] array, final int scan, final int kx, final int ky)
    {
    int R = 0, G = 0, B = 0, A = 0;
    final short k0[]= array[ky];
    final short k1[]= array[ky + 1];
    final short k2[]= array[ky + 2];
    final short k3[]= array[ky + 3];

    {
    final short k = k0[kx + 0];
    final int value = m_src[scan];
    final int alphaMult = ((value & s_maskA) >>> s_offA) * k;
    R= alphaMult * ((value & s_maskR) >>> s_offR);
    G= alphaMult * ((value & s_maskG) >>> s_offG);
    B= alphaMult * ((value & s_maskB) >>> s_offB);
    A= alphaMult;
    }
    {
    final short k = k0[kx + 1];
    final int value = m_src[scan + 1];
    final int alphaMult = ((value & s_maskA) >>> s_offA) * k;
    R += alphaMult * ((value & s_maskR) >>> s_offR);
    G += alphaMult * ((value & s_maskG) >>> s_offG);
    B += alphaMult * ((value & s_maskB) >>> s_offB);
    A += alphaMult;
    }
    {
    final short k = k0[kx + 2];
    final int value = m_src[scan + 2];
    final int alphaMult = ((value & s_maskA) >>> s_offA) * k;
    R += alphaMult * ((value & s_maskR) >>> s_offR);
    G += alphaMult * ((value & s_maskG) >>> s_offG);
    B += alphaMult * ((value & s_maskB) >>> s_offB);
    A += alphaMult;
    }
    {
    final short k = k0[kx + 3];
    final int value = m_src[scan + 3];
    final int alphaMult = ((value & s_maskA) >>> s_offA) * k;
    R += alphaMult * ((value & s_maskR) >>> s_offR);
    G += alphaMult * ((value & s_maskG) >>> s_offG);
    B += alphaMult * ((value & s_maskB) >>> s_offB);
    A += alphaMult;
    }
    {
    final short k = k1[kx + 0];
    final int value = m_src[scan + m_srcYstride];
    final int alphaMult = ((value & s_maskA) >>> s_offA) * k;
    R += alphaMult * ((value & s_maskR) >>> s_offR);
    G += alphaMult * ((value & s_maskG) >>> s_offG);
    B += alphaMult * ((value & s_maskB) >>> s_offB);
    A += alphaMult;
    }
    {
    final short k = k1[kx + 1];
    final int value = m_src[scan + m_srcYstride + 1];
    final int alphaMult = ((value & s_maskA) >>> s_offA) * k;
    R += alphaMult * ((value & s_maskR) >>> s_offR);
    G += alphaMult * ((value & s_maskG) >>> s_offG);
    B += alphaMult * ((value & s_maskB) >>> s_offB);
    A += alphaMult;
    }
    {
    final short k = k1[kx + 2];
    final int value = m_src[scan + m_srcYstride + 2];
    final int alphaMult = ((value & s_maskA) >>> s_offA) * k;
    R += alphaMult * ((value & s_maskR) >>> s_offR);
    G += alphaMult * ((value & s_maskG) >>> s_offG);
    B += alphaMult * ((value & s_maskB) >>> s_offB);
    A += alphaMult;
    }
    {
    final short k = k1[kx + 3];
    final int value = m_src[scan + m_srcYstride + 3];
    final int alphaMult = ((value & s_maskA) >>> s_offA) * k;
    R += alphaMult * ((value & s_maskR) >>> s_offR);
    G += alphaMult * ((value & s_maskG) >>> s_offG);
    B += alphaMult * ((value & s_maskB) >>> s_offB);
    A += alphaMult;
    }
    {
    final short k = k2[kx + 0];
    final int value = m_src[scan + m_srcYstride + m_srcYstride];
    final int alphaMult = ((value & s_maskA) >>> s_offA) * k;
    R += alphaMult * ((value & s_maskR) >>> s_offR);
    G += alphaMult * ((value & s_maskG) >>> s_offG);
    B += alphaMult * ((value & s_maskB) >>> s_offB);
    A += alphaMult;
    }
    {
    final short k = k2[kx + 1];
    final int value = m_src[scan + m_srcYstride + m_srcYstride + 1];
    final int alphaMult = ((value & s_maskA) >>> s_offA) * k;
    R += alphaMult * ((value & s_maskR) >>> s_offR);
    G += alphaMult * ((value & s_maskG) >>> s_offG);
    B += alphaMult * ((value & s_maskB) >>> s_offB);
    A += alphaMult;
    }
    {
    final short k = k2[kx + 2];
    final int value = m_src[scan + m_srcYstride + m_srcYstride + 2];
    final int alphaMult = ((value & s_maskA) >>> s_offA) * k;
    R += alphaMult * ((value & s_maskR) >>> s_offR);
    G += alphaMult * ((value & s_maskG) >>> s_offG);
    B += alphaMult * ((value & s_maskB) >>> s_offB);
    A += alphaMult;
    }
    {
    final short k = k2[kx + 3];
    final int value = m_src[scan + m_srcYstride + m_srcYstride + 3];
    final int alphaMult = ((value & s_maskA) >>> s_offA) * k;
    R += alphaMult * ((value & s_maskR) >>> s_offR);
    G += alphaMult * ((value & s_maskG) >>> s_offG);
    B += alphaMult * ((value & s_maskB) >>> s_offB);
    A += alphaMult;
    }
    {
    final short k = k3[kx + 0];
    final int value = m_src[scan + 3 * m_srcYstride];
    final int alphaMult = ((value & s_maskA) >>> s_offA) * k;
    R += alphaMult * ((value & s_maskR) >>> s_offR);
    G += alphaMult * ((value & s_maskG) >>> s_offG);
    B += alphaMult * ((value & s_maskB) >>> s_offB);
    A += alphaMult;
    }
    {
    final short k = k3[kx + 1];
    final int value = m_src[scan + 3 * m_srcYstride + 1];
    final int alphaMult = ((value & s_maskA) >>> s_offA) * k;
    R += alphaMult * ((value & s_maskR) >>> s_offR);
    G += alphaMult * ((value & s_maskG) >>> s_offG);
    B += alphaMult * ((value & s_maskB) >>> s_offB);
    A += alphaMult;
    }
    {
    final short k = k3[kx + 2];
    final int value = m_src[scan + 3 * m_srcYstride + 2];
    final int alphaMult = ((value & s_maskA) >>> s_offA) * k;
    R += alphaMult * ((value & s_maskR) >>> s_offR);
    G += alphaMult * ((value & s_maskG) >>> s_offG);
    B += alphaMult * ((value & s_maskB) >>> s_offB);
    A += alphaMult;
    }
    {
    final short k = k3[kx + 3];
    final int value = m_src[scan + 3 * m_srcYstride + 3];
    final int alphaMult = ((value & s_maskA) >>> s_offA) * k;
    R += alphaMult * ((value & s_maskR) >>> s_offR);
    G += alphaMult * ((value & s_maskG) >>> s_offG);
    B += alphaMult * ((value & s_maskB) >>> s_offB);
    A += alphaMult;
    }

    m_R+= R;
    m_G+= G;
    m_B+= B;
    m_A+= A;
    }
    }

    --
    I will not see your reply if you use Google.
    Kevin McMurtrie, Aug 15, 2009
    #9
  10. Kevin McMurtrie

    Tom Anderson Guest

    On Fri, 14 Aug 2009, Kevin McMurtrie wrote:

    > I pulled out a few bits of code and patched it together so a test case
    > does the same kind of math as the real deal. (Don't be a style freak -
    > it's demo fragment squished to fit in a Usenet posting.)
    >
    > Machine:
    > MacOS X 10.5.8
    >
    > Darwin desktop.pixelmemory.us 9.8.0 Darwin Kernel Version 9.8.0: Wed Jul
    > 15 16:55:01 PDT 2009; root:xnu-1228.15.4~1/RELEASE_I386 i386
    >
    > ---------------
    > Java 1.5
    >
    > Version:
    > java version "1.5.0_20"
    > Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_20-b02-308)
    > Java HotSpot(TM) 64-Bit Server VM (build 1.5.0_19-137, mixed mode)
    >
    > Options: -d64 -mx2G
    >
    > Output:
    > Millis: 5979.04
    > Millis: 5984.168
    > Millis: 5987.027
    > Millis: 5979.992
    > Millis: 5953.974
    >
    > ---------------
    > Java 1.6
    >
    > Version:
    > java version "1.6.0_15"
    > Java(TM) SE Runtime Environment (build 1.6.0_15-b02-215)
    > Java HotSpot(TM) 64-Bit Server VM (build 14.1-b02-87, mixed mode)
    >
    > Options: -d64 -mx2G
    >
    > Output:
    > Millis: 6943.407
    > Millis: 6937.324
    > Millis: 6917.524
    > Millis: 6931.662
    > Millis: 6917.065


    Zoinks. I'd suggest filing a bug report with Sun - that is a substantial
    performance regression.

    tom

    --
    Argumentative and pedantic, oh, yes. Although it's properly called
    "correct" -- Huge
    Tom Anderson, Aug 15, 2009
    #10
  11. On 15.08.2009 13:18, Tom Anderson wrote:
    > On Fri, 14 Aug 2009, Kevin McMurtrie wrote:
    >
    >> I pulled out a few bits of code and patched it together so a test case
    >> does the same kind of math as the real deal. (Don't be a style freak -
    >> it's demo fragment squished to fit in a Usenet posting.)
    >>
    >> Machine:
    >> MacOS X 10.5.8
    >>
    >> Darwin desktop.pixelmemory.us 9.8.0 Darwin Kernel Version 9.8.0: Wed Jul
    >> 15 16:55:01 PDT 2009; root:xnu-1228.15.4~1/RELEASE_I386 i386
    >>
    >> ---------------
    >> Java 1.5
    >>
    >> Version:
    >> java version "1.5.0_20"
    >> Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_20-b02-308)
    >> Java HotSpot(TM) 64-Bit Server VM (build 1.5.0_19-137, mixed mode)
    >>
    >> Options: -d64 -mx2G
    >>
    >> Output:
    >> Millis: 5979.04
    >> Millis: 5984.168
    >> Millis: 5987.027
    >> Millis: 5979.992
    >> Millis: 5953.974
    >>
    >> ---------------
    >> Java 1.6
    >>
    >> Version:
    >> java version "1.6.0_15"
    >> Java(TM) SE Runtime Environment (build 1.6.0_15-b02-215)
    >> Java HotSpot(TM) 64-Bit Server VM (build 14.1-b02-87, mixed mode)
    >>
    >> Options: -d64 -mx2G
    >>
    >> Output:
    >> Millis: 6943.407
    >> Millis: 6937.324
    >> Millis: 6917.524
    >> Millis: 6931.662
    >> Millis: 6917.065

    >
    > Zoinks. I'd suggest filing a bug report with Sun - that is a substantial
    > performance regression.


    Maybe Sun optimized the JVM in other areas (e.g. IO bandwidth and
    throughput) which are more important for the average server application
    today. Maybe the optimization kicks in later. I am not convinced that
    what we have seen constitutes a bug.

    Kind regards

    robert

    --
    remember.guy do |as, often| as.you_can - without end
    http://blog.rubybestpractices.com/
    Robert Klemme, Aug 15, 2009
    #11
  12. Kevin McMurtrie

    Tom Anderson Guest

    On Sat, 15 Aug 2009, Robert Klemme wrote:

    > On 15.08.2009 13:18, Tom Anderson wrote:
    >> On Fri, 14 Aug 2009, Kevin McMurtrie wrote:
    >>
    >>> I pulled out a few bits of code and patched it together so a test case
    >>> does the same kind of math as the real deal. (Don't be a style freak -
    >>> it's demo fragment squished to fit in a Usenet posting.)
    >>>
    >>> Machine:
    >>> MacOS X 10.5.8
    >>>
    >>> Darwin desktop.pixelmemory.us 9.8.0 Darwin Kernel Version 9.8.0: Wed Jul
    >>> 15 16:55:01 PDT 2009; root:xnu-1228.15.4~1/RELEASE_I386 i386
    >>>
    >>> ---------------
    >>> Java 1.5
    >>>
    >>> Version:
    >>> java version "1.5.0_20"
    >>> Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_20-b02-308)
    >>> Java HotSpot(TM) 64-Bit Server VM (build 1.5.0_19-137, mixed mode)
    >>>
    >>> Options: -d64 -mx2G
    >>>
    >>> Output:
    >>> Millis: 5979.04
    >>> Millis: 5984.168
    >>> Millis: 5987.027
    >>> Millis: 5979.992
    >>> Millis: 5953.974
    >>>
    >>> ---------------
    >>> Java 1.6
    >>>
    >>> Version:
    >>> java version "1.6.0_15"
    >>> Java(TM) SE Runtime Environment (build 1.6.0_15-b02-215)
    >>> Java HotSpot(TM) 64-Bit Server VM (build 14.1-b02-87, mixed mode)
    >>>
    >>> Options: -d64 -mx2G
    >>>
    >>> Output:
    >>> Millis: 6943.407
    >>> Millis: 6937.324
    >>> Millis: 6917.524
    >>> Millis: 6931.662
    >>> Millis: 6917.065

    >>
    >> Zoinks. I'd suggest filing a bug report with Sun - that is a substantial
    >> performance regression.

    >
    > Maybe Sun optimized the JVM in other areas (e.g. IO bandwidth and throughput)
    > which are more important for the average server application today.


    Doubtless. But they've still slowed down integer array maths of the kind
    you're doing.

    > Maybe the optimization kicks in later.


    Perhaps - you could try that, right? Just change the top loop to a
    while(true), fire the test off and leave it running overnight.

    > I am not convinced that what we have seen constitutes a bug.


    It's not a bug, no, but it *is* a performance regression. How about
    telling Sun and letting them decide if it's a problem?

    That said, they must know about it - i doubt they make a release without
    doing fairly thorough benchmarking.

    Besides, if you report it, they may suggest a fix to make it faster under
    1.6 - a VM flag, or code patterns to avoid or something.

    tom

    --
    Argumentative and pedantic, oh, yes. Although it's properly called
    "correct" -- Huge
    Tom Anderson, Aug 15, 2009
    #12
  13. In article <>,
    Tom Anderson <> wrote:

    > On Fri, 14 Aug 2009, Kevin McMurtrie wrote:
    >
    > > I pulled out a few bits of code and patched it together so a test case
    > > does the same kind of math as the real deal. (Don't be a style freak -
    > > it's demo fragment squished to fit in a Usenet posting.)
    > >
    > > Machine:
    > > MacOS X 10.5.8
    > >
    > > Darwin desktop.pixelmemory.us 9.8.0 Darwin Kernel Version 9.8.0: Wed Jul
    > > 15 16:55:01 PDT 2009; root:xnu-1228.15.4~1/RELEASE_I386 i386
    > >
    > > ---------------
    > > Java 1.5
    > >
    > > Version:
    > > java version "1.5.0_20"
    > > Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_20-b02-308)
    > > Java HotSpot(TM) 64-Bit Server VM (build 1.5.0_19-137, mixed mode)
    > >
    > > Options: -d64 -mx2G
    > >
    > > Output:
    > > Millis: 5979.04
    > > Millis: 5984.168
    > > Millis: 5987.027
    > > Millis: 5979.992
    > > Millis: 5953.974
    > >
    > > ---------------
    > > Java 1.6
    > >
    > > Version:
    > > java version "1.6.0_15"
    > > Java(TM) SE Runtime Environment (build 1.6.0_15-b02-215)
    > > Java HotSpot(TM) 64-Bit Server VM (build 14.1-b02-87, mixed mode)
    > >
    > > Options: -d64 -mx2G
    > >
    > > Output:
    > > Millis: 6943.407
    > > Millis: 6937.324
    > > Millis: 6917.524
    > > Millis: 6931.662
    > > Millis: 6917.065

    >
    > Zoinks. I'd suggest filing a bug report with Sun - that is a substantial
    > performance regression.


    Tom: Sun won't care, but Apple would. Maybe. Unofficially. :)

    Kevin: How'd you get 1.5.0_20 & 1.6.0_15? I though I was patched up!

    I get a little closer race:

    $ make clean run
    rm -f *.class
    javac Benchmark.java
    java -version
    java version "1.5.0_19"
    Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_19-b02-304)
    Java HotSpot(TM) Client VM (build 1.5.0_19-137, mixed mode, sharing)
    java -d64 -mx2G Benchmark
    Millis: 5970.23
    Millis: 5962.926
    Millis: 5961.268
    Millis: 5964.333
    Millis: 5964.136

    $ make clean run
    rm -f *.class
    javac Benchmark.java
    java -version
    java version "1.6.0_13"
    Java(TM) SE Runtime Environment (build 1.6.0_13-b03-211)
    Java HotSpot(TM) 64-Bit Server VM (build 11.3-b02-83, mixed mode)
    java -d64 -mx2G Benchmark
    Millis: 6308.654
    Millis: 6299.484
    Millis: 6298.147
    Millis: 6296.435
    Millis: 6300.94

    --
    John B. Matthews
    trashgod at gmail dot com
    <http://sites.google.com/site/drjohnbmatthews>
    John B. Matthews, Aug 15, 2009
    #13
  14. Kevin McMurtrie

    Lew Guest

    Kevin McMurtrie wrote:
    >>>> ---------------
    >>>> Java 1.5
    >>>>
    >>>> Version:
    >>>> java version "1.5.0_20"
    >>>> Java(TM) 2 Runtime Environment, Standard Edition (build
    >>>> 1.5.0_20-b02-308)
    >>>> Java HotSpot(TM) 64-Bit Server VM (build 1.5.0_19-137, mixed mode)
    >>>>
    >>>> Options: -d64 -mx2G
    >>>>
    >>>> Output:
    >>>> Millis: 5979.04
    >>>> Millis: 5984.168
    >>>> Millis: 5987.027
    >>>> Millis: 5979.992
    >>>> Millis: 5953.974
    >>>>
    >>>> ---------------
    >>>> Java 1.6
    >>>>
    >>>> Version:
    >>>> java version "1.6.0_15"
    >>>> Java(TM) SE Runtime Environment (build 1.6.0_15-b02-215)
    >>>> Java HotSpot(TM) 64-Bit Server VM (build 14.1-b02-87, mixed mode)
    >>>>
    >>>> Options: -d64 -mx2G
    >>>>
    >>>> Output:
    >>>> Millis: 6943.407
    >>>> Millis: 6937.324
    >>>> Millis: 6917.524
    >>>> Millis: 6931.662
    >>>> Millis: 6917.065


    My results on a 1GB RAM 64-bit Linux installation:

    $ /opt/java/jdk1.5.0_20/bin/java -version
    java version "1.5.0_20"
    Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_20-b02)
    Java HotSpot(TM) 64-Bit Server VM (build 1.5.0_20-b02, mixed mode)

    $ /opt/java/jdk1.6.0_16/bin/java -version
    java version "1.6.0_16"
    Java(TM) SE Runtime Environment (build 1.6.0_16-b01)
    Java HotSpot(TM) 64-Bit Server VM (build 14.2-b01, mixed mode)

    $ cd ~/projects/testit/src/
    ~/projects/testit/src

    $ /opt/java/jdk1.5.0_20/bin/javac -d ../build/classes/ testit/Benchmark.java

    $ /opt/java/jdk1.5.0_20/bin/java -cp ../build/classes/ testit.Benchmark
    Millis: 11994.786
    Millis: 11238.298
    Millis: 11274.685
    Millis: 11760.477
    Millis: 12342.302

    $ /opt/java/jdk1.6.0_16/bin/java -cp ../build/classes/ testit.Benchmark
    Millis: 12608.171015
    Millis: 11576.036953
    Millis: 11428.551717
    Millis: 12272.149676
    Millis: 12045.456421

    $ /opt/java/jdk1.6.0_16/bin/javac -d ../build/classes/ testit/Benchmark.java

    $ /opt/java/jdk1.6.0_16/bin/java -cp ../build/classes/ testit.Benchmark
    Millis: 14205.535646
    Millis: 11449.014148
    Millis: 11421.515997
    Millis: 12346.804192
    Millis: 11967.196742

    $

    As you can see, not nearly the extent of difference - the timing ranges overlap.

    --
    Lew
    Lew, Aug 15, 2009
    #14
  15. On 15.08.2009 13:55, Tom Anderson wrote:
    > On Sat, 15 Aug 2009, Robert Klemme wrote:
    >
    >> On 15.08.2009 13:18, Tom Anderson wrote:
    >>> On Fri, 14 Aug 2009, Kevin McMurtrie wrote:
    >>>
    >>>> I pulled out a few bits of code and patched it together so a test case
    >>>> does the same kind of math as the real deal. (Don't be a style freak -
    >>>> it's demo fragment squished to fit in a Usenet posting.)
    >>>>
    >>>> Machine:
    >>>> MacOS X 10.5.8
    >>>>
    >>>> Darwin desktop.pixelmemory.us 9.8.0 Darwin Kernel Version 9.8.0: Wed
    >>>> Jul
    >>>> 15 16:55:01 PDT 2009; root:xnu-1228.15.4~1/RELEASE_I386 i386
    >>>>
    >>>> ---------------
    >>>> Java 1.5
    >>>>
    >>>> Version:
    >>>> java version "1.5.0_20"
    >>>> Java(TM) 2 Runtime Environment, Standard Edition (build
    >>>> 1.5.0_20-b02-308)
    >>>> Java HotSpot(TM) 64-Bit Server VM (build 1.5.0_19-137, mixed mode)
    >>>>
    >>>> Options: -d64 -mx2G
    >>>>
    >>>> Output:
    >>>> Millis: 5979.04
    >>>> Millis: 5984.168
    >>>> Millis: 5987.027
    >>>> Millis: 5979.992
    >>>> Millis: 5953.974
    >>>>
    >>>> ---------------
    >>>> Java 1.6
    >>>>
    >>>> Version:
    >>>> java version "1.6.0_15"
    >>>> Java(TM) SE Runtime Environment (build 1.6.0_15-b02-215)
    >>>> Java HotSpot(TM) 64-Bit Server VM (build 14.1-b02-87, mixed mode)
    >>>>
    >>>> Options: -d64 -mx2G
    >>>>
    >>>> Output:
    >>>> Millis: 6943.407
    >>>> Millis: 6937.324
    >>>> Millis: 6917.524
    >>>> Millis: 6931.662
    >>>> Millis: 6917.065
    >>>
    >>> Zoinks. I'd suggest filing a bug report with Sun - that is a
    >>> substantial performance regression.

    >>
    >> Maybe Sun optimized the JVM in other areas (e.g. IO bandwidth and
    >> throughput) which are more important for the average server
    >> application today.

    >
    > Doubtless. But they've still slowed down integer array maths of the kind
    > you're doing.


    No - I'm not the OP.

    >> Maybe the optimization kicks in later.

    >
    > Perhaps - you could try that, right? Just change the top loop to a
    > while(true), fire the test off and leave it running overnight.
    >
    >> I am not convinced that what we have seen constitutes a bug.

    >
    > It's not a bug, no, but it *is* a performance regression. How about
    > telling Sun and letting them decide if it's a problem?


    I am not even convinced yet that there *is* a performance regression
    (see for example Lew's results).

    > That said, they must know about it - i doubt they make a release without
    > doing fairly thorough benchmarking.


    Exactly.

    > Besides, if you report it, they may suggest a fix to make it faster
    > under 1.6 - a VM flag, or code patterns to avoid or something.


    Certainly I won't report it because I don't have a problem.

    Cheers

    robert

    --
    remember.guy do |as, often| as.you_can - without end
    http://blog.rubybestpractices.com/
    Robert Klemme, Aug 16, 2009
    #15
  16. Kevin McMurtrie

    Tom Anderson Guest

    On Sun, 16 Aug 2009, Robert Klemme wrote:

    > On 15.08.2009 13:55, Tom Anderson wrote:
    >> On Sat, 15 Aug 2009, Robert Klemme wrote:
    >>
    >>> On 15.08.2009 13:18, Tom Anderson wrote:
    >>>> On Fri, 14 Aug 2009, Kevin McMurtrie wrote:
    >>>>
    >>>>> I pulled out a few bits of code and patched it together so a test case
    >>>>> does the same kind of math as the real deal.
    >>>>
    >>>> Zoinks. I'd suggest filing a bug report with Sun - that is a substantial
    >>>> performance regression.
    >>>
    >>> Maybe Sun optimized the JVM in other areas (e.g. IO bandwidth and
    >>> throughput) which are more important for the average server application
    >>> today.

    >>
    >> Doubtless. But they've still slowed down integer array maths of the kind
    >> you're doing.

    >
    > No - I'm not the OP.


    Whoops! Apologies.

    tom

    --
    Safety not guaranteed. I have only done this once before.
    Tom Anderson, Aug 16, 2009
    #16
  17. Kevin McMurtrie

    Arne Vajhøj Guest

    Tom Anderson wrote:
    > On Fri, 14 Aug 2009, Kevin McMurtrie wrote:
    >> I pulled out a few bits of code and patched it together so a test case
    >> does the same kind of math as the real deal. (Don't be a style freak -
    >> it's demo fragment squished to fit in a Usenet posting.)
    >>
    >> Machine:
    >> MacOS X 10.5.8
    >>
    >> Darwin desktop.pixelmemory.us 9.8.0 Darwin Kernel Version 9.8.0: Wed Jul
    >> 15 16:55:01 PDT 2009; root:xnu-1228.15.4~1/RELEASE_I386 i386
    >>
    >> ---------------
    >> Java 1.5
    >>
    >> Version:
    >> java version "1.5.0_20"
    >> Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_20-b02-308)
    >> Java HotSpot(TM) 64-Bit Server VM (build 1.5.0_19-137, mixed mode)
    >>
    >> Options: -d64 -mx2G
    >>
    >> Output:
    >> Millis: 5979.04
    >> Millis: 5984.168
    >> Millis: 5987.027
    >> Millis: 5979.992
    >> Millis: 5953.974
    >>
    >> ---------------
    >> Java 1.6
    >>
    >> Version:
    >> java version "1.6.0_15"
    >> Java(TM) SE Runtime Environment (build 1.6.0_15-b02-215)
    >> Java HotSpot(TM) 64-Bit Server VM (build 14.1-b02-87, mixed mode)
    >>
    >> Options: -d64 -mx2G
    >>
    >> Output:
    >> Millis: 6943.407
    >> Millis: 6937.324
    >> Millis: 6917.524
    >> Millis: 6931.662
    >> Millis: 6917.065

    >
    > Zoinks. I'd suggest filing a bug report with Sun - that is a substantial
    > performance regression.


    There are two good reasons for why that will not accomplish anything:
    * the Java on MacOS X is Apple's responsibility not SUN's (the fact
    that Apple is buying Java technology from SUN as the basis for
    their Java does not mean that SUN has a responsibility for
    Apple's end users)
    * neither SUN nor Apple has claimed that there will not exist code
    where the newer version perform worse than the old version (I don't
    think any compiler vendor has done that - it happens frequently for
    C compilers)

    Arne
    Arne Vajhøj, Aug 16, 2009
    #17
  18. Lew wrote:
    > Kevin McMurtrie wrote:
    >>>>> ---------------
    >>>>> Java 1.5
    >>>>>
    >>>>> Version:
    >>>>> java version "1.5.0_20"
    >>>>> Java(TM) 2 Runtime Environment, Standard Edition (build
    >>>>> 1.5.0_20-b02-308)
    >>>>> Java HotSpot(TM) 64-Bit Server VM (build 1.5.0_19-137, mixed mode)
    >>>>>
    >>>>> Options: -d64 -mx2G
    >>>>>
    >>>>> Output:
    >>>>> Millis: 5979.04
    >>>>> Millis: 5984.168
    >>>>> Millis: 5987.027
    >>>>> Millis: 5979.992
    >>>>> Millis: 5953.974
    >>>>>
    >>>>> ---------------
    >>>>> Java 1.6
    >>>>>
    >>>>> Version:
    >>>>> java version "1.6.0_15"
    >>>>> Java(TM) SE Runtime Environment (build 1.6.0_15-b02-215)
    >>>>> Java HotSpot(TM) 64-Bit Server VM (build 14.1-b02-87, mixed mode)
    >>>>>
    >>>>> Options: -d64 -mx2G
    >>>>>
    >>>>> Output:
    >>>>> Millis: 6943.407
    >>>>> Millis: 6937.324
    >>>>> Millis: 6917.524
    >>>>> Millis: 6931.662
    >>>>> Millis: 6917.065

    >
    > My results on a 1GB RAM 64-bit Linux installation:
    >
    > $ /opt/java/jdk1.5.0_20/bin/java -version
    > java version "1.5.0_20"
    > Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_20-b02)
    > Java HotSpot(TM) 64-Bit Server VM (build 1.5.0_20-b02, mixed mode)
    >
    > $ /opt/java/jdk1.6.0_16/bin/java -version
    > java version "1.6.0_16"
    > Java(TM) SE Runtime Environment (build 1.6.0_16-b01)
    > Java HotSpot(TM) 64-Bit Server VM (build 14.2-b01, mixed mode)
    >
    > $ cd ~/projects/testit/src/
    > ~/projects/testit/src
    >
    > $ /opt/java/jdk1.5.0_20/bin/javac -d ../build/classes/
    > testit/Benchmark.java
    >
    > $ /opt/java/jdk1.5.0_20/bin/java -cp ../build/classes/ testit.Benchmark
    > Millis: 11994.786
    > Millis: 11238.298
    > Millis: 11274.685
    > Millis: 11760.477
    > Millis: 12342.302
    >
    > $ /opt/java/jdk1.6.0_16/bin/java -cp ../build/classes/ testit.Benchmark
    > Millis: 12608.171015
    > Millis: 11576.036953
    > Millis: 11428.551717
    > Millis: 12272.149676
    > Millis: 12045.456421
    >
    > $ /opt/java/jdk1.6.0_16/bin/javac -d ../build/classes/
    > testit/Benchmark.java
    >
    > $ /opt/java/jdk1.6.0_16/bin/java -cp ../build/classes/ testit.Benchmark
    > Millis: 14205.535646
    > Millis: 11449.014148
    > Millis: 11421.515997
    > Millis: 12346.804192
    > Millis: 11967.196742
    >
    > $
    >
    > As you can see, not nearly the extent of difference - the timing ranges
    > overlap.


    You are assuming that the Apple is reusing the SUN JIT compiler
    unchanged ?

    (MacOS X Java is from Apple, this Linux Java is from SUN)

    Arne
    Arne Vajhøj, Aug 16, 2009
    #18
  19. Kevin McMurtrie

    Arne Vajhøj Guest

    Kevin McMurtrie wrote:
    > I pulled out a few bits of code and patched it together so a test case
    > does the same kind of math as the real deal. (Don't be a style freak -
    > it's demo fragment squished to fit in a Usenet posting.)
    >
    > Machine:
    > MacOS X 10.5.8
    >
    > Darwin desktop.pixelmemory.us 9.8.0 Darwin Kernel Version 9.8.0: Wed Jul
    > 15 16:55:01 PDT 2009; root:xnu-1228.15.4~1/RELEASE_I386 i386
    >
    > ---------------
    > Java 1.5
    >
    > Version:
    > java version "1.5.0_20"
    > Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_20-b02-308)
    > Java HotSpot(TM) 64-Bit Server VM (build 1.5.0_19-137, mixed mode)
    >
    > Options: -d64 -mx2G
    >
    > Output:
    > Millis: 5979.04
    > Millis: 5984.168
    > Millis: 5987.027
    > Millis: 5979.992
    > Millis: 5953.974
    >
    > ---------------
    > Java 1.6
    >
    > Version:
    > java version "1.6.0_15"
    > Java(TM) SE Runtime Environment (build 1.6.0_15-b02-215)
    > Java HotSpot(TM) 64-Bit Server VM (build 14.1-b02-87, mixed mode)
    >
    > Options: -d64 -mx2G
    >
    > Output:
    > Millis: 6943.407
    > Millis: 6937.324
    > Millis: 6917.524
    > Millis: 6931.662
    > Millis: 6917.065


    I made some experimentation on Windows.

    It seems as if the order fastest to slowest with
    no special parameter except -server is:
    IBM 1.5
    SUN 1.7 beta
    SUN 1.5
    IBM 1.6
    SUN 1.6
    BEA 1.5
    BEA 1.6

    My assumption is still that different code (or different
    options) may result in a completely different result.

    Arne
    Arne Vajhøj, Aug 16, 2009
    #19
  20. Kevin McMurtrie

    Lew Guest

    Arne Vajhøj wrote:
    > You are assuming that the Apple is reusing the SUN JIT compiler
    > unchanged ?


    No.

    --
    Lew
    Lew, Aug 17, 2009
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Andre Charbonneau

    XPath queries getting slower and slower...

    Andre Charbonneau, Feb 15, 2005, in forum: Java
    Replies:
    0
    Views:
    533
    Andre Charbonneau
    Feb 15, 2005
  2. Martin Ankerl

    EJB + number crunching

    Martin Ankerl, Aug 18, 2005, in forum: Java
    Replies:
    8
    Views:
    400
    Roedy Green
    Aug 23, 2005
  3. Shuo Xiang

    big number crunching in C

    Shuo Xiang, Sep 25, 2003, in forum: C Programming
    Replies:
    3
    Views:
    452
    Glen Herrmannsfeldt
    Sep 26, 2003
  4. Replies:
    21
    Views:
    1,673
    northerntechie
    Mar 26, 2008
  5. Replies:
    40
    Views:
    1,247
    Roedy Green
    Jun 23, 2008
Loading...

Share This Page