taking advantage of SSE

Discussion in 'Java' started by pfalstad@gmail.com, Feb 20, 2005.

  1. Guest

    Hi, is there any documentation out there which will tell me how to best
    take advantage of SSE/SSE2? Is there any way to figure out if my java
    applet is using SSE instructions or not?

    I have a loop that looks like this:

    for (j = 0; j < max; j++) {
    for (i = 0; i < max; i++) {
    float previ = func[i-1][j];
    float nexti = func[i+1][j];
    float prevj = func[j-1];
    float nextj = func[j+1];
    float basis = (nexti+previ+nextj+prevj)*.25f;
    if (exceptional[j]) { ... handle rare/messy cases ... }
    float a = (func[j] - basis) * damp[j];
    float b = funci[j] * damp[j];
    func [j] = basis + a*const1 - b*const2;
    funci[j] = b*const1 + a*const2;
    }
    }

    This loop seems like it could take advantage of SSE, but I doubt the
    java compiler is smart enough to figure out how to do it without my
    help. I also have no way of knowing what the JIT is doing internally.
    Does anyone have any ideas on how I can best optimize this loop (aside
    from trial-and-error)?
     
    , Feb 20, 2005
    #1
    1. Advertising

  2. Roland Guest

    On 20-2-2005 19:16, wrote:

    > Hi, is there any documentation out there which will tell me how to best
    > take advantage of SSE/SSE2? Is there any way to figure out if my java
    > applet is using SSE instructions or not?
    >
    > I have a loop that looks like this:
    >
    > for (j = 0; j < max; j++) {
    > for (i = 0; i < max; i++) {
    > float previ = func[i-1][j];
    > float nexti = func[i+1][j];
    > float prevj = func[j-1];
    > float nextj = func[j+1];
    > float basis = (nexti+previ+nextj+prevj)*.25f;
    > if (exceptional[j]) { ... handle rare/messy cases ... }
    > float a = (func[j] - basis) * damp[j];
    > float b = funci[j] * damp[j];
    > func [j] = basis + a*const1 - b*const2;
    > funci[j] = b*const1 + a*const2;
    > }
    > }
    >
    > This loop seems like it could take advantage of SSE, but I doubt the
    > java compiler is smart enough to figure out how to do it without my
    > help. I also have no way of knowing what the JIT is doing internally.
    > Does anyone have any ideas on how I can best optimize this loop (aside
    > from trial-and-error)?
    >

    Unless the JRE has been compiled to take advantage of the SSE
    instructions, I doubt that your applet will benefit from it. AFAIK, the
    JRE's that Sun offers are compiled for the common denominator of
    instructions available for Intel/AMD processors, and therefore don't
    take advantage of SSE, MMX or whatever instructions.
    Of course, I could be wrong, and it maybe well possible that the JIT
    compiler, or rather the bytecode to native code translator does generate
    SSE instructions (after it has detected that the processor supports them).

    Have you done some profiling on your code to see if the nested loop
    actually is a serious bottleneck?
    --
    Regards,

    Roland de Ruiter
    ___ ___
    /__/ w_/ /__/
    / \ /_/ / \
     
    Roland, Feb 20, 2005
    #2
    1. Advertising

  3. Guest

    , Feb 20, 2005
    #3
  4. Roland Guest

    On 20-2-2005 19:53, wrote:

    > here's where I saw that Java2 supports SSE/SSE2:
    >
    > http://java.sun.com/j2se/1.4.2/1.4.2_whitepaper.html#7
    >
    >
    >>Have you done some profiling on your code to see if the nested loop
    >>actually is a serious bottleneck?

    >
    >
    > no profiling, but I'm positive it's the bottleneck.. :)
    >

    Well, I stand corrected. [A 1.5 to 1.6 performance increase between Java
    1.4.1 and 1.4.2 in that benchmark is nice, but not that dramatic.]
    Then I guess the your loop might benefit from the SSE instructions (if
    your applet runs on a SSE supporting JRE and processor).
    --
    Regards,

    Roland de Ruiter
    ___ ___
    /__/ w_/ /__/
    / \ /_/ / \
     
    Roland, Feb 20, 2005
    #4
  5. wrote:

    > Hi, is there any documentation out there which will tell me how to best
    > take advantage of SSE/SSE2? Is there any way to figure out if my java
    > applet is using SSE instructions or not?
    >
    > I have a loop that looks like this:
    >
    > for (j = 0; j < max; j++) {
    > for (i = 0; i < max; i++) {
    > float previ = func[i-1][j];
    > float nexti = func[i+1][j];
    > float prevj = func[j-1];
    > float nextj = func[j+1];
    > float basis = (nexti+previ+nextj+prevj)*.25f;
    > if (exceptional[j]) { ... handle rare/messy cases ... }
    > float a = (func[j] - basis) * damp[j];
    > float b = funci[j] * damp[j];
    > func [j] = basis + a*const1 - b*const2;
    > funci[j] = b*const1 + a*const2;
    > }
    > }
    >
    > This loop seems like it could take advantage of SSE, but I doubt the
    > java compiler is smart enough to figure out how to do it without my
    > help. I also have no way of knowing what the JIT is doing internally.
    > Does anyone have any ideas on how I can best optimize this loop (aside
    > from trial-and-error)?
    >


    I can't answer your specific question. However, this type of
    code is generally difficult to vectorize or parallelize
    because you have dependencies between consecutive iterations
    of the inner loop. The value of func[n-1][m] is generated
    near the end of the i=n-1,j=m iteration, and used in the
    first calculation in the i=n,j=m iteration.

    Optimizers usually do better on loops that don't contain if
    statements. Is there any way you can avoid the exceptional
    issue? Pretend the rare/messy cases are normal and fix up
    later? It might be worth measuring without the exceptional
    test to see if you get any gain.

    If max is large, you might gain by tiling the loops, working
    on rectangular sections small enough to fit in cache.

    Patricia
     
    Patricia Shanahan, Feb 20, 2005
    #5
  6. Guest

    > I can't answer your specific question. However, this type of
    > code is generally difficult to vectorize or parallelize
    > because you have dependencies between consecutive iterations
    > of the inner loop. The value of func[n-1][m] is generated
    > near the end of the i=n-1,j=m iteration, and used in the
    > first calculation in the i=n,j=m iteration.


    That's not necessary, though; in fact, I really should be looking at
    the values from the previous iteration. If it would make the code
    faster, I could easily fix that.

    > Optimizers usually do better on loops that don't contain if
    > statements. Is there any way you can avoid the exceptional
    > issue? Pretend the rare/messy cases are normal and fix up
    > later?


    sure..
     
    , Feb 21, 2005
    #6
  7. wrote:

    >>I can't answer your specific question. However, this type of
    >>code is generally difficult to vectorize or parallelize
    >>because you have dependencies between consecutive iterations
    >>of the inner loop. The value of func[n-1][m] is generated
    >>near the end of the i=n-1,j=m iteration, and used in the
    >>first calculation in the i=n,j=m iteration.

    >
    >
    > That's not necessary, though; in fact, I really should be looking at
    > the values from the previous iteration. If it would make the code
    > faster, I could easily fix that.


    Unfortunately, there does not seem to be any way to find out
    exactly what is really being executed, so you are probably
    stuck with some trial and error. If breaking the dependency
    enables SSE it should help, but if the JIT does not use SSE
    anyway, you may find it makes things worse by increasing
    cache misses.

    Patricia
     
    Patricia Shanahan, Feb 21, 2005
    #7
  8. Guest

    > If breaking the dependency
    > enables SSE it should help, but if the JIT does not use SSE
    > anyway, you may find it makes things worse by increasing
    > cache misses.


    well guess what, I broke the dependency, and I that it worse, probably
    by increasing cache misses. My loop may be too complicated for the
    compiler to figure out how to use SSE anyway; I could do some more
    trial and error at some point.

    Even better, though, I discovered that replacing the two-dimensional
    array by a one-dimensional array resulted in a ridiculous increase in
    speed; the loop runs in about 1/5 the time. Wow!
     
    , Feb 28, 2005
    #8
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. =?Utf-8?B?WmFjaGFyeSBCdXJucw==?=

    ASP.NET Config WITHOUT SSE 2005

    =?Utf-8?B?WmFjaGFyeSBCdXJucw==?=, Nov 9, 2005, in forum: ASP .Net
    Replies:
    0
    Views:
    407
    =?Utf-8?B?WmFjaGFyeSBCdXJucw==?=
    Nov 9, 2005
  2. Kevin
    Replies:
    10
    Views:
    612
    Roedy Green
    Feb 3, 2006
  3. Kobu
    Replies:
    5
    Views:
    367
    kingzog
    Feb 17, 2005
  4. Toby Considine \(UNC\)

    Taking Advantage of new Word Blog/Post options

    Toby Considine \(UNC\), Mar 29, 2007, in forum: ASP .Net Web Controls
    Replies:
    0
    Views:
    127
    Toby Considine \(UNC\)
    Mar 29, 2007
  5. Jim Cain
    Replies:
    1
    Views:
    209
    Yukihiro Matsumoto
    Jul 18, 2003
Loading...

Share This Page