Hi, how can I optimize the following code?

Discussion in 'C++' started by kenneth, Dec 11, 2006.

  1. kenneth

    kenneth Guest

    I was trying to use multiple thread to optimize my following code, but
    met some problems, anyone can help me?

    k[3][500] are initialized.

    int computePot() {
    int i, j;

    for( i=0; i<500; i++ ) {
    for( j=0; j<i-1; j++ ) {
    distx = pow( (r[0][j] - r[0]), 2 );
    disty = pow( (r[1][j] - r[1]), 2 );
    distz = pow( (r[2][j] - r[2]), 2 );
    dist = sqrt( distx + disty + distz );
    pot += 1.0 / dist;
    }
    }

    I doubt that If this code really can be optimized?
     
    kenneth, Dec 11, 2006
    #1
    1. Advertising

  2. kenneth

    Noah Roberts Guest

    kenneth wrote:
    > I was trying to use multiple thread to optimize my following code, but
    > met some problems, anyone can help me?
    >
    > k[3][500] are initialized.
    >
    > int computePot() {
    > int i, j;
    >
    > for( i=0; i<500; i++ ) {
    > for( j=0; j<i-1; j++ ) {
    > distx = pow( (r[0][j] - r[0]), 2 );
    > disty = pow( (r[1][j] - r[1]), 2 );
    > distz = pow( (r[2][j] - r[2]), 2 );
    > dist = sqrt( distx + disty + distz );
    > pot += 1.0 / dist;
    > }
    > }
    >
    > I doubt that If this code really can be optimized?


    You can quite often optimize an O(N^2) operation by coming up with an
    algorithm that is not O(N^2). The compiler will hit your loops and
    optimize them quite a bit but that doesn't resolve the problem of
    having an algorithm that is a square of N. Don't know what you are
    trying to do. You might describe the problem in comp.programming and
    see if they have some ideas.

    In other words, look to the algorithms you are using not ways to
    micro-optimize.
     
    Noah Roberts, Dec 11, 2006
    #2
    1. Advertising

  3. kenneth kirjoitti:
    > int computePot() {
    > int i, j;
    >
    > for( i=0; i<500; i++ ) {
    > for( j=0; j<i-1; j++ ) {
    > distx = pow( (r[0][j] - r[0]), 2 );
    > disty = pow( (r[1][j] - r[1]), 2 );
    > distz = pow( (r[2][j] - r[2]), 2 );
    > dist = sqrt( distx + disty + distz );
    > pot += 1.0 / dist;
    > }
    > }
    >
    > I doubt that If this code really can be optimized?
    >


    I can't say how to optimize for multithreading but
    don't use pow function to compute squares.
    Function pow takes the exponent as a floating point number and it may
    use exponent and logarithm functions.

    Use something like
    ----
    inline double SquareDouble(double r)
    {
    return r * r;
    }
    ----
    or
    ----
    #define square(x) ((x)*(x))
    ----

    If you use GNU C consider using the const attribute:
    ----
    inline double SquareDouble(double r) __attribute__((const))
    {
    return r * r;
    }
    ----

    --
    Tommi Höynälänmaa
    sähköposti / e-mail:
    kotisivu / homepage: http://www.iki.fi/tohoyn/
     
    =?ISO-8859-1?Q?Tommi_H=F6yn=E4l=E4nmaa?=, Dec 11, 2006
    #3
  4. kenneth wrote:
    > I was trying to use multiple thread to optimize my following code, but
    > met some problems, anyone can help me?


    I'll try with a single thread. Multiple threads would require
    platform-specific solutions, so are out of the scope of this NG.

    > k[3][500] are initialized.


    What's k?

    > int computePot() {
    > int i, j;
    >
    > for( i=0; i<500; i++ ) {
    > for( j=0; j<i-1; j++ ) {
    > distx = pow( (r[0][j] - r[0]), 2 );
    > disty = pow( (r[1][j] - r[1]), 2 );
    > distz = pow( (r[2][j] - r[2]), 2 );
    > dist = sqrt( distx + disty + distz );
    > pot += 1.0 / dist;
    > }
    > }
    >
    > I doubt that If this code really can be optimized?


    Sure it can. First of all forget that 'pow' thing.
    To get the square, just multiply the value by itself.
    This will definitely make it much faster.

    Another thing is that you don't need to index out
    the second vector inside the inner loop, you can as
    well do it outside, like in:

    for(i) {
    register double r0i=r[0]; // now evaluated only 500 times
    register double r1i=r[1];
    register double r2i=r[2];
    for(j) {
    distx = (r[0][j]-r0i)*(r[0][j]-r0i);
    // ...
    }
    }

    That should give your code a boost.

    HTH,
    - J.
     
    Jacek Dziedzic, Dec 11, 2006
    #4
  5. kenneth

    Chris Theis Guest

    "kenneth" <> wrote in message
    news:...
    >I was trying to use multiple thread to optimize my following code, but
    > met some problems, anyone can help me?
    >
    > k[3][500] are initialized.
    >
    > int computePot() {
    > int i, j;
    >
    > for( i=0; i<500; i++ ) {
    > for( j=0; j<i-1; j++ ) {
    > distx = pow( (r[0][j] - r[0]), 2 );
    > disty = pow( (r[1][j] - r[1]), 2 );
    > distz = pow( (r[2][j] - r[2]), 2 );
    > dist = sqrt( distx + disty + distz );
    > pot += 1.0 / dist;
    > }
    > }
    >
    > I doubt that If this code really can be optimized?


    Hi,

    in principle I would let the compiler worry about that and not try to
    micro-optimize. However, you can move the assignments outside to register
    variables like Jacek already showed and get rid of the pow(). Additionally,
    it's cheap to check if some "manual" loop-unrolling via meta-templates might
    get you some speed.

    Cheers
    Chris
     
    Chris Theis, Dec 11, 2006
    #5
  6. Jacek Dziedzic wrote:

    > Another thing is that you don't need to index out
    > the second vector inside the inner loop, you can as
    > well do it outside, like in:
    >
    > for(i) {
    > register double r0i=r[0]; // now evaluated only 500 times
    > register double r1i=r[1];
    > register double r2i=r[2];
    > for(j) {
    > distx = (r[0][j]-r0i)*(r[0][j]-r0i);
    > // ...
    > }
    > }
    >
    > That should give your code a boost.


    Doubtful. Any compiler worth its salt would have hoisted the evaluations out
    of the loop by itself. Just how any compiler worth its salt will convert
    'distx = (r[0][j]-r0i)*(r[0][j]-r0i);' into: 'distx = (r[0][j]-r0i); distx *=
    distx;'

    But the fact is that code like what the OP showed /rarely/ benefits from
    micro-optimizations; that's not to say that trying to make such optimizations
    aren't necessary at times -- only that it is a much better idea to focus on
    the algorithm and try to make it simpler.

    -n
     
    Nikolaos D. Bougalis, Dec 11, 2006
    #6
  7. kenneth

    Ian Collins Guest

    kenneth wrote:
    > I was trying to use multiple thread to optimize my following code, but
    > met some problems, anyone can help me?
    >
    > k[3][500] are initialized.
    >
    > int computePot() {
    > int i, j;
    >
    > for( i=0; i<500; i++ ) {
    > for( j=0; j<i-1; j++ ) {
    > distx = pow( (r[0][j] - r[0]), 2 );
    > disty = pow( (r[1][j] - r[1]), 2 );
    > distz = pow( (r[2][j] - r[2]), 2 );
    > dist = sqrt( distx + disty + distz );
    > pot += 1.0 / dist;
    > }
    > }
    >
    > I doubt that If this code really can be optimized?
    >

    If you have access to an MP system and your compiler supports OpenMP,
    you might be able to get some gains.

    --
    Ian Collins.
     
    Ian Collins, Dec 12, 2006
    #7
  8. Nikolaos D. Bougalis wrote:
    > Jacek Dziedzic wrote:
    >
    >> Another thing is that you don't need to index out
    >> the second vector inside the inner loop, you can as
    >> well do it outside, like in:
    >>
    >> for(i) {
    >> register double r0i=r[0]; // now evaluated only 500 times
    >> register double r1i=r[1];
    >> register double r2i=r[2];
    >> for(j) {
    >> distx = (r[0][j]-r0i)*(r[0][j]-r0i);
    >> // ...
    >> }
    >> }
    >>
    >> That should give your code a boost.

    >
    > Doubtful. Any compiler worth its salt would have hoisted the
    > evaluations out of the loop by itself. Just how any compiler worth its
    > salt will convert 'distx = (r[0][j]-r0i)*(r[0][j]-r0i);' into: 'distx =
    > (r[0][j]-r0i); distx *= distx;'


    I disagree. First of all, I doubt any compiler would turn
    pow(double,2) into double*double.

    Secondly, the conversion distx=something*something into
    distx=something, distx*=distx does not necessarily speed up
    anything, that would depend on the computer architecture,
    so I doubt "any compiler worth its salt will convert" it.

    Thirdly, even though a good compiler is supposed to
    move outside of the loop what does not depend on the loop
    iterating variable, my experience show it is worth trying
    to help it. I have recently spent a few days, if not weeks,
    having to micro-optimize tight loops, working with a compiler
    that is rather good at optimizing, the Intel compiler for
    Itanium 2 (this platform _needs_ good optimization at
    compiler level). I think you would be surprised at how
    dumb sometimes the compiler was, or rather, how typical,
    simple and out-of-the-book optimizations made the code
    faster even at -O3.

    Therefore, if the OP asked for ways to optimize, I gave
    him two hints. I'm 99% sure they won't slow the code down
    and pretty sure they will speed it up. I'm too lazy though
    to try it out, anyway, it depends on the platform.

    > But the fact is that code like what the OP showed /rarely/ benefits
    > from micro-optimizations; that's not to say that trying to make such
    > optimizations aren't necessary at times -- only that it is a much better
    > idea to focus on the algorithm and try to make it simpler.


    Of course. But I assumed the OP ran out of options on this one.
    For example, this looked like a calculation of a harmonic potential.
    Typically for the calculations of potential one assumes a cutoff
    radius, beyond which the interactions are assumed to vanish.
    Here, such algorithmic optimization would turn this O(N^2) into
    O(N). However, with the harmonic potential it is impossible,
    because it decays to slowly. That led me to the conclusion that
    the OP has already ran out of options on algorithmic optimization
    and is left with mirco-optimizations.

    howgh :),
    - J.
     
    Jacek Dziedzic, Dec 12, 2006
    #8
  9. Jacek Dziedzic wrote:
    > Nikolaos D. Bougalis wrote:
    >> Jacek Dziedzic wrote:
    >>
    >>> Another thing is that you don't need to index out
    >>> the second vector inside the inner loop, you can as
    >>> well do it outside, like in:
    >>>
    >>> for(i) {
    >>> register double r0i=r[0]; // now evaluated only 500 times
    >>> register double r1i=r[1];
    >>> register double r2i=r[2];
    >>> for(j) {
    >>> distx = (r[0][j]-r0i)*(r[0][j]-r0i);
    >>> // ...
    >>> }
    >>> }
    >>>
    >>> That should give your code a boost.

    >>
    >> Doubtful. Any compiler worth its salt would have hoisted the
    >> evaluations out of the loop by itself. Just how any compiler worth its
    >> salt will convert 'distx = (r[0][j]-r0i)*(r[0][j]-r0i);' into: 'distx
    >> = (r[0][j]-r0i); distx *= distx;'

    >
    > I disagree. First of all, I doubt any compiler would turn
    > pow(double,2) into double*double.


    I said the compiler would hoist the r[0], r[1] and r[2] out of the
    loop. I said nothing about converting the pow(d,2) into d*d, although I don't
    see why a compiler could not do that, if it had intrinsic support for pow()
    and understood the semantics of the pow() call..


    > Secondly, the conversion distx=something*something into
    > distx=something, distx*=distx does not necessarily speed up
    > anything, that would depend on the computer architecture,
    > so I doubt "any compiler worth its salt will convert" it.


    You are misrepresenting what I said. In your code "something" was a complex
    expression that required evaluation. Do you honestly believe that any decent
    compiler would calculate "a-b" twice to do (a-b)*(a-b)? Common subexpression
    elimination is one of the most basic optimizations a compiler can and does
    perform.

    Basic optimizations aside, you would really be surprised what compilers can
    do nowadays. I was quite stunned to see the latest CL.EXE for example, do some
    crazy tricks in a math intensive function, shaving off two divisions by using
    multiplications and additions instead and managing to keep both the U and V
    pipes going full speed significantly improving throughput. I couldn't even
    come close to matching it and I'm not a newbie at the optimization game.


    > Thirdly, even though a good compiler is supposed to
    > move outside of the loop what does not depend on the loop
    > iterating variable, my experience show it is worth trying
    > to help it. I have recently spent a few days, if not weeks,
    > having to micro-optimize tight loops, working with a compiler
    > that is rather good at optimizing, the Intel compiler for
    > Itanium 2 (this platform _needs_ good optimization at
    > compiler level). I think you would be surprised at how
    > dumb sometimes the compiler was, or rather, how typical,
    > simple and out-of-the-book optimizations made the code
    > faster even at -O3.


    I don't disagree that hand-optimization and a helping hand to the compiler
    are necessary even today. What I am saying, is that the manual hoist operation
    you showed and most such trivial optimizations are unlikely to have any
    significant effect. The compiler will take care of simple things like that if
    you compile with optimizations enabled and you don't use volatile every other
    line.

    A very good example of optimizations that programmers can make but compilers
    cannot is easily demonstrated with this little function that sums a floating
    point array (extern float *huge_array):

    The naive approach is this:

    float array_sum()
    {
    float sum = 0;

    for(int i = 0; i < huge_array_size; i++)
    sum += huge_array;

    return sum;
    }

    The better approach takes advantage of abelian operations,
    something a compiler cannot do because of restrictions placed by
    the ANSI/ISO C and C++ standards:

    float array_sum()
    {
    float sum1 = 0, sum2 = 0, sum3 = 0, sum4 = 0

    for(int i = 0; i < huge_array_size; i += 4)
    {
    sum0 += huge_array;
    sum1 += huge_array[i + 1];
    sum2 += huge_array[i + 2];
    sum3 += huge_array[i + 3];
    }

    return (sum0 + sum2) + (sum1 + sum3);
    }

    -n
     
    Nikolaos D. Bougalis, Dec 12, 2006
    #9
  10. Nikolaos D. Bougalis wrote:
    > Jacek Dziedzic wrote:
    >> Nikolaos D. Bougalis wrote:
    >>> Jacek Dziedzic wrote:
    >>>
    >>>> Another thing is that you don't need to index out
    >>>> the second vector inside the inner loop, you can as
    >>>> well do it outside, like in:
    >>>>
    >>>> for(i) {
    >>>> register double r0i=r[0]; // now evaluated only 500 times
    >>>> register double r1i=r[1];
    >>>> register double r2i=r[2];
    >>>> for(j) {
    >>>> distx = (r[0][j]-r0i)*(r[0][j]-r0i);
    >>>> // ...
    >>>> }
    >>>> }
    >>>>
    >>>> That should give your code a boost.
    >>>
    >>> Doubtful. Any compiler worth its salt would have hoisted the
    >>> evaluations out of the loop by itself. Just how any compiler worth
    >>> its salt will convert 'distx = (r[0][j]-r0i)*(r[0][j]-r0i);' into:
    >>> 'distx = (r[0][j]-r0i); distx *= distx;'

    >>
    >> I disagree. First of all, I doubt any compiler would turn
    >> pow(double,2) into double*double.

    >
    > I said the compiler would hoist the r[0], r[1] and r[2] out
    > of the loop. I said nothing about converting the pow(d,2) into d*d,


    Yes, but you said "doubtful" to my statement "that should give
    your code a boost". What I meant to say was "for this code to give
    no boost, it would require the compiler to turn pow(double,2) into
    double*double. Therefore, by being doubtful wrt to this code's
    potential boost, you imply this action of the compiler.".

    > although I don't see why a compiler could not do that, if it had
    > intrinsic support for pow() and understood the semantics of the pow()
    > call..


    OK, perhaps a smart compiler could do that. But honestly
    I wouldn't expect it.

    >> Secondly, the conversion distx=something*something into
    >> distx=something, distx*=distx does not necessarily speed up
    >> anything, that would depend on the computer architecture,
    >> so I doubt "any compiler worth its salt will convert" it.

    >
    > You are misrepresenting what I said. In your code "something" was a
    > complex expression that required evaluation. Do you honestly believe
    > that any decent compiler would calculate "a-b" twice to do (a-b)*(a-b)?


    No, of course I don't believe that. This was a misunderstanding.
    I meant that (a-b) would be stored into a register, there squared
    and sent to distx. I somehow thought you implied that a memory
    location would be squared, rather than a register.

    > Common subexpression elimination is one of the most basic optimizations
    > a compiler can and does perform.


    Yes, I don't claim a-b will be evaluated twice.

    > Basic optimizations aside, you would really be surprised what
    > compilers can do nowadays. I was quite stunned to see the latest CL.EXE
    > for example, do some crazy tricks in a math intensive function, shaving
    > off two divisions by using multiplications and additions instead and
    > managing to keep both the U and V pipes going full speed significantly
    > improving throughput. I couldn't even come close to matching it and I'm
    > not a newbie at the optimization game.


    Yes, I also usually trust the compiler to optimize better, especially
    since for the Itanium they are supposed to be good at it. Maybe I was
    lucky then that I managed to squeeze some extra ticks from the code
    I was working on by simple tricks like changing

    for(i) a=something;
    into
    while(i++) *(a_ptr+i)=something;

    or the other way round.

    > I don't disagree that hand-optimization and a helping hand to the
    > compiler are necessary even today. What I am saying, is that the manual
    > hoist operation you showed and most such trivial optimizations are
    > unlikely to have any significant effect. The compiler will take care of
    > simple things like that if you compile with optimizations enabled and
    > you don't use volatile every other line.


    OK, then that was just a suggestion for the OP to try and see for
    himself. I think neither I can make a general statement "taking
    things out of the loop will make things faster" nor you can make
    a general statement "taking things out of the loop will not make
    things faster because the compiler had done it already", because
    this all depends on the compiler.

    We'll never know unless the OP tries.

    I could have as well said "make sure you have -O3" and then
    somebody could say "come on, it's on by default" -- just a
    suggestion to try out.

    >
    > A very good example of optimizations that programmers can make but
    > compilers cannot is easily demonstrated with this little function that
    > sums a floating point array (extern float *huge_array):
    >
    > The naive approach is this:
    >
    > float array_sum()
    > {
    > float sum = 0;
    >
    > for(int i = 0; i < huge_array_size; i++)
    > sum += huge_array;
    >
    > return sum;
    > }
    >
    > The better approach takes advantage of abelian operations,
    > something a compiler cannot do because of restrictions placed by
    > the ANSI/ISO C and C++ standards:
    >
    > float array_sum()
    > {
    > float sum1 = 0, sum2 = 0, sum3 = 0, sum4 = 0
    >
    > for(int i = 0; i < huge_array_size; i += 4)
    > {
    > sum0 += huge_array;
    > sum1 += huge_array[i + 1];
    > sum2 += huge_array[i + 2];
    > sum3 += huge_array[i + 3];
    > }
    >
    > return (sum0 + sum2) + (sum1 + sum3);
    > }


    OK.

    - J.
     
    Jacek Dziedzic, Dec 12, 2006
    #10
  11. [ After re-reading this thread, I think I might have come off a bit too
    aggressive. It was not my intent, and I hope there are no hard feelings. ]

    A lot of times, simple tricks can help the compiler along and result in much
    better code. While slightly off-topic, I'd be interested in knowing what kind
    of gains you saw in Itanium builds after the micro-optimizations you made. You
    can contact me in private if you don't mind sharing.

    -n
     
    Nikolaos D. Bougalis, Dec 12, 2006
    #11
  12. kenneth wrote:
    > I was trying to use multiple thread to optimize my following code, but
    > met some problems, anyone can help me?
    >
    > k[3][500] are initialized.
    >
    > int computePot() {
    > int i, j;
    >
    > for( i=0; i<500; i++ ) {
    > for( j=0; j<i-1; j++ ) {
    > distx = pow( (r[0][j] - r[0]), 2 );
    > disty = pow( (r[1][j] - r[1]), 2 );
    > distz = pow( (r[2][j] - r[2]), 2 );
    > dist = sqrt( distx + disty + distz );
    > pot += 1.0 / dist;
    > }
    > }
    >
    > I doubt that If this code really can be optimized?
    >


    I think you're too dum for the explanation.
     
    Uenal S. Mutlu, Dec 17, 2006
    #12
  13. kenneth

    Ian Collins Guest

    Uenal S. Mutlu wrote:
    > kenneth wrote:
    >
    >> I was trying to use multiple thread to optimize my following code, but
    >> met some problems, anyone can help me?
    >>
    >> k[3][500] are initialized.
    >>
    >> int computePot() {
    >> int i, j;
    >>
    >> for( i=0; i<500; i++ ) {
    >> for( j=0; j<i-1; j++ ) {
    >> distx = pow( (r[0][j] - r[0]), 2 );
    >> disty = pow( (r[1][j] - r[1]), 2 );
    >> distz = pow( (r[2][j] - r[2]), 2 );
    >> dist = sqrt( distx + disty + distz );
    >> pot += 1.0 / dist;
    >> }
    >> }
    >>
    >> I doubt that If this code really can be optimized?
    >>

    > I think you're too dum for the explanation.


    A bit rich from one who can't spell.

    --
    Ian Collins.
     
    Ian Collins, Dec 17, 2006
    #13
  14. kenneth

    frame Guest

    Uenal S. Mutlu wrote:
    >
    > I think you're too dum for the explanation.


    Hey! don't break wind???...
    [NOTE: As long as you post this kind of stuff in this group, I shall
    follow up with this advice!]
     
    frame, Dec 17, 2006
    #14
  15. Ian Collins wrote:
    > A bit rich from one who can't spell.


    I arege taht his cmmonet was ianpprorpaite, but dsylxeia has nthonig to
    do wtih inetlilegnece.
     
    Vidar Hasfjord, Dec 17, 2006
    #15
  16. kenneth

    White Wolf Guest

    frame wrote:
    > Uenal S. Mutlu wrote:
    >>
    >> I think you're too dum for the explanation.

    >
    > Hey! don't break wind???...
    > [NOTE: As long as you post this kind of stuff in this group, I shall
    > follow up with this advice!]


    Don't feed the troll pls. I have already reported his destructive actions
    to his ISP and his NSP.

    --
    WW aka Attila
    :::
    Constant change is here to stay
     
    White Wolf, Dec 19, 2006
    #16
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Aaron

    Warning and how to optimize code

    Aaron, Jan 19, 2006, in forum: ASP .Net
    Replies:
    2
    Views:
    1,249
    =?Utf-8?B?Sm9l?=
    Jan 19, 2006
  2. deepak
    Replies:
    1
    Views:
    290
    Daniel Pitts
    Jan 31, 2007
  3. gtippery
    Replies:
    40
    Views:
    1,004
  4. George2
    Replies:
    1
    Views:
    257
    Rolf Magnus
    Dec 19, 2007
  5. Replies:
    19
    Views:
    566
Loading...

Share This Page