speeding up piece of code

R

razor

Dear all,

I'm struggling with a C program (since I'm new to C...) therefore, I
count on you to help me a hand speeding up this piece, since it's the
bottleneck of the program.
I should give you some extra information
- the arrays m , ha, ad is 1e3 to 1e4 elements long, dependent on the
input. The code I show you below is part of a for loop, looping over
all elements i.
- none of the cases in the switch case construction is likely to occur
more often than the other ones.
- I used cosalfa and sinalfa variables, because I need these values
later in the program, and calculating cos and sin is expensive I
guess ...
- this program runs for a very large numbers of times, so a small
speedup is worth it...

Here's the code:

yp = sqrt(Ri*Ri-(xmo-xmi)*(xmo-xmi))+ymi;
switch((yi>yp) + 2*(yi<ymi)) {
case 1:{
alfa = 0;
cosalfa = 1;
sinalfa = 0;
c = 0;
x = xi;
y = yi;
G = 0;
r = ((x-xi)*(x-xi)+(y-yi)*(y-yi));
if (r<(ha*ha)){
H = ad;}
else {H = 0;}
break;}
case 2:{
alfa = 0;
cosalfa = 1;
sinalfa = 0;
c = 1;
x = xi;
y = yp;
ti = (vi*cosalfa+vfi*sinalfa);
G = -m*sgn(ti);
H = 0;
break;}
default:{
gamma = atan2((yi-ymi),(xmo-xmi));
x = xmi+Ri*cos(gamma);
y = ymi+Ri*sin(gamma);
alfa = atan2((x-xmo),(y-yi));
cosalfa = cos(alfa);
sinalfa = sin(alfa);
x = x+m;
c = 1;
ti = (vi*cosalfa+vfi*sinalfa);
G = -m*sgn(ti);
H = 0;
break;}
}


Thanks in advance,
Kizzie
 
J

James Kuyper

razor said:
Dear all,

I'm struggling with a C program (since I'm new to C...) therefore, I
count on you to help me a hand speeding up this piece, since it's the
bottleneck of the program.

Did you reach that conclusion by profiling your code, or is it
guesswork? As a general rule, profiling is the only reliable way to
identify bottlenecks, especially if you're new to C.

....
x = xi;
y = yi;
....

Given the above lines setting x and y, the following line:
r = ((x-xi)*(x-xi)+(y-yi)*(y-yi));

can be simplified to

r = 0;
....
cosalfa = 1;
sinalfa = 0;
....

Given the above settings for cosalfa and sinalfa, the following line:
ti = (vi*cosalfa+vfi*sinalfa);

can be simplified to

ti = vi;

A good compiler might have already noticed these two special cases and
performed the simplification for you, so making those changes might not
have any effect on the speed of your program. However, they are still
useful for making it easier to understand.

I saw no other obvious opportunities for speedup, though that's
primarily due to lack of context and explanation.
 
G

Gene

Dear all,

I'm struggling with a C program (since I'm new to C...) therefore, I
count on you to help me a hand speeding up this piece, since it's the
bottleneck of the program.
I should give you some extra information
- the arrays m , ha, ad is 1e3 to 1e4 elements long, dependent on the
input. The code I show you below is part of a for loop, looping over
all elements i.
- none of the cases in the switch case construction is likely to occur
more often than the other ones.
- I used cosalfa and sinalfa variables, because I need these values
later in the program, and calculating cos and sin is expensive I
guess ...
- this program runs for a very large numbers of times, so a small
speedup is worth it...

Here's the code:

yp = sqrt(Ri*Ri-(xmo-xmi)*(xmo-xmi))+ymi;
    switch((yi>yp) + 2*(yi<ymi)) {
        case 1:{
            alfa = 0;
            cosalfa = 1;
            sinalfa = 0;
            c = 0;
            x = xi;
            y = yi;
            G = 0;
            r = ((x-xi)*(x-xi)+(y-yi)*(y-yi));
            if (r<(ha*ha)){
                H = ad;}
            else {H = 0;}
            break;}
        case 2:{
            alfa = 0;
            cosalfa = 1;
            sinalfa = 0;
            c = 1;
            x = xi;
            y = yp;
            ti = (vi*cosalfa+vfi*sinalfa);
            G = -m*sgn(ti);
            H = 0;
            break;}
         default:{
            gamma = atan2((yi-ymi),(xmo-xmi));
            x = xmi+Ri*cos(gamma);
            y = ymi+Ri*sin(gamma);
            alfa = atan2((x-xmo),(y-yi));
            cosalfa = cos(alfa);
            sinalfa = sin(alfa);
            x = x+m;
            c = 1;
            ti = (vi*cosalfa+vfi*sinalfa);
            G = -m*sgn(ti);
            H = 0;
            break;}
    }

Thanks in advance,
Kizzie


It would help to see the complete loop and to know more about your
environment: hardware, compiler, optimization settings. But here are
a few things to look at:

* The trick with switching among cases based on boolean conditions
used in arithmetic will probably hide optimization opportunities from
a good compiler. Try simple "if" conditions.

* If this is really the whole body of the for loop, it looks like only
one of the cases will ever be used for all the loop iterations.
Therefore this should be coded as 3 separate loops, not one.

* If the computation can be separated into multiple threads, a multi-
core processor will provide a good speedup for this code.
 
D

David Resnick

Dear all,

I'm struggling with a C program (since I'm new to C...) therefore, I
count on you to help me a hand speeding up this piece, since it's the
bottleneck of the program.

Just checking, but did you ask your compiler to optimize at its
highest level (e.g. try gcc with -O2 or -O3)? Often that is the
easiest way to get a boost if you aren't already doing it.

Other than that, I'd suggest profiling your whole program (e.g. using
oprofile) and focusing on what it tells you.

-David
 
G

Gene

Dear all,
I'm struggling with a C program (since I'm new to C...) therefore, I
count on you to help me a hand speeding up this piece, since it's the
bottleneck of the program.
I should give you some extra information
- the arrays m , ha, ad is 1e3 to 1e4 elements long, dependent on the
input. The code I show you below is part of a for loop, looping over
all elements i.
- none of the cases in the switch case construction is likely to occur
more often than the other ones.
- I used cosalfa and sinalfa variables, because I need these values
later in the program, and calculating cos and sin is expensive I
guess ...
- this program runs for a very large numbers of times, so a small
speedup is worth it...
Here's the code:
yp = sqrt(Ri*Ri-(xmo-xmi)*(xmo-xmi))+ymi;
    switch((yi>yp) + 2*(yi<ymi)) {
        case 1:{
            alfa = 0;
            cosalfa = 1;
            sinalfa = 0;
            c = 0;
            x = xi;
            y = yi;
            G = 0;
            r = ((x-xi)*(x-xi)+(y-yi)*(y-yi));
            if (r<(ha*ha)){
                H = ad;}
            else {H = 0;}
            break;}
        case 2:{
            alfa = 0;
            cosalfa = 1;
            sinalfa = 0;
            c = 1;
            x = xi;
            y = yp;
            ti = (vi*cosalfa+vfi*sinalfa);
            G = -m*sgn(ti);
            H = 0;
            break;}
         default:{
            gamma = atan2((yi-ymi),(xmo-xmi));
            x = xmi+Ri*cos(gamma);
            y = ymi+Ri*sin(gamma);
            alfa = atan2((x-xmo),(y-yi));
            cosalfa = cos(alfa);
            sinalfa = sin(alfa);
            x = x+m;
            c = 1;
            ti = (vi*cosalfa+vfi*sinalfa);
            G = -m*sgn(ti);
            H = 0;
            break;}
    }

Thanks in advance,
Kizzie

It would help to see the complete loop and to know more about your
environment: hardware, compiler, optimization settings.  But here are
a few things to look at:

* The trick with switching among cases based on boolean conditions
used in arithmetic will probably hide optimization opportunities from
a good compiler.  Try simple "if" conditions.

* If this is really the whole body of the for loop, it looks like only
one of the cases will ever be used for all the loop iterations.
Therefore this should be coded as 3 separate loops, not one.

* If the computation can be separated into multiple threads, a multi-
core processor will provide a good speedup for this code.- Hide quoted text -


* Sorry I should have included the first time:

A = atan(X, Y)
C = cos(X)
S = sin(X)

is probably more expensive than

H = sqrt(X * X + Y * Y);
C = Y / H;
S = X / H;
 
J

James Dow Allen

count on you to help me a hand speeding up this piece, since it's the
bottleneck of the program.

Here's the code:

yp = sqrt(Ri*Ri-(xmo-xmi)*(xmo-xmi))+ymi;
    switch((yi>yp) + ...
[snip]

I think this is equivalent to
vvv = (Ri*Ri-(xmo-xmi)*(xmo-xmi));
switch ((yi-ymi)*(yi-ymi) > vvv ...

Squaring is almost always much faster than sqrt'ing.
I do see you use yp in *one* of the cases; even if you
can't change that you still save the sqrt() in other cases.
            gamma = atan2((yi-ymi),(xmo-xmi));
            x = xmi+Ri*cos(gamma);
            y = ymi+Ri*sin(gamma);

Can't you rewrite this to call one trig function instead of 3?
            alfa = atan2((x-xmo),(y-yi));
            cosalfa = cos(alfa);
            sinalfa = sin(alfa);

Same.

James
 
R

razor

Thank you all for the valuable comments! I'll try to implement the
remarks during the weekend.

cheers,
Kizzie
 
J

jameskuyper

Gene said:
....
* Sorry I should have included the first time:

A = atan(X, Y)

That's should be atan2().
C = cos(X)
S = sin(X)

is probably more expensive than

H = sqrt(X * X + Y * Y);
C = Y / H;
S = X / H;

Note, however, that is __STDC_IEC_559__ is defined, then ISO/IEC 559
(== IEEE 754) requirements apply. In that case, if X and Y are both
zeros, the division by H is problematic, while the behavior of atan2()
is well defined: it will return either +/-0 or +/- pi, depending upon
whether X and Y are +0 or -0.
If either X, or Y, or both is an infinity, the behaviour of atan2(X,
Y) is also well-defined, while the X*X + Y*Y expression can overflow
even for sufficiently large finite values of X and Y.

These are a very special and generically unlikely cases; but if they
can come up, the atan2() approach is better.
 
D

Doug Miller

Did you reach that conclusion by profiling your code, or is it
guesswork? As a general rule, profiling is the only reliable way to
identify bottlenecks, especially if you're new to C.

....
....

Given the above lines setting x and y, the following line:


can be simplified to

r = 0;

And hence the test immediately after that

if (r<(ha*ha)){

can be simplified to

if (ha) {
 
J

jameskuyper

Doug said:
razor wrote: ....
....

Given the above lines setting x and y, the following line:


can be simplified to

r = 0;

And hence the test immediately after that

if (r<(ha*ha)){

can be simplified to

if (ha) {


Not quite: if ha!=0.0 && (fabs(ha) < sqrt(DBL_MIN), the original
if() condition would (probably) be false, but your replacement would
be true.

Whether that's desirable or undesirable depends upon the purpose of
the test.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,906
Latest member
SkinfixSkintag

Latest Threads

Top