# float algorithm is slow

Discussion in 'C Programming' started by Wenfei, Jul 5, 2005.

1. ### WenfeiGuest

float percentage;

for (j = 0; j < 10000000; j++)
{
percentage = sinf(frequency * j * 2 * 3.14159 / sampleFreq );
buffer[totalBytes] =ceilf(volume * percentage) + volume;
totalBytes++;
}

Because the float variable, the above loop take 2 seconds in c or c++
on Linux machine. Does anybody has a solution to reduce the time?

Thanks,

Wenfei

Wenfei, Jul 5, 2005

2. ### E. Robert TisdaleGuest

Wenfei wrote:

> float percentage;
>
> for (j = 0; j < 10000000; j++) {
> percentage = sinf(frequency * j * 2 * 3.14159 / sampleFreq );
> buffer[totalBytes] =ceilf(volume * percentage) + volume;
> totalBytes++;
> }
>
> Because the float variable, the above loop take 2 seconds in c or c++
> on Linux machine. Does anybody has a solution to reduce the time?

> cat main.c

#include <stdlib.h>
#include <math.h>

int main(int argc, char* argv[]) {

const
size_t n = 10000000;
float buffer[n];
size_t totalBytes = 0;
const
float_t frequency = 1.0;
const
float_t sampleFreq = 1.0;
const
float_t pi = 3.14159265358979323846;
const
float_t volume = 1.0;

for (size_t j = 0; j < n; ++j) {
float_t percentage = sinf(frequency*j*2*pi/sampleFreq);
buffer[totalBytes] =ceilf(volume*percentage) + volume;
totalBytes++;
}

return 0;
}

> gcc -Wall -std=c99 -pedantic -O2 -o main main.c -lm
> time ./main

3.694u 0.258s 0:03.92 100.5% 0+0k 0+0io 0pf+0w

E. Robert Tisdale, Jul 5, 2005

3. ### Eric SosmanGuest

Wenfei wrote:
> float percentage;
>
> for (j = 0; j < 10000000; j++)
> {
> percentage = sinf(frequency * j * 2 * 3.14159 / sampleFreq );
> buffer[totalBytes] =ceilf(volume * percentage) + volume;
> totalBytes++;
> }
>
> Because the float variable, the above loop take 2 seconds in c or c++
> on Linux machine. Does anybody has a solution to reduce the time?

Yes: Change the iteration count from 10000000 to 0, and
the code will almost certainly run faster.

In other words, micro-benchmarks of this sort are not
very informative. What are you really trying to do?

--
Eric Sosman
lid

Eric Sosman, Jul 5, 2005
4. ### Gordon BurdittGuest

>float percentage;
>
>for (j = 0; j < 10000000; j++)
>{
> percentage = sinf(frequency * j * 2 * 3.14159 / sampleFreq );
> buffer[totalBytes] =ceilf(volume * percentage) + volume;
> totalBytes++;
>}
>
>Because the float variable,

My guess is that if you GOT RID OF the float variable, it would

>for (j = 0; j < 10000000; j++)

{
buffer[totalBytes] =ceilf(volume * sinf(frequency * j * 2 * 3.14159 / sampleFreq )) + volume;
totalBytes++;
}

>the above loop take 2 seconds in c or c++
>on Linux machine. Does anybody has a solution to reduce the time?

You haven't demonstrated why taking two seconds is a problem yet.
Cut down the number of iterations? Get a faster machine?
Doing the calculation in double might make it faster (although on
Intel *86 it probably won't).

Gordon L. Burditt

Gordon Burditt, Jul 5, 2005
5. ### Anonymous 7843Guest

In article <>,
Wenfei <> wrote:
>
>
> float percentage;
>
> for (j = 0; j < 10000000; j++)
> {
> percentage = sinf(frequency * j * 2 * 3.14159 / sampleFreq );
> buffer[totalBytes] =ceilf(volume * percentage) + volume;
> totalBytes++;
> }
>
> Because the float variable, the above loop take 2 seconds in c or c++
> on Linux machine. Does anybody has a solution to reduce the time?

* Use a small table of sine values instead of sinf() function.
It appears that you're downsampling to the width of a char
anyway, so table of somewhat more than 256 sine values probably
won't make things much worse.

* Or just map out one complete waveform and just repeatedly copy that
throughout the rest of the buffer.

* Or Just make one copy of the waveform and index into that as
appropriate when you need the results.
--
7842++

Anonymous 7843, Jul 5, 2005
6. ### Robert GambleGuest

Wenfei wrote:
> float percentage;
>
> for (j = 0; j < 10000000; j++)
> {
> percentage = sinf(frequency * j * 2 * 3.14159 / sampleFreq );
> buffer[totalBytes] =ceilf(volume * percentage) + volume;
> totalBytes++;
> }
>
> Because the float variable, the above loop take 2 seconds in c or c++
> on Linux machine. Does anybody has a solution to reduce the time?

<OT>
It is unlikely that using double will be any faster.

On my system, 66% of the time is spent in the first statement inside
the loop, 27% on the second. The sinf and ceilf function calls are by
far the most intensive parts of these statements. If your system is
similiar, you might try replacing the sinf call with a lookup table of
precalculated values with a tradeoff of accuracy and memory usage. You
could also use a macro of inline function in place of ceilf.

You didn't specify how much faster you need it, what tradeoffs are
feasible, what you have tried already, or even what exactly you are
trying to accomplish. Perhaps the algorithm itself could be improved
but it is difficult to attempt to do so without knowing what the
specifications are.
</OT>

Robert Gamble

Robert Gamble, Jul 5, 2005
7. ### Erik de Castro LopoGuest

Wenfei wrote:
>
> float percentage;
>
> for (j = 0; j < 10000000; j++)
> {
> percentage = sinf(frequency * j * 2 * 3.14159 / sampleFreq );
> buffer[totalBytes] =ceilf(volume * percentage) + volume;
> totalBytes++;
> }
>
> Because the float variable, the above loop take 2 seconds in c or c++
> on Linux machine. Does anybody has a solution to reduce the time?

I assume your linux machine has an x86 based processor. If thats the
case then you problem is probably the ceilf() function. The implementation
of that function requires that the FPU's control word to be modified at
the start of ceilf() and restored afterwards. Every time the FPU control
word is modified, it causes a stall in the FPU pipeline.

You might try replacing ceilf() with the C99 function lrintf().
Unfortunately lrintf() is a round function, not a ceil function so you
might need to replace

ceilf (x)

with

lrintf (x + 0.5).

Also have a look at this paper:

http://www.mega-nerd.com/FPcast/

Erik
--
+-----------------------------------------------------------+
Erik de Castro Lopo (Yes it's valid)
+-----------------------------------------------------------+
"It has been discovered that C++ provides a remarkable facility
for concealing the trival details of a program -- such as where
its bugs are." -- David Keppel

Erik de Castro Lopo, Jul 5, 2005
8. ### Anonymous 7843Guest

In article <>,
Wenfei <> wrote:
>
> buffer[totalBytes] =ceilf(volume * percentage) + volume;

Sorry for posting twice to the same thread, but ceilf() may not be
strictly really necessary. If buffer is an array of some kind of
integer type then the assignment is going to do a conversion from float
to int anyway. Unless you have some mathematically rigorous standard
to uphold, you might get a "close enough" result by rounding up. E.g.

buffer[totalBytes] = (volume * percentage + 0.5) + volume;
--
7842++

Anonymous 7843, Jul 5, 2005
9. ### Christian BauGuest

In article <>,
"Wenfei" <> wrote:

> float percentage;
>
> for (j = 0; j < 10000000; j++)
> {
> percentage = sinf(frequency * j * 2 * 3.14159 / sampleFreq );
> buffer[totalBytes] =ceilf(volume * percentage) + volume;
> totalBytes++;
> }
>
> Because the float variable, the above loop take 2 seconds in c or c++
> on Linux machine. Does anybody has a solution to reduce the time?

anyway.

When j has the maximum value 1e7 - 1, what is the difference between j *
2 * 3.14159 and j * 2 * pi?

Once you've fixed this, just a hint (mathematics is fun): Consecutive
terms of any function of the form

f (n) = a * sin (x * n + b) + c * cos (x * n + d)

can be calculated using one single multiplication and one single

Christian Bau, Jul 5, 2005
10. ### Lawrence KirbyGuest

On Tue, 05 Jul 2005 22:57:23 +0100, Christian Bau wrote:

....

> Once you've fixed this, just a hint (mathematics is fun): Consecutive
> terms of any function of the form
>
> f (n) = a * sin (x * n + b) + c * cos (x * n + d)
>
> can be calculated using one single multiplication and one single

But you need to be careful about accumulation of errors.

Lawrence

Lawrence Kirby, Jul 6, 2005
11. ### Lawrence KirbyGuest

On Tue, 05 Jul 2005 21:15:21 +0000, Anonymous 7843 wrote:

> In article <>,
> Wenfei <> wrote:
>>
>>
>> float percentage;
>>
>> for (j = 0; j < 10000000; j++)
>> {
>> percentage = sinf(frequency * j * 2 * 3.14159 / sampleFreq );

You may gain a little by precalculating the loop invariant
frequency * 2 * 3.14159 / sampleFreq outside the loop.

>> buffer[totalBytes] =ceilf(volume * percentage) + volume;
>> totalBytes++;
>> }
>>
>> Because the float variable, the above loop take 2 seconds in c or c++
>> on Linux machine. Does anybody has a solution to reduce the time?

>
> * Use a small table of sine values instead of sinf() function.
> It appears that you're downsampling to the width of a char
> anyway, so table of somewhat more than 256 sine values probably
> won't make things much worse.

On of the problems is that the type of buffer is not specified

> * Or just map out one complete waveform and just repeatedly copy that
> throughout the rest of the buffer.
>
> * Or Just make one copy of the waveform and index into that as
> appropriate when you need the results.

These only work if the cycle length corresponds to an integral number of
positions in buffer. There's nothing to suggest that this is the case.

Lawrence

Lawrence Kirby, Jul 6, 2005
12. ### Anonymous 7843Guest

In article <>,
Lawrence Kirby <> wrote:
>
> On Tue, 05 Jul 2005 21:15:21 +0000, Anonymous 7843 wrote:
>
> > In article <>,
> > Wenfei <> wrote:
> >>
> >> percentage = sinf(frequency * j * 2 * 3.14159 / sampleFreq );

>
> > * Or Just make one copy of the waveform and index into that as
> > appropriate when you need the results.

>
> These only work if the cycle length corresponds to an integral number of
> positions in buffer. There's nothing to suggest that this is the case.

Given the choice to use float, round pi off at the 6th digit and the
(likely) stuffing of results into a char it didn't seem like fidelity
was foremost.

One hopes that this project was just the first exercise in a DSP course
and that our intrepid correspondent Wenfei soon outlearns us.
--
7842++

Anonymous 7843, Jul 6, 2005
13. ### WenfeiGuest

I am converting Notes format to PCM format to play sounds on the ceil
phone. It is Linux os and it is not FPU, so it is slow.

But finally, I used a sinf lookup table, and the sound can be played
instantly.

Thanks for all your guys idea.

Wenfei

Wenfei, Jul 8, 2005
14. ### Guest

Wenfei wrote:
> float percentage;
> for (j = 0; j < 10000000; j++) {
> percentage = sinf(frequency * j * 2 * 3.14159 / sampleFreq );
> buffer[totalBytes] =ceilf(volume * percentage) + volume;
> totalBytes++;
> }

sinf() appears to be a Microsoft VC++ extension that computes sin() but
uses only 32-bit floats. Interesting that the zealots in this
newsgroup didn't call you on that.

Ok, anyhow remembering our trigonometry:

sin(a + b) = sin(a)*cos(b) + sin(b)cos(a)
cos(a + b) = cos(a)*cos(b) - sin(b)sin(a)

Now we can strength reduce the argument to sinf:

arg += (freqency*2*3.14159/sampleFreq);

But that value is a constant:

arg += deltaAngle; /* deltaAngle is precomputed as:
freqency*2*3.14159/sampleFreq */

So we can simplify this to:

s1 = percentage*cos(deltaAngle) + c*sin(deltaAngle);
c = c*cos(deltaAngle) + percentage*sin(deltaAngle);
percentage = s1;

And of course we can replace the cos(deltaAngle) and sin(deltaAngle)
with some precomputed values, and we initialize c to 1, and percentage
to 0.

This removes all the trigonometric functions altogether.

The problem is that it will have "accumulating accuracy" problems.
These problems are not trivial, because you will lose the sin^2+cos^2=1
property, which will start scaling your results (either upward or
downward) globally, which could get bad.

A simple way to mitigate this is to split the loop into groups of, say,
100 or 1000 at a time, and reset the parameters with their true
mathematical value with the raw sin/cos functions:

percentage = sinf (frequency * j * 2 * 3.14159 / sampleFreq);
c = cosf (frequency * j * 2 * 3.14159 / sampleFreq);

> Because the float variable, the above loop take 2 seconds in c or c++
> on Linux machine. Does anybody has a solution to reduce the time?

This takes time because you are calling sinf(), not because its
floating point.

--
Paul Hsieh
http://www.pobox.com/~qed/
http://bstring.sf.net/

, Jul 10, 2005
15. ### CBFalconerGuest

wrote:
> Wenfei wrote:
>
>> float percentage;
>> for (j = 0; j < 10000000; j++) {
>> percentage = sinf(frequency * j * 2 * 3.14159 / sampleFreq );
>> buffer[totalBytes] =ceilf(volume * percentage) + volume;
>> totalBytes++;
>> }

>
> sinf() appears to be a Microsoft VC++ extension that computes
> sin() but uses only 32-bit floats. Interesting that the zealots
> in this newsgroup didn't call you on that.

The usage is proper. The foolish comment is not. From N869:

7.12.4.6 The sin functions

Synopsis

[#1]
#include <math.h>
double sin(double x);
float sinf(float x);
long double sinl(long double x);

Description

[#2] The sin functions compute the sine of x (measured in

Returns

[#3] The sin functions return the sine value.

--
"If you want to post a followup via groups.google.com, don't use
the broken "Reply" link at the bottom of the article. Click on
"show options" at the top of the article, then click on the

CBFalconer, Jul 10, 2005
16. ### NetocratGuest

On Sat, 09 Jul 2005 17:35:27 -0700, websnarf wrote:

> Wenfei wrote:
>> float percentage;
>> for (j = 0; j < 10000000; j++) {
>> percentage = sinf(frequency * j * 2 * 3.14159 / sampleFreq );
>> buffer[totalBytes] =ceilf(volume * percentage) + volume; totalBytes++;
>> }
>> }

> sinf() appears to be a Microsoft VC++ extension that computes sin() but
> uses only 32-bit floats. Interesting that the zealots in this newsgroup
> didn't call you on that.
>
> Ok, anyhow remembering our trigonometry:
>
> sin(a + b) = sin(a)*cos(b) + sin(b)cos(a) cos(a + b) = cos(a)*cos(b) -
> sin(b)sin(a)
>
> Now we can strength reduce the argument to sinf:
>
> arg += (freqency*2*3.14159/sampleFreq);
>
> But that value is a constant:
>
> arg += deltaAngle; /* deltaAngle is precomputed as:
> freqency*2*3.14159/sampleFreq */
>
> So we can simplify this to:
>
> s1 = percentage*cos(deltaAngle) + c*sin(deltaAngle);
> c = c*cos(deltaAngle) + percentage*sin(deltaAngle);

You meant to subtract there, not add, I believe.

> percentage = s1;
>
> And of course we can replace the cos(deltaAngle) and sin(deltaAngle) with
> some precomputed values, and we initialize c to 1, and percentage to 0.
>
> This removes all the trigonometric functions altogether.

Good suggestion. That would seem to be the major contributor to the
calculation time, although you'd have to compare the approaches to be
sure. It's theoretically possible that on a specially optimised machine
sin(j*deltaAngle) could be faster than your approach, but I very much
doubt that this is the case on the OP's cell phone...

> The problem is that it will have "accumulating accuracy" problems. These
> problems are not trivial, because you will lose the sin^2+cos^2=1
> property, which will start scaling your results (either upward or
> downward) globally, which could get bad.
>
> A simple way to mitigate this is to split the loop into groups of, say,
> 100 or 1000 at a time, and reset the parameters with their true
> mathematical value with the raw sin/cos functions:
>
> percentage = sinf (frequency * j * 2 * 3.14159 / sampleFreq); c
> = cosf (frequency * j * 2 * 3.14159 / sampleFreq);

Yes and you could tune the loop grouping number beforehand by looking at
how many cycles it takes before the values grow too inaccurate for

<snip>

Netocrat, Jul 10, 2005
17. ### Christian BauGuest

In article <>,
wrote:

> Wenfei wrote:
> > float percentage;
> > for (j = 0; j < 10000000; j++) {
> > percentage = sinf(frequency * j * 2 * 3.14159 / sampleFreq );
> > buffer[totalBytes] =ceilf(volume * percentage) + volume;
> > totalBytes++;
> > }

>
> sinf() appears to be a Microsoft VC++ extension that computes sin() but
> uses only 32-bit floats. Interesting that the zealots in this
> newsgroup didn't call you on that.

That's no surprise because it is part of C99.

Christian Bau, Jul 10, 2005
18. ### P.J. PlaugerGuest

"Christian Bau" <> wrote in message
news:...

> In article <>,
> wrote:
>
>> Wenfei wrote:
>> > float percentage;
>> > for (j = 0; j < 10000000; j++) {
>> > percentage = sinf(frequency * j * 2 * 3.14159 / sampleFreq );
>> > buffer[totalBytes] =ceilf(volume * percentage) + volume;
>> > totalBytes++;
>> > }

>>
>> sinf() appears to be a Microsoft VC++ extension that computes sin() but
>> uses only 32-bit floats. Interesting that the zealots in this
>> newsgroup didn't call you on that.

>
> That's no surprise because it is part of C99.

It's even defined, but optional, in C89.

P.J. Plauger
Dinkumware, Ltd.
http://www.dinkumware.com

P.J. Plauger, Jul 10, 2005
19. ### WenfeiGuest

I am converting Notes format to PCM format to play sounds on the ceil
phone. It is Linux os and it is not FPU, so it is slow.

But finally, I used a sinf lookup table, and the sound can be played
instantly.

Thanks for all your guys idea.

Wenfei

Wenfei wrote:
> float percentage;
>
> for (j = 0; j < 10000000; j++)
> {
> percentage = sinf(frequency * j * 2 * 3.14159 / sampleFreq );
> buffer[totalBytes] =ceilf(volume * percentage) + volume;
> totalBytes++;
> }
>
> Because the float variable, the above loop take 2 seconds in c or c++
> on Linux machine. Does anybody has a solution to reduce the time?
>
> Thanks,
>
> Wenfei

Wenfei, Jul 11, 2005
20. ### Tydr SchnubbisGuest

E. Robert Tisdale wrote:
> Wenfei wrote:
>
>> float percentage;
>>
>> for (j = 0; j < 10000000; j++) {
>> percentage = sinf(frequency * j * 2 * 3.14159 / sampleFreq );
>> buffer[totalBytes] =ceilf(volume * percentage) + volume;
>> totalBytes++;
>> }
>>
>> Because the float variable, the above loop take 2 seconds in c or c++
>> on Linux machine. Does anybody has a solution to reduce the time?

>
> > cat main.c

> #include <stdlib.h>
> #include <math.h>
>
> int main(int argc, char* argv[]) {
>
> const
> size_t n = 10000000;
> float buffer[n];
> size_t totalBytes = 0;
> const
> float_t frequency = 1.0;
> const
> float_t sampleFreq = 1.0;
> const
> float_t pi = 3.14159265358979323846;
> const
> float_t volume = 1.0;
>
> for (size_t j = 0; j < n; ++j) {
> float_t percentage = sinf(frequency*j*2*pi/sampleFreq);
> buffer[totalBytes] =ceilf(volume*percentage) + volume;
> totalBytes++;
> }
>
> return 0;
> }
>
> > gcc -Wall -std=c99 -pedantic -O2 -o main main.c -lm
> > time ./main

> 3.694u 0.258s 0:03.92 100.5% 0+0k 0+0io 0pf+0w

If you are posting a version of someone's example that looks different,
but really does the exact same thing in the exact same way, it's best if
you state that that's what you're doing. Don't make people read your
code and compare to figure out you didn't change the logic, nor optimize
it at all. But sure, you version is better style.

-Tydr

Tydr Schnubbis, Jul 12, 2005