problem returning floating point

Peter · Nov 6, 2008

I have written this small app to explain an issue I'm having with a larger
program.

In the following code I'm taking 10 ints from the keyboard. In the call to
average() these 10 ints are then added and Divided by the number of ints to
return the average, a float.

Can someone see why I get for example: if the total is 227 then divided by
10 I get a return of 22.00000 instead of the correct answer 22.7.

Thanks in advance
Pete

#include <stdio.h>

float average(int, int[]); /* Prototype returning float */

int main()
{
int num[10]; /* array to hold 10 ints */
float avg; /* float to hold the average */
int i, size=10;

printf("Enter 10 integers: \n");

/* Takes ten ints for keyboard */

for(i=0; i<size; i++)
{
scanf("%d", &num);
}

avg = average(size, num); /* Call to average() passing size and array */
printf("%f", avg);
}

/* average() calculates total of 10 ints & calculates average, returns float
ans */

float average(int count, int numbers[])
{
int total=0;
int i;
float ans;

for(i=0; i<count; i++)
{
total +=numbers;

/*printf("Total = %d \n", total);*/
ans = total/count;
}
return ans;

}

viza · Nov 6, 2008

Hi

Peter said:

float average(int count, int numbers[])
{
int total=0;
int i;
float ans;

for(i=0; i<count; i++)
{
total +=numbers;
ans = total/count;
}
return ans;
}

Click to expand...

One modification will suffice to fix the above problem:

int total=0;

Click to expand...

Change this to:

float total=0;

Not strictly wrong, but bad advice.

As long as the size of the integer used is big enough to hold the sum, it
is both faster and more accurate to do the addition in an integer.

A much better change to make is:

ans = (double)total / (double)count;

Also, it is massively stupid to calculate the division inside each loop,
although a half-decent compiler will realise this and optimise it away.

My version of this function would look like:

float average( int count, int *numbers ){
int total= 0;
int *stop= numbers + count;

while( numbers < stop )
total+= *numbers++;

return (double)total / count;
}

I have removed the subscripting in favour of comparing pointers, to
improve speed, in case this is performance critical.

In the case where an int couldn't hold the maximum possible sum of the
ten inputs, I would use a long or long long. In the case where that was
still not enough, I would use two nested loops, the inner one adding up
using integers, and the outer one using doubles. Again, this would give
both increased accuracy and increased speed.

Nick Keighley · Nov 6, 2008

I have removed the subscripting in favour of comparing pointers, to
improve speed, in case this is performance critical.

do you have access to a compiler where this makes any difference?

Richard Tobin · Nov 6, 2008

viza said:
In the case where an int couldn't hold the maximum possible sum of the
ten inputs, I would use a long or long long. In the case where that was
still not enough, I would use two nested loops, the inner one adding up
using integers, and the outer one using doubles. Again, this would give
both increased accuracy and increased speed.

Loss of precision in a case like this arises when a small number is
added to a large one. A reasonable approach therefore is to add up
each half of the numbers and then add these together, using the same
method recursively to add up the halves. Something like this
(untested):

double sum_ints(int count, int *numbers)
{
if(count == 0)
return 0.0;
else if(count == 1)
return (double)numbers[0];
else return sum_ints(count/2, numbers) +
sum_ints(count-count/2, numbers+count/2);
}

-- Richard

viza · Nov 6, 2008

Hi

Loss of precision in a case like this arises when a small number is
added to a large one.

Yes, I would tune my loops so that the inner integer loop got as close as
possible to the integer type's maximum before adding it to the double
being used in the outer loop. If there was need for still more precision
or averaging over more values an intermediate third loop could be used.

A reasonable approach therefore is to add up each
half of the numbers and then add these together, using the same method
recursively to add up the halves.

That method effectively uses recursion to put a lot of temporary
intermediates onto the stack. Using intermediates to increase precision
is not a bad idea in some cases, but in such cases I would allocate them
explicitly and iterate over them in a loop. This would avoid the
overhead of the recursion and keep memory access closer to being in
order, helping the cache (and allowing for stream based processing too,
if that was wanted).

All of the methods discussed use floating point summing, which has the
property that the sum depends upon the order of the elements. A pure
integer method perhaps using bignums or pre-shifting would be required to
avoid this effect.

viza

viza · Nov 6, 2008

Hi

do you have access to a compiler where this makes any difference?

No, not noticeably, if at all. The use of one less variable makes the C
look tidier to me, quite aside from what the compiler outputs.

viza

Nick Keighley · Nov 6, 2008

Hi

No, not noticeably, if at all. The use of one less variable makes the C
look tidier to me, quite aside from what the compiler outputs.

I don't see how you are saving a variable. You wrote
(I fiddled with layout a bit)

float average( int count, int *numbers )
{
int total= 0;
int *stop= numbers + count;

while( numbers < stop )
total+= *numbers++;

return (double)total / count;
}

I'd write

float average (int count, int *numbers)
{
int total = 0;
int i;

for (i = 0; i < count; count++)
total += numbers ;

return (double)total / count;
}

I don't find your version significantly easier to understand.
The termination condition seems a little obscure.

Richard Tobin · Nov 6, 2008

A reasonable approach therefore is to add up each
half of the numbers and then add these together, using the same method
recursively to add up the halves.

[/QUOTE]

That method effectively uses recursion to put a lot of temporary
intermediates onto the stack.

Not a lot. The depth is only log2(N).

Using intermediates to increase precision
is not a bad idea in some cases, but in such cases I would allocate them
explicitly and iterate over them in a loop. This would avoid the
overhead of the recursion and keep memory access closer to being in
order

The recursive method accesses the array in order, if the calls are in
order (which you can force by separating the calls into two
statements).

helping the cache (and allowing for stream based processing too,
if that was wanted).

The recursive method is trivially parallelisable, and I think
"restrict" would be sufficient to convey this to the compiler.

-- Richard

viza · Nov 6, 2008

Hi

viza said:

We don't know this, so you're already on dodgy ground.

We don't know that float is either. As soon as you add two of anything
in C you make an assumption that the result will fit in the type.

Chapter and verse, please. (Math co-processors are commonplace
nowadays.)

I did test specifically this function when implementing several image
shrinking functions. Integer was faster on both x86 and x86-64, which
was all I had to hand. 64-bit integers were even faster then float or
double on one 32 bit cpu that I tested.

Introducing two spurious casts, and risking overflow on the total.

Your method fails first. The range where all integers can be represented
in a float is much smaller than the range of the same size int. Once you
pass that limit then if some of your input ints are small they will be
completely lost and if many of them are small (wrt the eventual mean)
then the output of the float method will be less precise than in the
integer method.

int data[] = { INT_MAX, INT_MAX - 1, INT_MAX - 2 };

or long long.

Click to expand...

Might not be available.

Ok, my method simply cannot handle integers of the largest type available
on the system which are close to their maximum value. In many situations
it is possible to know a priori that this will not be the case, and
achieve a more accurate result more quickly by using my method.

By your method the OP would discard accuracy when that may not have been
necessary or even acceptable, and would have run slower on many systems.

Have a nice day.

viza

user923005 · Nov 6, 2008

Hi

Peter said:

float average(int count, int numbers[])
{
int total=0;
int i;
float ans;
for(i=0; i<count; i++)
{
total +=numbers;
ans = total/count;
}
return ans;
}

Click to expand...

Click to expand...

One modification will suffice to fix the above problem:

Click to expand...

Change this to:

Click to expand...

float total=0;

Click to expand...

Not strictly wrong, but bad advice.

As long as the size of the integer used is big enough to hold the sum, it
is both faster and more accurate to do the addition in an integer.

Utter poppycock. As long as the floating point type has enough
precision to hold the totals, a floating point type is exactly as
precise as an integral type.
So for up to 6 digits with 4 byte float or 15 digits with 8 byte
float, the floating point value will hold the exact answer just like
an integer would.

Also, floating point operations are not appreciably slower and many
modern systems have special hardware for floating point to accelerate
it.
See, for example:
http://www.cygnus-software.com/papers/x86andinfinity.html

One floating point addition per cycle is going to be hard to beat.

A much better change to make is:

ans = (double)total / (double)count;

Also, it is massively stupid to calculate the division inside each loop,
although a half-decent compiler will realise this and optimise it away.

My version of this function would look like:

float average( int count, int *numbers ){
int total= 0;
int *stop= numbers + count;

while( numbers < stop )
total+= *numbers++;

return (double)total / count;

}

I have removed the subscripting in favour of comparing pointers, to
improve speed, in case this is performance critical.

Click to expand...

More nonsense.

In the case where an int couldn't hold the maximum possible sum of the
ten inputs, I would use a long or long long. In the case where that was
still not enough, I would use two nested loops, the inner one adding up
using integers, and the outer one using doubles. Again, this would give
both increased accuracy and increased speed.

Click to expand...

All solutions are going to be O(n) and these little tweaky things you
are suggesting are for the most part utter nonsense.

However, your accumulation only and single division is the right way
to do it unless there is a need for a running sum (e.g. if you need
the current average after each insertion).

viza · Nov 7, 2008

Hi

On Nov 6, 4:19Â am, viza wrote:

Also, floating point operations are not appreciably slower and many
modern systems have special hardware for floating point to accelerate
it.
See, for example:
http://www.cygnus-software.com/papers/x86andinfinity.html

One floating point addition per cycle is going to be hard to beat.

Ok, if you use simd it gets much faster but most compilers can't (or
don't by default) use those instructions in a loop like this, so if you
are only writing standard C and using a common compiler then integer
addition is still much faster.

I take your point that both ints and floats can store ints exactly over a
certain range; that range is larger for ints though.

More nonsense.

The compiler will probably do that for you but it can't hurt.

viza

Problem of the code?	3	Mar 13, 2019
Floating point linkage	37	Oct 13, 2013
I can't get this program to run properly	2	Dec 17, 2022
seg fault	11	Jun 4, 2011
Weird Behavior with Rays in C and OpenGL	4	Feb 13, 2024
C coding a rotate function (help me pleasee)	1	Dec 26, 2022
Floating point to integer casting	48	Oct 10, 2009
Could you fix my code please? (i get no output after inputing the number)	1	Oct 16, 2023

problem returning floating point

Peter

viza

Nick Keighley

Richard Tobin

viza

viza

Nick Keighley

Richard Tobin

viza

user923005

viza

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads