problem returning floating point

P

Peter

I have written this small app to explain an issue I'm having with a larger
program.

In the following code I'm taking 10 ints from the keyboard. In the call to
average() these 10 ints are then added and Divided by the number of ints to
return the average, a float.

Can someone see why I get for example: if the total is 227 then divided by
10 I get a return of 22.00000 instead of the correct answer 22.7.

Thanks in advance
Pete

#include <stdio.h>


float average(int, int[]); /* Prototype returning float */

int main()
{
int num[10]; /* array to hold 10 ints */
float avg; /* float to hold the average */
int i, size=10;

printf("Enter 10 integers: \n");

/* Takes ten ints for keyboard */

for(i=0; i<size; i++)
{
scanf("%d", &num);
}

avg = average(size, num); /* Call to average() passing size and array */
printf("%f", avg);
}

/* average() calculates total of 10 ints & calculates average, returns float
ans */

float average(int count, int numbers[])
{
int total=0;
int i;
float ans;

for(i=0; i<count; i++)
{
total +=numbers;

/*printf("Total = %d \n", total);*/
ans = total/count;
}
return ans;

}
 
V

viza

Hi

Peter said:
float average(int count, int numbers[])
{
int total=0;
int i;
float ans;

for(i=0; i<count; i++)
{
total +=numbers;
ans = total/count;
}
return ans;
}


One modification will suffice to fix the above problem:
int total=0;

Change this to:

float total=0;


Not strictly wrong, but bad advice.

As long as the size of the integer used is big enough to hold the sum, it
is both faster and more accurate to do the addition in an integer.

A much better change to make is:

ans = (double)total / (double)count;

Also, it is massively stupid to calculate the division inside each loop,
although a half-decent compiler will realise this and optimise it away.

My version of this function would look like:

float average( int count, int *numbers ){
int total= 0;
int *stop= numbers + count;

while( numbers < stop )
total+= *numbers++;

return (double)total / count;
}

I have removed the subscripting in favour of comparing pointers, to
improve speed, in case this is performance critical.

In the case where an int couldn't hold the maximum possible sum of the
ten inputs, I would use a long or long long. In the case where that was
still not enough, I would use two nested loops, the inner one adding up
using integers, and the outer one using doubles. Again, this would give
both increased accuracy and increased speed.
 
N

Nick Keighley

I have removed the subscripting in favour of comparing pointers, to
improve speed, in case this is performance critical.

do you have access to a compiler where this makes any difference?
 
R

Richard Tobin

viza said:
In the case where an int couldn't hold the maximum possible sum of the
ten inputs, I would use a long or long long. In the case where that was
still not enough, I would use two nested loops, the inner one adding up
using integers, and the outer one using doubles. Again, this would give
both increased accuracy and increased speed.

Loss of precision in a case like this arises when a small number is
added to a large one. A reasonable approach therefore is to add up
each half of the numbers and then add these together, using the same
method recursively to add up the halves. Something like this
(untested):

double sum_ints(int count, int *numbers)
{
if(count == 0)
return 0.0;
else if(count == 1)
return (double)numbers[0];
else return sum_ints(count/2, numbers) +
sum_ints(count-count/2, numbers+count/2);
}

-- Richard
 
V

viza

Hi

Loss of precision in a case like this arises when a small number is
added to a large one.

Yes, I would tune my loops so that the inner integer loop got as close as
possible to the integer type's maximum before adding it to the double
being used in the outer loop. If there was need for still more precision
or averaging over more values an intermediate third loop could be used.
A reasonable approach therefore is to add up each
half of the numbers and then add these together, using the same method
recursively to add up the halves.

That method effectively uses recursion to put a lot of temporary
intermediates onto the stack. Using intermediates to increase precision
is not a bad idea in some cases, but in such cases I would allocate them
explicitly and iterate over them in a loop. This would avoid the
overhead of the recursion and keep memory access closer to being in
order, helping the cache (and allowing for stream based processing too,
if that was wanted).

All of the methods discussed use floating point summing, which has the
property that the sum depends upon the order of the elements. A pure
integer method perhaps using bignums or pre-shifting would be required to
avoid this effect.

viza
 
V

viza

Hi

do you have access to a compiler where this makes any difference?

No, not noticeably, if at all. The use of one less variable makes the C
look tidier to me, quite aside from what the compiler outputs.

viza
 
N

Nick Keighley

Hi




No, not noticeably, if at all. The use of one less variable makes the C
look tidier to me, quite aside from what the compiler outputs.

I don't see how you are saving a variable. You wrote
(I fiddled with layout a bit)

float average( int count, int *numbers )
{
int total= 0;
int *stop= numbers + count;

while( numbers < stop )
total+= *numbers++;

return (double)total / count;
}

I'd write

float average (int count, int *numbers)
{
int total = 0;
int i;

for (i = 0; i < count; count++)
total += numbers ;

return (double)total / count;
}

I don't find your version significantly easier to understand.
The termination condition seems a little obscure.
 
R

Richard Tobin

A reasonable approach therefore is to add up each
half of the numbers and then add these together, using the same method
recursively to add up the halves.
[/QUOTE]
That method effectively uses recursion to put a lot of temporary
intermediates onto the stack.

Not a lot. The depth is only log2(N).
Using intermediates to increase precision
is not a bad idea in some cases, but in such cases I would allocate them
explicitly and iterate over them in a loop. This would avoid the
overhead of the recursion and keep memory access closer to being in
order

The recursive method accesses the array in order, if the calls are in
order (which you can force by separating the calls into two
statements).
helping the cache (and allowing for stream based processing too,
if that was wanted).

The recursive method is trivially parallelisable, and I think
"restrict" would be sufficient to convey this to the compiler.

-- Richard
 
V

viza

Hi

viza said:

We don't know this, so you're already on dodgy ground.

We don't know that float is either. As soon as you add two of anything
in C you make an assumption that the result will fit in the type.
Chapter and verse, please. (Math co-processors are commonplace
nowadays.)

I did test specifically this function when implementing several image
shrinking functions. Integer was faster on both x86 and x86-64, which
was all I had to hand. 64-bit integers were even faster then float or
double on one 32 bit cpu that I tested.
Introducing two spurious casts, and risking overflow on the total.

Your method fails first. The range where all integers can be represented
in a float is much smaller than the range of the same size int. Once you
pass that limit then if some of your input ints are small they will be
completely lost and if many of them are small (wrt the eventual mean)
then the output of the float method will be less precise than in the
integer method.

int data[] = { INT_MAX, INT_MAX - 1, INT_MAX - 2 };
or long long.
Might not be available.

Ok, my method simply cannot handle integers of the largest type available
on the system which are close to their maximum value. In many situations
it is possible to know a priori that this will not be the case, and
achieve a more accurate result more quickly by using my method.

By your method the OP would discard accuracy when that may not have been
necessary or even acceptable, and would have run slower on many systems.

Have a nice day.

viza
 
U

user923005

Hi





Peter said:
float average(int count, int numbers[])
{
  int total=0;
  int i;
  float ans;
  for(i=0; i<count; i++)
  {
    total +=numbers;
    ans = total/count;
  }
  return ans;
}

One modification will suffice to fix the above problem:
Change this to:
float total=0;

Not strictly wrong, but bad advice.

As long as the size of the integer used is big enough to hold the sum, it
is both faster and more accurate to do the addition in an integer.


Utter poppycock. As long as the floating point type has enough
precision to hold the totals, a floating point type is exactly as
precise as an integral type.
So for up to 6 digits with 4 byte float or 15 digits with 8 byte
float, the floating point value will hold the exact answer just like
an integer would.

Also, floating point operations are not appreciably slower and many
modern systems have special hardware for floating point to accelerate
it.
See, for example:
http://www.cygnus-software.com/papers/x86andinfinity.html

One floating point addition per cycle is going to be hard to beat.
A much better change to make is:

     ans = (double)total / (double)count;

Also, it is massively stupid to calculate the division inside each loop,
although a half-decent compiler will realise this and optimise it away.

My version of this function would look like:

float average( int count, int *numbers ){
  int total= 0;
  int *stop= numbers + count;

  while( numbers < stop )
    total+= *numbers++;

  return (double)total / count;

}

I have removed the subscripting in favour of comparing pointers, to
improve speed, in case this is performance critical.

More nonsense.
In the case where an int couldn't hold the maximum possible sum of the
ten inputs, I would use a long or long long.  In the case where that was
still not enough, I would use two nested loops, the inner one adding up
using integers, and the outer one using doubles.  Again, this would give
both increased accuracy and increased speed.

All solutions are going to be O(n) and these little tweaky things you
are suggesting are for the most part utter nonsense.

However, your accumulation only and single division is the right way
to do it unless there is a need for a running sum (e.g. if you need
the current average after each insertion).
 
V

viza

Hi

On Nov 6, 4:19 am, viza wrote:
Also, floating point operations are not appreciably slower and many
modern systems have special hardware for floating point to accelerate
it.
See, for example:
http://www.cygnus-software.com/papers/x86andinfinity.html

One floating point addition per cycle is going to be hard to beat.

Ok, if you use simd it gets much faster but most compilers can't (or
don't by default) use those instructions in a loop like this, so if you
are only writing standard C and using a common compiler then integer
addition is still much faster.

I take your point that both ints and floats can store ints exactly over a
certain range; that range is larger for ints though.
More nonsense.

The compiler will probably do that for you but it can't hurt.

viza
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,767
Messages
2,569,572
Members
45,045
Latest member
DRCM

Latest Threads

Top