# standard deviation

Discussion in 'C Programming' started by Bill Cunningham, Jun 5, 2011.

1. ### Bill CunninghamGuest

I have some code here and a snippet of unfinished, untested code which
is an attempt at a function called stddev. This is of course meant to
calculate a standard deviation. I am trying to build small helper functions
that can be built into tech analysis tools. Something I've been attempting
and thinking about for a long time. stddev's first parameter is passed the
return value of the function mean(). It may not need a second parameter but
this is what I have so far. stddev needs to do the following things.
1) find the difference in prices from mean. Whether negative of positive
numbers.
2) square those numbers
3) sum those squares
4) calculate the square of the total from 3 above.

#include <stdio.h>
#include <stdlib.h>
#ifdef M
#include <math.h>
#endif

double mean(double *, int);
double stddev(double, double *);

mean.c

#include "tech.h"

double mean(double *avg, int num)
{
double sum, average;
int i;
sum = average = 0;
for (i = 0; i < num; ++i) {
sum = sum + avg;
average = sum / num;
}
return average;
}

stddev.c /*the attempt*/

#include "tech.h"

double stddev(double mean, double *prices)
{
double price = 0.0;
int i = 0;
for (; i < prices; ++i) {
if (prices > mean) {
price = prices - mean;
return prices;
} else if (prices < mean) {
price = mean - prices;
return prices;
}

I really have no way to code this but I don't want anyone to do my
homework. Can someone offer tips or citations as to what I might need to do
here?

Bill

Bill Cunningham, Jun 5, 2011

2. ### Lew PitcherGuest

On June 5, 2011 14:25, in comp.lang.c, d wrote:

> I have some code here and a snippet of unfinished, untested code which
> is an attempt at a function called stddev. This is of course meant to
> calculate a standard deviation.

[snip]
> double stddev(double mean, double *prices)
> {
> double price = 0.0;
> int i = 0;
> for (; i < prices; ++i) {
> if (prices > mean) {
> price = prices - mean;
> return prices;
> } else if (prices < mean) {
> price = mean - prices;
> return prices;
> }
>
> I really have no way to code this but I don't want anyone to do my
> homework. Can someone offer tips or citations as to what I might need to
> do here?

Sorry, Bill, but your code doesn't really reflect the accepted way that you
calculate standard deviation. I'm not mathematician enough to tell whether
you've written equivalent code or not, so I'll just assume that your code
isn't correct, and move on.

I suggest that you read the first few paragraphs of the Wikipedia article on
Standard Deviation, especially start of the "Basic Examples" section
(http://en.wikipedia.org/wiki/Standard_deviation#Basic_examples)
There, you'll find an excellent algorithm for calculating standard deviation
that is easily transformable into C code.

Let me summarize their algorithm:
1) Compute the mean of the population
2) For each element of the population,
2a) compute the difference between the element and the mean.
2b) square this value
2c) call this new value the "variance"
3) Find the mean of the variances (sum them, then divide by the # of
variances)
4) Compute the square root of this sum of the mean of the variances

This square root is the "standard deviation"

--
Lew Pitcher
Master Codewright & JOAT-in-training | Registered Linux User #112576
Me: http://pitcher.digitalfreehold.ca/ | Just Linux: http://justlinux.ca/
---------- Slackware - Because I know what I'm doing. ------

Lew Pitcher, Jun 5, 2011

3. ### Kleuskes & MoosGuest

On Jun 5, 8:25 pm, "Bill Cunningham" <> wrote:
>     I have some code here and a snippet of unfinished, untested code which
> is an attempt at a function called stddev. This is of course meant to
> calculate a standard deviation. I am trying to build small helper functions
> that can be built into tech analysis tools. Something I've been attempting
> and thinking about for a long time. stddev's first parameter is passed the
> return value of the function mean(). It may not need a second parameter but
> this is what I have so far. stddev needs to do the following things.
> 1) find the difference in prices from mean. Whether negative of positive
> numbers.
> 2) square those numbers
> 3) sum those squares
> 4) calculate the square of the total from 3 above.
>
>
> #include <stdio.h>
> #include <stdlib.h>
> #ifdef M
> #include <math.h>
> #endif
>
> double mean(double *, int);
> double stddev(double, double *);
>
> mean.c
>
> #include "tech.h"
>
> double mean(double *avg, int num)
> {
>     double sum, average;
>     int i;
>     sum = average = 0;
>     for (i = 0; i < num; ++i) {
>  sum = sum + avg;
>  average = sum / num;
>     }
>     return average;
>
> }
>
> stddev.c /*the attempt*/
>
> #include "tech.h"
>
> double stddev(double mean, double *prices)
> {
>     double price = 0.0;
>     int i = 0;
>     for (; i < prices; ++i) {
>  if (prices > mean) {
>      price = prices - mean;
>      return prices;
>  } else if (prices < mean) {
>      price = mean - prices;
>      return prices;
>  }
>
>     I really have no way to code this but I don't want anyone to do my
> homework. Can someone offer tips or citations as to what I might need to do
> here?
>
> Bill

Erwin Kreyszig has a pretty good rundown in 'Introduction to
mathematical statistics, principles and methods' section 3.2 and 3.3.
It used to be pretty standard when i was in college, so i guess it
should still be available in the library.

Kleuskes & Moos, Jun 5, 2011
4. ### Lew PitcherGuest

On June 5, 2011 14:57, in comp.lang.c, wrote:

> On June 5, 2011 14:25, in comp.lang.c, d wrote:
>
>> I have some code here and a snippet of unfinished, untested code
>> which
>> is an attempt at a function called stddev. This is of course meant to
>> calculate a standard deviation.

> [snip]
>> double stddev(double mean, double *prices)
>> {
>> double price = 0.0;
>> int i = 0;
>> for (; i < prices; ++i) {
>> if (prices > mean) {
>> price = prices - mean;
>> return prices;
>> } else if (prices < mean) {
>> price = mean - prices;
>> return prices;
>> }
>>

FWIW, from the algorithm and data given on the Wikipedia page, I coded this

#include <stdio.h>
#include <stdlib.h>
#include <math.h>

double StdDev(unsigned int samplesize, double population[])
{
unsigned int index;

if (samplesize == 0) return 0.0; /* catch obvious error */

/* compute mean of sample population */
for (index = 0, sum = 0.0 ; index < samplesize; ++index)
sum += population[index];
mean = sum / samplesize;

/* compute variances */
for (index = 0, sum = 0.0 ; index < samplesize; ++index)
{
double delta;

delta = population[index] - mean;
sum += (delta * delta);
}
return sqrt(sum/samplesize); /* standard deviation */
}

/*
** Population values taken from the Wikipedia example
*/
int main(void)
{
double pop[] = {2,4,4,4,5,5,7,9};
unsigned int popsize = (sizeof(pop) / sizeof(pop[0]));

printf("The standard deviation = %f\n",StdDev(popsize,pop));

return EXIT_SUCCESS;
}

When I compile and run this code
\$ cc -lm -o stddev stddev.c
\$ stddev
The standard deviation = 2.000000
\$
I get the same Standard Deviation value as the Wikipedia article's example

HTH
--
Lew Pitcher
Master Codewright & JOAT-in-training | Registered Linux User #112576
Me: http://pitcher.digitalfreehold.ca/ | Just Linux: http://justlinux.ca/
---------- Slackware - Because I know what I'm doing. ------

Lew Pitcher, Jun 5, 2011
5. ### Bill CunninghamGuest

Lew Pitcher wrote:

[snip]

> FWIW, from the algorithm and data given on the Wikipedia page, I
> coded this
>
> #include <stdio.h>
> #include <stdlib.h>
> #include <math.h>
>
> double StdDev(unsigned int samplesize, double population[])
> {
> unsigned int index;
>
> if (samplesize == 0) return 0.0; /* catch obvious error */
>
> /* compute mean of sample population */
> for (index = 0, sum = 0.0 ; index < samplesize; ++index)
> sum += population[index];
> mean = sum / samplesize;
>
> /* compute variances */
> for (index = 0, sum = 0.0 ; index < samplesize; ++index)
> {
> double delta;
>
> delta = population[index] - mean;
> sum += (delta * delta);

Is this really saying sum=sum+(delta*delta);
And the parenthsis is for precedence?

> }
> return sqrt(sum/samplesize); /* standard deviation */
> }
>
> /*
> ** Population values taken from the Wikipedia example
> */
> int main(void)
> {
> double pop[] = {2,4,4,4,5,5,7,9};
> unsigned int popsize = (sizeof(pop) / sizeof(pop[0]));

Is the above code the standard thing to use if you have an array and
really don't want to count the number of elements? Using sizeof?

> printf("The standard deviation = %f\n",StdDev(popsize,pop));
>
> return EXIT_SUCCESS;
> }
>
> When I compile and run this code
> \$ cc -lm -o stddev stddev.c
> \$ stddev
> The standard deviation = 2.000000
> \$
> I get the same Standard Deviation value as the Wikipedia article's
> example
>
> HTH

Bill Cunningham, Jun 5, 2011
6. ### Lew PitcherGuest

On June 5, 2011 17:02, in comp.lang.c, d wrote:

> Lew Pitcher wrote:
>
> [snip]
>
>> FWIW, from the algorithm and data given on the Wikipedia page, I
>> coded this
>>
>> #include <stdio.h>
>> #include <stdlib.h>
>> #include <math.h>
>>
>> double StdDev(unsigned int samplesize, double population[])
>> {
>> unsigned int index;
>>
>> if (samplesize == 0) return 0.0; /* catch obvious error */
>>
>> /* compute mean of sample population */
>> for (index = 0, sum = 0.0 ; index < samplesize; ++index)
>> sum += population[index];
>> mean = sum / samplesize;
>>
>> /* compute variances */
>> for (index = 0, sum = 0.0 ; index < samplesize; ++index)
>> {
>> double delta;
>>
>> delta = population[index] - mean;
>> sum += (delta * delta);

>
> Is this really saying sum=sum+(delta*delta);

Yes

> And the parenthsis is for precedence?

Not really. The parenthesis here are a visual cue to the programmer. They
are unnecessary for the logic; the expression would compute the same
without the parenthesis.

>> }
>> return sqrt(sum/samplesize); /* standard deviation */
>> }
>>
>> /*
>> ** Population values taken from the Wikipedia example
>> */
>> int main(void)
>> {
>> double pop[] = {2,4,4,4,5,5,7,9};
>> unsigned int popsize = (sizeof(pop) / sizeof(pop[0]));

>
> Is the above code the standard thing to use if you have an array and
> really don't want to count the number of elements? Using sizeof?

(sizeof(array) / sizeof(array[0])) is a fairly standard way to determine the
number of elements in an array. You could call it an idiom.

>> printf("The standard deviation = %f\n",StdDev(popsize,pop));
>>
>> return EXIT_SUCCESS;
>> }
>>
>> When I compile and run this code
>> \$ cc -lm -o stddev stddev.c
>> \$ stddev
>> The standard deviation = 2.000000
>> \$
>> I get the same Standard Deviation value as the Wikipedia article's
>> example
>>
>> HTH

>
>

--
Lew Pitcher
Master Codewright & JOAT-in-training | Registered Linux User #112576
Me: http://pitcher.digitalfreehold.ca/ | Just Linux: http://justlinux.ca/
---------- Slackware - Because I know what I'm doing. ------

Lew Pitcher, Jun 5, 2011
7. ### FredGuest

On Jun 5, 12:22 pm, Lew Pitcher <> wrote:
> On June 5, 2011 14:57, in comp.lang.c, wrote:
>
>
>
>
>
> > On June 5, 2011 14:25, in comp.lang.c, wrote:

>
> >>     I have some code here and a snippet of unfinished, untested code
> >>     which
> >> is an attempt at a function called stddev. This is of course meant to
> >> calculate a standard deviation.

> > [snip]
> >> double stddev(double mean, double *prices)
> >> {
> >>     double price = 0.0;
> >>     int i = 0;
> >>     for (; i < prices; ++i) {
> >>  if (prices > mean) {
> >>      price = prices - mean;
> >>      return prices;
> >>  } else if (prices < mean) {
> >>      price = mean - prices;
> >>      return prices;
> >>  }

>
> FWIW, from the algorithm and data given on the Wikipedia page, I coded this
>
>   #include <stdio.h>
>   #include <stdlib.h>
>   #include <math.h>
>
>   double StdDev(unsigned int samplesize, double population[])
>   {
>     unsigned int index;
>
>     if (samplesize == 0) return 0.0;    /* catch obvious error */
>
>     /* compute mean of sample population */
>     for (index = 0, sum = 0.0 ; index < samplesize; ++index)
>       sum += population[index];
>     mean = sum / samplesize;
>
>     /* compute variances */
>     for (index = 0, sum = 0.0 ; index < samplesize; ++index)
>     {
>       double delta;
>
>       delta = population[index] - mean;
>       sum += (delta * delta);
>     }
>     return sqrt(sum/samplesize); /* standard deviation */
>   }
>
>   /*
>   ** Population values taken from the Wikipedia example
>   */
>   int main(void)
>   {
>     double pop[] = {2,4,4,4,5,5,7,9};
>     unsigned int popsize = (sizeof(pop) / sizeof(pop[0]));
>
>     printf("The standard deviation = %f\n",StdDev(popsize,pop));
>
>     return EXIT_SUCCESS;
>   }
>
> When I compile and run this code
>   \$ cc -lm -o stddev stddev.c
>   \$ stddev
>   The standard deviation = 2.000000
>   \$
> I get the same Standard Deviation value as the Wikipedia article's example
>

The above algorithm, while mathematically correct, is not good enough
for a computer. If your population is very large, or the individual
items in the population vary greatly in magnitude, you may run into
severe truncation and roundoff errors.

A more accurate way is to include the Leveque computational
correction,
computing the variance as:

var = { sum[(x - mean)^2] - (1/n)*sum[(x - mean) } / (n-1)
then stddev = sqrt(var)

Note that you computing the mean is not really as simple as summing
the
items and dividing by the number of items. What happens on a 32-bit
machine if the first item is of magnitude 10^18, followed by 10^20
items that are of magnitude 1? None of the latter items will
magnitude in error.
--
Fred K

Fred, Jun 6, 2011
8. ### FredGuest

On Jun 6, 7:39 am, Fred <> wrote:
> On Jun 5, 12:22 pm, Lew Pitcher <> wrote:
>
>
>
>
>
> > On June 5, 2011 14:57, in comp.lang.c, wrote:

>
> > > On June 5, 2011 14:25, in comp.lang.c, wrote:

>
> > >>     I have some code here and a snippet of unfinished, untested code
> > >>     which
> > >> is an attempt at a function called stddev. This is of course meant to
> > >> calculate a standard deviation.
> > > [snip]
> > >> double stddev(double mean, double *prices)
> > >> {
> > >>     double price = 0.0;
> > >>     int i = 0;
> > >>     for (; i < prices; ++i) {
> > >>  if (prices > mean) {
> > >>      price = prices - mean;
> > >>      return prices;
> > >>  } else if (prices < mean) {
> > >>      price = mean - prices;
> > >>      return prices;
> > >>  }

>
> > FWIW, from the algorithm and data given on the Wikipedia page, I coded this

>
> >   #include <stdio.h>
> >   #include <stdlib.h>
> >   #include <math.h>

>
> >   double StdDev(unsigned int samplesize, double population[])
> >   {
> >     double sum, mean, spread;
> >     unsigned int index;

>
> >     if (samplesize == 0) return 0.0;    /* catch obvious error */

>
> >     /* compute mean of sample population */
> >     for (index = 0, sum = 0.0 ; index < samplesize; ++index)
> >       sum += population[index];
> >     mean = sum / samplesize;

>
> >     /* compute variances */
> >     for (index = 0, sum = 0.0 ; index < samplesize; ++index)
> >     {
> >       double delta;

>
> >       delta = population[index] - mean;
> >       sum += (delta * delta);
> >     }
> >     return sqrt(sum/samplesize); /* standard deviation */
> >   }

>
> >   /*
> >   ** Population values taken from the Wikipedia example
> >   */
> >   int main(void)
> >   {
> >     double pop[] = {2,4,4,4,5,5,7,9};
> >     unsigned int popsize = (sizeof(pop) / sizeof(pop[0]));

>
> >     printf("The standard deviation = %f\n",StdDev(popsize,pop));

>
> >     return EXIT_SUCCESS;
> >   }

>
> > When I compile and run this code
> >   \$ cc -lm -o stddev stddev.c
> >   \$ stddev
> >   The standard deviation = 2.000000
> >   \$
> > I get the same Standard Deviation value as the Wikipedia article's example

>
> The above algorithm, while mathematically correct, is not good enough
> for a computer. If your population is very large, or the individual
> items in the population vary greatly in magnitude, you may run into
> severe truncation and roundoff errors.
>
> A more accurate way is to include the Leveque computational
> correction,
> computing the variance as:
>
> var = { sum[(x - mean)^2] - (1/n)*sum[(x - mean) } / (n-1)
> then stddev = sqrt(var)

Oops, missing a square. The variance with Leveque correction is

{ sum[(x - mean)^2] - (1/n)* sum[(x - mean)]^2 } / (n-1)

i.e., in the first term you sum the squares of x-mean,
and in the second term you square the sum of x-mean

See the Stanford Computer Science report by Chan, Golub, and Leveque

> Note that you computing the mean is not really as simple as summing
> the
> items and dividing by the number of items. What happens on a 32-bit
> machine if the first item is of magnitude 10^18, followed by 10^20
> items that are of magnitude 1? None of the latter items will
> magnitude in error.

-- Fred K

Fred, Jun 6, 2011
9. ### NobodyGuest

On Mon, 06 Jun 2011 07:39:47 -0700, Fred wrote:

> The above algorithm, while mathematically correct, is not good enough
> for a computer. If your population is very large, or the individual
> items in the population vary greatly in magnitude, you may run into
> severe truncation and roundoff errors.
>
> A more accurate way is to include the Leveque computational
> correction,

While that may be true, it's a minor detail given the amount of software
I've seen which uses the single-pass algorithm:

var := (sum(x^2) - sum(x)^2/n)/n

This can be rather inaccurate if the standard deviation is small compared
to the mean (i.e. the data has a relatively large constant offset).

Nobody, Jun 6, 2011
10. ### Dann CorbitGuest

Dann Corbit, Jun 7, 2011
11. ### FredGuest

On Jun 6, 7:17 pm, Dann Corbit <> wrote:
> In article <46b643da-c836-45ce-9348-fc8758139b33
> {snip}
> The Welford method cited here is quite good numerically.  It is the
> method that I use:http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance

And note that it cites the article by LeVeque et. al.
--
FredK

Fred, Jun 7, 2011
12. ### Bill CunninghamGuest

Dann Corbit wrote:
> In article <46b643da-c836-45ce-9348-fc8758139b33
> {snip}
> The Welford method cited here is quite good numerically. It is the
> method that I use:
> http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance

What kind of language is that in the article?

Bill

Bill Cunningham, Jun 7, 2011
13. ### Lew PitcherGuest

On June 7, 2011 12:00, in comp.lang.c, d wrote:

> Dann Corbit wrote:
>> In article <46b643da-c836-45ce-9348-fc8758139b33
>> {snip}
>> The Welford method cited here is quite good numerically. It is the
>> method that I use:
>> http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance

>
> What kind of language is that in the article?

It's not a formal computer language; it is pseudo-code. Pseudo-code is a
programming-language like way to express algorithms.

--
Lew Pitcher
Master Codewright & JOAT-in-training | Registered Linux User #112576
Me: http://pitcher.digitalfreehold.ca/ | Just Linux: http://justlinux.ca/
---------- Slackware - Because I know what I'm doing. ------

Lew Pitcher, Jun 7, 2011
14. ### Keith ThompsonGuest

Lew Pitcher <> writes:
> On June 7, 2011 12:00, in comp.lang.c, d wrote:
>> Dann Corbit wrote:
>>> In article <46b643da-c836-45ce-9348-fc8758139b33
>>> {snip}
>>> The Welford method cited here is quite good numerically. It is the
>>> method that I use:
>>> http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance

>>
>> What kind of language is that in the article?

>
> It's not a formal computer language; it is pseudo-code. Pseudo-code is a
> programming-language like way to express algorithms.

No, it's not pseudo-code, it's Python. (It's odd that the article
never mentions that face.)

Bill, I've suggested to you before that Python might be a better
language for you than C. I now repeat that suggestion.

--
Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
Nokia
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"

Keith Thompson, Jun 7, 2011
15. ### Bill CunninghamGuest

"Keith Thompson" <> wrote in message
news:...

> Bill, I've suggested to you before that Python might be a better
> language for you than C. I now repeat that suggestion.

Oh why? It doesn't look sensible to me at all. C++ or java might be more
understandable than python. Even perl.

Bill

Bill Cunningham, Jun 9, 2011
16. ### Keith ThompsonGuest

"Bill Cunningham" <> writes:
> "Keith Thompson" <> wrote in message
> news:...
>
>> Bill, I've suggested to you before that Python might be a better
>> language for you than C. I now repeat that suggestion.

>
> Oh why? It doesn't look sensible to me at all. C++ or java might be more
> understandable than python. Even perl.

Because the things that have been causing you grief in C all these years
are, to large extent, things that you wouldn't have to worry about in
Python.

Seriously, does C "look sensible" to you?

--
Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
Nokia
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"

Keith Thompson, Jun 9, 2011
17. ### Bill CunninghamGuest

Keith Thompson wrote:

> Because the things that have been causing you grief in C all these
> years are, to large extent, things that you wouldn't have to worry
>
> Seriously, does C "look sensible" to you?

Not really serious stuff no. But I am learning functions. I can use
those real well. Syntactic things seem to be complicated in C. Quite a bit
in C doesn't look sensible to me actually.

Bill

Bill Cunningham, Jun 9, 2011
18. ### NobodyGuest

On Tue, 07 Jun 2011 09:24:49 -0700, Keith Thompson wrote:

> Bill, I've suggested to you before that Python might be a better language
> for you than C. I now repeat that suggestion.

Python has a few pitfalls of its own. Probably the most common one is
the fact that everything is passed by reference.

Nobody, Jun 9, 2011
19. ### Bill CunninghamGuest

Keith Thompson wrote:

> Seriously, does C "look sensible" to you?

I'm just a humble hobbyist. I do want to learn C or even C++ if I have
to go to higher level things. The tutorials that I've studied some most
basic things about C. Algorithms and ways to do things seems to be an
altogether different matter. They don't seem to teach that in tutorials.

Bill

Bill Cunningham, Jun 9, 2011
20. ### Michael PressGuest

In article
<>,
Fred <> wrote:

> Oops, missing a square. The variance with Leveque correction is
>
>
> { sum[(x - mean)^2] - (1/n)* sum[(x - mean)]^2 } / (n-1)
>
> i.e., in the first term you sum the squares of x-mean,
> and in the second term you square the sum of x-mean

Looks hinky. Perhaps

{ sum[(x - mean)^2] - (1/n)* [sum(x - mean)]^2 } / (n-1)

> See the Stanford Computer Science report by Chan, Golub, and Leveque

--
Michael Press

Michael Press, Jun 10, 2011