standard deviation

  • Thread starter Bill Cunningham
  • Start date
B

Bill Cunningham

I have some code here and a snippet of unfinished, untested code which
is an attempt at a function called stddev. This is of course meant to
calculate a standard deviation. I am trying to build small helper functions
that can be built into tech analysis tools. Something I've been attempting
and thinking about for a long time. stddev's first parameter is passed the
return value of the function mean(). It may not need a second parameter but
this is what I have so far. stddev needs to do the following things.
1) find the difference in prices from mean. Whether negative of positive
numbers.
2) square those numbers
3) sum those squares
4) calculate the square of the total from 3 above.

header called "tech.h"

#include <stdio.h>
#include <stdlib.h>
#ifdef M
#include <math.h>
#endif

double mean(double *, int);
double stddev(double, double *);

mean.c

#include "tech.h"

double mean(double *avg, int num)
{
double sum, average;
int i;
sum = average = 0;
for (i = 0; i < num; ++i) {
sum = sum + avg;
average = sum / num;
}
return average;
}

stddev.c /*the attempt*/

#include "tech.h"

double stddev(double mean, double *prices)
{
double price = 0.0;
int i = 0;
for (; i < prices; ++i) {
if (prices > mean) {
price = prices - mean;
return prices;
} else if (prices < mean) {
price = mean - prices;
return prices;
}

I really have no way to code this but I don't want anyone to do my
homework. Can someone offer tips or citations as to what I might need to do
here?

Bill
 
L

Lew Pitcher

I have some code here and a snippet of unfinished, untested code which
is an attempt at a function called stddev. This is of course meant to
calculate a standard deviation. [snip]
double stddev(double mean, double *prices)
{
double price = 0.0;
int i = 0;
for (; i < prices; ++i) {
if (prices > mean) {
price = prices - mean;
return prices;
} else if (prices < mean) {
price = mean - prices;
return prices;
}

I really have no way to code this but I don't want anyone to do my
homework. Can someone offer tips or citations as to what I might need to
do here?


Sorry, Bill, but your code doesn't really reflect the accepted way that you
calculate standard deviation. I'm not mathematician enough to tell whether
you've written equivalent code or not, so I'll just assume that your code
isn't correct, and move on.

I suggest that you read the first few paragraphs of the Wikipedia article on
Standard Deviation, especially start of the "Basic Examples" section
(http://en.wikipedia.org/wiki/Standard_deviation#Basic_examples)
There, you'll find an excellent algorithm for calculating standard deviation
that is easily transformable into C code.

Let me summarize their algorithm:
1) Compute the mean of the population
2) For each element of the population,
2a) compute the difference between the element and the mean.
2b) square this value
2c) call this new value the "variance"
3) Find the mean of the variances (sum them, then divide by the # of
variances)
4) Compute the square root of this sum of the mean of the variances

This square root is the "standard deviation"
 
K

Kleuskes & Moos

    I have some code here and a snippet of unfinished, untested code which
is an attempt at a function called stddev. This is of course meant to
calculate a standard deviation. I am trying to build small helper functions
that can be built into tech analysis tools. Something I've been attempting
and thinking about for a long time. stddev's first parameter is passed the
return value of the function mean(). It may not need a second parameter but
this is what I have so far. stddev needs to do the following things.
1) find the difference in prices from mean. Whether negative of positive
numbers.
2) square those numbers
3) sum those squares
4) calculate the square of the total from 3 above.

header called "tech.h"

#include <stdio.h>
#include <stdlib.h>
#ifdef M
#include <math.h>
#endif

double mean(double *, int);
double stddev(double, double *);

mean.c

#include "tech.h"

double mean(double *avg, int num)
{
    double sum, average;
    int i;
    sum = average = 0;
    for (i = 0; i < num; ++i) {
 sum = sum + avg;
 average = sum / num;
    }
    return average;

}

stddev.c /*the attempt*/

#include "tech.h"

double stddev(double mean, double *prices)
{
    double price = 0.0;
    int i = 0;
    for (; i < prices; ++i) {
 if (prices > mean) {
     price = prices - mean;
     return prices;
 } else if (prices < mean) {
     price = mean - prices;
     return prices;
 }

    I really have no way to code this but I don't want anyone to do my
homework. Can someone offer tips or citations as to what I might need to do
here?

Bill


Erwin Kreyszig has a pretty good rundown in 'Introduction to
mathematical statistics, principles and methods' section 3.2 and 3.3.
It used to be pretty standard when i was in college, so i guess it
should still be available in the library.
 
L

Lew Pitcher

I have some code here and a snippet of unfinished, untested code
which
is an attempt at a function called stddev. This is of course meant to
calculate a standard deviation. [snip]
double stddev(double mean, double *prices)
{
double price = 0.0;
int i = 0;
for (; i < prices; ++i) {
if (prices > mean) {
price = prices - mean;
return prices;
} else if (prices < mean) {
price = mean - prices;
return prices;
}


FWIW, from the algorithm and data given on the Wikipedia page, I coded this

#include <stdio.h>
#include <stdlib.h>
#include <math.h>

double StdDev(unsigned int samplesize, double population[])
{
double sum, mean, spread;
unsigned int index;

if (samplesize == 0) return 0.0; /* catch obvious error */

/* compute mean of sample population */
for (index = 0, sum = 0.0 ; index < samplesize; ++index)
sum += population[index];
mean = sum / samplesize;

/* compute variances */
for (index = 0, sum = 0.0 ; index < samplesize; ++index)
{
double delta;

delta = population[index] - mean;
sum += (delta * delta);
}
return sqrt(sum/samplesize); /* standard deviation */
}

/*
** Population values taken from the Wikipedia example
*/
int main(void)
{
double pop[] = {2,4,4,4,5,5,7,9};
unsigned int popsize = (sizeof(pop) / sizeof(pop[0]));

printf("The standard deviation = %f\n",StdDev(popsize,pop));

return EXIT_SUCCESS;
}

When I compile and run this code
$ cc -lm -o stddev stddev.c
$ stddev
The standard deviation = 2.000000
$
I get the same Standard Deviation value as the Wikipedia article's example

HTH
 
B

Bill Cunningham

Lew Pitcher wrote:

[snip]
FWIW, from the algorithm and data given on the Wikipedia page, I
coded this

#include <stdio.h>
#include <stdlib.h>
#include <math.h>

double StdDev(unsigned int samplesize, double population[])
{
double sum, mean, spread;
unsigned int index;

if (samplesize == 0) return 0.0; /* catch obvious error */

/* compute mean of sample population */
for (index = 0, sum = 0.0 ; index < samplesize; ++index)
sum += population[index];
mean = sum / samplesize;

/* compute variances */
for (index = 0, sum = 0.0 ; index < samplesize; ++index)
{
double delta;

delta = population[index] - mean;
sum += (delta * delta);

Is this really saying sum=sum+(delta*delta);
And the parenthsis is for precedence?
}
return sqrt(sum/samplesize); /* standard deviation */
}

/*
** Population values taken from the Wikipedia example
*/
int main(void)
{
double pop[] = {2,4,4,4,5,5,7,9};
unsigned int popsize = (sizeof(pop) / sizeof(pop[0]));

Is the above code the standard thing to use if you have an array and
really don't want to count the number of elements? Using sizeof?
 
L

Lew Pitcher

Lew Pitcher wrote:

[snip]
FWIW, from the algorithm and data given on the Wikipedia page, I
coded this

#include <stdio.h>
#include <stdlib.h>
#include <math.h>

double StdDev(unsigned int samplesize, double population[])
{
double sum, mean, spread;
unsigned int index;

if (samplesize == 0) return 0.0; /* catch obvious error */

/* compute mean of sample population */
for (index = 0, sum = 0.0 ; index < samplesize; ++index)
sum += population[index];
mean = sum / samplesize;

/* compute variances */
for (index = 0, sum = 0.0 ; index < samplesize; ++index)
{
double delta;

delta = population[index] - mean;
sum += (delta * delta);

Is this really saying sum=sum+(delta*delta);
Yes

And the parenthsis is for precedence?

Not really. The parenthesis here are a visual cue to the programmer. They
are unnecessary for the logic; the expression would compute the same
without the parenthesis.
}
return sqrt(sum/samplesize); /* standard deviation */
}

/*
** Population values taken from the Wikipedia example
*/
int main(void)
{
double pop[] = {2,4,4,4,5,5,7,9};
unsigned int popsize = (sizeof(pop) / sizeof(pop[0]));

Is the above code the standard thing to use if you have an array and
really don't want to count the number of elements? Using sizeof?

(sizeof(array) / sizeof(array[0])) is a fairly standard way to determine the
number of elements in an array. You could call it an idiom.
 
F

Fred

On June 5, 2011 14:25, in comp.lang.c, (e-mail address removed) wrote:
    I have some code here and a snippet of unfinished, untested code
    which
is an attempt at a function called stddev. This is of course meant to
calculate a standard deviation. [snip]
double stddev(double mean, double *prices)
{
    double price = 0.0;
    int i = 0;
    for (; i < prices; ++i) {
 if (prices > mean) {
     price = prices - mean;
     return prices;
 } else if (prices < mean) {
     price = mean - prices;
     return prices;
 }


FWIW, from the algorithm and data given on the Wikipedia page, I coded this

  #include <stdio.h>
  #include <stdlib.h>
  #include <math.h>

  double StdDev(unsigned int samplesize, double population[])
  {
    double sum, mean, spread;
    unsigned int index;

    if (samplesize == 0) return 0.0;    /* catch obvious error */

    /* compute mean of sample population */
    for (index = 0, sum = 0.0 ; index < samplesize; ++index)
      sum += population[index];
    mean = sum / samplesize;

    /* compute variances */
    for (index = 0, sum = 0.0 ; index < samplesize; ++index)
    {
      double delta;

      delta = population[index] - mean;
      sum += (delta * delta);
    }
    return sqrt(sum/samplesize); /* standard deviation */
  }

  /*
  ** Population values taken from the Wikipedia example
  */
  int main(void)
  {
    double pop[] = {2,4,4,4,5,5,7,9};
    unsigned int popsize = (sizeof(pop) / sizeof(pop[0]));

    printf("The standard deviation = %f\n",StdDev(popsize,pop));

    return EXIT_SUCCESS;
  }

When I compile and run this code
  $ cc -lm -o stddev stddev.c
  $ stddev
  The standard deviation = 2.000000
  $
I get the same Standard Deviation value as the Wikipedia article's example


The above algorithm, while mathematically correct, is not good enough
for a computer. If your population is very large, or the individual
items in the population vary greatly in magnitude, you may run into
severe truncation and roundoff errors.

A more accurate way is to include the Leveque computational
correction,
computing the variance as:

var = { sum[(x - mean)^2] - (1/n)*sum[(x - mean) } / (n-1)
then stddev = sqrt(var)

Note that you computing the mean is not really as simple as summing
the
items and dividing by the number of items. What happens on a 32-bit
machine if the first item is of magnitude 10^18, followed by 10^20
items that are of magnitude 1? None of the latter items will
contribute to your sum, and your answer will be a couple of orders of
magnitude in error.
 
F

Fred

On June 5, 2011 14:57, in comp.lang.c, (e-mail address removed) wrote:
On June 5, 2011 14:25, in comp.lang.c, (e-mail address removed) wrote:
    I have some code here and a snippet of unfinished, untested code
    which
is an attempt at a function called stddev. This is of course meant to
calculate a standard deviation.
[snip]
double stddev(double mean, double *prices)
{
    double price = 0.0;
    int i = 0;
    for (; i < prices; ++i) {
 if (prices > mean) {
     price = prices - mean;
     return prices;
 } else if (prices < mean) {
     price = mean - prices;
     return prices;
 }

FWIW, from the algorithm and data given on the Wikipedia page, I coded this
  #include <stdio.h>
  #include <stdlib.h>
  #include <math.h>
  double StdDev(unsigned int samplesize, double population[])
  {
    double sum, mean, spread;
    unsigned int index;
    if (samplesize == 0) return 0.0;    /* catch obvious error */
    /* compute mean of sample population */
    for (index = 0, sum = 0.0 ; index < samplesize; ++index)
      sum += population[index];
    mean = sum / samplesize;
    /* compute variances */
    for (index = 0, sum = 0.0 ; index < samplesize; ++index)
    {
      double delta;
      delta = population[index] - mean;
      sum += (delta * delta);
    }
    return sqrt(sum/samplesize); /* standard deviation */
  }
  /*
  ** Population values taken from the Wikipedia example
  */
  int main(void)
  {
    double pop[] = {2,4,4,4,5,5,7,9};
    unsigned int popsize = (sizeof(pop) / sizeof(pop[0]));
    printf("The standard deviation = %f\n",StdDev(popsize,pop));
    return EXIT_SUCCESS;
  }
When I compile and run this code
  $ cc -lm -o stddev stddev.c
  $ stddev
  The standard deviation = 2.000000
  $
I get the same Standard Deviation value as the Wikipedia article's example

The above algorithm, while mathematically correct, is not good enough
for a computer. If your population is very large, or the individual
items in the population vary greatly in magnitude, you may run into
severe truncation and roundoff errors.

A more accurate way is to include the Leveque computational
correction,
computing the variance as:

var = { sum[(x - mean)^2] - (1/n)*sum[(x - mean) } / (n-1)
then stddev = sqrt(var)




Oops, missing a square. The variance with Leveque correction is


{ sum[(x - mean)^2] - (1/n)* sum[(x - mean)]^2 } / (n-1)

i.e., in the first term you sum the squares of x-mean,
and in the second term you square the sum of x-mean

See the Stanford Computer Science report by Chan, Golub, and Leveque

Note that you computing the mean is not really as simple as summing
the
items and dividing by the number of items. What happens on a 32-bit
machine if the first item is of magnitude 10^18, followed by 10^20
items that are of magnitude 1? None of the latter items will
contribute to your sum, and your answer will be a couple of orders of
magnitude in error.

-- Fred K
 
N

Nobody

The above algorithm, while mathematically correct, is not good enough
for a computer. If your population is very large, or the individual
items in the population vary greatly in magnitude, you may run into
severe truncation and roundoff errors.

A more accurate way is to include the Leveque computational
correction,

While that may be true, it's a minor detail given the amount of software
I've seen which uses the single-pass algorithm:

var := (sum(x^2) - sum(x)^2/n)/n

This can be rather inaccurate if the standard deviation is small compared
to the mean (i.e. the data has a relatively large constant offset).
 
L

Lew Pitcher

What kind of language is that in the article?

It's not a formal computer language; it is pseudo-code. Pseudo-code is a
programming-language like way to express algorithms.
 
K

Keith Thompson

Lew Pitcher said:
It's not a formal computer language; it is pseudo-code. Pseudo-code is a
programming-language like way to express algorithms.

No, it's not pseudo-code, it's Python. (It's odd that the article
never mentions that face.)

Bill, I've suggested to you before that Python might be a better
language for you than C. I now repeat that suggestion.
 
B

Bill Cunningham

Bill, I've suggested to you before that Python might be a better
language for you than C. I now repeat that suggestion.

Oh why? It doesn't look sensible to me at all. C++ or java might be more
understandable than python. Even perl.

Bill
 
K

Keith Thompson

Bill Cunningham said:
Oh why? It doesn't look sensible to me at all. C++ or java might be more
understandable than python. Even perl.

Because the things that have been causing you grief in C all these years
are, to large extent, things that you wouldn't have to worry about in
Python.

Seriously, does C "look sensible" to you?
 
B

Bill Cunningham

Keith said:
Because the things that have been causing you grief in C all these
years are, to large extent, things that you wouldn't have to worry
about in Python.

Seriously, does C "look sensible" to you?

Not really serious stuff no. But I am learning functions. I can use
those real well. Syntactic things seem to be complicated in C. Quite a bit
in C doesn't look sensible to me actually.

Bill
 
N

Nobody

Bill, I've suggested to you before that Python might be a better language
for you than C. I now repeat that suggestion.

Python has a few pitfalls of its own. Probably the most common one is
the fact that everything is passed by reference.
 
B

Bill Cunningham

Keith said:
Seriously, does C "look sensible" to you?

I'm just a humble hobbyist. I do want to learn C or even C++ if I have
to go to higher level things. The tutorials that I've studied some most
basic things about C. Algorithms and ways to do things seems to be an
altogether different matter. They don't seem to teach that in tutorials.

Bill
 
M

Michael Press

Fred said:
Oops, missing a square. The variance with Leveque correction is


{ sum[(x - mean)^2] - (1/n)* sum[(x - mean)]^2 } / (n-1)

i.e., in the first term you sum the squares of x-mean,
and in the second term you square the sum of x-mean


Looks hinky. Perhaps

{ sum[(x - mean)^2] - (1/n)* [sum(x - mean)]^2 } / (n-1)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Staff online

Members online

Forum statistics

Threads
473,769
Messages
2,569,577
Members
45,054
Latest member
LucyCarper

Latest Threads

Top