# Good binary PRNG

Discussion in 'C Programming' started by pinkfloydhomer@gmail.com, May 15, 2006.

1. ### Guest

Using rand() in and old version og gcc, and using Tausworth's method, I
calculated the frequency of 0 or 1 in the first digit like this:

int hist[2] = {0,0};

for (i = 0; i < 100000; ++i)
{
++hist[prng() % 2];
}

For some reason, in both cases, 0 occurs more often than 1, no matter
what seed I use to initialize the prng with. Something like 50150 vs
49850 in the above code.

Does anybody know of a good "binary" PRNG, that is, one that in the
above case would make something closer to 50000 vs 50000?

/David

, May 15, 2006

2. ### Flash GordonGuest

wrote:
> Using rand() in and old version og gcc, and using Tausworth's method, I
> calculated the frequency of 0 or 1 in the first digit like this:
>
> int hist[2] = {0,0};
>
> for (i = 0; i < 100000; ++i)
> {
> ++hist[prng() % 2];
> }
>
> For some reason, in both cases, 0 occurs more often than 1, no matter
> what seed I use to initialize the prng with. Something like 50150 vs
> 49850 in the above code.
>
> Does anybody know of a good "binary" PRNG, that is, one that in the
> above case would make something closer to 50000 vs 50000?

I suggest you start with the comp.lang.c FAQ which as question 13.15 has
"I need a random number generator". You can find the FAQ at
http://c-faq.com/
--
Flash Gordon, living in interesting times.
Web site - http://home.flash-gordon.me.uk/
comp.lang.c posting guidelines and intro:
http://clc-wiki.net/wiki/Intro_to_clc

Flash Gordon, May 15, 2006

3. ### Rod PembertonGuest

<> wrote in message
news:...
> Using rand() in and old version og gcc, and using Tausworth's method, I
> calculated the frequency of 0 or 1 in the first digit like this:
>
> int hist[2] = {0,0};
>
> for (i = 0; i < 100000; ++i)
> {
> ++hist[prng() % 2];
> }
>
> For some reason, in both cases, 0 occurs more often than 1, no matter
> what seed I use to initialize the prng with. Something like 50150 vs
> 49850 in the above code.
>
> Does anybody know of a good "binary" PRNG, that is, one that in the
> above case would make something closer to 50000 vs 50000?
>

Many PRNG's (Mersienne Twister, ISO C, etc.) seem to hit a binary "sweet
spot" after calling rand() about 7 million times.

Try the following code with your compiler or PRNG:

#include <stdio.h>
#include <stdlib.h>
#include <time.h>

unsigned long hist[2] = {0,0};

int main(void)
{
unsigned long i, value;

srand(NULL);

for (i = 0; i < 7000000; ++i)
{
rand();
}
for (i = 0; i < 100000; ++i)
{
value=rand();
++hist[value%2];
}
printf("%ld %ld\n",hist[0],hist[1]);

return(EXIT_SUCCESS);
}

Rod Pemberton

Rod Pemberton, May 18, 2006
4. ### Keith ThompsonGuest

"Rod Pemberton" <> writes:
[...]
> srand(NULL);

srand(time(NULL));

--
Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.

Keith Thompson, May 18, 2006
5. ### peteGuest

Keith Thompson wrote:
>
> "Rod Pemberton" <> writes:
> [...]
> > srand(NULL);

>
> srand(time(NULL));

A cast would supress warnings that might be generated
if time_t is higher ranking than unsigned.

srand((unsigned)time(NULL));

--
pete

pete, May 18, 2006
6. ### Ben PfaffGuest

Keith Thompson <> writes:

> "Rod Pemberton" <> writes:
> [...]
>> srand(NULL);

>
> srand(time(NULL));

This used to come up fairly often, so I have a stock answer on
it:

By default, the C random number generator produces the same
sequence every time the program is run. In order to generate
different sequences, it has to be "seeded" using srand() with a
unique value. The function below to do this is carefully
designed. It uses time() to obtain the current time; the
alternative clock() is a poor choice because it measures CPU time
used, which is often more or less constant among runs. The
actual value of a time_t is not portable, so it computes a "hash"
of the bytes in it using a multiply-and-add technique. The
factor used for multiplication normally comes out as 257, a prime
and therefore a good candidate.

References: Knuth, _The Art of Computer Programming, Vol. 2:
Seminumerical Algorithms_, section 3.2.1; Aho, Sethi, and Ullman,
_Compilers: Principles, Techniques, and Tools_, section 7.6.

#include <limits.h>
#include <stdlib.h>
#include <time.h>

/* Choose and return an initial random seed based on the current time.
Based on code by Lawrence Kirby <>.
Usage: srand (time_seed ()); */
unsigned
time_seed (void)
{
time_t timeval; /* Current time. */
unsigned char *ptr; /* Type punned pointed into timeval. */
unsigned seed; /* Generated seed. */
size_t i;

timeval = time (NULL);
ptr = (unsigned char *) &timeval;

seed = 0;
for (i = 0; i < sizeof timeval; i++)
seed = seed * (UCHAR_MAX + 2u) + ptr;

return seed;
}

--
"Large amounts of money tend to quench any scruples I might be having."
-- Stephan Wilms

Ben Pfaff, May 18, 2006
7. ### Eric SosmanGuest

wrote:
> Using rand() in and old version og gcc, and using Tausworth's method, I
> calculated the frequency of 0 or 1 in the first digit like this:
>
> int hist[2] = {0,0};
>
> for (i = 0; i < 100000; ++i)
> {
> ++hist[prng() % 2];
> }
>
> For some reason, in both cases, 0 occurs more often than 1, no matter
> what seed I use to initialize the prng with. Something like 50150 vs
> 49850 in the above code.
>
> Does anybody know of a good "binary" PRNG, that is, one that in the
> above case would make something closer to 50000 vs 50000?

Note that the 50150 you mention is less than one-sixth
of one percent greater than a perfect fifty-fifty split. How
much "closer" do you think you need to be? How much "closer"
do you think a random sequence *ought* to be?

Let's see: If the outcomes (1 or 0) are equiprobable and
you make 100000 independent trials, the probability of getting
exactly 50000 of each outcome is the binomial probability

100000!
--------------- * 2^(-100000)
50000! * 50000!

If I'm not mistaken (I might be; these numbers are BIG), this
comes to 0.0025231262141967+; you can only expect a "perfect"
split about once in four hundred trials of a hundred thousand
numbers each. How many times have you run your experiment?

Let's see, take two: The expected number of 1's is 50000.
The variance is 100000*0.5*0.5 = 25000, so the standard
deviation is a little more than 158. You report a discrepancy
of 150, which is less than one standard deviation away from
a "perfect" split. I ask again: How much closer do you think
a random sequence can be while still being "random?"

Flip a coin twice. If get something other than heads once
and tails once, do you conclude that the coin is biased?

--
Eric Sosman
lid

Eric Sosman, May 18, 2006
8. ### Keith ThompsonGuest

Eric Sosman <> writes:
> wrote:
>> Using rand() in and old version og gcc, and using Tausworth's method, I
>> calculated the frequency of 0 or 1 in the first digit like this:
>> int hist[2] = {0,0};
>> for (i = 0; i < 100000; ++i)
>> {
>> ++hist[prng() % 2];
>> }
>> For some reason, in both cases, 0 occurs more often than 1, no matter
>> what seed I use to initialize the prng with. Something like 50150 vs
>> 49850 in the above code.
>> Does anybody know of a good "binary" PRNG, that is, one that in the
>> above case would make something closer to 50000 vs 50000?

>
> Note that the 50150 you mention is less than one-sixth
> of one percent greater than a perfect fifty-fifty split. How
> much "closer" do you think you need to be? How much "closer"
> do you think a random sequence *ought* to be?

The difference very likely isn't significant, but it might be.

If a single run of 100000 trials differs from a 50/50 split by a
fraction of a percent, it's not just harmless, it's expected. But if
N runs of 100000 are *consistently* over 50%, there's probably a
systematic error. (I'm deliberately leaving the value of N vague.)

I suspect that the OP hasn't done enough runs to demonstrate an actual
bias, but I'm only guessing.

[...]

> Flip a coin twice. If get something other than heads once
> and tails once, do you conclude that the coin is biased?

No, but if I consistently get more heads than tails, it probably is
biased.

--
Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.

Keith Thompson, May 18, 2006
9. ### peteGuest

Keith Thompson wrote:

> If a single run of 100000 trials differs from a 50/50 split by a
> fraction of a percent, it's not just harmless, it's expected. But if
> N runs of 100000 are *consistently* over 50%, there's probably a
> systematic error. (I'm deliberately leaving the value of N vague.)
>
> I suspect that the OP hasn't done enough runs to demonstrate an actual
> bias, but I'm only guessing.
>
> [...]
>
> > Flip a coin twice. If get something other than heads once
> > and tails once, do you conclude that the coin is biased?

>
> No, but if I consistently get more heads than tails, it probably is
> biased.

If you consistently get heads and tails alternating,
that's not very good either.

1, 2300471611
0, 3253914372
1, 1858424313
0, 116628778
1, 2369750119
0, 3657615680
1, 2176380933
0, 894631110
1, 3876978771
0, 721092924

/* BEGIN new.c */

#include <stdio.h>

#define LU_RAND(S) ((S) * 69069 + 362437 & 0XFFFFFFFFLU)

int main(void)
{
long unsigned lu_seed = 12345678LU;
int x;

for (x = 0; x != 10; ++x) {
lu_seed = LU_RAND(lu_seed);
printf("%lu, %lu\n", lu_seed % 2, lu_seed);
}
return 0;
}

/* END new.c */

--
pete

pete, May 18, 2006
10. ### CBFalconerGuest

Keith Thompson wrote:
> Eric Sosman <> writes:
>

.... snip ...
>
>> Flip a coin twice. If get something other than heads once
>> and tails once, do you conclude that the coin is biased?

>
> No, but if I consistently get more heads than tails, it probably
> is biased.

Look up the chi-square test.

--
news:news.announce.newusers
http://www.geocities.com/nnqweb/
http://www.catb.org/~esr/faqs/smart-questions.html
http://www.caliburn.nl/topposting.html
http://www.netmeister.org/news/learn2quote.html

CBFalconer, May 18, 2006
11. ### Richard HeathfieldGuest

Keith Thompson said:

> Eric Sosman <> writes:
>
>> Flip a coin twice. If get something other than heads once
>> and tails once, do you conclude that the coin is biased?

>
> No, but if I consistently get more heads than tails, it probably is
> biased.

Not so. The "law of averages" does not work by evening out differences, but
by swamping them.

Coin-flipping is isomorphic to a random walk in one dimension. Start at 0.
If you toss heads, go right one step. If you toss tails, go left one step.
The most probable number of sign changes in a walk is actually 0. The next
most probable is 1, then 2, and so on. So consistently getting more heads
than tails does not necessarily, or even probably, mean that the coin is
biased.

--
Richard Heathfield
"Usenet is a strange place" - dmr 29/7/1999
http://www.cpax.org.uk
email: rjh at above domain (but drop the www, obviously)

Richard Heathfield, May 18, 2006
12. ### Jordan AbelGuest

On 2006-05-18, pete <> wrote:
> Keith Thompson wrote:
>
>> If a single run of 100000 trials differs from a 50/50 split by a
>> fraction of a percent, it's not just harmless, it's expected. But if
>> N runs of 100000 are *consistently* over 50%, there's probably a
>> systematic error. (I'm deliberately leaving the value of N vague.)
>>
>> I suspect that the OP hasn't done enough runs to demonstrate an actual
>> bias, but I'm only guessing.
>>
>> [...]
>>
>> > Flip a coin twice. If get something other than heads once
>> > and tails once, do you conclude that the coin is biased?

>>
>> No, but if I consistently get more heads than tails, it probably is
>> biased.

>
> If you consistently get heads and tails alternating,
> that's not very good either.
>
> 1, 2300471611
> 0, 3253914372
> 1, 1858424313
> 0, 116628778
> 1, 2369750119
> 0, 3657615680
> 1, 2176380933
> 0, 894631110
> 1, 3876978771
> 0, 721092924

Consistently heads and tails alternating would be
2863311530
2863311530
2863311530
2863311530
2863311530
2863311530
2863311530
2863311530

Yours is more like
HTTTHTTHTTTHHHHTTHHTHTTHTTHHHTHH HHTTTTTHHHHHTTHTHHTTHTHHTTTTTHTT
THHTHHHTHHTTTHTHTHTTHHTHHHHHHTTH TTTTTHHTHHHHTTHHHTTHHHTHTTHTHTHT

It's hardly the algorithm's fault you're only taking every 32nd result.
It may be its fault that it's biased in such a pattern, but it wouldn't
be as noticeable if you wouldn't use it that way.

>
>
> /* BEGIN new.c */
>
> #include <stdio.h>
>
> #define LU_RAND(S) ((S) * 69069 + 362437 & 0XFFFFFFFFLU)
>
> int main(void)
> {
> long unsigned lu_seed = 12345678LU;
> int x;
>
> for (x = 0; x != 10; ++x) {
> lu_seed = LU_RAND(lu_seed);
> printf("%lu, %lu\n", lu_seed % 2, lu_seed);
> }
> return 0;
>}
>
> /* END new.c */
>

Jordan Abel, May 18, 2006
13. ### Jordan AbelGuest

On 2006-05-18, Richard Heathfield <> wrote:
> Keith Thompson said:
>
>> Eric Sosman <> writes:
>>
>>> Flip a coin twice. If get something other than heads once
>>> and tails once, do you conclude that the coin is biased?

>>
>> No, but if I consistently get more heads than tails, it probably is
>> biased.

>
> Not so. The "law of averages" does not work by evening out differences, but
> by swamping them.
>
> Coin-flipping is isomorphic to a random walk in one dimension. Start at 0.
> If you toss heads, go right one step. If you toss tails, go left one step.
> The most probable number of sign changes in a walk is actually 0. The next
> most probable is 1, then 2, and so on. So consistently getting more heads
> than tails does not necessarily, or even probably, mean that the coin is
> biased.

But if you consistently get a particular _percentage_ more, it does.

[i.e. 501 in a trial run of 1000, 5032 in a trial run of 10000, 50290 in
a trial run of 1000000, etc] that _can_ be indicative of bias. So is an
average of 510 across a large number of independent runs of 1000. What
you're saying, i think, is that 510 in 1000, then a total of 5012 in
that and the next 9000, then a total of 50008 in that and the next
90000, doesn't indicate bias - and it doesn't. But that's not the only,
or even the most obvious, thing that "consistently getting more heads"
can mean.

[incidentally, you say the most probable number of sign changes is
actually 0 - that sounds a bit misleading - is 0 really more probable
than all other numbers of sign changes combined? What _is_ the
probability of there being 0 sign changes, in a random walk of arbitrary
length?]

Jordan Abel, May 18, 2006
14. ### Richard HeathfieldGuest

Jordan Abel said:

> On 2006-05-18, Richard Heathfield <> wrote:
>> So consistently getting
>> more heads than tails does not necessarily, or even probably, mean that
>> the coin is biased.

>
> But if you consistently get a particular _percentage_ more, it does.

Correct.

<snip>

> [incidentally, you say the most probable number of sign changes is
> actually 0 - that sounds a bit misleading - is 0 really more probable
> than all other numbers of sign changes combined?

I didn't mean to imply that, and I don't know whether or not it's true.

> What _is_ the
> probability of there being 0 sign changes, in a random walk of arbitrary
> length?]

Scary and hard to transcribe in text!

http://mathworld.wolfram.com/RandomWalk1-Dimensional.html may help the more
mathematical amongst us (i.e. not me) to get a clue. Failing that, I can
probably lay my hands on a less scary formula, given a few days to plough
through all my Ian Stewarts...

--
Richard Heathfield
"Usenet is a strange place" - dmr 29/7/1999
http://www.cpax.org.uk
email: rjh at above domain (but drop the www, obviously)

Richard Heathfield, May 18, 2006
15. ### Rod PembertonGuest

"pete" <> wrote in message
news:...
> Keith Thompson wrote:
> >
> > "Rod Pemberton" <> writes:
> > [...]
> > > srand(NULL);

> >
> > srand(time(NULL));

>
> A cast would supress warnings that might be generated
> if time_t is higher ranking than unsigned.
>
> srand((unsigned)time(NULL));

True. srand(time(NULL)) was unintentionally dropped during an edit.

My original statements:

"Many PRNG's (Mersienne Twister, ISO C, etc.) seem to hit a binary "sweet
spot" after calling rand() about 7 million times."

won't apply when srand() is seeded correctly.

Rod Pemberton

Rod Pemberton, May 18, 2006
16. ### Keith ThompsonGuest

Richard Heathfield <> writes:
> Keith Thompson said:
>> Eric Sosman <> writes:
>>> Flip a coin twice. If get something other than heads once
>>> and tails once, do you conclude that the coin is biased?

>>
>> No, but if I consistently get more heads than tails, it probably is
>> biased.

>
> Not so. The "law of averages" does not work by evening out differences, but
> by swamping them.
>
> Coin-flipping is isomorphic to a random walk in one dimension. Start at 0.
> If you toss heads, go right one step. If you toss tails, go left one step.
> The most probable number of sign changes in a walk is actually 0. The next
> most probable is 1, then 2, and so on. So consistently getting more heads
> than tails does not necessarily, or even probably, mean that the coin is
> biased.

tails" so as to make my statement correct. }

For example, if I do multiple runs of 100000 flips, and I consistently
get (even just slightly) more heads than tails on each run, the coin
is probably biased.

--
Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.

Keith Thompson, May 18, 2006
17. ### Richard HeathfieldGuest

Keith Thompson said:

> Richard Heathfield <> writes:
>> So consistently getting
>> more heads than tails does not necessarily, or even probably, mean that
>> the coin is biased.

>
> tails" so as to make my statement correct. }

Righty-ho, squire - re-interpreting now...

Oh gosh yes, you're right, aren't you? ;-)

--
Richard Heathfield
"Usenet is a strange place" - dmr 29/7/1999
http://www.cpax.org.uk
email: rjh at above domain (but drop the www, obviously)

Richard Heathfield, May 18, 2006
18. ### Guest

The problems is not that I don't get exactly 50/50 (which would be
improbable), but that the skew is _always_ towards the 0's, no matter
what seed I use. I need a PRNG that at least biases 0's and 1's
equally. That is, the number of runs where 0's win vs the number of
runs where 1's win should be 50/50 given enough runs

Let's say you had to make a _fair_ system that randomly chooses which
of two people should get a dollar on each run (the scenario might be
much more serious, making the fairness infinitely important). What
would you do? If your PRNG always slightly favors one, then it cannot
be called fair.

But on the other hand, it occurs to me, if we had a fair binary PRNG,
then we would have a perfect general PRNG since we could just string
the results of the binary PRNG together to make multibit words. And
since a perfect PRNG doesn't exist yet, neither does a solution to the
above problem, I guess.

/David

, May 18, 2006
19. ### Ben PfaffGuest

"" <> writes:

> The problems is not that I don't get exactly 50/50 (which would be
> improbable), but that the skew is _always_ towards the 0's, no matter
> what seed I use. I need a PRNG that at least biases 0's and 1's
> equally. That is, the number of runs where 0's win vs the number of
> runs where 1's win should be 50/50 given enough runs

There's a PRNG that should have equal bias on my webpage:
http://benpfaff.org/writings/clc/random.html
--
int main(void){char p[]="ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz.\
\n",*q="kl BIcNBFr.NKEzjwCIxNJC";int i=sizeof p/2;char *strchr();int putchar(\
);while(*q){i+=strchr(p,*q++)-p;if(i>=(int)sizeof p)i-=sizeof p-1;putchar(p\
);}return 0;}

Ben Pfaff, May 18, 2006
20. ### Richard HeathfieldGuest

said:

> The problems is not that I don't get exactly 50/50 (which would be
> improbable), but that the skew is _always_ towards the 0's, no matter
> what seed I use. I need a PRNG that at least biases 0's and 1's
> equally. That is, the number of runs where 0's win vs the number of
> runs where 1's win should be 50/50 given enough runs

Oh, that's easy to fix. To toss an unfair coin fairly: let p be the
probability that you get heads, and q the probability that you get tails.

Probability of HH = p * p
Probability of HT = p * q
Probability of TH = q * p
Probability of TT = q * q

But q = 1 - p, so:

Probability of HH = p^2
Probability of HT = p * (1 - p) = p - p^2
Probability of TH = (1 - p) * p = p - p^2
Probability of TT = (1 - p) * (1 - p) = 1 - 2p + p^2

Note that the probability of HT and the probability of TH are equal, despite
p not necessarily equalling q. So if you get HT or TH, call the first of
them heads and the second tails. If you get anything else, discard the two
tosses, and toss again.

--
Richard Heathfield
"Usenet is a strange place" - dmr 29/7/1999
http://www.cpax.org.uk
email: rjh at above domain (but drop the www, obviously)

Richard Heathfield, May 18, 2006