Good binary PRNG

P

pinkfloydhomer

Using rand() in and old version og gcc, and using Tausworth's method, I
calculated the frequency of 0 or 1 in the first digit like this:

int hist[2] = {0,0};

for (i = 0; i < 100000; ++i)
{
++hist[prng() % 2];
}

For some reason, in both cases, 0 occurs more often than 1, no matter
what seed I use to initialize the prng with. Something like 50150 vs
49850 in the above code.

Does anybody know of a good "binary" PRNG, that is, one that in the
above case would make something closer to 50000 vs 50000?

/David
 
F

Flash Gordon

Using rand() in and old version og gcc, and using Tausworth's method, I
calculated the frequency of 0 or 1 in the first digit like this:

int hist[2] = {0,0};

for (i = 0; i < 100000; ++i)
{
++hist[prng() % 2];
}

For some reason, in both cases, 0 occurs more often than 1, no matter
what seed I use to initialize the prng with. Something like 50150 vs
49850 in the above code.

Does anybody know of a good "binary" PRNG, that is, one that in the
above case would make something closer to 50000 vs 50000?

I suggest you start with the comp.lang.c FAQ which as question 13.15 has
"I need a random number generator". You can find the FAQ at
http://c-faq.com/
 
R

Rod Pemberton

Using rand() in and old version og gcc, and using Tausworth's method, I
calculated the frequency of 0 or 1 in the first digit like this:

int hist[2] = {0,0};

for (i = 0; i < 100000; ++i)
{
++hist[prng() % 2];
}

For some reason, in both cases, 0 occurs more often than 1, no matter
what seed I use to initialize the prng with. Something like 50150 vs
49850 in the above code.

Does anybody know of a good "binary" PRNG, that is, one that in the
above case would make something closer to 50000 vs 50000?

Many PRNG's (Mersienne Twister, ISO C, etc.) seem to hit a binary "sweet
spot" after calling rand() about 7 million times.

Try the following code with your compiler or PRNG:

#include <stdio.h>
#include <stdlib.h>
#include <time.h>

unsigned long hist[2] = {0,0};

int main(void)
{
unsigned long i, value;

srand(NULL);

for (i = 0; i < 7000000; ++i)
{
rand();
}
for (i = 0; i < 100000; ++i)
{
value=rand();
++hist[value%2];
}
printf("%ld %ld\n",hist[0],hist[1]);

return(EXIT_SUCCESS);
}


Rod Pemberton
 
B

Ben Pfaff

Keith Thompson said:
srand(time(NULL));

This used to come up fairly often, so I have a stock answer on
it:

By default, the C random number generator produces the same
sequence every time the program is run. In order to generate
different sequences, it has to be "seeded" using srand() with a
unique value. The function below to do this is carefully
designed. It uses time() to obtain the current time; the
alternative clock() is a poor choice because it measures CPU time
used, which is often more or less constant among runs. The
actual value of a time_t is not portable, so it computes a "hash"
of the bytes in it using a multiply-and-add technique. The
factor used for multiplication normally comes out as 257, a prime
and therefore a good candidate.

References: Knuth, _The Art of Computer Programming, Vol. 2:
Seminumerical Algorithms_, section 3.2.1; Aho, Sethi, and Ullman,
_Compilers: Principles, Techniques, and Tools_, section 7.6.

#include <limits.h>
#include <stdlib.h>
#include <time.h>

/* Choose and return an initial random seed based on the current time.
Based on code by Lawrence Kirby <[email protected]>.
Usage: srand (time_seed ()); */
unsigned
time_seed (void)
{
time_t timeval; /* Current time. */
unsigned char *ptr; /* Type punned pointed into timeval. */
unsigned seed; /* Generated seed. */
size_t i;

timeval = time (NULL);
ptr = (unsigned char *) &timeval;

seed = 0;
for (i = 0; i < sizeof timeval; i++)
seed = seed * (UCHAR_MAX + 2u) + ptr;

return seed;
}
 
E

Eric Sosman

Using rand() in and old version og gcc, and using Tausworth's method, I
calculated the frequency of 0 or 1 in the first digit like this:

int hist[2] = {0,0};

for (i = 0; i < 100000; ++i)
{
++hist[prng() % 2];
}

For some reason, in both cases, 0 occurs more often than 1, no matter
what seed I use to initialize the prng with. Something like 50150 vs
49850 in the above code.

Does anybody know of a good "binary" PRNG, that is, one that in the
above case would make something closer to 50000 vs 50000?

Note that the 50150 you mention is less than one-sixth
of one percent greater than a perfect fifty-fifty split. How
much "closer" do you think you need to be? How much "closer"
do you think a random sequence *ought* to be?

Let's see: If the outcomes (1 or 0) are equiprobable and
you make 100000 independent trials, the probability of getting
exactly 50000 of each outcome is the binomial probability

100000!
--------------- * 2^(-100000)
50000! * 50000!

If I'm not mistaken (I might be; these numbers are BIG), this
comes to 0.0025231262141967+; you can only expect a "perfect"
split about once in four hundred trials of a hundred thousand
numbers each. How many times have you run your experiment?

Let's see, take two: The expected number of 1's is 50000.
The variance is 100000*0.5*0.5 = 25000, so the standard
deviation is a little more than 158. You report a discrepancy
of 150, which is less than one standard deviation away from
a "perfect" split. I ask again: How much closer do you think
a random sequence can be while still being "random?"

Flip a coin twice. If get something other than heads once
and tails once, do you conclude that the coin is biased?
 
K

Keith Thompson

Eric Sosman said:
Using rand() in and old version og gcc, and using Tausworth's method, I
calculated the frequency of 0 or 1 in the first digit like this:
int hist[2] = {0,0};
for (i = 0; i < 100000; ++i)
{
++hist[prng() % 2];
}
For some reason, in both cases, 0 occurs more often than 1, no matter
what seed I use to initialize the prng with. Something like 50150 vs
49850 in the above code.
Does anybody know of a good "binary" PRNG, that is, one that in the
above case would make something closer to 50000 vs 50000?

Note that the 50150 you mention is less than one-sixth
of one percent greater than a perfect fifty-fifty split. How
much "closer" do you think you need to be? How much "closer"
do you think a random sequence *ought* to be?

The difference very likely isn't significant, but it might be.

If a single run of 100000 trials differs from a 50/50 split by a
fraction of a percent, it's not just harmless, it's expected. But if
N runs of 100000 are *consistently* over 50%, there's probably a
systematic error. (I'm deliberately leaving the value of N vague.)

I suspect that the OP hasn't done enough runs to demonstrate an actual
bias, but I'm only guessing.

[...]
Flip a coin twice. If get something other than heads once
and tails once, do you conclude that the coin is biased?

No, but if I consistently get more heads than tails, it probably is
biased.
 
P

pete

Keith Thompson wrote:
If a single run of 100000 trials differs from a 50/50 split by a
fraction of a percent, it's not just harmless, it's expected. But if
N runs of 100000 are *consistently* over 50%, there's probably a
systematic error. (I'm deliberately leaving the value of N vague.)

I suspect that the OP hasn't done enough runs to demonstrate an actual
bias, but I'm only guessing.

[...]
Flip a coin twice. If get something other than heads once
and tails once, do you conclude that the coin is biased?

No, but if I consistently get more heads than tails, it probably is
biased.

If you consistently get heads and tails alternating,
that's not very good either.

1, 2300471611
0, 3253914372
1, 1858424313
0, 116628778
1, 2369750119
0, 3657615680
1, 2176380933
0, 894631110
1, 3876978771
0, 721092924


/* BEGIN new.c */

#include <stdio.h>

#define LU_RAND(S) ((S) * 69069 + 362437 & 0XFFFFFFFFLU)

int main(void)
{
long unsigned lu_seed = 12345678LU;
int x;

for (x = 0; x != 10; ++x) {
lu_seed = LU_RAND(lu_seed);
printf("%lu, %lu\n", lu_seed % 2, lu_seed);
}
return 0;
}

/* END new.c */
 
R

Richard Heathfield

Keith Thompson said:
No, but if I consistently get more heads than tails, it probably is
biased.

Not so. The "law of averages" does not work by evening out differences, but
by swamping them.

Coin-flipping is isomorphic to a random walk in one dimension. Start at 0.
If you toss heads, go right one step. If you toss tails, go left one step.
The most probable number of sign changes in a walk is actually 0. The next
most probable is 1, then 2, and so on. So consistently getting more heads
than tails does not necessarily, or even probably, mean that the coin is
biased.
 
J

Jordan Abel

Keith said:
If a single run of 100000 trials differs from a 50/50 split by a
fraction of a percent, it's not just harmless, it's expected. But if
N runs of 100000 are *consistently* over 50%, there's probably a
systematic error. (I'm deliberately leaving the value of N vague.)

I suspect that the OP hasn't done enough runs to demonstrate an actual
bias, but I'm only guessing.

[...]
Flip a coin twice. If get something other than heads once
and tails once, do you conclude that the coin is biased?

No, but if I consistently get more heads than tails, it probably is
biased.

If you consistently get heads and tails alternating,
that's not very good either.

1, 2300471611
0, 3253914372
1, 1858424313
0, 116628778
1, 2369750119
0, 3657615680
1, 2176380933
0, 894631110
1, 3876978771
0, 721092924

Consistently heads and tails alternating would be
2863311530
2863311530
2863311530
2863311530
2863311530
2863311530
2863311530
2863311530

Yours is more like
HTTTHTTHTTTHHHHTTHHTHTTHTTHHHTHH HHTTTTTHHHHHTTHTHHTTHTHHTTTTTHTT
THHTHHHTHHTTTHTHTHTTHHTHHHHHHTTH TTTTTHHTHHHHTTHHHTTHHHTHTTHTHTHT

It's hardly the algorithm's fault you're only taking every 32nd result.
It may be its fault that it's biased in such a pattern, but it wouldn't
be as noticeable if you wouldn't use it that way.
 
J

Jordan Abel

Keith Thompson said:


Not so. The "law of averages" does not work by evening out differences, but
by swamping them.

Coin-flipping is isomorphic to a random walk in one dimension. Start at 0.
If you toss heads, go right one step. If you toss tails, go left one step.
The most probable number of sign changes in a walk is actually 0. The next
most probable is 1, then 2, and so on. So consistently getting more heads
than tails does not necessarily, or even probably, mean that the coin is
biased.

But if you consistently get a particular _percentage_ more, it does.

[i.e. 501 in a trial run of 1000, 5032 in a trial run of 10000, 50290 in
a trial run of 1000000, etc] that _can_ be indicative of bias. So is an
average of 510 across a large number of independent runs of 1000. What
you're saying, i think, is that 510 in 1000, then a total of 5012 in
that and the next 9000, then a total of 50008 in that and the next
90000, doesn't indicate bias - and it doesn't. But that's not the only,
or even the most obvious, thing that "consistently getting more heads"
can mean.

[incidentally, you say the most probable number of sign changes is
actually 0 - that sounds a bit misleading - is 0 really more probable
than all other numbers of sign changes combined? What _is_ the
probability of there being 0 sign changes, in a random walk of arbitrary
length?]
 
R

Richard Heathfield

Jordan Abel said:
But if you consistently get a particular _percentage_ more, it does.

Correct.

[incidentally, you say the most probable number of sign changes is
actually 0 - that sounds a bit misleading - is 0 really more probable
than all other numbers of sign changes combined?

I didn't mean to imply that, and I don't know whether or not it's true.
What _is_ the
probability of there being 0 sign changes, in a random walk of arbitrary
length?]

Scary and hard to transcribe in text! :)

http://mathworld.wolfram.com/RandomWalk1-Dimensional.html may help the more
mathematical amongst us (i.e. not me) to get a clue. Failing that, I can
probably lay my hands on a less scary formula, given a few days to plough
through all my Ian Stewarts...
 
R

Rod Pemberton

pete said:
A cast would supress warnings that might be generated
if time_t is higher ranking than unsigned.

srand((unsigned)time(NULL));


True. srand(time(NULL)) was unintentionally dropped during an edit.

My original statements:

"Many PRNG's (Mersienne Twister, ISO C, etc.) seem to hit a binary "sweet
spot" after calling rand() about 7 million times."

won't apply when srand() is seeded correctly.


Rod Pemberton
 
K

Keith Thompson

Richard Heathfield said:
Keith Thompson said:

Not so. The "law of averages" does not work by evening out differences, but
by swamping them.

Coin-flipping is isomorphic to a random walk in one dimension. Start at 0.
If you toss heads, go right one step. If you toss tails, go left one step.
The most probable number of sign changes in a walk is actually 0. The next
most probable is 1, then 2, and so on. So consistently getting more heads
than tails does not necessarily, or even probably, mean that the coin is
biased.

Then please interpret the phrase "consistently get more heads than
tails" so as to make my statement correct. :cool:}

For example, if I do multiple runs of 100000 flips, and I consistently
get (even just slightly) more heads than tails on each run, the coin
is probably biased.
 
R

Richard Heathfield

Keith Thompson said:
Then please interpret the phrase "consistently get more heads than
tails" so as to make my statement correct. :cool:}

Righty-ho, squire - re-interpreting now...

Oh gosh yes, you're right, aren't you? ;-)
 
P

pinkfloydhomer

The problems is not that I don't get exactly 50/50 (which would be
improbable), but that the skew is _always_ towards the 0's, no matter
what seed I use. I need a PRNG that at least biases 0's and 1's
equally. That is, the number of runs where 0's win vs the number of
runs where 1's win should be 50/50 given enough runs :)

Let's say you had to make a _fair_ system that randomly chooses which
of two people should get a dollar on each run (the scenario might be
much more serious, making the fairness infinitely important). What
would you do? If your PRNG always slightly favors one, then it cannot
be called fair.

But on the other hand, it occurs to me, if we had a fair binary PRNG,
then we would have a perfect general PRNG since we could just string
the results of the binary PRNG together to make multibit words. And
since a perfect PRNG doesn't exist yet, neither does a solution to the
above problem, I guess.

/David
 
B

Ben Pfaff

The problems is not that I don't get exactly 50/50 (which would be
improbable), but that the skew is _always_ towards the 0's, no matter
what seed I use. I need a PRNG that at least biases 0's and 1's
equally. That is, the number of runs where 0's win vs the number of
runs where 1's win should be 50/50 given enough runs :)

There's a PRNG that should have equal bias on my webpage:
http://benpfaff.org/writings/clc/random.html
 
R

Richard Heathfield

(e-mail address removed) said:
The problems is not that I don't get exactly 50/50 (which would be
improbable), but that the skew is _always_ towards the 0's, no matter
what seed I use. I need a PRNG that at least biases 0's and 1's
equally. That is, the number of runs where 0's win vs the number of
runs where 1's win should be 50/50 given enough runs :)

Oh, that's easy to fix. To toss an unfair coin fairly: let p be the
probability that you get heads, and q the probability that you get tails.

Probability of HH = p * p
Probability of HT = p * q
Probability of TH = q * p
Probability of TT = q * q

But q = 1 - p, so:

Probability of HH = p^2
Probability of HT = p * (1 - p) = p - p^2
Probability of TH = (1 - p) * p = p - p^2
Probability of TT = (1 - p) * (1 - p) = 1 - 2p + p^2

Note that the probability of HT and the probability of TH are equal, despite
p not necessarily equalling q. So if you get HT or TH, call the first of
them heads and the second tails. If you get anything else, discard the two
tosses, and toss again.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,904
Latest member
HealthyVisionsCBDPrice

Latest Threads

Top