# RNGs: A double KISS

G

#### geo

The KISS (Keep-It Simple-Stupid) principle
appears to be one of the better ways to get
fast and reliable random numbers in a computer,
by combining several simple methods.

Most RNGs produce random integers---usually 32-bit---
that must be converted to floating point numbers
for applications. I offer here a posting
describing my latest version of a KISS RNG that
produces double-precision uniform random variables
directly, without need for the usual integer-to-float
conversion. It is an extension of a method I provided
for MATLAB, with details described in
G.Marsaglia and W.W.Tsang, The 64-bit universal RNG,
Statistics & Probability Letters, V66, Issue 2,2004.

Rather than combining simple lagged Fibonacci and Weyl
sequences, as in those versions, this latest version combines
a lag-1220 CSWB (Complimentary-Subtract-With-Borrow) sequence,
x[n]=x[n-1220-x[n-1190]-borrow \mod b=2^53, period >10^19462
with a lag-2 SWB (Subtract-With-Borrow) sequence,
z[n]=z[n-1]-z[n-2]-borrow mod b=2^53. period>10^30.
Details on how to maintain exact integer arithmetic
with only float operations are in the article above.

This double-precision RNG, called dUNI, has an immense period,
around 10^19492 versus a previous 10^61, requires just a few
lines of code, is very fast, some 70 million per second, and
---a key feature---behaves very well on tests of randomness.
The extensive tests of The Diehard Battery, designed for 32-bit
random integers, are applied to bits 1 to 32, then 2 to 33,
then 3 to 34,...,then finally bits 21 to 53 for each part of
the string of 53 bits in each double-precision float dUNI().

A C version is listed below, but as only tests of magnitude
and floating point addition or subtraction are required,
implementations in other languages could be made available.
I invite such from interested readers.

If you cut, paste, compile and run the following C code,
it should take around 14 seconds to generate 1000 million
dUNI's, followed by the output 0.6203646342357479.
------------------------------------------------------------

/*
dUNI(), a double-precision uniform RNG based
on the KISS (Keep-It-Simple-Stupid) principle,
combining, by subtraction mod 1, a CSWB
(Complimentary-Subtract-With-Borrow) integer sequence
x[n]=x[n-1220]-x[n-1190]-borrow mod b=2^53,
with a SWB (Subtract-With-Borrow) sequence
z[n]=z[n-2]-z[n-1]-borrow mod b=2^53.

Both implemented as numerators of double precision
rationals, t for CSWB, zy for SWB, each formed as
(integer mod 2^53)/2^53, leading to a returned KISS
value t-zw mod 1. All denominators floats of b=2^53.

The CSWB sequence is based on prime p=b^1220-b^1190+1,
for which the order of b mod p is (p-1)/72, so the CSWB
sequence has period >2^64653 or 10^19462.

The SWB-lag2 sequence z[n]=z[n-2]-z[n-1]-zc mod b=2^53,
is based on b^2-b-1 = 11*299419*24632443746239056514780519
with period 10*299418*(24632443746239056514780518/2) > 2^101.
cc is the CSWB and SWB borrow or carry value: cc=1.0/b;
*/

#include <stdio.h>
static double Q;
static int indx=1220;

double dUNI()
{ /* Takes 14 nanosecs, Intel Q6600,2.40GHz */
const double cc=1.0/9007199254740992.0;
static double c=0.0,zc=0.0, /*current CSWB and SWB `borrow`*/
zx=5212886298506819.0/9007199254740992.0, /*SWB seed1*/
zy=2020898595989513.0/9007199254740992.0; /*SWB seed2*/
int i,j; double t; /*t: first temp, then next CSWB value*/
/* First get zy as next lag-2 SWB */
t=zx-zy-zc; zx=zy; if(t<0){zy=t+1.0; zc=cc;} else{zy=t; zc=0.0;}

/*Then get t as the next lag-1220 CSWB value */
if(indx<1220) t=Q[indx++];
else /*refill Q[n] via Q[n-1220]-Q[n-1190]-c, */
{ for(i=0;i<1220;i++)
{j=(i<30) ? i+1190 : i-30;
t=Q[j]-Q+c; /*Get next CSWB element*/
if(t>0) {t=t-cc; c=cc;} else {t=t-cc+1.0; c=0.0;}
Q=t;
} /* end i loop */
indx=1; t=Q; /*set indx, exit 'else' with t=Q */
} /* end else segment; return t-zy mod 1 */
return( (t<zy)? 1.0+(t-zy) : t-zy ) ;
} /* end dUNI() */

int main() /*You must provide at least two 32-bit seeds*/
{ int i,j; double s,t; /*Choose 32 bits for x, 32 for y */
unsigned long x=123456789, y=362436069; /*default seeds*/
/* Next, seed each Q, one bit at a time, */
for(i=0;i<1220;i++) /*using 9th bit from Cong+Xorshift*/
{s=0.0; t=1.0;
for(j=0;j<52;j++)
{t=0.5*t; /* make t=.5/2^j */
x=69069*x+123; y^=(y<<13);y^=(y>>17);y^=(y<<5);
if(((x+y)>>23)&1) s=s+t; /*change bit of s, maybe */
} /* end j loop*/
Q=s;
} /*end i seed loop, Now generate 10^9 dUNI's: */
for(i=0;i<1000000000;i++) t=dUNI();
printf("%.16f\n",dUNI());
}

/*
Output, after 10^9 random uniforms:
0.6203646342357479
Should take about 14 seconds,
e.g. with gcc -O3 opimizing compiler
--------------------------------------------------------
*/

/*
Fully seeding the Q array requires 64660
seed bits, plus another 106 for the initial zx
and zy of the lag-2 SWB sequence.
As in the above listing, using just two 32-bit
seeds, x and y in main(), to initialize the Q
array, one bit at a time, by means of the
resulting Congruential+Xorshift sequence
may be enough for many applications.

Applications such as in Law or Gaming may
require enough seeds in the Q array to
guarantee that each one of a huge set of
possible outcomes can appear. For example,
choosing a jury venire of 80 from a
list of 2000 eligibles would require at least
ten 53-bit seeds; choosing 180 from 4000 would
require twenty 53-bit seeds.
To get certification, a casino machine that could
play forty simultaneous games of poker must be
able to produce forty successive straight-flushes,
with a resulting minimal seed set.

Users can choose their 32-bit x,y for the
above seeding process, or develop their own
for more exacting requirements when a mere
set of 64 seed bits may not be enough.

Properties:
1. Simple and very fast, some 70 million double-precision
uniform(0,1] random variables/second, in IEEE 754 standard
form: 1 sign bit, 11 exponent bits, 52 fraction bits
plus the implied 1 leading the fraction part, making a
total of 53 bits for each uniform floating-point variate.

2. Period some 10^19492 (vs. the 10^61 of an earlier version),
G. Marsaglia and W.W. Tsang, "The 64-bit universal RNG",(2004)
Statistics and Probability Letters}, v66, no. 2, 183-187,
or an earlier version provided for Matlab.)

3. Following the KISS, (Keep-It-Simple-Stupid) principle,
combines, via subtraction, a CSWB sequence
x[n]=x[n-1220]-x[n-1190]-borrow mod 2^53
based on the prime p=b^1220-b^1190+1, b=2^53,
with a SWB lag-2 sequence z[n]=w[n-2]-z[n-1] -c mod 2^53.
All modular arithmetic on numerators of rationals
with denominators 2.^53.

4. Easily implemented in various programming languages,
as only tests on magnitude and double-precision floating-point
subtraction or addition required.

5. Passes all Diehard tests,
http://i.cs.hku.hk/~diehard/
As Diehard is designed for 32-bit integer input, all of its
234 tests are applied to bits 1..32, then 2..33, then 3..34,..,
then 22..53 of the 53 bits in each dUNI().
(To form 32-bit integer j from bits i..(i+31):
t=dUNI()*(1<<(i-1)); i=t; j=(t-i)*4294967296.0; )
All twenty-two choices for the 32-bit segments performed
very well on the 234 p-values from Diehard tests, using
the default 32-bit seeds x and y. You might try your own,
with possibly more extensive seeding of the Q array.
*/

George Marsaglia

J

#### James Waldby

On Wed, 14 Apr 2010 03:41:05 -0700, geo (George Marsaglia) wrote:

[I snipped your comments and the code of dUNI double-precision
uniform random variates generator and simple test, except for
the following bits of main]
int main() /*You must provide at least two 32-bit seeds*/ ....
unsigned long x=123456789, y=362436069; /*default seeds*/ ....
for(...) ...
x=69069*x+123; y^=(y<<13);y^=(y>>17);y^=(y<<5); ....
Output, after 10^9 random uniforms:
0.6203646342357479
Should take about 14 seconds,
e.g. with gcc -O3 opimizing compiler
....

1. On my 2-GHz machine, runs take about 19 seconds. When you
give a run time, perhaps give a reference to machine speed also.

2. With code compiled as is, the result I get is 0.9250927935120845
on an Athlon 64 X2 5200+ cpu. If I substitute 'int' in place of
'long' in the x, y declaration quoted above, I get the result you
gave, 0.6203646342357479. Perhaps everyone in comp.lang.c knows
of clean ways to fix that declaration, if any exist. A brute force
fix would be to AND each operand and result in the x,y calcs in the
for loop with a 32-bit mask.

3. Compiling with -Wall turned on gives warning that control
reaches end of non-void function (main). So put a 'return 0'
or similar at the end of main.

D

#### Dann Corbit

For reference and comparison, here is a "down and dirty" translation
into a C class.

I guess that some C++ guru will take a few minutes to shine the apple a
bit so that it can be genuinely useful.

/*
From: geo <[email protected]>
Newsgroups: sci.math,comp.lang.c
Subject: RNGs: A double KISS
Date: Wed, 14 Apr 2010 03:41:05 -0700 (PDT)
Xref: eternal-september.org sci.math:94533 comp.lang.c:57850

The KISS (Keep-It Simple-Stupid) principle
appears to be one of the better ways to get
fast and reliable random numbers in a computer,
by combining several simple methods.

Most RNGs produce random integers---usually 32-bit---
that must be converted to floating point numbers
for applications. I offer here a posting
describing my latest version of a KISS RNG that
produces double-precision uniform random variables
directly, without need for the usual integer-to-float
conversion. It is an extension of a method I provided
for MATLAB, with details described in
G.Marsaglia and W.W.Tsang, The 64-bit universal RNG,
Statistics & Probability Letters, V66, Issue 2,2004.

Rather than combining simple lagged Fibonacci and Weyl
sequences, as in those versions, this latest version combines
a lag-1220 CSWB (Complimentary-Subtract-With-Borrow) sequence,
x[n]=x[n-1220-x[n-1190]-borrow \mod b=2^53, period >10^19462
with a lag-2 SWB (Subtract-With-Borrow) sequence,
z[n]=z[n-1]-z[n-2]-borrow mod b=2^53. period>10^30.
Details on how to maintain exact integer arithmetic
with only float operations are in the article above.

This double-precision RNG, called dUNI, has an immense period,
around 10^19492 versus a previous 10^61, requires just a few
lines of code, is very fast, some 70 million per second, and
---a key feature---behaves very well on tests of randomness.
The extensive tests of The Diehard Battery, designed for 32-bit
random integers, are applied to bits 1 to 32, then 2 to 33,
then 3 to 34,...,then finally bits 21 to 53 for each part of
the string of 53 bits in each double-precision float dUNI().

A C version is listed below, but as only tests of magnitude
and floating point addition or subtraction are required,
implementations in other languages could be made available.
I invite such from interested readers.

If you cut, paste, compile and run the following C code,
it should take around 14 seconds to generate 1000 million
dUNI's, followed by the output 0.6203646342357479.
------------------------------------------------------------
*/
/*
dUNI(), a double-precision uniform RNG based
on the KISS (Keep-It-Simple-Stupid) principle,
combining, by subtraction mod 1, a CSWB
(Complimentary-Subtract-With-Borrow) integer sequence
x[n]=x[n-1220]-x[n-1190]-borrow mod b=2^53,
with a SWB (Subtract-With-Borrow) sequence
z[n]=z[n-2]-z[n-1]-borrow mod b=2^53.

Both implemented as numerators of double precision
rationals, t for CSWB, zy for SWB, each formed as
(integer mod 2^53)/2^53, leading to a returned KISS
value t-zw mod 1. All denominators floats of b=2^53.

The CSWB sequence is based on prime p=b^1220-b^1190+1,
for which the order of b mod p is (p-1)/72, so the CSWB
sequence has period >2^64653 or 10^19462.

The SWB-lag2 sequence z[n]=z[n-2]-z[n-1]-zc mod b=2^53,
is based on b^2-b-1 = 11*299419*24632443746239056514780519
with period 10*299418*(24632443746239056514780518/2) > 2^101.
cc is the CSWB and SWB borrow or carry value: cc=1.0/b;
*/

#include <cstdio>
#include <cstdlib>
#include <cassert>

class kiss2 {

private:
double Q;
int indx;
double cc;
double c; /* current CSWB */
double zc; /* current SWB `borrow` */
double zx; /* SWB seed1 */
double zy; /* SWB seed2 */
size_t qlen;/* length of Q array */

public:
kiss2()
{
assert(sizeof (double) >= 8);
cc = 1.0 / 9007199254740992.0; // inverse of 2^53rd power
int i;
size_t qlen = indx = sizeof Q / sizeof Q;
for (i = 0; i < qlen; i++)
Q = 0;
double c = 0.0, zc = 0.0, /* current CSWB and SWB `borrow` */
zx = 5212886298506819.0 / 9007199254740992.0, /* SWB seed1 */
zy = 2020898595989513.0 / 9007199254740992.0; /* SWB seed2 */
int j;
double s, t; /* Choose 32 bits for x, 32 for y */
unsigned long x = 123456789, y = 362436069; /* default seeds * /
/* Next, seed each Q, one bit at a time, */
for (i = 0; i < qlen; i++)
{ /* using 9th bit from Cong+Xorshift */
s = 0.0;
t = 1.0;
for (j = 0; j < 52; j++)
{
t = 0.5 * t; /* make t=.5/2^j */
x = 69069 * x + 123;
y ^= (y << 13);
y ^= (y >> 17);
y ^= (y << 5);
if (((x + y) >> 23) & 1)
s = s + t; /* change bit of s, maybe */
} /* end j loop */
Q = s;
} /* end i seed loop, Now generate 10^9 dUNI's: */
}

double dUNI()
{ /* Takes 14 nanosecs, Intel Q6600,2.40GHz */
int i, j;
double t; /* t: first temp, then next CSWB value */
/* First get zy as next lag-2 SWB */
t = zx - zy - zc;
zx = zy;
if (t < 0)
{
zy = t + 1.0;
zc = cc;
}
else
{
zy = t;
zc = 0.0;
}

/* Then get t as the next lag-1220 CSWB value */
if (indx < 1220)
t = Q[indx++];
else
{ /* refill Q[n] via Q[n-1220]-Q[n-1190]-c, */
for (i = 0; i < 1220; i++)
{
j = (i < 30) ? i + 1190 : i - 30;
t = Q[j] - Q + c; /* Get next CSWB element */
if (t > 0)
{
t = t - cc;
c = cc;
}
else
{
t = t - cc + 1.0;
c = 0.0;
}
Q = t;
} /* end i loop */
indx = 1;
t = Q; /* set indx, exit 'else' with t=Q */
} /* end else segment; return t-zy mod 1 */
return ((t < zy) ? 1.0 + (t - zy) : t - zy);
} /* end dUNI() */
};

int main(void)
{ /* You must provide at least two 32-bit seeds */
kiss2 krng;
double t;
int i;
for (i = 0; i < 1000000000; i++)
t = krng.dUNI ();
printf ("%.16f\n", krng.dUNI ());
}

/*
Output, after 10^9 random uniforms:
0.6203646342357479
Should take about 14 seconds,
e.g. with gcc -O3 opimizing compiler
--------------------------------------------------------
*/

/*
Fully seeding the Q array requires 64660
seed bits, plus another 106 for the initial zx
and zy of the lag-2 SWB sequence.
As in the above listing, using just two 32-bit
seeds, x and y in main(), to initialize the Q
array, one bit at a time, by means of the
resulting Congruential+Xorshift sequence
may be enough for many applications.

Applications such as in Law or Gaming may
require enough seeds in the Q array to
guarantee that each one of a huge set of
possible outcomes can appear. For example,
choosing a jury venire of 80 from a
list of 2000 eligibles would require at least
ten 53-bit seeds; choosing 180 from 4000 would
require twenty 53-bit seeds.
To get certification, a casino machine that could
play forty simultaneous games of poker must be
able to produce forty successive straight-flushes,
with a resulting minimal seed set.

Users can choose their 32-bit x,y for the
above seeding process, or develop their own
for more exacting requirements when a mere
set of 64 seed bits may not be enough.

Properties:
1. Simple and very fast, some 70 million double-precision
uniform(0,1] random variables/second, in IEEE 754 standard
form: 1 sign bit, 11 exponent bits, 52 fraction bits
plus the implied 1 leading the fraction part, making a
total of 53 bits for each uniform floating-point variate.

2. Period some 10^19492 (vs. the 10^61 of an earlier version),
G. Marsaglia and W.W. Tsang, "The 64-bit universal RNG",(2004)
Statistics and Probability Letters}, v66, no. 2, 183-187,
or an earlier version provided for Matlab.)

3. Following the KISS, (Keep-It-Simple-Stupid) principle,
combines, via subtraction, a CSWB sequence
x[n]=x[n-1220]-x[n-1190]-borrow mod 2^53
based on the prime p=b^1220-b^1190+1, b=2^53,
with a SWB lag-2 sequence z[n]=w[n-2]-z[n-1] -c mod 2^53.
All modular arithmetic on numerators of rationals
with denominators 2.^53.

4. Easily implemented in various programming languages,
as only tests on magnitude and double-precision floating-point
subtraction or addition required.

5. Passes all Diehard tests,
http://i.cs.hku.hk/~diehard/
As Diehard is designed for 32-bit integer input, all of its
234 tests are applied to bits 1..32, then 2..33, then 3..34,..,
then 22..53 of the 53 bits in each dUNI().
(To form 32-bit integer j from bits i..(i+31):
t=dUNI()*(1<<(i-1)); i=t; j=(t-i)*4294967296.0; )
All twenty-two choices for the 32-bit segments performed
very well on the 234 p-values from Diehard tests, using
the default 32-bit seeds x and y. You might try your own,
with possibly more extensive seeding of the Q array.

-- George Marsaglia

*/

J

#### Jonathan Lee

Thanks for sharing.

Isn't this dangerous for y:
for (j = 0; j < 52; j++)
{
t = 0.5 * t; /* make t=.5/2^j  */
x = 69069 * x + 123;
y ^= (y << 13);
y ^= (y >> 17);
y ^= (y << 5);
if (((x + y) >> 23) & 1)
s = s + t; /* change bit of s, maybe */
}    /* end j loop */

Specifically (y >> 17) could mix in high bits if unsigned
long were 64 bit instead of (presumably) 32-bit. Just at
a quick glance, I would think the operations on y
would need to be masked to be portable.

--Jonathan

J

#### Jonathan Lee

For reference and comparison, here is a "down and dirty" translation
into a C class.

I guess that some C++ guru will take a few minutes to shine the apple a
bit so that it can be genuinely useful.

Another thought: you could probably remove that assertion by building
the double from LSb to MSb. You'd essentially be bitreversing the
number which probably won't affect the randomness (can't prove it
off hand). If double didn't have enough bits, the machine would simply
round the value as you went along. Not ideal under some RNG scenarios,
but the result should be the "same" as the actual number up to machine
precision.

--Jonathan

J

#### Jonathan Lee

Thanks for sharing.

Isn't this dangerous for y:

Just confirmed: if y isn't masked to 32-bits the output
is wrong.

Anyway, here's my take on the code:

// kiss2.hpp
-------------------------------------------------------------------
#include <cstddef>

class kiss2 {
double Q; // meh.. I guess this could be a
std::vector
const std::size_t qlen; // length of Q array
std::size_t indx;
double c; // current CSWB
double zc; // current SWB `borrow`
double zx; // SWB seed1
double zy; // SWB seed2
public:
kiss2(unsigned long = 123456789UL, unsigned long = 362436069UL);
double operator()(); // For getting a random number
};

// kiss2.cpp
-------------------------------------------------------------------
//#include "kiss2.hpp"
#include <cfloat>
using std::size_t;
/**
\todo? Replace constant initialization of indx and qlen w/ sizeof Q/
sizeof Q
\todo Magic numbers make me uneasy (re: zx, zy)
*/
kiss2::kiss2(
unsigned long x, // seed1 value
unsigned long y // seed2 value
): qlen(1220), indx(1220), c(0.0), zc(0.0),
zx(5212886298506819.0 / 9007199254740992.0),
zy(2020898595989513.0 / 9007199254740992.0)
{
#if (FLT_RADIX != 2 || DBL_MIN_EXP > -53)
#error "Machine double not supported"
#endif

// fill in Q "using 9th bit from Cong+Xorshift"
for (size_t i = 0; i < qlen; i++) {
double s = 0.0, t = 1.0;

// Build Q one bit at a time
for (int j = 0; j < 52; j++) {
t *= 0.5; // make t=.5/2^j
x = (69069 * x + 123);
y = (y ^ (y << 13)) & 0xFFFFFFFFUL;
y = (y ^ (y >> 17)) & 0xFFFFFFFFUL;
y = (y ^ (y << 5)) & 0xFFFFFFFFUL;

if (((x + y) >> 23) & 1UL) // conditionally set bit of s
s += t;
}
Q = s;
}
}

/**
\todo Separate refill function?
\todo I guess we sorta need a guarantee that qlen >= 30
*/
double kiss2: perator()() { // Takes 14 nanosecs, Intel Q6600,2.40GHz
static const double cc = 1.0 / 9007199254740992.0; // 2^-53

double t; // t: first temp, then next CSWB value
// First get zy as next lag-2 SWB
t = zx - zy - zc;
zx = zy;
if (t < 0) {
zy = t + 1.0, zc = cc;
} else zy = t, zc = 0.0;

// Then get t as the next lag-1220 CSWB value
if (indx >= qlen) { // refill Q[n] via Q[n-1220]-Q[n-1190]-c
for (size_t i = 0; i < qlen; i++) {
size_t j = (i < 30) ? i + (qlen - 30) : i - 30;
t = Q[j] - Q + c; // Get next CSWB element
if (t > 0) {
t = t - cc, c = cc;
} else t = t - cc + 1.0, c = 0.0;
Q = t;
}
indx = 0; // reset indx
}

// return Q[indx] - zy (mod 1)
t = Q[indx++] - zy;
return (t < 0.0 ? 1.0 + t : t);
}

// main.cpp
--------------------------------------------------------------------
#include <cstdio>
#include <cstdlib>
// #include "kiss2.hpp"
int main(void)
{ /* You must provide at least two 32-bit seeds */
kiss2 krng; // Use the default seeds
double t;
int i;
for (i = 0; i < 1000000000; i++)
t = krng();
printf ("%.16f\n", krng());
}

--Jonathan

S

#### steve

1. Simple and very fast, some 70 million double-precision
uniform(0,1] random variables/second, in IEEE 754 standard
form

This gives an issue with the Java conversion.  Java expects the return
from Random.nextDouble() to be in the range [0.0, 1.0), while this
code produces numbers in the range (0.0, 1.0].

I know I can make the switch by using:

return (1.0 - DoubleKISS());

Does anyone have any alternative suggestions that are faster and do
not break George's excellent RNG.

Not sure if it would be faster, but one could do

return(nextafter(DoubleKISS(), -1.));

Two items that George did not comment on are subnormal
numbers and how he determined the distribution is normal
over (0,1]. Does his generator create subnormals? If
I understand Figure D-1 in Goldberg, "What every computer
scientist should know about floating-point arithmetic,"
the distribution cannot be normal.

G

#### geo

1. Simple and very fast, some 70 million double-precision
uniform(0,1] random variables/second, in IEEE 754 standard
form
This gives an issue with the Java conversion.  Java expects the return
from Random.nextDouble() to be in the range [0.0, 1.0), while this
code produces numbers in the range (0.0, 1.0].
I know I can make the switch by using:
return (1.0 - DoubleKISS());
Does anyone have any alternative suggestions that are faster and do
not break George's excellent RNG.

Not sure if it would be faster, but one could do

return(nextafter(DoubleKISS(), -1.));

Two items that George did not comment on are subnormal
numbers and how he determined the distribution is normal
over (0,1].  Does his generator create subnormals?  If
I understand Figure D-1 in Goldberg, "What every computer
scientist should know about floating-point arithmetic,"
the distribution cannot be normal.- Hide quoted text -

- Show quoted text -

Users who insist on getting every possible float
in the unit interval could form dUNI()*.5^j;
with j taking values
0, 1, 2, 3,...
with probabilities
q, qp, qp^2, qp^3, ...
and p = 2^{-53) and q = 1-p..
Even at the rate of 70 million per second,
you wouild have to wait an average of over
four years before you had to multiply by .5,
8 years to multiply by .5^2, etc..

George Marsaglia

G

#### geo

Two items that George did not comment on are subnormal
numbers and how he determined the distribution is normal
over (0,1].  Does his generator create subnormals?  If
I understand Figure D-1 in Goldberg, "What every computer
scientist should know about floating-point arithmetic,"
the distribution cannot be normal.- Hide quoted text -
- Show quoted text -

Users who insist on getting every possible float
in the unit interval could form dUNI()*.5^j;
with j taking values
0, 1, 2, 3,...
with probabilities
q, qp, qp^2, qp^3, ...
and p = 2^{-53) and q = 1-p..
Even at the rate of  70 million per second,
you wouild have to wait an average of over
four years before you had to multiply by .5,
8 years to multiply by .5^2,  etc..

Reconsidering: the above won't help.

It is a sad, but not very sad, fact that with this
representation: floated rationals of the form k/2^53,
1/2 of the set of possible k's, say K0,
will provide a full complement of 53 bits,

1/4 of the set of possible k's, say K1,
will provide only 52 effective bits,

1/8 of the set of possible k's, say K2,
will provide only 51 effective bits,
and so on.

Ensuring that every float has its full complement
of 53 effective bits would seem to require that
a random extra bit be adjoined to those k's in K1,
two extra bits to those in K2, and so on.

But it doesn't seem worthwhile to add those extra
complexities. The average of 52 effective bits
should still be adequate for most simulations.

Remember that we were forced to go to double
precision because the average of 22 effective
bits fell short---but often not by very much---
in single-precision simulations.

Only floating a 64-bit integer J>2^53
seems to guarantee a full complement
of 53 effective bits in the IEEE representation
of J/2.^64, in one fell swoop.

George Marsaglia

A

#### Axel Vogt

geo wrote:
....
Applications such as in Law or Gaming may
require enough seeds in the Q array to
guarantee that each one of a huge set of
possible outcomes can appear. For example,
choosing a jury venire of 80 from a
list of 2000 eligibles would require at least
ten 53-bit seeds; choosing 180 from 4000 would
require twenty 53-bit seeds.
To get certification, a casino machine that could
play forty simultaneous games of poker must be
able to produce forty successive straight-flushes,
with a resulting minimal seed set.

Users can choose their 32-bit x,y for the
above seeding process, or develop their own
for more exacting requirements when a mere
set of 64 seed bits may not be enough.

Properties: ....
George Marsaglia

May be a somewhat bearishly question (after all the
detailed replies):

What is the 'recipe' to (repeated) seed?

S

#### steve

Two items that George did not comment on are subnormal
numbers and how he determined the distribution is normal
over (0,1].  Does his generator create subnormals?  If
I understand Figure D-1 in Goldberg, "What every computer
scientist should know about floating-point arithmetic,"
the distribution cannot be normal.- Hide quoted text -
- Show quoted text -
Users who insist on getting every possible float
in the unit interval could form dUNI()*.5^j;
with j taking values
0, 1, 2, 3,...
with probabilities
q, qp, qp^2, qp^3, ...
and p = 2^{-53) and q = 1-p..
Even at the rate of  70 million per second,
you wouild have to wait an average of over
four years before you had to multiply by .5,
8 years to multiply by .5^2,  etc..

Reconsidering: the above won't help.

It is a sad, but not very sad, fact that with this
representation:    floated rationals of the form k/2^53,
1/2 of the set of possible k's, say K0,
will provide a full complement of 53 bits,

1/4 of the set of possible k's, say K1,
will provide only 52 effective bits,

1/8 of the set of possible k's, say K2,
will provide only 51 effective bits,
and so on.

Ensuring that every float has its full complement
of 53 effective bits would seem to require that
a random extra bit be adjoined to those k's in K1,
two extra bits to those in K2, and so on.

But it doesn't seem worthwhile to add those extra
complexities.  The average of 52 effective bits
should still be adequate for most simulations.

Remember that we were forced to go to double
precision because the average of 22 effective
bits fell short---but often not by very much---
in single-precision simulations.

Only floating a 64-bit integer J>2^53
seems to guarantee a full complement
of 53 effective bits in the IEEE representation
of J/2.^64, in one fell swoop.

George,

Thanks for the helpful follow-ups. IIUC, your new
generator is sampling only a subset of all possible
floating point values in (0,1]. This subset is then
uniformly distributed in the interval.