KISS4691, a potentially top-ranked RNG.

orz · Aug 19, 2010

Hi George,

Click to expand...

the problem is not that the carry could match or exceed the multiplier
8193.
The problem is that the carry can be as large as 8192.

Click to expand...

In your code you write

Click to expand...

t=(x<<13)+c+x; c=(t<x)+(x>>19);

Click to expand...

To make it a bit more obvious what happens, I can change this to the
100% equivalent

Click to expand...

t = (x << 13) + c;
c = x >> 19;
t += x; if (t < x) ++c;

Click to expand...

The last line is the usual trick to add numbers and check for carry: t
will be less than x exactly if the addition t += x wrapped around and
produced a carry.

Click to expand...

The problem is that (x << 13) + c can also produce a carry, and that
goes unnoticed. If c = 8192, and the lowest 19 bits in x are all set,
then (x << 13) has the highest 19 bits set and 13 bits zero. Adding c
"overflows" and gives a result of 0, and adding x leaves x. The
comparison (t < x) is false and produces zero, even though (x<<13)+c+x
should have produced a carry.

Click to expand...

This is very, very rare. It doesn't actually happen in your original
source code with a total of 2 billion random numbers. If you produce
four billion random numbers, then it happens at i = 3596309491 and
again at i = 3931531774.

Click to expand...

I am more interested in 64 bit performance, so I just made c, t, and x
64 bit numbers and wrote

Click to expand...

uint64_t x = Q ;
uint64_t t = (x << 13) + x + c;
c = (t >> 32);

Click to expand...

That might help in Fortran as well, assuming 64 bit integers are
available, signed or unsigned doesn't matter.

Click to expand...

Thanks for pointing that out.
The correction (t<x) should be (t<=x),
to take care of that rare, but inevitable, event
when the last 19 bits of x are 1's
and the carry c is a-1=2^13, making (x<<13)+c = 0,
and thus t=(x<<13)+c+x = x,
(mod 2^32, of course, the assumed underlying
integer arithmetic of the processor).

The entire MWC() routine should have that
(t<x) to (t<=x) correction:

unsigned long MWC(void)
{ unsigned long t,x,i; static c=0,j=4691;
j=(j<4690)? j+1:0;
x=Q[j];
t=(x<<13)+c+x; c=(t<=x)+(x>>19);
return (Q[j]=t);

}

and for interested readers who may have pasted
the original, please change that (t<x) to (t<=x).
The test values from 10^9 calls will still be as indicated,
but strict adherence to the underlying mathematics requires
the change from (t<x) to (t<=x) to ensure the full period.

Keeping that (t<x) may be viewed as, rather than making a
circular transition through the 8193*2^133407-1 base-b digits
of the expansion of some j/p, we jump---after an average of
b=2^32 steps---to another point in the immense circle.

It could be that the period is increased rather than
decreased, but it remains a curiosity. Perhaps light could
be shed with primes p=ab^n-1 for which the actual distribution
of such jumps, averaging every b steps, could be examined.

Or perhaps it could remain a mystery
and be further fodder for those who tend to
equate lack of understanding to "true" randomness.

George Marsaglia

Hm... I see what you're saying...
old forumula: carry = (x>>19) + (t<x)
old state: x = 2**32-1, carry = 2**13
new state: t = 2**32-1, carry = 2**13-1 !!!

The carry does appear incorrect. And the change does appear to fix
that:

new forumula: carry = (x>>19) + (t<=x)
old state: x = 2**32-1, carry = 2**13
new state: t = 2**32-1, carry = 2**13

But... doesn't that mean:

new forumula: carry = (x>>19) + (t<=x)
old state: x = 0, carry = 0
new state: t = 0, carry = 1 !!!

Isn't that incorrect?

orz · Aug 19, 2010

On Aug 18, 6:08 pm, "christian.bau" <[email protected]>
wrote:

Hi George,
the problem is not that the carry could match or exceed the multiplier
8193.
The problem is that the carry can be as large as 8192.
In your code you write
t=(x<<13)+c+x; c=(t<x)+(x>>19);
To make it a bit more obvious what happens, I can change this to the
100% equivalent
t = (x << 13) + c;
c = x >> 19;
t += x; if (t < x) ++c;
The last line is the usual trick to add numbers and check for carry: t
will be less than x exactly if the addition t += x wrapped around and
produced a carry.
The problem is that (x << 13) + c can also produce a carry, and that
goes unnoticed. If c = 8192, and the lowest 19 bits in x are all set,
then (x << 13) has the highest 19 bits set and 13 bits zero. Adding c
"overflows" and gives a result of 0, and adding x leaves x. The
comparison (t < x) is false and produces zero, even though (x<<13)+c+x
should have produced a carry.
This is very, very rare. It doesn't actually happen in your original
source code with a total of 2 billion random numbers. If you produce
four billion random numbers, then it happens at i = 3596309491 and
again at i = 3931531774.
I am more interested in 64 bit performance, so I just made c, t, and x
64 bit numbers and wrote
uint64_t x = Q ;
uint64_t t = (x << 13) + x + c;
c = (t >> 32);
That might help in Fortran as well, assuming 64 bit integers are
available, signed or unsigned doesn't matter.

Click to expand...

Click to expand...

Thanks for pointing that out.
The correction (t<x) should be (t<=x),
to take care of that rare, but inevitable, event
when the last 19 bits of x are 1's
and the carry c is a-1=2^13, making (x<<13)+c = 0,
and thus t=(x<<13)+c+x = x,
(mod 2^32, of course, the assumed underlying
integer arithmetic of the processor).

Click to expand...

The entire MWC() routine should have that
(t<x) to (t<=x) correction:

Click to expand...

unsigned long MWC(void)
{ unsigned long t,x,i; static c=0,j=4691;
j=(j<4690)? j+1:0;
x=Q[j];
t=(x<<13)+c+x; c=(t<=x)+(x>>19);
return (Q[j]=t);

}

Click to expand...

and for interested readers who may have pasted
the original, please change that (t<x) to (t<=x).
The test values from 10^9 calls will still be as indicated,
but strict adherence to the underlying mathematics requires
the change from (t<x) to (t<=x) to ensure the full period.

Click to expand...

Keeping that (t<x) may be viewed as, rather than making a
circular transition through the 8193*2^133407-1 base-b digits
of the expansion of some j/p, we jump---after an average of
b=2^32 steps---to another point in the immense circle.

Click to expand...

It could be that the period is increased rather than
decreased, but it remains a curiosity. Perhaps light could
be shed with primes p=ab^n-1 for which the actual distribution
of such jumps, averaging every b steps, could be examined.

Click to expand...

Or perhaps it could remain a mystery
and be further fodder for those who tend to
equate lack of understanding to "true" randomness.

Click to expand...

George Marsaglia

Click to expand...

Hm... I see what you're saying...
old forumula: carry = (x>>19) + (t<x)
old state: x = 2**32-1, carry = 2**13
new state: t = 2**32-1, carry = 2**13-1 !!!

The carry does appear incorrect. And the change does appear to fix
that:

new forumula: carry = (x>>19) + (t<=x)
old state: x = 2**32-1, carry = 2**13
new state: t = 2**32-1, carry = 2**13

But... doesn't that mean:

new forumula: carry = (x>>19) + (t<=x)
old state: x = 0, carry = 0
new state: t = 0, carry = 1 !!!

Isn't that incorrect?

Late by several hours. I should double-check that the thread hasn't
gone on to the next page before posting.

The suggestion of using 64 bit integers seems to work... but if
compiled to 32 bit code on my machine, using 64 bit integers slows it
down by almost a factor of 2. The whole KISS, not just the MWC
measured in isolation.

I tried a few options, comparing performance between them:
(compiling to 32 bit x86 with MSVC, running on a Core 2 Duo 3 GHrz)

without the carry fix:
6.07 seconds for 10**9 calls

with 64 bit ints:
11.24 seconds for 10**9 calls
(I tried a few minor variations of this trying to get the compiler to
do better, but it refused)

with a special case if checking for the incorrectly handled 0
6.19 seconds for 10**9 calls

with some ADC instructions (inefficiently) inserted with __asm
8.68 seconds for 10**9 calls

So it looks to me like at least on 32 bit instruction sets the fastest
is adding an extra if. Any other ideas for how to handle that?

The versions of MWC with the carry fixes appears to pass and fail the
same statistical tests as the original version of MWC.

Gib Bogle · Aug 19, 2010

baf said:
Yes, that seems to be correct. Just a typo in the code.

gcc -O2 27.7 sec
gfortran -O2 31.9

I am sure both of these times could be improved with some better choices
in compile options.

It appears that Intel's C compiler has some pretty nifty bit-shifting
optimizations (lacking the Fortran, unfortunately).

sci.math · Aug 21, 2010

i forget to say it could be the wrong traslation or deep bugged
so not use it if you want something sure 100%
not i do some test on it for see the randomeness too

- Show quoted text -

My ss begins 4691

robin · Aug 28, 2010

| robin wrote:
| > | orz wrote:
| > |
| > | > If I made yet another mistake on that then I'm going to throw
| > | > something.
| > | >
| > | > It produced identical results for me when I tried it in C using this
| > | > code:
| > | >
| > | > typedef unsigned int Uint32;
| > | > Uint32 sign(Uint32 value) {return value>>31;}
| > | > Uint32 index, carry, data[4691];
| > | > Uint32 mwc4691() {
| > | > index = (index < 4690) ? index + 1 : 0;
| > | > Uint32 x = data[index];
| > | > Uint32 t = (x << 13) + carry + x;
| > | > if (sign((x<<13)+carry) == sign(x))
| > | > carry = sign(x) + (x>>19);
| > | > else carry = 1 - sign(t) + (x>>19);
| > | > data[index] = t;
| > | > return t;
| > | > }
| > | >
| > | > I think the only difference between that and the algorithm he
| > | > expressed in pseudocode is that I added parentheses on the first left-
| > | > shift (he was apparently assuming higher precedence on the leftshift
| > | > operator than C uses). Maybe that's why you're getting different
| > | > results?
| > |
| > | You do realize that there's no unsigned integer type in Fortran? I don't think
| > | there's much point in trying to predict what the Fortran code does without using
| > | a Fortran compiler.
| >
| > It's perfectly possible to predict how to program the operation
| > (unsigned arithmetic) in Fortran provided that we make the
| > assumption that the hardware provides twos complement arithmetic.
|
| It's possible to predict, but more reliable to test. I can use Fortran to
| reproduce the results generated by the C code, but not using the pseudo-code
| that George provided.

The C code that George provided was for unsigned.
He also gave some alterations in C for signed.

| Unfortunately he seems to have lost interest in this
| thread. Perhaps you'd like to suggest the Fortran code to implement the method
| that George provided in C (assuming twos complement).

I already provided an unsigned version in PL/I.
Trivial changes are required for Fortran.

robin · Aug 28, 2010

|
| > In article <[email protected]>,
| > (e-mail address removed) says...
|
| > > You do realize that there's no unsigned integer type in Fortran?
| ...
| > Maybe he is using Sun's Fortran 95 compiler.
|
| Though if my admittedly fallible memory serves me, the unsigned integer
| in Sun's f95 has restrictions that are likely to surprise many casual
| users. Being more specific might stretch my memory a bit farther than I
| trust, but I think I recall a lack of mixed-mode operations, for
| example. So one would have to tread pretty carefully in order to
| correctly transcribe pseudocode into working code. For example, I'm not
| at all sure that one can do such things as adding 1 to an unsigned
| integer; I think you might have to add 1U instead.

The entire algorithm can be implemented in unsigned aruithmetic.
It should not therefore present any problems.

robin · Aug 28, 2010

Michael Press · Dec 6, 2010

glen herrmannsfeldt said:
(snip)

The C precedence rules for shifts are widely believed to
be wrong, including by the developers of C, but it is considered
too late to change. Best to always use parentheses around them.

Shifts bind more tightly than assignment,
and that is not saying much. Shifts _should_
bind as tightly as power, and that is more
tightly than multiplication and division.
As it is, shift binds less tightly than
addition. _That_ bit me once. Speaking of
bits, the bit wise binary operators bind
_less_ tightly than assignment. Yes, before
you ask, that bit me too, and I had the
precedence chart tacked to the wall at
eye level.

Implementing the KISS4691 RNG	22	Sep 5, 2010
Ensuring the long period of the KISS4691 RNG.	5	Aug 22, 2010
turning Marsaglia's recent posts into a function	15	Sep 1, 2010
The CMWC4827 RNG, an improvement on MWC4691.	12	Oct 22, 2010
RNGs with periods exceeding 10^(40million).	29	Jan 16, 2011
An interesting new RNG.	11	Jul 1, 2010
SuperKISS for 32- and 64-bit RNGs in both C and Fortran.	11	Nov 27, 2009
RNGs: A Super KISS	8	Nov 3, 2009

KISS4691, a potentially top-ranked RNG.

orz

orz

Gib Bogle

sci.math

robin

robin

robin

Michael Press

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads