64-bit KISS RNGs

Discussion in 'C Programming' started by geo, Feb 28, 2009.

  1. geo

    geo Guest

    64-bit KISS RNGs

    Consistent with the Keep It Simple Stupid (KISS) principle,
    I have previously suggested 32-bit KISS Random Number
    Generators (RNGs) that seem to have been frequently adopted.

    Having had requests for 64-bit KISSes, and now that
    64-bit integers are becoming more available, I will
    describe here a 64-bit KISS RNG, with comments on
    implementation for various languages, speed, periods
    and performance after extensive tests of randomness.

    This 64-bit KISS RNG has three components, each nearly
    good enough to serve alone. The components are:
    Multiply-With-Carry (MWC), period (2^121+2^63-1)
    Xorshift (XSH), period 2^64-1
    Congruential (CNG), period 2^64

    Compact C and Fortran listings are given below. They
    can be cut, pasted, compiled and run to see if, after
    100 million calls, results agree with that provided
    by theory, assuming the default seeds.

    Users may want to put the content in other forms, and,
    for general use, provide means to set the 250 seed bits
    required in the variables x,y,z (64 bits) and c (58 bits)
    that have been given default values in the test versions.

    The C version uses #define macros to enumerate the few
    instructions that MWC, XSH and CNG require. The KISS
    macro adds MWC+XSH+CNG mod 2^64, so that KISS can be
    inserted at any place in a C program where a random 64-bit
    integer is required.
    Fortran's requirement that integers be signed makes the
    necessary code more complicated, hence a function KISS().

    C version; test by invoking macro KISS 100 million times
    #include <stdio.h>

    static unsigned long long

    #define MWC (t=(x<<58)+c, c=(x>>6), x+=t, c+=(x<t), x)
    #define XSH ( y^=(y<<13), y^=(y>>17), y^=(y<<43) )
    #define CNG ( z=6906969069LL*z+1234567 )
    #define KISS (MWC+XSH+CNG)

    int main(void)
    {int i;
    for(i=0;i<100000000;i++) t=KISS;
    (t==1666297717051644203ULL) ?
    printf("100 million uses of KISS OK"):

    Fortran version; test by calling KISS() 100 million times
    program testkiss
    implicit integer*8(a-z)
    do i=1,100000000; t=KISS(); end do
    if(t.eq.1666297717051644203_8) then
    print*,"100 million calls to KISS() OK"
    else; print*,"Fail"
    end if; end

    function KISS()
    implicit integer*8(a-z)
    data x,y,z,c /1234567890987654321_8, 362436362436362436_8,&
    1066149217761810_8, 123456123456123456_8/
    save x,y,z,c
    m(x,k)=ieor(x,ishft(x,k)) !statement function
    s(x)=ishft(x,-63) !statement function
    if(s(x).eq.s(t)) then; c=ishft(x,-6)+s(x)
    else; c=ishft(x,-6)+1-s(x+t); endif
    return; end

    Output from using the macro KISS or the function KISS() is
    MWC+XSH+CNG mod 2^64.

    CNG is easily implemented on machines with 64-bit integers,
    as arithmetic is automatically mod 2^64, whether integers
    are considered signed or unsigned. The CNG statement is
    When I established the lattice structure of congruential
    generators in the 60's, a search produced 69069 as an easy-
    to-remember multiplier with nearly cubic lattices in 2,3,4,5-
    space, so I tried concatenating, using 6906969069 as
    my first test multiplier. Remarkably---a seemingly one in many
    hundreds chance---it turned out to also have excellent lattice
    structure in 2,3,4,5-space, so that's the one chosen.
    (I doubt if lattice structure of CNG has much influence on the
    composite 64-bit KISS produced via MWC+XSH+CNG mod 2^64.)

    XSH, the Xorshift component, described in
    uses three invocations of an integer "xor"ed with a shifted
    version of itself.
    The XSH component used for this KISS is, in C notation:
    y^=(y<<13); y^=(y>>17); y^=(y<<43)
    with Fortran equivalents y=ieor(y,ishft(y,13)), etc., although
    this can be effected by a Fortran statement function:
    As with lattice structure, choice of the triple 13,-17,43 is
    probably of no particular importance; any one of the 275 full-
    period triples listed in the above article is likely to provide
    a satisfactory component XSH for the composite MWC+XSH+CNG.

    The choice of multiplier 'a' for the multiply-with-carry (MWC)
    component of KISS is not so easily made. In effect, a multiply-
    with-carry sequence has a current value x and current "carry" c,
    and from each given x,c a new x,c pair is constructed by forming
    t=a*x+c, then x=t mod b=2^64 and c=floor(t/b).
    This is easily implemented for 32-bit computers that permit
    forming a*t+c in 64 bits, from which the new x is the bottom and
    the new c the top 32-bits.

    When a,x and c are 64-bits, not many computers seem to have an easy
    way to form t=a*x+c in 128 bits, then extract the top and bottom
    64-bit segments. For that reason, special choices for 'a' are
    needed among those that satisfy the general requirement that
    p=a*b-1 is a prime for which b=2^64 has order (p-1)/2.

    My choice---and the only one of this form---is a=2^58+1. Then the
    top 64 bits of an imagined 128-bit t=a*x+c may be obtained as
    (using C notation) (x>>6)+ 1 or 0, depending
    on whether the 64-bit parts of (x<<58)+c+x cause an overflow.
    Since (x<<58)+c cannot itself cause overflow (c will always be <a),
    we get the carry as c=(x>>6) plus overflow from (x<<58)+x.

    This is easily done in C with unsigned integers, using a different
    kind of 't': t=(x<<58)+c; c=(x>>6); x=t+x; c=c+(x<t);
    For Fortran and others that permit only signed integers, more work
    is needed.
    Equivalent mod 2^64 versions of t=(x<<58)+c and c=(x>>6) are easy,
    and if s(x) represents (x>>63) in C or ishft(x,-66) in Fortran,
    then for signed integers, the new carry c comes from the rule
    if s(x) equals s(t) then c=(x>>6)+s(x) else c=(x>>6)+1-s(x+t)

    A C version of this KISS RNG takes 18 nanosecs for each
    64-bit random number on my desktop (Vista) PC, thus
    producing KISSes at a rate exceeding 55 million per second.
    Fortran or other integers-must-be-signed compilers might get
    "only" around 40 million per second.

    Setting seeds:
    Use of KISS or KISS() as a general 64-bit RNG requires specifying
    3*64+58=250 bits for seeds, 64 bits each for x,y,z and 58 for c,
    resulting in a composite sequence with period around 2^250.
    The actual period is
    (2^250+2^192+2^64-2^186-2^129)/6 ~= 2^(247.42) or 10^(74.48).
    We "lose" 1+1.58=2.58 bits from maximum possible period, one bit
    because b=2^64, a square, cannot be a primitive root of p=ab-1,
    so the best possible order for b is (p-1)/2.
    The periods of MWC and XSH have gcd 3=2^1.58, so another 1.58
    bits are "lost" from the best possible period we could expect
    from 250 seed bits.

    Some users may think 250 seed bits are an unreasonable requirement.
    A good seeding procedure might be to assume the default seed
    values then let the user choose none, one, two,..., or all
    of x,y,z, and c to be reseeded.

    Latest tests in The Diehard Battery, available at
    were applied extensively. Those tests that specifically required
    32-bit integers were applied to the leftmost 32 bits
    (e,g, KISS>>32;), then to the middle 32-bits ((KISS<<16)>>32;)
    then to the rightmost 32 bits, ( (KISS<<32)>>32).
    There were no extremes in the more than 700 p-values returned
    by the tests, nor indeed for similar tests applied to just two of the
    KISS components: MWC+XSH, then MWC+CNG, then XSH+CNG.

    The simplicity, speed, period around 2^250 and performance on
    tests of randomness---as well as ability to produce exactly
    the same 64-bit patterns, whether considered signed or unsigned
    integers---make this 64-bit KISS well worth considering for
    adoption or adaption to languages other than C or Fortran,
    as has been done for 32-bit KISSes.

    George Marsaglia
    geo, Feb 28, 2009
    1. Advertisements

  2. geo

    nmm1 Guest

    While I hesitate to follow up to George Marsaglia on random numbers,
    there are a few things here that need at least clarification.
    I am NOT referring to the generator as such, where I have no
    disagreement, but to some of the other remarks.

    Nobody should EVER use 32-bit generators for more than about ten
    million numbers in an analysis without careful analysis, because
    the discreteness will start to show through in at least some real

    No Xorshift generator is good enough to use on its own, because
    they have some evil properties. We knew about them before 1970
    (Knuth refers to it) and at least some of us knew the reason but
    could not prove it. The person who did was Martin Luescher of
    CERN, in the 1990s (if I recall).

    When using generators in parallel programs, it is as important to
    worry about their quasi-independence as their serial properties;
    the common practice of using different seeds to the same generator
    is NOT a good idea. Multiplicative congruential generators with
    coprime moduli are quasi-independent over the whole period, as are
    multiply-with-carry ones with the same modulus but different, 'safe
    prime' multipliers - unfortunately, complementary multiply-with-carry
    ones are not.

    Most of this is published, but perhaps not the last sentence.
    More work on the quasi-independence of generators is needed! In
    particular, my belief is that the best generators for parallel
    work are multiply-with-carry ones with the same modulus and
    different, 'safe prime' multipliers, but I have been out of this
    area for many years and my study of them has been cursory. A
    quick Web search didn't find anything useful, but there were an
    awful lot of irrelevant hits, so I may have missed something.

    The problem here is that full-period properties do not necessarily
    map to the properties of shorter sequences (that was the problem
    with Xorshift generators), and analysing the latter is mathematically

    Nick Maclaren.
    nmm1, Feb 28, 2009
    1. Advertisements

  3. geo

    galathaea Guest


    as chief architect for a leading gaming (gambling) manufacturer in
    and one of the few engineers with a mathematics background
    i am often working with our internal rng
    to provide better tools for our game catalog

    for years
    we had used your 96-bit kiss rng
    which served us quite well for it's nice distribution
    and fair period

    some of our newest gaming ideas
    have required rng's with much greater periods
    (to cover the possibility space of the game)
    and our latest platform has moved over to use mersenne twisters

    it has been an industry standard
    to just add more rng's (the same ones instantiated multiply!)
    and i had to fight to ensure my company didn't go that path
    to ensure unintended correlations were not introduced)

    this improvement of yours would have certainly been considered
    if available at the time
    but i do fear the period may still have been too small
    for some of our more extreme future needs
    (there are already products on the market from several manufacturers
    that allow a player to play 100 simultaneous poker or keno games)

    your timings are quite impressive

    i'm very happy i saw this post of yours

    thank you for all your contributions!

    galathaea: prankster, fablist, magician, liar
    galathaea, Mar 1, 2009
  4. RANLUX is slow, but at the highest "luxury level" all 24 bits of the
    mantissa are chaotic. So, one could just stick these together to create
    numbers containing more bits.
    Phillip Helbig---remove CLOTHES to reply, Mar 1, 2009
  5. geo

    nmm1 Guest

    That wasn't the issue she (I assume) was addressing - it was one that
    I did. Yes, that technique works, for both RANLUX and 32-bit KISS.
    I use my own double-precision generator, of course, which has some
    theoretical advantages over both and is marginally simpler than (and
    similar to) KISS.

    Galathaea's concern was about the period, and she is very right to
    be so concerned. While a long period does not guarantee pseudo-
    randomness, it is a prerequisite for it - in particular, the pseudo-
    random properties in N dimensions are often limited by the Nth root
    of the period. And, despite common belief, that is NOT solely true
    for multiplicative congruential generators.

    Nick Maclaren.
    nmm1, Mar 1, 2009
  6. The period of RANLUX is huge---10**164 or something. (In other words,
    many orders of magnitude larger than the number of distinct bit
    combinations. Many lesser generators have a period much LESS than the
    number of distinct bit combinations and some algorithms can have a
    period at most as long as the number of distinct bit combinations. In
    these cases, of course, a given number x is always followed by a given
    number y, which with RANLUX is not the case.)
    Phillip Helbig---remove CLOTHES to reply, Mar 1, 2009
  7. geo

    nmm1 Guest

    !!!!! ALL such generators have been known to be trash (and, yes, I
    mean trash) since the late 1960s! One of my papers proves (and I
    mean mathematically rigorously) that you should never use more than
    period^(2/3) numbers in a simulation - the rule for never using more
    than period^(1/2) for cryptographic purposes has been known since
    time immemorial.

    64-bit KISS has a period of about 2^249, according to that posting.

    I could explain the potential defects of RANLUX, but would have to
    rake quite a lot of my memories from where they are archived (on
    tape, perhaps?) And please note that the word is 'potential' - to
    know if they were actual needs forms of analysis that I believe are
    still beyond the state of the art (and well beyond my mathematical
    ability). However, I believe that it is almost certainly reliable
    (on mathematical grounds, incidentally).

    Similar remarks can be made about KISS and my own generator but,
    there, I am almost certain that the required analysis is WAY beyond
    the state of the art.

    [ The analysis I am referring to is the one that maps the full-period
    uniformity, which is a quasi-random property, to the pseudo-randomness
    of short sequences. The word 'hairy' springs to mind! ]

    Nick Maclaren.
    nmm1, Mar 1, 2009
  8. geo

    robin Guest

    George, this code is not portable Fortran.
    If you want to specify a kind, you need to use
    SELECTED_INT_KIND or equivalent action
    in order for it to be portable.
    robin, Mar 1, 2009
  9. geo

    robin Guest

    A PL/I version:

    KISS: procedure() returns (fixed binary (64) unsigned) options (reorder);
    declare (x initial (1234567890987654321),
    y initial ( 362436362436362436),
    z initial ( 1066149217761810),
    c initial ( 123456123456123456),
    t ) fixed binary (64) unsigned static;

    if isrl(x,63) = isrl(t,63) then
    c=isrl(x,6)+isrl(x, 63);
    t = ieor(y, isll(y, 13));
    t = ieor(t, isrl(t, 17));
    y = ieor(t, isll(t, 43));
    return (x+y+z);
    end KISS;
    robin, Mar 13, 2009
  10. geo

    user923005 Guest

    user923005, Mar 13, 2009
  11. geo

    nmm1 Guest

    That URL is completely mangled. You don't use (ugh) 'Rich Text' by
    any chance?

    In my view Pierre L'Ecuyer is the leading expert on this area active
    today[*]. I would do so were I still active in this area myself; it
    would unquestionably pass most of them, as I have tried. I have some
    other tests, one of which is harsh enough that it is one of very few
    that will fail most of Marsaglia's; my generator passed, but that is
    to be expected :)

    [*} George Marsaglia was the previous holder of that position, as
    most people will agree, preceded by Donald Knuth.

    Nick Maclaren.
    nmm1, Mar 13, 2009
  12. geo

    robin Guest

    You are only guessing.
    Until you have actually run all such tests, your post is just hype.
    robin, Mar 14, 2009
  13. geo

    nmm1 Guest

    Back under your bridge with you!

    Nick Maclaren.
    nmm1, Mar 14, 2009
  14. geo

    user923005 Guest

    user923005, Mar 14, 2009
  15. geo

    nmm1 Guest

    Absolutely. Analyse first. Design second. Code and debug third.
    Every good engineer knows that rule!

    You need to run a couple of simple tests to check that your code
    matches your analysis, and there isn't a major flaw in the analysis
    itself, but you don't need to do more than that. The main testing
    comes where you can't analyse the problem, or where you suspect the
    analysis may be unreliable.

    Nick Maclaren.
    nmm1, Mar 15, 2009
  16. geo

    Ron Shepard Guest

    Sometimes you divide your problem into different domains, large,
    medium, and small, or high and low, or combinations of different
    ranges for different variables, and you use different algorithms to
    solve the different domains. Part of the optimal solution then
    consists of switching to the best algorithm at the right time.
    Sorting is one of those problems, where with small data sets you
    might use one algorithm, and you switch to different algorithms as
    the data sets grow. Also, some sorting algorithms involve
    partitioning the original large data set into smaller subsets,
    sometimes doing this recursively, and you might change algorithms as
    you move up and down the different levels of recursion.

    The exact switchover points might be machine dependent, depending
    say on the relative efficiency of various floating point and integer
    instructions, or the relative speeds of different parts of the
    memory hierarchy, or the communication speed with some external
    device. Much of this must be done in a brute force way where you
    just search for the switchover points, and then store or interpolate
    those switchover points for later use. This is the way ATLAS
    optimizes linear algebra operations.

    $.02 -Ron Shepard
    Ron Shepard, Mar 15, 2009
  17. Recoded for MS VC (MSC) compilers, which don't understand
    the 'long long' datatype, and with minor corrections....

    #include <stdio.h>

    #if _WIN32
    typedef unsigned __int64 ullong_t;
    typedef unsigned long long ullong_t;

    static ullong_t x = 1234567890987654321/*ULL*/;
    static ullong_t c = 123456123456123456/*ULL*/;
    static ullong_t y = 362436362436362436/*ULL*/;
    static ullong_t z = 1066149217761810/*ULL*/;
    static ullong_t t;

    #define MWC (t = (x<<58)+c, c = (x>>6), x+=t, c+=(x<t), x)
    #define XSH (y ^= (y<<13), y ^= (y>>17), y ^= (y<<43))
    #define CNG (z = 6906969069/*LL*/ * z + 1234567)
    #define KISS (MWC + XSH + CNG)

    int main(void)
    int i;

    for (i = 0; i < 100000000; i++)
    t = KISS;

    if (t == 1666297717051644203/*ULL*/)
    printf("100 million uses of KISS OK");
    return 0;

    David R Tribble, Mar 15, 2009
  18. geo

    Axel Vogt Guest

    Does that make sense on a 32 bit Windows?
    Axel Vogt, Mar 15, 2009
  19. geo

    Richard Bos Guest

    Which, in practice, is all the time. I mean, have _you_ ever spoken to a
    customer who knew what he wanted, completely, correctly, and up front?
    I know I haven't.

    Richard Bos, Mar 16, 2009
  20. geo

    nmm1 Guest

    Oh, yes, but you had better deliver the solution within a couple of
    days - if not, the requirement will have started drifting ....

    Nick Maclaren.
    nmm1, Mar 16, 2009
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.