RFC: clc-compliant pseudo-random number generator

B

Ben Pfaff

One issue that comes up fairly often around here is the poor
quality of the pseudo-random number generators supplied with many
C implementations. As a result, we have to recommend things like
using the high-order bits returned by rand() instead of the
low-order bits, avoiding using rand() for anything that wants
decently random numbers, not using rand() if you want more than
approx. UINT_MAX total different sequences, and so on.

So I wrote some PRNG code that uses RC4, with a seed of up to 256
bytes. Here it is. I believe it is clc-compliant. Comments on
this and anything else are welcome.

----------------------------------------------------------------------
/*
* prng.c - Portable, ISO C90 and C99 compliant high-quality
* pseudo-random number generator based on the alleged RC4
* cipher. This PRNG should be suitable for most general-purpose
* uses. Not recommended for cryptographic or financial
* purposes. Not thread-safe.
*/

/*
* Copyright (c) 2004 Ben Pfaff <[email protected]>.
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or
* without modification, are permitted provided that the
* following conditions are met:
*
* 1. Redistributions of source code must retain the above
* copyright notice, this list of conditions and the following
* disclaimer.
*
* 2. Redistributions in binary form must reproduce the above
* copyright notice, this list of conditions and the following
* disclaimer in the documentation and/or other materials
* provided with the distribution.
*
* 3. The author's and contributors' names may not be used to
* endorse or promote products derived from this software without
* specific prior written permission.
*
* THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS
* IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND
* FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT
* SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT,
* INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
* DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
* SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS;
* OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
* LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
* (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF
* THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY
* OF SUCH DAMAGE.
*
*/

#include "prng.h"
#include <assert.h>
#include <float.h>
#include <limits.h>
#include <math.h>
#include <time.h>

/* RC4-based pseudo-random state. */
static unsigned char s[256];
static int s_i, s_j;

/* Nonzero if PRNG has been seeded. */
static int seeded;

/* Take advantage of inlining if possible. */
#if __STDC_VERSION__ >= 199901L
#define INLINE inline
#else
#define INLINE
#endif

/* Swap bytes. */
static INLINE void
swap_byte (unsigned char *a, unsigned char *b)
{
unsigned char t = *a;
*a = *b;
*b = t;
}

/* If KEY is non-null and SIZE is nonzero, seeds the
pseudo-random number generator based on the SIZE bytes in BUF.
At most the first 256 bytes of BUF are used. If KEY is a null
pointer and SIZE is zero, seeds the pseudo-random number
generator based on the current time.

If this function is not called by the user before any prng_get
function, it is called automatically to obtain a time-based
seed. */
void
prng_seed (const void *key_, size_t size)
{
const unsigned char *key = key_;
size_t key_idx;
int i, j;

assert ((key != NULL && size > 0) || (key == NULL && size == 0));

if (key == NULL)
{
static time_t t;

t = (t == 0) ? time (NULL) : t + 1;
key = (void *) &t;
size = sizeof t;
}

for (i = 0; i < 256; i++)
s = i;
for (key_idx = 0, i = j = 0; i < 256; i++)
{
j = (j + s + key[key_idx]) & 255;
swap_byte (s + i, s + j);
if (++key_idx >= size)
key_idx = 0;
}

s_i = s_j = 0;
seeded = 1;
}

/* Returns a pseudo-random integer in the range [0,255]. */
unsigned char
prng_get_octet (void)
{
if (!seeded)
prng_seed (NULL, 0);

s_i = (s_i + 1) & 255;
s_j = (s_j + s[s_i]) & 255;
swap_byte (s + s_i, s + s_j);

return s[(s[s_i] + s[s_j]) & 255];
}

/* Returns a pseudo-random integer in the range [0,UCHAR_MAX]. */
unsigned char
prng_get_byte (void)
{
if (UCHAR_MAX == 255)
{
/* Normal case for 8-bit bytes. */
return prng_get_octet ();
}
else
{
/* Handle oversize bytes. */
unsigned byte = prng_get_octet ();
unsigned done = 255;
while ((unsigned char) done != UCHAR_MAX)
{
byte = (byte << 8) | prng_get_octet ();
done = (done << 8) | 255;
}
return byte;
}
}

/* Fills BUF with SIZE pseudo-random bytes. */
void
prng_get_bytes (void *buf_, size_t size)
{
unsigned char *buf;

for (buf = buf_; size-- > 0; buf++)
*buf = prng_get_byte ();
}

/* Returns a pseudo-random unsigned long in the range [0,
ULONG_MAX]. */
unsigned long
prng_get_ulong (void)
{
unsigned long ulng = prng_get_octet ();
unsigned long done = 255;

while (done != ULONG_MAX)
{
ulng = (ulng << 8) | prng_get_octet ();
done = (done << 8) | 255;
}

return ulng;
}

/* Returns a pseudo-random long in the range [0, LONG_MAX]. */
long
prng_get_long (void)
{
for (;;)
{
unsigned ulng = prng_get_ulong ();
if (ulng <= LONG_MAX)
return ulng;
}
}

/* Returns a pseudo-random unsigned int in the range [0,
UINT_MAX]. */
unsigned
prng_get_uint (void)
{
unsigned uint = prng_get_octet ();
unsigned done = 255;

while (done != UINT_MAX)
{
uint = (uint << 8) | prng_get_octet ();
done = (done << 8) | 255;
}

return uint;
}

/* Returns a pseudo-random int in the range [0, INT_MAX]. */
int
prng_get_int (void)
{
for (;;)
{
unsigned uint = prng_get_uint ();
if (uint <= INT_MAX)
return uint;
}
}

/* Returns a pseudo-random floating-point number from the uniform
distribution with range [0,1). */
double
prng_get_double (void)
{
for (;;)
{
unsigned long ulng = prng_get_ulong ();
if (ulng != ULONG_MAX)
return (double) ulng / ULONG_MAX;
}
}

/* Returns a pseudo-random floating-point number from the
distribution with mean 0 and standard deviation 1. (Multiply
the result by the desired standard deviation, then add the
desired mean.) */
double
prng_get_double_normal (void)
{
/* Knuth, _The Art of Computer Programming_, Vol. 2, 3.4.1C,
Algorithm P. */
static double next_normal = DBL_MAX;
double this_normal;

if (next_normal != DBL_MAX)
{
this_normal = next_normal;
next_normal = DBL_MAX;
}
else
{
double v1, v2, s;

do
{
double u1 = prng_get_double ();
double u2 = prng_get_double ();
v1 = 2.0 * u1 - 1.0;
v2 = 2.0 * u2 - 1.0;
s = v1 * v1 + v2 * v2;
}
while (s >= 1);

this_normal = v1 * sqrt (-2. * log (s) / s);
next_normal = v2 * sqrt (-2. * log (s) / s);
}

return this_normal;
}
----------------------------------------------------------------------
#ifndef PRNG_H_INCLUDED
#define PRNG_H_INCLUDED

#include <stddef.h>

void prng_seed (const void *, size_t);
unsigned char prng_get_octet (void);
unsigned char prng_get_byte (void);
void prng_get_bytes (void *, size_t);
unsigned long prng_get_ulong (void);
long prng_get_long (void);
unsigned prng_get_uint (void);
int prng_get_int (void);
double prng_get_double (void);
double prng_get_double_normal (void);

#endif /* prng.h */
----------------------------------------------------------------------
 
J

Johan Lindh

Ben said:
One issue that comes up fairly often around here is the poor
quality of the pseudo-random number generators supplied with many
C implementations. As a result, we have to recommend things like
using the high-order bits returned by rand() instead of the
low-order bits, avoiding using rand() for anything that wants
decently random numbers, not using rand() if you want more than
approx. UINT_MAX total different sequences, and so on.

So I wrote some PRNG code that uses RC4, with a seed of up to 256
bytes. Here it is. I believe it is clc-compliant. Comments on
this and anything else are welcome.

Niiice.

Q: Are you going to publish this somewhere more accessible, like
sourceforge or freshmeat?

Q: What platforms have you actually tested this on?

Q: Is there a test application lurking somewhere that you can publish?

/J
 
S

Scott Fluhrer

Ben Pfaff said:
One issue that comes up fairly often around here is the poor
quality of the pseudo-random number generators supplied with many
C implementations. As a result, we have to recommend things like
using the high-order bits returned by rand() instead of the
low-order bits, avoiding using rand() for anything that wants
decently random numbers, not using rand() if you want more than
approx. UINT_MAX total different sequences, and so on.

So I wrote some PRNG code that uses RC4, with a seed of up to 256
bytes. Here it is. I believe it is clc-compliant. Comments on
this and anything else are welcome.

Two comments:
- Since you are using RC4, if you really want to be cryptographically
secure, you'll need to discard some part of the keystream at initialization
time. Estimates as to how much you need to discard varies somewhat -- I've
seen estimates of 256 - 2048 bytes.

- I believe that the defined behavior for rand() is that if it is called
without seeding, it acts as if the user did srand(1). You don't; instead,
you seed it based on time.
 
B

Ben Pfaff

Johan Lindh said:
Niiice.

Q: Are you going to publish this somewhere more accessible, like
sourceforge or freshmeat?

I will put it up on my webpage at benpfaff.org/writings/clc when
I am satisfied with it.
Q: What platforms have you actually tested this on?

GNU/Linux i386 only.
Q: Is there a test application lurking somewhere that you can publish?

Not yet--I have only tested it in a cursory fashion. I have not
yet decided how much testing it should receive before release, or
what sort of test routine it should be packaged with.
Suggestions are welcome.
 
B

Ben Pfaff

Scott Fluhrer said:
Two comments:
- Since you are using RC4, if you really want to be cryptographically
secure, you'll need to discard some part of the keystream at initialization
time. Estimates as to how much you need to discard varies somewhat -- I've
seen estimates of 256 - 2048 bytes.

I'm intentionally not trying to make it cryptographically secure.
The goal is to be a better rand() for simulations etc., not to be
useful for cryptography. My comment at the top was intended to
say that, but perhaps it should be more prominent.
- I believe that the defined behavior for rand() is that if it is called
without seeding, it acts as if the user did srand(1). You don't; instead,
you seed it based on time.

That's also intentional. Usually the repeatability of rand() is
a surprise to beginners. Experts who want repeatability know how
to get it. Again, maybe this should be more prominent.
 
E

E. Robert Tisdale

Ben said:
One issue that comes up fairly often around here is the poor
quality of the pseudo-random number generators supplied with many
C implementations. As a result, we have to recommend things like
using the high-order bits returned by rand() instead of the
low-order bits, avoiding using rand() for anything that wants
decently random numbers, not using rand() if you want more than
approx. UINT_MAX total different sequences, and so on.

So I wrote some PRNG code that uses RC4, with a seed of up to 256
bytes. Here it is. I believe it is clc-compliant.
Comments on this and anything else are welcome.

[snip]

Why is your pseudo random number generator "better" than rand()?

Is it faster than rand()?

Have you run any performance benchmarks against rand()?
 
B

Ben Pfaff

Kevin D. Quitt said:
Just curious: why not use the mersenne twister?

It's too complicated for my taste, and I find it difficult to
prove to myself that its code is portable. I have no reason to
believe that it is not an excellent PRNG.
 
B

Ben Pfaff

E. Robert Tisdale said:
Why is your pseudo random number generator "better" than rand()?

Is it faster than rand()?

Have you run any performance benchmarks against rand()?

I forgot to mention in my original article that I don't feel
obligated to answer questions from wackos.
 
K

Kevin D. Quitt

Without casting asparagus on your effort, a paper in my archive from an
unknown source:

Most RNGs work by keeping a certain number, say k, of the most recently
generated integers, then return the next integer as a function of those
k. The initial k integers, the seeds, are assumed to be randomly
chosen, usually 32-bits. The period of the RNG is related to the number
of choices for seeds, usually 2^(32k), so to get longer periods you need
to increase k.

Probably the most common type has k=1, and needs a single seed, with
each new integer a function of the previous one. An example is this
congruential RNG, a form of which was the system RNG in VAXs for many
years:

static unsigned long x = 123456789; /* a random initial x to be */
/* assigned by the calling program */
unsigned long cong( void )
{
x = 69069 * x + 362437;
return x;
}

Simple, k=1, RNGs can perform fairly well in tests of randomness such as
those in the new version of Diehard, csis.hku.hk/~diehard but experience
has shown that better performances come from RNGs with k's ranging from
4 or 5 to as much as 4097.

Here is an example with k=5, period about 2^160, one of the fastest long
period RNGs, returns more than 120 million random 32-bit integers/second
(1.8MHz CPU), seems to pass all tests:

static unsigned long
x = 123456789,
y = 362436069,
z = 521288629,
w = 88675123,
v = 886756453;
/* replace defaults with five random seed values in calling program */
unsigned long xorshift(void)
{
unsigned long t;
t = x ^ (x>>7);
x = y;
y = z;
z = w;
w = v;
v = (v ^ (v << 6)) ^ (t ^ (t << 13));
return (y + y + 1)*v;
}


Another example has k=257, period about 2^8222. Uses a static array
Q[256] and an initial carry 'c', the Q array filled with 256 random
32-bit integers in the calling program and an initial carry c<809430660
for the multiply-with-carry operation. It is very fast and seems to
pass all tests.

static unsigned long Q[ 256 ], c = 362436;
// choose random initial c<809430660
// and 256 random 32-bit
// integers for Q[]
unsigned long MWC256(void)
{
unsigned long long t,a= 809430660LL;
static unsigned char i= 255;
t = a * Q[ ++i ] + c;
c = t >> 32;
Q[ i ] = t
return t;
}


The Mersenne Twister (check Google) is an excellent RNG, with k=624. But
it requires an elaborate C program and is slower than many RNGs that do
as well in tests, have comparable or longer periods and require only a
few lines of code.

Here is a complimentary-multiply-with-carry RNG with k=4097 and a
near-record period, more than 10^33000 times as long as that of the
Twister. (2^131104 vs. 2^19937)

static unsigned long Q[ 4096 ], c = 362436;
// choose random initial
// c<809430660 and 4096
// random 32-bit integers for Q[]
unsigned long CMWC4096( void )
{
unsigned long long t, a = 18782LL;
static unsigned long i = 4095;
unsigned long x, r = 0xfffffffe;
i = (i + 1) & 4095;
t = a * Q[ i ] + c;
c = t >> 32;
x = t + c;
if ( x < c )
{
x++;
c++;
}
Q[ i ] = r - x;
return Q;
}

You will find several more CMWC RNGs and comments on choice of seeds in
the May 2003 Communications of the ACM.

George Marsaglia
 
D

Dave Vandervies

One issue that comes up fairly often around here is the poor
quality of the pseudo-random number generators supplied with many
C implementations. As a result, we have to recommend things like
using the high-order bits returned by rand() instead of the
low-order bits, avoiding using rand() for anything that wants
decently random numbers, not using rand() if you want more than
approx. UINT_MAX total different sequences, and so on.

So I wrote some PRNG code that uses RC4, with a seed of up to 256
bytes. Here it is. I believe it is clc-compliant. Comments on
this and anything else are welcome.

[Everything I don't have a comment on has been snipped]
/* Take advantage of inlining if possible. */
#if __STDC_VERSION__ >= 199901L
#define INLINE inline
#else
#define INLINE
#endif

It might be worth putting compiler-specific inlines into the #else here,
so that (f'rexample) GCC in C89 mode would still get GCC's inlining.
(This may affect CLC-compliance, though.)

/* Returns a pseudo-random long in the range [0, LONG_MAX]. */
long
prng_get_long (void)
/* Returns a pseudo-random int in the range [0, INT_MAX]. */
int
prng_get_int (void)

How difficult would it be to get pseudo-random numbers in the ranges
[INT_MIN, INT_MAX] and [LONG_MIN, LONG_MAX]?



dave
 
B

Ben Pfaff

So I wrote some PRNG code that uses RC4, with a seed of up to 256
bytes. Here it is. I believe it is clc-compliant. Comments on
this and anything else are welcome.
/* Returns a pseudo-random long in the range [0, LONG_MAX]. */
long
prng_get_long (void)
/* Returns a pseudo-random int in the range [0, INT_MAX]. */
int
prng_get_int (void)

How difficult would it be to get pseudo-random numbers in the ranges
[INT_MIN, INT_MAX] and [LONG_MIN, LONG_MAX]?

Not hard, it's just that in my experience that's not often
useful.
 
E

E. Robert Tisdale

Ben said:
I forgot to mention in my original article that I don't feel
obligated to answer questions from wackos.

Your PRNG is a dog isn't it Ben?

My guess is that it's so slow that there are no practical uses for it.
 
K

Keith Thompson

Ben Pfaff said:
That's also intentional. Usually the repeatability of rand() is
a surprise to beginners. Experts who want repeatability know how
to get it. Again, maybe this should be more prominent.

So it's meant to be a clc-compliant pseudo-random number generator,
not a clc-compliant implementation of rand().

Which is of course precisely what you said it was, but it occurs to me
that a decent rand() implementation would also be a nice thing to
have. If it actually caught on, we could stop giving people so much
advice about how to work around the shortcomings of typical rand()
implementations (just as soon as all existing implementations vanish,
of course).
 
S

Simon Biber

E. Robert Tisdale said:
Why is your pseudo random number generator "better" than rand()?

Is it faster than rand()?

Have you run any performance benchmarks against rand()?

I ran some benchmarks.

On Cygwin gcc, Ben's PRNG takes 9.8 nanoseconds, whereas
the rand() provided in Cygwin takes 17.6 nanoseconds.

On LCC-Win32, Ben's PRNG takes 13.2 nanoseconds, whereas the
rand() provided takes 3.0 nanoseconds.

On BCC 5.5, Ben's PRNG does not compile, whereas the rand()
provided takes 0.95 nanoseconds.

The errors compiling Ben's PRNG were:
C:\docs\prog\c>bcc32 -c prng.c
Borland C++ 5.5.1 for Win32 Copyright (c) 1993, 2000 Borland
prng.c:
Warning W8008 prng.c 135: Condition is always true in function prng_get_byte
Warning W8066 prng.c 145: Unreachable code in function prng_get_byte
Error E2063 prng.c 244: Illegal initialization in function prng_get_double_normal
*** 1 errors in Compile ***

The error was at the line:
static double next_normal = DBL_MAX;
 
B

Ben Pfaff

Simon Biber said:
On BCC 5.5, Ben's PRNG does not compile, whereas the rand()
provided takes 0.95 nanoseconds.

The errors compiling Ben's PRNG were:
C:\docs\prog\c>bcc32 -c prng.c
Borland C++ 5.5.1 for Win32 Copyright (c) 1993, 2000 Borland
prng.c:
Warning W8008 prng.c 135: Condition is always true in function prng_get_byte
Warning W8066 prng.c 145: Unreachable code in function prng_get_byte
Error E2063 prng.c 244: Illegal initialization in function prng_get_double_normal
*** 1 errors in Compile ***

The error was at the line:
static double next_normal = DBL_MAX;

Bizarre. Can this be anything other than a bug in Borland C++?
 
E

E. Robert Tisdale

Simon said:
I ran some benchmarks.

On Cygwin gcc, Ben's PRNG takes 9.8 nanoseconds, whereas
the rand() provided in Cygwin takes 17.6 nanoseconds.

On LCC-Win32, Ben's PRNG takes 13.2 nanoseconds, whereas the
rand() provided takes 3.0 nanoseconds.

On BCC 5.5, Ben's PRNG does not compile, whereas the rand()
provided takes 0.95 nanoseconds.

Thanks Simon.

"Better" pseudo random number generators are generally slower
because they compute a larger state.
A simple pseudo random number generator like rand() is better
when all you really want to do is "mix things up" a bit.

There are other criteria for "better" PRNGs such as repeatability,
portability and uniformity.
 
J

Joona I Palaste

I'm intentionally not trying to make it cryptographically secure.
The goal is to be a better rand() for simulations etc., not to be
useful for cryptography. My comment at the top was intended to
say that, but perhaps it should be more prominent.

Why are you not trying to make it cryptographically secure? To avoid
it being classified as a weapon by the US army? =)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,764
Messages
2,569,567
Members
45,041
Latest member
RomeoFarnh

Latest Threads

Top