RFC: clc-compliant pseudo-random number generator

D

Dan Pop

In said:
/* Take advantage of inlining if possible. */
#if __STDC_VERSION__ >= 199901L

Any use of __STDC_VERSION__ invokes undefined behaviour in a C90
implementation. Such an implementation can use it as a magic
identifier during preprocessing (e.g. the preprocessor's nasal demon
generator ;-) or simply define it as a macro with a value greater
than 199901L.

If you want clc-compliance, drop it.

Dan
 
E

Eric Sosman

E. Robert Tisdale said:
Your PRNG is a dog isn't it Ben?

My guess is that it's so slow that there are no practical uses for it.

If speed is the criterion of goodness for pseudo-
random number generators, this one should win:

#define RAND() 0

You might also be interested in my super-fast versions
of the trigonometric functions:

#define SIN(x) ( (x), 0 )
#define COS(x) ( (x), 0 )
...

.... and in my blindingly fast memory management implementation:

#define MALLOC(bytes) ( (bytes), 0 )
#define CALLOC(bytes) ( (bytes), 0 )
#define REALLOC(ptr,bytes) ( (ptr), (bytes), 0 )
#define FREE(ptr) (void)(ptr)
 
J

Joona I Palaste

If speed is the criterion of goodness for pseudo-
random number generators, this one should win:
#define RAND() 0
You might also be interested in my super-fast versions
of the trigonometric functions:
#define SIN(x) ( (x), 0 )
#define COS(x) ( (x), 0 )
...
... and in my blindingly fast memory management implementation:
#define MALLOC(bytes) ( (bytes), 0 )
#define CALLOC(bytes) ( (bytes), 0 )

This implementation might be blindingly fast, but it does not follow
the specification. It should be:

#define CALLOC(number, bytes) ( (number), (bytes), 0 )
 
M

Michael Wojcik

Without casting asparagus on your effort, a paper in my archive from an
unknown source:

Most RNGs work by keeping a certain number, say k, of the most recently
generated integers, then return the next integer as a function of those
k. ...

That's an article posted by George Marsaglia to comp.lang.c on 13 May 2003.
(I kept a copy of it as well; it's a nice piece.)

On 3 April 2003 George posted another article on CMWC (Complementary
Multiply with Carry) PRNGs which was also quite interesting.
 
E

Eric Sosman

Joona said:
This implementation might be blindingly fast, but it does not follow
the specification. It should be:

#define CALLOC(number, bytes) ( (number), (bytes), 0 )

Thanks for the correction; I'll issue a patch immediately.
Watch for the CERT advisory.

(Can you tell that I don't use calloc() much?)
 
H

Hallvard B Furuseth

Dan said:
Any use of __STDC_VERSION__ invokes undefined behaviour in a C90
implementation. (...) If you want clc-compliance, drop it.

In particular since it's only used for swap_byte(), which is
only used a few places that just as well can use

#define SWAP_BYTE(x, y) \
{ \
unsigned char tmp__SWAP_BYTE = (x); \
(x) = (y); \
(y) = tmp__SWAP_BYTE; \
}

(Yes, I know you could put do...while(0) around it so it will 'expect'
to be followed by a semicolon, but I think the while(0) gives a warning
with some compilers. There is no need to do that in this code.)
 
K

Kevin Bracey

Any use of __STDC_VERSION__ invokes undefined behaviour in a C90
implementation. Such an implementation can use it as a magic
identifier during preprocessing (e.g. the preprocessor's nasal demon
generator ;-) or simply define it as a macro with a value greater
than 199901L.

Ok, thanks for the DeathStation programming advice.

On a more practical note, what WOULD a reasonable value for a compiler in C90
mode to set __STDC_VERSION__ to, if it wanted to set it? (One reason for
wanting to set it might be to avoid warnings about it being an undefined
identifier treated as 0 in #if).

Presumably 1990xxL would make sense, but what is "xx"? Which month was C90
published (or reached whichever stage was equivalent to 199409/199901)? I
suppose 199000L would be safe whatever.
 
B

Ben Pfaff

Joona I Palaste said:
Why are you not trying to make it cryptographically secure? To avoid
it being classified as a weapon by the US army? =)

Because if I intend it to be cryptographically secure, then a bug
is an important security hole. If I don't, a bug just means that
the output is less random. I prefer to leave cryptography to the
cryptographers.
 
B

Ben Pfaff

Any use of __STDC_VERSION__ invokes undefined behaviour in a C90
implementation. Such an implementation can use it as a magic
identifier during preprocessing (e.g. the preprocessor's nasal demon
generator ;-) or simply define it as a macro with a value greater
than 199901L.

Only an actively malicious compiler would do so. I'm not too
interested in those.
 
J

Joona I Palaste

Because if I intend it to be cryptographically secure, then a bug
is an important security hole. If I don't, a bug just means that
the output is less random. I prefer to leave cryptography to the
cryptographers.

So if it happens to be cryptographically secure anyway, good for it, but
you won't be losing sleep if it doesn't?
 
H

Hallvard B Furuseth

It might be an idea to provide a thread_safe version: Append '_r' to
the function names and add an extra parameter (a struct containing
the current static variables). The current function names would be
an interface to it, with their own static struct.

Ben Pfaff wrote:

About prng_seed:
/* If KEY is non-null and SIZE is nonzero, seeds the
pseudo-random number generator based on the SIZE bytes in BUF.
At most the first 256 bytes of BUF are used. If KEY is a null
pointer and SIZE is zero, seeds the pseudo-random number
generator based on the current time.

I think it would be both cleaner and simpler to base the choice of
whether or not to use time() on just one parameter:

Seeds the pseudo-random number generator.
If KEY is null, the seed is based on the current time.
Otherwise, SIZE must be nonzero and the seed is based on the SIZE
bytes in KEY. At most the first 256 bytes of KEY are used.
prng_get_byte (void)
{
if (UCHAR_MAX == 255)

`#if UCHAR_MAX == 255' avoids warnings about the if() test always being
true/false.
/* Handle oversize bytes. */
unsigned byte = prng_get_octet ();
unsigned done = 255;
while ((unsigned char) done != UCHAR_MAX)
{
byte = (byte << 8) | prng_get_octet ();
done = (done << 8) | 255;
}

You don't need to shift 'done'. done++ should be a bit faster:

unsigned done = 1;
do {
byte = (byte << 8) | prng_get_octet ();
} while (++done < (CHAR_BIT+7)/8);

The same applies to prng_get_ulong() and prng_get_uint().

#if CHAR_BIT % 8 != 0, you could avoid some calls to prng_get_octet() by
saving the unused bits and use them in the next prng_get_byte() call.
(If you do, remember to reset this value in prng_seed().)
prng_get_long (void)
{
for (;;)
{
unsigned ulng = prng_get_ulong ();
if (ulng <= LONG_MAX)
return ulng;

This should be enough:

return prng_get_ulong () & LONG_MAX;

Similar for prng_get_int().
/* Returns a pseudo-random floating-point number from the uniform
distribution with range [0,1). */
double
prng_get_double (void)
{
for (;;)
{
unsigned long ulng = prng_get_ulong ();
if (ulng != ULONG_MAX)
return (double) ulng / ULONG_MAX;

This can return 1, since floating point arithmetic is not exact.

double ret = (double) prng_get_ulong () / ULONG_MAX;
if (ret < 1)
return ret;
 
B

Ben Pfaff

All good feedback. Thanks Hallvard, I'll incorporate these
suggestions into the next version.
 
H

Hallvard B Furuseth

Scott said:
- Since you are using RC4, if you really want to be cryptographically
secure, you'll need to discard some part of the keystream at initialization
time. Estimates as to how much you need to discard varies somewhat -- I've
seen estimates of 256 - 2048 bytes.

Why is that? Are the first bytes less random than later ones?
If so, prng_seed() should discard the initial output even if it's
not meant to be cryptographically secure.
 
S

Scott Fluhrer

Hallvard B Furuseth said:
Why is that? Are the first bytes less random than later ones?
Well, no. It's just that by looking at the first bytes, you can get some
hints at what the seed (key) was. In addition, if you have enough initial
outputs from related seeds, well, you can figure out what the seeds are.
If so, prng_seed() should discard the initial output even if it's
not meant to be cryptographically secure.
If you're interested in "randomness" (that is, being able to pass natural
looking statistical tests) and not cryptographical security (that is, being
able to pass any test an attacker can think of), there's no reason to
discard the initial output.

In any case, there are known tests that are able to distinguish an RC4
keystream output from random given about a Gigabyte of output (even with
discarding the initial output), and so (by cryptographical standards) this
generator is somewhat limited in any case.
 
S

Scott Fluhrer

Scott Fluhrer said:
Well, no. It's just that by looking at the first bytes, you can get some
hints at what the seed (key) was. In addition, if you have enough initial
outputs from related seeds, well, you can figure out what the seeds are.

If you're interested in "randomness" (that is, being able to pass natural
looking statistical tests) and not cryptographical security (that is, being
able to pass any test an attacker can think of),
Sigh, I'm being sloppy here, so I better correct myself before someone else
does. It's not literally any test, but any test that can be computed within
a limited computational time (for example, 2**64 operations, for some
definition of 'operations').
 
C

CBFalconer

Eric said:
Thanks for the correction; I'll issue a patch immediately.
Watch for the CERT advisory.

Are you putting those in the public domain, so that Trollsdale can
use them freely?
 
P

Peter Nilsson

Eric Sosman said:
Thanks for the correction; I'll issue a patch immediately.
Watch for the CERT advisory.

(Can you tell that I don't use calloc() much?)

Whilst your repairing the module, note also that [m/c/re]alloc return
void *, not int...

#define MALLOC(bytes) (bytes, (void *) 0)

Add parentheses to taste... ;)
 
A

Allin Cottrell

Ben said:
It's too complicated for my taste, and I find it difficult to
prove to myself that its code is portable. I have no reason to
believe that it is not an excellent PRNG.

It seems to me that Ben has done a nice job here. His PRNG passes
George Marsaglia's "diehard" test with flying colors.

http://stat.fsu.edu/pub/diehard/

(Last time I checked, my local standard rand() did _not_ pass; the
Mersenne Twister does fine).

In case you're interested, Ben, I've put a little archive of the
diehard test suite as it applies to your code at

http://www.ecn.wfu.edu/~cottrell/pfaff_prng.tgz (60Kb)
 
P

Peter Nilsson

Hallvard B Furuseth said:
It might be an idea to provide a thread_safe version: Append '_r' to
the function names and add an extra parameter (a struct containing
the current static variables). The current function names would be
an interface to it, with their own static struct.

Ben Pfaff wrote:

About prng_seed:

I think it would be both cleaner and simpler to base the choice of
whether or not to use time() on just one parameter:

Seeds the pseudo-random number generator.
If KEY is null, the seed is based on the current time.
Otherwise, SIZE must be nonzero and the seed is based on the SIZE
bytes in KEY. At most the first 256 bytes of KEY are used.

Why not change the size_t SIZE parameter to unsigned and, if KEY is
null and SIZE is non zero, seed on SIZE...?

if (!key && size)
{
key = &size;
size = sizeof(size);
}

That way, there is a simple mechanism for seeding by direct integers.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,774
Messages
2,569,596
Members
45,128
Latest member
ElwoodPhil
Top