"hat" container class [C++]

A

AngleWyrm

I have created a new container class, called the hat.

It provides random selection of user objects, both with and without
replacement, with non-uniform probabilities. Uniform probabilities are a
degenerate case, and can also be stored.

Here's the web-page, with complete source, illustration, and example usage.
Take a look-see, and post your critique!

http://home.comcast.net/~anglewyrm/hat.html
 
K

Kai-Uwe Bux

AngleWyrm said:
I have created a new container class, called the hat.

It provides random selection of user objects, both with and without
replacement, with non-uniform probabilities. Uniform probabilities are a
degenerate case, and can also be stored.

Here's the web-page, with complete source, illustration, and example
usage. Take a look-see, and post your critique!

http://home.comcast.net/~anglewyrm/hat.html


I only had a very brief look at what is a pet subject of mine: the random
number generator:

template <class T>
int hat<T>::random(int range)
{
// make sure srand has been called before using rand
if( !rand_seeded )
{
srand( time(NULL) );
rand();
rand_seeded = true;
}
int random_number = rand() % range;

return random_number;
};


I would leave the choice of a random number generator to the client, like
it is done in std::random_shuffle. A second template parameter can be used
for that. You could also prove a constructor that takes an instance of an
RNG as an argument. You would have to provide a reasonable default, of
course. There are good reasons for not making this choice for the user. For
example:

a) There is a *huge* variety of random number generators out there. It is
impossible to predict for a library designer which of these will be good
for a given client problem.

b) It is a big advantage if the user can have independent instances of
random number generators.


Now for the one that you have chosen: relying on std::rand() from <cstdlib>
puts you at the mercy of the operating system. Your code assumes that the
low order bits of rand() are reasonably well distributed. Please note that
some random number generators (e.g. straight forward linear congruence
RNGs) have some troubles with the low order bytes. On some operating
systems those might still be in use. Thus it is good practice to use the
high order bytes instead like so (quoted from the man page):

In Numerical Recipes in C: The Art of Scientific Computing
(William H. Press, Brian P. Flannery, Saul A. Teukolsky,
William T. Vetterling; New York: Cambridge University
Press, 1992 (2nd ed., p. 277)), the following comments are
made:
"If you want to generate a random integer between 1
and 10, you should always do it by using high-order
bits, as in

j=1+(int) (10.0*rand()/(RAND_MAX+1.0));

and never by anything resembling

j=1+(rand() % 10);

(which uses lower-order bits)."

Now, on Linux, this advice is somewhat outdated. And it is to be hoped that
weak RNGs will eventually die out. However, I would reccommend to provide a
reasonable default random number generator of your choosind in the first
place for which the quality of low order bits is established.


BTW, nice job.


Best

Kai-Uwe
 
K

Kai-Uwe Bux

Kai-Uwe Bux said:
I only had a very brief look at what is a pet subject of mine: the random
number generator:

template <class T>
int hat<T>::random(int range)
{
// make sure srand has been called before using rand
if( !rand_seeded )
{
srand( time(NULL) );
rand();
rand_seeded = true;
}
int random_number = rand() % range;

return random_number;
};


I would leave the choice of a random number generator to the client, like
it is done in std::random_shuffle. A second template parameter can be used
for that. You could also prove a constructor that takes an instance of an
RNG as an argument. You would have to provide a reasonable default, of
course. There are good reasons for not making this choice for the user.
For example:

a) There is a *huge* variety of random number generators out there. It is
impossible to predict for a library designer which of these will be good
for a given client problem.

b) It is a big advantage if the user can have independent instances of
random number generators.


Now for the one that you have chosen: relying on std::rand() from
<cstdlib> puts you at the mercy of the operating system. Your code assumes
^^^^^^^^^^^^^^^^^
Oops: it is of course the vendor of your cstdlib. Still its a risk!
that the low order bits of rand() are reasonably well distributed. Please
note that some random number generators (e.g. straight forward linear
congruence RNGs) have some troubles with the low order bytes. On some
operating systems those might still be in use. Thus it is good practice to
use the high order bytes instead like so (quoted from the man page):

In Numerical Recipes in C: The Art of Scientific Computing
(William H. Press, Brian P. Flannery, Saul A. Teukolsky,
William T. Vetterling; New York: Cambridge University
Press, 1992 (2nd ed., p. 277)), the following comments are
made:
"If you want to generate a random integer between 1
and 10, you should always do it by using high-order
bits, as in

j=1+(int) (10.0*rand()/(RAND_MAX+1.0));

and never by anything resembling

j=1+(rand() % 10);

(which uses lower-order bits)."

Now, on Linux, this advice is somewhat outdated. And it is to be hoped
^^^^^^^^^^^^^

Same oops.
 
A

AngleWyrm

Kai-Uwe Bux said:
RNGs) have some troubles with the low order bytes. On some operating
systems those might still be in use. Thus it is good practice to use the
high order bytes instead like so (quoted from the man page):

In Numerical Recipes in C: The Art of Scientific Computing
(William H. Press, Brian P. Flannery, Saul A. Teukolsky,
William T. Vetterling; New York: Cambridge University
Press, 1992 (2nd ed., p. 277)), the following comments are
made:
"If you want to generate a random integer between 1
and 10, you should always do it by using high-order
bits, as in

j=1+(int) (10.0*rand()/(RAND_MAX+1.0));

and never by anything resembling

j=1+(rand() % 10);

(which uses lower-order bits)."

Now, on Linux, this advice is somewhat outdated. And it is to be hoped
that

Thanks for the input, I'm now working on a way to template-ize a
user-assigned function, which would have a default. Unfortunately
random_shuffle uses a rather coarse method: It simply defines two
functions, one with and one without. This seems so un-C++ to me that I
can't bring myself to recode a duplicate in the header. There's got to be a
better way. Still researching on that one.

As to the advice that was given about low-order bits...I've done some
research on this (in the form of executing code, and observing results) and
as it turns out std::rand()'s first return value is a sequential number
that steps by threes through RAND_MAX. My version anyway.

What I got was that if srand( time(NULL) ) is used as a seed, then it will
take about nine hours to cycle through the range.
 
K

Kai-Uwe Bux

AngleWyrm said:
...
As to the advice that was given about low-order bits...I've done some
research on this (in the form of executing code, and observing results)
and as it turns out std::rand()'s first return value is a sequential
number that steps by threes through RAND_MAX. My version anyway.

What I got was that if srand( time(NULL) ) is used as a seed, then it will
take about nine hours to cycle through the range.

Hi,

experiments are good. They will, however, only be telling you about your
particular platform. Other clients of your code might be less lucky.

Moreover, keep in mind that a long period is not the only thing that
matters in pseudo random number generation. You will also want subsequent
calls to random() to appear independent. For instance it would be bad if a
low return from random() strongly indicating that a high number would
result from the next call. There are various statistical tests that you
should perform before trusting an RNG. I very much recommend Volume 2 of
"The Art of Computer Programming" by D.E. Knuth. It has an in depth account
on how to measure the quality of RNGs. You might also want to have a look
at the random number generators from the Boost libraries. They represent
pretty much the state of the art.

Another property that is sometimes requested is that the RNG may not leak
information about its internal state. This is important if a random number
generator is used in a cryptoscheme for instance as a counter measure to
timing attacks. You assume that the adversary knows the algorithm of the
RNG and can observe the returns from the calls. It shall nevertheless be
computationally infeasible to deduce the internal state of the RNG or
predict successfully its next return.

Upshot: RNGs are pretty cool stuff, darn interesting, and easy to get
wrong. It is very important to provide a reasonable general purpose RNG as
the default but to leave the ultimate choice to client code.


Best

Kai-Uwe
 
S

Siemel Naran

Kai-Uwe Bux said:
Now for the one that you have chosen: relying on std::rand() from
puts you at the mercy of the operating system. Your code assumes that the
low order bits of rand() are reasonably well distributed. Please note that
some random number generators (e.g. straight forward linear congruence
RNGs) have some troubles with the low order bytes. On some operating
systems those might still be in use. Thus it is good practice to use the
high order bytes instead like so (quoted from the man page):

One of the tests says the green dice rolls 1 half of the time. So to test,
roll 6000 times, and about 3000 should be 1. Is this test sufficient? (I
imagine there's more to it though).

BTW, I wrote this class once. It might be useful. Don't know how good it
really is, but it seems to work.


// class crandom
// congruential random number generator
// idea from Wannier: Numerical Methods for Scientists and Engineers
// crandom<3> generates random numbers in [0,8) inclusive
template <unsigned LogRange>
class crandom
{
public:
crandom() : number(1) { ++(*this); }

unsigned lower() const { return 0 ; }
unsigned upper() const { return (1<<LogRange)-1; }

crandom& operator++()
{
number=(number*37)&((4<<LogRange)-1);
return *this;
}

const crandom operator++(int)
{
crandom tmp(*this);
++(*this);
return tmp;
}

unsigned operator*() const { return number>>2; }

private:
unsigned number;
};
 
S

Siemel Naran

I have created a new container class, called the hat.

It provides random selection of user objects, both with and without
replacement, with non-uniform probabilities. Uniform probabilities are a
degenerate case, and can also be stored.

Here's the web-page, with complete source, illustration, and example usage.
Take a look-see, and post your critique!

http://home.comcast.net/~anglewyrm/hat.html

The code looks good and well written, and well commented too, and the web
page is good too.

Kai's suggestion of seperating the random number generation with the logic
of the hat class is good too.

One of your tests says the green dice rolls 1 half of the time. So create a
new array of 10 elements,

{ 1, 1, 1, 1, 1, 2, 3, 4, 5, 6 )

The pick a random number in the range [0, 10) and that's your weighted
random number, with 1 showing up about 50% of the time.

In the actual implementation you don't need to create a new array of 10
elements. Instead you can just pick a number in the range [0, 10) where 10
is the sum of weights. The original array is { 1, 2, 3, 4, 5, 6 }. Numbers
[0,4] map to index 0, numbers [5,5] map to index 1, etc.

What benefit does the tree structure give you?
 
S

Siemel Naran

What I got was that if srand( time(NULL) ) is used as a seed, then it will
take about nine hours to cycle through the range.

What is the significance of this finding?
 
S

Siemel Naran

Kai-Uwe Bux said:
Upshot: RNGs are pretty cool stuff, darn interesting, and easy to get
wrong. It is very important to provide a reasonable general purpose RNG as
the default but to leave the ultimate choice to client code.

For random numbers can they use something in nature, like the number of
neutrons hitting the earth's atmosphere or something, which in principle may
not be random, but is practically so for us?
 
K

Kai-Uwe Bux

Siemel said:
For random numbers can they use something in nature, like the number of
neutrons hitting the earth's atmosphere or something, which in principle
may not be random, but is practically so for us?

Sure,

actually, I have the vague recollection that there are some hardware devices
out there that provide "true" random numbers. Some computers even have
physical RNGs built in. Usually the problems are that the raw random bits
are a little skewed (that it easy to corrected in software) and that
subsequent calls are tricky--you will have to wait for a certain amount of
time so that the device accumulates enough entropy (from heat or some
quantum yadda yadda) in order to ensure that the random bits it generates
are independent.

If you have such a device, your operating system needs to provide a driver
and a system call. Then you write a simple wrapper around it to integrate
it with C++. I do not have such a device, but I think it should be fun.

Some less fancy ways of obtaining more or less true random numbers is the
operating system watching various events that are seemingly unpredictable
like packages hitting IO ports, user interaction with mouse and keyboard,
etc. Linux turns that into /dev/random which you can read from.


There is one drawback to "true" random numbers though that can be serious:
you will not be able to reproduce identical behavior of your program from
one run to the next. Sometimes you will want identical yet random like
behavior. Then you should go for pseudo random numbers. Moreover, becauer
of non-reproducibility, true random number applications are harder to
debug. Thus in writing drafts of code, I would always prefer pseudo random
numbers.


Best

Kai-Uwe
 
K

Kai-Uwe Bux

Siemel said:
Kai-Uwe Bux said:
Now for the one that you have chosen: relying on std::rand() from
puts you at the mercy of the operating system. Your code assumes that the
low order bits of rand() are reasonably well distributed. Please note
that some random number generators (e.g. straight forward linear
congruence RNGs) have some troubles with the low order bytes. On some
operating systems those might still be in use. Thus it is good practice
to use the high order bytes instead like so (quoted from the man page):

One of the tests says the green dice rolls 1 half of the time. So to
test,
roll 6000 times, and about 3000 should be 1. Is this test sufficient? (I
imagine there's more to it though).

BTW, I wrote this class once. It might be useful. Don't know how good it
really is, but it seems to work.


// class crandom
// congruential random number generator
// idea from Wannier: Numerical Methods for Scientists and Engineers
// crandom<3> generates random numbers in [0,8) inclusive
template <unsigned LogRange>
class crandom
{
public:
crandom() : number(1) { ++(*this); }

unsigned lower() const { return 0 ; }
unsigned upper() const { return (1<<LogRange)-1; }

crandom& operator++()
{
number=(number*37)&((4<<LogRange)-1);
return *this;
}

const crandom operator++(int)
{
crandom tmp(*this);
++(*this);
return tmp;
}

unsigned operator*() const { return number>>2; }

private:
unsigned number;
};

Hi,


thanks a lot for this code. It provides an excellent illustration of the
weakness of low order bits. Let us just test the lowest one:

// class crandom
// congruential random number generator
// idea from Wannier: Numerical Methods for Scientists and Engineers
// crandom<3> generates random numbers in [0,8) inclusive
template <unsigned LogRange>
class crandom
{
public:
crandom() : number(1) { ++(*this); }

unsigned lower() const { return 0 ; }
unsigned upper() const { return (1<<LogRange)-1; }

crandom& operator++()
{
number=(number*37)&((4<<LogRange)-1);
return *this;
}

const crandom operator++(int)
{
crandom tmp(*this);
++(*this);
return tmp;
}

unsigned operator*() const { return number>>2; }

private:
unsigned number;
};

#include <iostream>

int main ( void ) {
crandom<16> R;
for ( unsigned int i = 0; i < 1000; ++i ) {
std::cout << ( *R % 2 );
}
++R;
}

output: 10101010101010...

It sure gives 1 half of the time, but it does not feel like very random,
does it?


And now for the most significant bit:

int main ( void ) {
crandom<17> R;
for ( unsigned int i = 0; i < 1000; ++i ) {
std::cout << ( *R / 65536 );
++R;
}
}

output: 000101011000111001110001100101001010000010010100100010111010110...

BTW, what is the meaning of lower and upper, and why is there no
constructor that allows to seed? Or am I missing something?


Thanks again

Kai-Uwe
 
S

Siemel Naran

Kai-Uwe Bux said:
BTW, what is the meaning of lower and upper, and why is there no
constructor that allows to seed? Or am I missing something?

The random number generates random numbers in the range [0, 2^N), depending
on the template argument N, and lower() returns 0 always, and upper returns
2^N. There is no seed (at present).
 
A

AngleWyrm

Siemel Naran said:
The code looks good and well written, and well commented too, and the web
page is good too.

Thanks :)
Kai's suggestion of seperating the random number generation with the logic
of the hat class is good too.

Agreed; I'll post a revision soon
One of your tests says the green dice rolls 1 half of the time. So create a
new array of 10 elements,

{ 1, 1, 1, 1, 1, 2, 3, 4, 5, 6 )

The pick a random number in the range [0, 10) and that's your weighted
random number, with 1 showing up about 50% of the time.

In the actual implementation you don't need to create a new array of 10
elements. Instead you can just pick a number in the range [0, 10) where 10
is the sum of weights. The original array is { 1, 2, 3, 4, 5, 6 }. Numbers
[0,4] map to index 0, numbers [5,5] map to index 1, etc.

What benefit does the tree structure give you?

If the objects being chosen are small (like ints in the above example), and
we are only going to randomly select with replacement (like rolling dice),
then the above implementation is fine. If the objects are larger (like
personnel records, or a drawing program's shapes, or DNA strands) then
having multiple copies becomes a time-space trade off. More importantly
though, is random selection _without_ replacement, such as drawing cards
from a deck, or lottery balls from a cage. This requires being able to
efficiently delete items from the set.

If the above representation were cards instead of die faces, and we draw
the #1 card, how then shall we remove it so that it isn't drawn again the
second time around? This would require the search for and removal of all
instances of #1 from the set. Even if we give the premise that the objects
are sorted while being inserted, we still wouldn't know how many were a
given item, and the worst-case performance would be examining all records
in order to delete, then deleting many records.

The balanced tree structure guarantees that at most log_2 records will be
examined. This is the number of bits it takes to represent the total number
of records. Thus if we have 32,768 items in the container, then the
worst-case performance would be examining 15 records, and then deleting
one.
 
A

AngleWyrm

AngleWyrm said:
Siemel Naran said:
In the actual implementation you don't need to create a new array of 10
elements. Instead you can just pick a number in the range [0, 10)
where
10 is the sum of weights. The original array is { 1, 2, 3, 4, 5, 6 }.
Numbers [0,4] map to index 0, numbers [5,5] map to index 1, etc.
What benefit does the tree structure give you?

Apoloties, yes, that solves the time-space trade off. Sorry didn't have my
coffee yet. Useful for selection with replacement.

The map would still require a search and possibly multiple deletes to
remove a record, if used for selection without replacement.
 
A

AngleWyrm

Kai-Uwe Bux said:
I only had a very brief look at what is a pet subject of mine: the random
number generator:

template <class T>
int hat<T>::random(int range)
{
// make sure srand has been called before using rand
if( !rand_seeded )
{
srand( time(NULL) );
rand();
rand_seeded = true;
}
int random_number = rand() % range;

return random_number;
};


I would leave the choice of a random number generator to the client, like
it is done in std::random_shuffle. A second template parameter can be used
for that. You could also prove a constructor that takes an instance of an
RNG as an argument. You would have to provide a reasonable default, of
course. There are good reasons for not making this choice for the user. For
example:

a) There is a *huge* variety of random number generators out there. It is
impossible to predict for a library designer which of these will be good
for a given client problem.

b) It is a big advantage if the user can have independent instances of
random number generators.

Good idea. The new version of the hat container class now supports
user-specifiable random number generators during construction of a hat
object. If none is specified, std::rand() is used as a default.

USAGE:
hat<my_object_type> my_hat; // uses std::rand()
hat<my_object_type> my_hat( my_rng ) // uses my_rng() for random number
generation

Any function that takes no arguments and returns an int may be used.

http://home.comcast.net/~anglewyrm/hat.html
 
A

AngleWyrm

Kai-Uwe Bux said:
Now for the one that you have chosen: relying on std::rand() from
puts you at the mercy of the operating system. Your code assumes that the
low order bits of rand() are reasonably well distributed. Please note that
some random number generators (e.g. straight forward linear congruence
RNGs) have some troubles with the low order bytes. On some operating
systems those might still be in use. Thus it is good practice to use the
high order bytes instead like so (quoted from the man page):

In Numerical Recipes in C: The Art of Scientific Computing
(William H. Press, Brian P. Flannery, Saul A. Teukolsky,
William T. Vetterling; New York: Cambridge University
Press, 1992 (2nd ed., p. 277)), the following comments are
made:
"If you want to generate a random integer between 1
and 10, you should always do it by using high-order
bits, as in

j=1+(int) (10.0*rand()/(RAND_MAX+1.0));

and never by anything resembling

j=1+(rand() % 10);

(which uses lower-order bits)."

Question: Is it reasonable to assume that a user-specified random number
generator will redefine RAND_MAX ? If so, then the above formula is
do-able; otherwise it will rely on an inconsistant constant.

Question #2: Is it appropriate to make this a constraint of user-specified
random number generators? "If you specify a random number generator, then
you must redefine RAND_MAX as necessary."
 
S

Siemel Naran

AngleWyrm said:
The pick a random number in the range [0, 10) and that's your weighted
random number, with 1 showing up about 50% of the time.

In the actual implementation you don't need to create a new array of 10
elements. Instead you can just pick a number in the range [0, 10) where 10
is the sum of weights. The original array is { 1, 2, 3, 4, 5, 6 }. Numbers
[0,4] map to index 0, numbers [5,5] map to index 1, etc.

What benefit does the tree structure give you?

If the objects being chosen are small (like ints in the above example), and
we are only going to randomly select with replacement (like rolling dice),
then the above implementation is fine. If the objects are larger (like
personnel records, or a drawing program's shapes, or DNA strands) then
having multiple copies becomes a time-space trade off. More importantly
though, is random selection _without_ replacement, such as drawing cards
from a deck, or lottery balls from a cage. This requires being able to
efficiently delete items from the set.

In the original algorithm, you don't have to copy objects. If you have to
copy anything, copy pointers (ie. make your implementation use pointers).
Moreover I think you can handle replacement too.

The original array is { 1, 2, 3, 4, 5, 6 } with indices 0 to 5. We make a
virtual array { 1, 1, 1, 1, 1, 2, 3, 4, 5, 6 } with indices 0 to 10. We
don't actually have to construct this array, but instead pretend it's there,
so there's no copying. When we pick a random number in the range 0 to 10,
say 3, this means the 3rd element of the virtual array which is 1. There's
an algorithm to determine which element to choose from the real array
(basically subtract numbers in a loop). If we need to remove this element
from the array we just change the virtual array to { 2, 3, 4, 5, 6 }.
Doubtless we may need an array of storage space to make this work
efficiently, but at worst we could just have an array of pointers, where
each pointer points to an object in the original array.

Numbers
[0,4] map to index 0, numbers [5,5] map to index 1, etc.
If the above representation were cards instead of die faces, and we draw
the #1 card, how then shall we remove it so that it isn't drawn again the
second time around? This would require the search for and removal of all
instances of #1 from the set. Even if we give the premise that the objects
are sorted while being inserted, we still wouldn't know how many were a
given item, and the worst-case performance would be examining all records
in order to delete, then deleting many records.

OK, so the element #1 may occur several times? That doesn't make sense. If
you wanted that you could just increase the weight of #1.
The balanced tree structure guarantees that at most log_2 records will be
examined. This is the number of bits it takes to represent the total number
of records. Thus if we have 32,768 items in the container, then the
worst-case performance would be examining 15 records, and then deleting
one.

OK.
 
K

Kai-Uwe Bux

AngleWyrm said:
Question: Is it reasonable to assume that a user-specified random number
generator will redefine RAND_MAX ? If so, then the above formula is
do-able; otherwise it will rely on an inconsistant constant.

Question #2: Is it appropriate to make this a constraint of user-specified
random number generators? "If you specify a random number generator, then
you must redefine RAND_MAX as necessary."


Hi,


I think, RAND_MAX is neither a reasonable expectation nor an appropriate
constraint. Any definition of this identifier will be a global constant and
thus not be tied to a particular instance of an RNG.

Personally, I prefer to go with the precedent established by
std::random_shuffle, and assume that the user supplied RGN will be a
function object R that provides the signature

unsigned long R ( unsigned long );

where R(n) returns a pseudo random number in [0,n). Compare std:25.2.11
[lib.alg.random.shuffle].

The random number generators from the Boost libraries can all be used in
this way as the library comes with an appropriate adapter. Most clients of
your class will probably use the default RNG of RNGs from that library.


Best

Kai-Uwe
 
K

Kai-Uwe Bux

Kai-Uwe Bux said:
Hi,


I think, RAND_MAX is neither a reasonable expectation nor an appropriate
constraint. Any definition of this identifier will be a global constant
and thus not be tied to a particular instance of an RNG.

Personally, I prefer to go with the precedent established by
std::random_shuffle, and assume that the user supplied RGN will be a
function object R that provides the signature

unsigned long R ( unsigned long );

where R(n) returns a pseudo random number in [0,n). Compare std:25.2.11
[lib.alg.random.shuffle].

The random number generators from the Boost libraries can all be used in
this way as the library comes with an appropriate adapter. Most clients of
your class will probably use the default RNG of RNGs from that library.


Best

Kai-Uwe

To embark on this a little, I have drafted some code that wraps around the
two RNGs proposed in this thread: Here are two classes RNG_cstdlib and
RNG_crandom. Both implement a constructor that does the seeding, and define

unsigned long operator( unsigned long );

The implementation of this operator uses higher order bits. This is
especially important in the case of RNG_crandom as the lower order bits
of that RNG are particularly weak.

The sample code below uses these classes to simulate a coin. It then
proceeds to measure the average length of runs. The expected result is 2.

It should be noted that RNG_crandom has consistently runs that are a little
too long. The reason is that the multiplyer 37 is way too small.

#include <cstdlib>

class RNG_cstdlib {
private:

static
const unsigned long rand_max = RAND_MAX;

public:

RNG_cstdlib ( void ) {}

RNG_cstdlib ( unsigned long seed ) {
std::srand( seed );
}

unsigned long operator() ( unsigned long bound ) {
/*
| We will use the higher order bits of rand().
| Moreover, we avoid floating point arithmetic.
| The cost is that we might have to call rand()
| multiple times.
*/
unsigned long int int_width = rand_max / bound;
unsigned long int max_valid = int_width * bound;
/*
| max_valid is at least rand_max / 2.
*/
/*
| The following loop will be traversed
| rand_max/max_valid < 2 times on the average:
*/
while ( true ) {
// in the worst case, we
unsigned long int raw_random ( std::rand() );
if ( raw_random < max_valid ) {
return( raw_random / int_width );
}
}
}

}; // RNG_cstdlib

class RNG_crandom {
private:

static
const unsigned long internal_max = 4<<30;

static
const unsigned long rand_max = 4<<28;

unsigned long number;

public:

RNG_crandom ( void ) :
number( std::rand() )
{}

RNG_crandom (unsigned long int seed ) :
number( seed + 1 )
{}

unsigned long operator() ( unsigned long bound ) {
unsigned long int int_width = rand_max / bound;
unsigned long int max_valid = int_width * bound;
/*
| max_valid is at least rand_max / 2.
*/
/*
| The following loop will be traversed
| rand_max/max_valid < 2 times on the average:
*/
while ( true ) {
number=(number*37)&( internal_max - 1 );
unsigned long raw_random ( number >>2 );
if ( raw_random < max_valid ) {
return( raw_random / int_width );
}
}
}

}; // RNG_crandom


template < typename RNG >
double average_length_of_runs ( unsigned long tries,
RNG R ) {
unsigned int count = 1;
unsigned int parity = R(2);
for ( unsigned int i = 0; i < tries; ++i ) {
unsigned int next = R(2);
while ( parity == next ) {
++ count;
next = R(2);
}
parity = next;
++count;
}
return( double( count ) / double( tries ) );
}

#include <iostream>

int main ( void ) {
for ( unsigned long seed = 0; seed < 1000000; seed += 12334 ) {
std::cout << average_length_of_runs( 1000000, RNG_cstdlib( seed ) ) <<
" ";
std::cout << average_length_of_runs( 1000000, RNG_crandom( seed ) ) <<
std::endl;
}
}
 
A

AngleWyrm

Siemel Naran said:
OK, so the element #1 may occur several times? That doesn't make sense. If
you wanted that you could just increase the weight of #1.

Sorry, maybe not very clear. It might help if we used chars for the items,
in order to distinguish the items from a virtual lookup table. Let's
redefine the set of objects as {'a', 'b', 'c', 'd', 'e', 'f' }, and say
that 'a' has a 50% chance of being selected. The proposed virtual array
might be { 1, 1, 1, 1, 1, 2, 3, 4, 5, 6 }, which refer to item numbers. And
further, let's say that we rolled a three out of ten.

Can item #1 be removed from the virtual mapping array in less
examinations/calculations than is linearly proportional to the number of
records?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,774
Messages
2,569,599
Members
45,175
Latest member
Vinay Kumar_ Nevatia
Top