ideal interface for Random Number Generators?

O

orz

I am currently in the process of putting together a C++ library that
includes a variety of (pseudo-)random number generators (RNGs) and
related functionality, and I am interested in what people think the
best interface to an individual RNG instance would be. The current
interface has each RNG implemented as a class or template class with
the following methods:

//====== random number generating methods

Uint8 raw8 ();
Uint16 raw16 ();
Uint32 raw32 ();
Uint64 raw64 ();
//random bits - raw8() will return 0 to 255 with uniform probability,
//... raw16() will return 0 to 65535 with uniform probability, etc.

Uint32 randi ( Uint32 max );
Uint32 randi ( Uint32 min, Uint32 max ) {return randi(max-min) + min;}
//random integer - returns a uniform random integer less than "max"
and no less than "min".
//if no "min" is provided then the minimum is treated as zero
//note that there is no possible return value that could meet those
criteria if "min" is equal to "max"...
//... in that case, it will return any possible 32 bit value,
equivalent to raw32()
//also note that max must be greater than min...
//...if max is less than min then it will return any value that is >=
min OR < max
//that produces aproximately correct behavior in most cases when
negative signed integers
//... are used instead of the unsigned integers that these methods are
prototyped for

Uint64 randli ( Uint64 max );
Uint64 randli ( Uint64 min, Uint64 max ) {return randli(max-min) +
min;}
//random long integer - same as random integer except using 64 bit
values instead of 32 bit values

Uint32 randi_fast ( Uint32 max );
Uint32 randi_fast ( Uint32 min, Uint32 max ) {return randi_fast(max-
min) + min;}
//fast random integer - same as random integer, except optimized for
speed instead of correctness
//specifically, the distribution returned is only approximately
uniform instead of exactly uniform
//also, behavior when max == min is different - in that case, the
return value is
//... always equal min, just as if max was equal to min+1
//no 64 bit version is provided for the fast variant of randi

float randf ();
float randf (float max) {return randf() * max;}
float randf (float min, float max) {return randf() * (max-min) + min;}
//random floating point number - returns a uniform random floating
point number less than "max" and no less than "min"
//if no "min" is provided, then the minimum is treated as zero. if no
"max" is provided then the maximum is treated as 1

double randlf ();
double randlf (double max) {return randlf() * max;}
double randlf (double min, double max) {return randlf() * max;}
//random long floating point number - as random floating point
numbers, only double-precision instead of single

//====== end of random number generating methods

That's the basics for getting output from an RNG using this
interface.

I'm not at all sure about having the maximum values for randi
exclusive but the minimum values inclusive. Possibly randi (and
randi_fast and randli) should be changed to treat both max and min as
inclusive.

I'm also considering overloading to make more of those methods share a
common name, so that the compiler would automatically figure out which
to use. randi and randli could be combined, randf and randlf could be
combined, and conceivably randi and randf could be combined. However,
I'm a bit nervous about the extra ambiguity created by function
overloading, and in a few cases some of those functions would have to
get dropped (the zero parameter versions of randf and randlf in
particular). And I don't like the compile errors generated when the
compiler can't figure out which version to use.
Note this does not include any non-uniform distributions... the plan
is that for non-uniform distributions you use an external adaptor
object that takes an RNG as a parameter and uses it to generate
whatever distributions you want. That would look like this:
double a = distributions::gaussian(rng, mean, deviation);//
distributions::gaussian is a template function
They are not included inside the RNG class definitions because there
are a whole bunch of distributions that might be of interest to some
mathematician somewhere, and I don't want to clutter up the interface
with that many. But it would be possible to add one or two of the
most common distributions as methods of the RNGs if that was an
inconvenient interface to that functionality.


//====== state management methods

void seed32 (Uint32 seed);
void seed64 (Uint64 seed);
void seed (EntropyPool *seed);
//seeding functions
//seed the RNG with either a 32 bit integer, a 64 bit integer, or a
//... special object designed to help with special seeding
requirements


int serialize ( Uint8 *dest, int maxlength );
//writes the state to the destination pointer, provided that
//... the state would fit in to maxlength bytes or less
//if the state will not fit, it returns 0, otherwise it returns
//... the number of bytes written

int deserialize ( Uint8 *src, int maxlength );
//reads the state from the source pointer, provided that
//... maxlength bytes is sufficient to read the state from
//if maxlength is insufficient or if there is an error then it
//.. will return 0, otherwise it returns the # of bytes read

//====== end of state management methods

That's the basics for managing the state of an RNG using this
interface. Serialize and deserialize for saving state in a platform-
independant manner, seed for seeding, what more could you want?



Other details in the current interface:
The underlieing RNG engine definitions don't actually define all of
the standard methods... many are added by by adaptor templates that
convert the underlying RNG engine to this interface.
Some RNGs support additional special functionality (fast-forwarding,
rewinding, cryptographic checkpoints, etc).
The normal RNG interface has no virtual methods (and thus no vtable),
but you can get a virtualized version of any RNG, with all standard
interface methods available as virtual methods.


Also, there are currently 4 broad categories of ways RNGs are defined
in the library at the moment, which seems a bit excessive:

1. Standard RNGs:
Provides the complete standard interface as a simple class. Intended
for normal RNG use. No virtual methods or vtable. Only a few RNGs
algorithms are provided this way.
2. Raw RNGs:
The core of each RNG algorithm is provided as a simple class or
templated class, and it is adapted to the standard interface by a
system of templates. This is advantegous because it results in full
speed and functionality RNGs from minimal code, but it can also
produce slow compile times and possibly even extra sensitivity to
compiler bugs, so it's only really recommended for use in research
projects or other applications that may need a wide variety of RNG
implementations. A very large number of RNG algorithms are provided
this way - every algorithm included in the library, including
duplicates of those are are provided by other means.
3. Virtual RNGs:
Provides the complete standard interface through vtable, so that you
can pick which RNG you want to use at runtime. By default only a few
RNGs are available this way, but you can easily add support for any
Raw RNG via a simple template.
4. Inline-only RNGs:
Just in case you happen to like this stuff but don't want to link with
it or are unable to link with it, a very small number of RNG
algorithms are provided in an inline-only format that does not require
linking with this library. Inline-only RNGs may be missing some
functionality (like seeding from EntropyPool objects) because they are
completely stand-alone and don't rely on anything else from this
library except the integer type definitions (Uint8, Uint16, Uint32,
Uint64).

Unfortunately, I happen to be quite fond of each of those... I don't
want to have to deal with the long compile times of Raw RNGs in
ordinary applications that just happen to use RNGs, nor do I want
hundreds of RNG algorithms to each have to define as much code for
itself as the Standard RNGs, and certainly the virtual RNGs are needed
at least for research projects but the vtable calls have too much
overhead for normal use, and the inline-only RNGs are nice to have
around if there is difficulty building this library or linking to it
for some reason (or if you just don't want to link with it).

Other things to go in the library include:
statistical tests for RNG output
hash functions
statistical tests for hash functions
functions to adapt uniform random numbers to the gaussian distribution
cryptographic functions
tests for RNGs based upon automated analysis of their internal state
rather than of their output
etc
 
P

Puppet_Sock

Other things to go in the library include:
statistical tests for RNG output
[snip]

Do these belong in the library? Or do the naturally
go into a package that is provided with the library?
Say, a theory manual and some good source code to
implement the tests described in the theory manual.

Didn't see (though I admit to not looking all that
carefully) the ability to start the RNG over at
a specific item. Say I've got a program that has
been running for 300 hours of CPU (on a cluster
of about 30 CPUs) and it goes berzerk. I need to
be able to start the RNG over close to the spot
it went nuts, not have to fill the cluster for
days doing the creep-up-and-fail thing. And I don't
want to have to store huge amounts of internal state
of the RNG to do it. In the application described,
I may have pulled 100's of billions of numbers from
the RNG. If I had to store any significant state
from each of those, the data requirements would get
pretty annoying.
Socks
 
Ö

Öö Tiib

I am currently in the process of putting together a C++ library that
includes a variety of (pseudo-)random number generators (RNGs) and
related functionality, and I am interested in what people think the
best interface to an individual RNG instance would be.

Best interface is one that cooperates with upcoming standard. Standard
drafts are somewhat based on Boost.Random.

If your library has random engines (or distributions) with features
that TR1/boost does not implement, then best idea is perhaps to make
the engines (or distributions) that integrate there.

People think that on 99% cases the standard things are more than
plenty, so go ahead and prove them wrong. ;)
 
O

orz

Why bother? TR1/C++0x comes with RNGs.

/Leigh
Best interface is one that cooperates with upcoming standard. Standard
drafts are somewhat based on Boost.Random.

If your library has random engines (or distributions) with features
that TR1/boost does not implement, then best idea is perhaps to make
the engines (or distributions) that integrate there.

People think that on 99% cases the standard things are more than
plenty, so go ahead and prove them wrong. ;)

I had not actually realized that C++0x proposals included new RNG
stuff. It seems to be rather fond of the idea of a distribution & RNG
pair as a discrete object. I'd rather go lighter on templates than
they like, at least for the stuff not targeted at researchers... in my
experience using that much C++y sugar involves slow compile times and
encountering many many compiler bugs. Once C++0x comes out I'll
likely adapt my RNG framework to provide some degree of compatibility
with the framework from C++0x.
It's somewhat disappointing to see which RNG engines they incorporated
in Boost and TR1... mt19937 looks like their best, and it's not one of
my favorites by any means (large state size / slow seeding, medium
speed & quality (low quality relative to state size), and generally
reflects a mentality that emphasizes period length over any other
consideration).

After glancing through the Boost / TR1 RNG stuff, I'd say that my RNG
stuff is roughly midway between Boost / TR1 RNG stuff and L'Ecuyer's
TestU01 library. ( http://www.iro.umontreal.ca/~simardr/testu01/tu01.html
)

statistical tests for RNG output

[snip]

Do these belong in the library? Or do the naturally
go into a package that is provided with the library?
Say, a theory manual and some good source code to
implement the tests described in the theory manual.

Didn't see (though I admit to not looking all that
carefully) the ability to start the RNG over at
a specific item. Say I've got a program that has
been running for 300 hours of CPU (on a cluster
of about 30 CPUs) and it goes berzerk. I need to
be able to start the RNG over close to the spot
it went nuts, not have to fill the cluster for
days doing the creep-up-and-fail thing. And I don't
want to have to store huge amounts of internal state
of the RNG to do it. In the application described,
I may have pulled 100's of billions of numbers from
the RNG. If I had to store any significant state
from each of those, the data requirements would get
pretty annoying.
Socks

The unifying theme (to the extent that there is one) of the codebase
is research oriented, as much of it grew out of attempts to verify the
functions of other parts of it. It is vaguely similar to L'Ecuyer's
TestU01 library ( http://www.iro.umontreal.ca/~simardr/testu01/tu01.html
). However, the RNG interface, and to a lesser extent some other bits
and pieces, are intended to be as usable as any RNG code for ordinary C
++ coding unrelated to RNG research.

The ability to restart an RNG at a specific point is covered by the
"serialize" and "deserialize" methods. A few specific small-state
RNGs in my package also offer alternative state save/restore
interfaces for improved speed & convenience in some circumstances, but
the serialize/deserialize interface is the one recommended for typical
usage. In most applications there's only one RNG per thread, so the
total amount of state to save is tiny (the smallest serious RNGs
recommend for use on PCs in this package are about 16 bytes, the
largest around 16 kilobytes). Those rare applications that like to
have millions of RNGs might conceivably have some difficulty saving
all RNG states, but they're likely to have even more difficulty saving
all non-RNG states.
 
J

Juha Nieminen

In comp.lang.c++ orz said:
I am currently in the process of putting together a C++ library that
includes a variety of (pseudo-)random number generators (RNGs) and
related functionality, and I am interested in what people think the
best interface to an individual RNG instance would be.

RNGs have a state, and the idea of having one single state which is
global to the entire program (which is what eg. std::rand() does) is
braindead. It greatly reduces the usability of the RNG because you can't
have independent RNG streams which produce repeatable streams of values
independently of each other.

Since we are talking about C++, there's an idea solution to that problem:
Instead of having one single global state, the RNG is a class which has
its own internal state, and thus each instance of this class will produce
a RNG stream independently of all the other instances. Thus when you want
to produce random numbers, you do something along the lines of:

TheRNG rng(123);
for(int i = 0; i < 100; ++i) std::cout << rng.next() << std::endl;

Thus the best interface for a RNG is a class which can be constructed
with an initial seed, which can be re-seeded, and from which random values
can be pulled out.
 
O

orz

  RNGs have a state, and the idea of having one single state which is
global to the entire program (which is what eg. std::rand() does) is
braindead. It greatly reduces the usability of the RNG because you can't
have independent RNG streams which produce repeatable streams of values
independently of each other.

  Since we are talking about C++, there's an idea solution to that problem:
Instead of having one single global state, the RNG is a class which has
its own internal state, and thus each instance of this class will produce
a RNG stream independently of all the other instances. Thus when you want
to produce random numbers, you do something along the lines of:

    TheRNG rng(123);
    for(int i = 0; i < 100; ++i) std::cout << rng.next() << std::endl;

  Thus the best interface for a RNG is a class which can be constructed
with an initial seed, which can be re-seeded, and from which random values
can be pulled out.

Well, yes. I attempted to say in my first post that the codebase is
currently set up like that... sample code using it looks like:
RNGs::jsf32 rng;//a fast, medium quality, small-state RNG
rng.seed64(123);
for (int i = 0; i < 100; i++) std::cout << rng.randf() << std::endl;

That prints out some floating point numbers between 0 and 1. The
differences that and your sample are:
1. My constructors don't seed the RNGs. It seems a little easier to
structure my code if I don't insist that the state must be initialized
at the same time the RNG instance is constructed. I suppose I could
change that... it's no really in tune with the C++ way of doing things
I suppose.
2. My equivalent of "next()" requires that the user specify what kind
of distribution he wants next, he can't just ask the RNG for output
without specifying the form of that output.
 
A

aruzinsky

Too verbose. It would be easier for me to write my own random number
generator than read your instructions on how to use yours. And, then
I would also know whom to blame when something goes wrong.
 
Ö

Öö Tiib

I had not actually realized that C++0x proposals included new RNG
stuff. It seems to be rather fond of the idea of a distribution & RNG
pair as a discrete object.
I'd rather go lighter on templates than
they like, at least for the stuff not targeted at researchers... in my
experience using that much C++y sugar involves slow compile times and
encountering many many compiler bugs.

It is really not that heavy template wizardry like you seem to
suspect. Boost random distributions are ~80 lines with comments.
Random generators that substract_with_carry and lagged_fibonacci are
longest ~500 lines. I doubt that you will notice compile time slow
down using these.

Templates are least intrusive. Users want to encapsulate and hide the
details-schmetails anyway and have functions in interface that make
sense within their application like oneFrom(100) or dice(3,6)+4.

Bugs are most likely discovered for Boost since it has large user
base.

Performance issues you can never solve universally. When one has to
refactor away boost usage because of performance, he will likely go
raw all the way and not use some other library as well.
It's somewhat disappointing to see which RNG engines they incorporated
in Boost and TR1... mt19937 looks like their best, and it's not one of
my favorites by any means (large state size / slow seeding, medium
speed & quality (low quality relative to state size), and generally
reflects a mentality that emphasizes period length over any other
consideration).

You can add your own generator. You can propose it to boost. I bet
that some sort of chaos_of_orz engine in Boost.Random gives you more
fame and users than your own full library.

I was more hoping that you say that their interface lacks some
important functionality that people want to do with RNG. I feel that
they have plenty of engines.
 
O

orz

It is really not that heavy template wizardry like you seem to
suspect. Boost random distributions are ~80 lines with comments.
Random generators that substract_with_carry and lagged_fibonacci are
longest ~500 lines. I doubt that you will notice compile time slow
down using these.

Templates are least intrusive. Users want to encapsulate and hide the
details-schmetails anyway and have functions in interface that make
sense within their application like oneFrom(100) or dice(3,6)+4.

Bugs are most likely discovered for Boost since it has large user
base.

Performance issues you can never solve universally. When one has to
refactor away boost usage because of performance, he will likely go
raw all the way and not use some other library as well.


You can add your own generator. You can propose it to boost. I bet
that some sort of chaos_of_orz engine in Boost.Random gives you more
fame and users than your own full library.

I was more hoping that you say that their interface lacks some
important functionality that people want to do with RNG. I feel that
they have plenty of engines.
template-heavy:
I suspect we just have different standards of template-heavy. I
consider STL template-heavy, and have seen dramatically slower
compiles when equivalent non-STL projects are switched to STL (though
admittedly precompiled headers were not used... not sure how much
impact they have under what circumstances), and see comments in
portable STL implementations (and Boost IIRC) about working around
various compiler bugs. Despite all that I consider STL worth the
cost. Still, I prefer to minimize the amount of that kind of thing
that my projects have to #include... some of the RNGs intended for
research in this project use systems of templates that are a little
complex, but I'd prefer to only use small simple templates in the
interface intended for the most basic and common activities.

adding a generator to Boost:
I suppose I could propose a generator or two to Boost, but there's no
way they would be interested in most of this package, as it has lots
of code for doing things Boost has no interest in, like statistical
tests. All I could really do is stick in a few more appropriate
RNGs. Which might be worthwhile and I'll take a glance at Boosts
contribution process, but that's not really my focus. I suppose I
could modify my RNG interface to match Boosts, but that's a fairly
major change, and I'd prefer to offer an interface to basic
functionality that's less templated than Boost.

interface features missing from Boost / TR1:
I expect they'll support all the basic features. Although... I can't
actually find any kind of serialization / save & restore type
functionality in their interface... probably an omission they'll fix
sometime soon, or maybe it's there and I'm just not seeing it. And my
current codebase includes interfaces to allow any RNG to act as an
entropy accumulator, though in retrospect I've decided that that's not
a great idea and that portion of the interface is omitted from the
list of methods I posted above.
 
O

orz

Too verbose.  It would be easier for me to write my own random number
generator than read your instructions on how to use yours.  And, then
I would also know whom to blame when something goes wrong.
Yeah, I should probably trim the descriptions there down a bit.
 
Ö

Öö Tiib

template-heavy:
I suspect we just have different standards of template-heavy.

Oh. Yes, everybody have. :) I am also worried when heavy template
wizardry is used. In Boost.Random it seems just to be straightforward
for to parametrize and combine distribution and engine classes compile
time. That is classical usage of templates and i feel comfortable with
such. The major effect of such templates is that it produces lighter
results especially at low end and is still efficient.

Typical user is usally fine with maybe one or two combinations (so he
#includes and gets binary code only for these). For a power user who
simulates natural processes the virtual calls (or even callbacks)
between distribution and engine may be too slow.
adding a generator to Boost:
I suppose I could propose a generator or two to Boost, but there's no
way they would be interested in most of this package, as it has lots
of code for doing things Boost has no interest in, like statistical
tests.

Boost libraries have tests, but usually against bugs and not for
statistics. With your experience you probably would be of great
contributor even if you simply evaluated the efficiency of the
templates and compared it with your functions.
interface features missing from Boost / TR1:
I expect they'll support all the basic features.  Although... I can't
actually find any kind of serialization / save & restore type
functionality in their interface... probably an omission they'll fix
sometime soon, or maybe it's there and I'm just not seeing it.

The pseudo-random-generators are all seedable and streamable in Boost.
"streamable" means these have "ostream << engine" for serialization
and "istream >> engine" for deserialization. These ara usually enough
to save/restore, copy/paste or undo/redo.
 
J

Jorgen Grahn

["Followup-To:" header set to comp.lang.c++.]

On Mon, 2010-06-07, aruzinsky wrote:

[top-posting fixed]
....

Too verbose. It would be easier for me to write my own random number
generator than read your instructions on how to use yours. And, then
I would also know whom to blame when something goes wrong.

Uh, you *do* realize that creating high-quality PRNGs is a very tricky
task which requires lots of different skills, right?

/Jorgen
 
J

Jorgen Grahn

["Followup-To:" header set to comp.lang.c++.]

I am currently in the process of putting together a C++ library that
includes a variety of (pseudo-)random number generators (RNGs) and
related functionality, and I am interested in what people think the
best interface to an individual RNG instance would be. The current
interface has each RNG implemented as a class or template class with
the following methods:

//====== random number generating methods
....

Uint32 randi ( Uint32 max );
Uint32 randi ( Uint32 min, Uint32 max ) {return randi(max-min) + min;}
//random integer - returns a uniform random integer less than "max"
and no less than "min".

That is what I want in 99.9% of the cases. Except:

- Don't call it "max" if it's actually not in the range of the function.
If randi(4) can return 0--3, then 3 is max, not 4.
- Don't use home-made types! That Uint32 is a certain showstopper for
me. uint32_t has been semi-standard for over ten years.

Also, the objects should be copyable.

....
I'm not at all sure about having the maximum values for randi
exclusive but the minimum values inclusive. Possibly randi (and
randi_fast and randli) should be changed to treat both max and min as
inclusive.

Oh, here you discussed that. I'm so used to the [begin, end) idiom that
exclusive seems like the sane choice.

/Jorgen
 
O

orz

Oh. Yes, everybody have. :) I am also worried when heavy template
wizardry is used. In Boost.Random it seems just to be straightforward
for to parametrize and combine distribution and engine classes compile
time. That is classical usage of templates and i feel comfortable with
such. The major effect of such templates is that it produces lighter
results especially at low end and is still efficient.

Typical user is usally fine with maybe one or two combinations (so he
#includes and gets binary code only for these). For a power user who
simulates natural processes the virtual calls (or even callbacks)
between distribution and engine may be too slow.

I downloaded the latest Boost and glanced through it, and maybe I
spoke too soon on that being too template-heavy. The way they handle
distributions is lighter-weight than I thought from glancing at the
docs. All in all it's not really that different than my setup aside
from the detail that all distributions (except that native to the RNG
engine) are separate Boost, while only non-uniform ones are in mine,
and they're willing to incur a tiny bit of runtime overhead to allow
persistent RNG-distribution pairings.
Boost libraries have tests, but usually against bugs and not for
statistics. With your experience you probably would be of great
contributor even if you simply evaluated the efficiency of the
templates and compared it with your functions.

They could start by flipping through the paper L'Ecuyer published with
TestU01 to the list of results near the end and arbitrarily picking a
few RNGs listed as both passing all statistical tests and taking less
than 1.5 seconds on an Athlon 64 2.4 GHrz. Admittedly I think all of
those are large-state RNGs (there are relatively few good small-state
RNGs, and they tend to be more obscure too), but asside from that,
there's a dozen RNGs there that meet those criteria and have no major
drawbacks.
The pseudo-random-generators are all seedable and streamable in Boost.
"streamable" means these have "ostream << engine" for serialization
and "istream >> engine" for deserialization. These ara usually enough
to save/restore, copy/paste or undo/redo.

Ah, thanks. I see the serialization code now that I've downloaded the
package instead of browsing the online documentation. Though... it
seems oriented to serializing to a string of text describing RNG state
rather than to a binary representation as I do in my interface. Their
text serialization & deserialization will thus be slower and take more
memory, but will be (slightly) human-readable if you wanted to print a
serialized state out on the users screen. Still, 95% of users won't
use serialization, and 95% of the ones that do won't care about its
efficiency.
 
O

orz

["Followup-To:" header set to comp.lang.c++.]

I am currently in the process of putting together a C++ library that
includes a variety of (pseudo-)random number generators (RNGs) and
related functionality, and I am interested in what people think the
best interface to an individual RNG instance would be.  The current
interface has each RNG implemented as a class or template class with
the following methods:
//====== random number generating methods
...

Uint32 randi ( Uint32 max );
Uint32 randi ( Uint32 min, Uint32 max ) {return randi(max-min) + min;}
//random integer - returns a uniform random integer less than "max"
and no less than "min".

That is what I want in 99.9% of the cases. Except:

- Don't call it "max" if it's actually not in the range of the function.
  If randi(4) can return 0--3, then 3 is max, not 4.
- Don't use home-made types!  That Uint32 is a certain showstopper for
  me.  uint32_t has been semi-standard for over ten years.

Also, the objects should be copyable.

...
I'm not at all sure about having the maximum values for randi
exclusive but the minimum values inclusive.  Possibly randi (and
randi_fast and randli) should be changed to treat both max and min as
inclusive.

Oh, here you discussed that. I'm so used to the [begin, end) idiom that
exclusive seems like the sane choice.

/Jorgen

I don't have a good name besides "max". I'm currently leaning towards
[min..max] on integers (to match the Boost / TR1 standard and because
the corner cases get a little less confusing to document if it's don
that way), but [min..max) on floating point (I prefer it that way for
technical reasons), despite the small inconsistency.
Most (all?) of the RNG objects are copyable by either copy constructor
or serialization.
 
O

orz

Oh, on the subject or RNGs in Boost, their MT19937 implementation
looks rather odd. It basically includes two copies of the RNG state.
It looks like a clever optimization, but I don't see anything that it
actually optimizes.
 
T

Thomas J. Gritzan

Am 09.06.2010 17:52, schrieb orz:
Oh, on the subject or RNGs in Boost, their MT19937 implementation
looks rather odd. It basically includes two copies of the RNG state.
It looks like a clever optimization, but I don't see anything that it
actually optimizes.

From the source file:
// The goal is to always have x(i-n) ... x(i-1) available for
// operator== and save/restore.
(with i = <next word to output> and n = <# of words in state>)

You could store x(i)...x(i+n-1) instead, but you'ld had to precalculate
them each time you do a operator==.

(f'up to c.l.c++)
 
O

orz

Am 09.06.2010 17:52, schrieb orz:


From the source file:
  // The goal is to always have x(i-n) ... x(i-1) available for
  // operator== and save/restore.
(with i = <next word to output> and n = <# of words in state>)

You could store x(i)...x(i+n-1) instead, but you'ld had to precalculate
them each time you do a operator==.

(f'up to c.l.c++)

It is true that that makes efficient full equality checks a bit
easier. But...
1. Equality check is not a common operation on MT.
2. Even without the state double, full equality checks on MT instances
are not inefficient.
3. You could do mostly the same thing for equality checks with just a
single extra integer instead of 624 extras.
4. The naive approach to MT equality checks mostly works okay anyway.
Not perfectly, but I think the only ways to trip it up require either
running an instance of MT through an entire period (functionally
impossible to do without a skip-ahead type method, which is not
provided), or comparing two instances of MT originally seeded with
different seeds but then moved to the exact same position (on average
requires going through half of a full cycle, though it is possible to
do so with slightly less by way of bad luck, or a lot less if you pick
your seeds with great care).
5. The Boost random docs list MT as taking up the normal amount of
memory, not the doubled amount of memory.

Okay, now I'm just whining.
 
O

orz

The design of RNG serialization for TR1 includes use on non-homogeneous
systems. Binary representations just don't work for that.

If you're talking about endianness, I was presuming that endianness
would be taken care of in the serialization code. If you're talking
about floating point formats that's a bit harder but it could
certainly be taken care of more efficiently than conversion from and
to text. Why would binary RNG state serialization be harder?
 
K

Keith H Duggar

Because text transfer already deals with all the issues that you're
going to have to ferret out and manage with your hypothetical binary
protocol.

So it seems more a choice to utilize an existing subsystem (ie
using the text serialization code rather than designing second
(binary) format serialization code) rather than a "binary just
doesn't work" situation?

After all, text is just binary with a certain format. And text
is designed not only to be portable but notably to be readable
to humans. Surely if one removes the human readable constraint
one can design a more efficient format that is still portable
across modes, machines, etc.

And indeed such schemes have been designed before and deployed
widely for example xdr.

But then why should the C++ committee be concerned with this?
If there are already libraries or we are given the ability to
overload/override as we see fit then all is well.

Related question, will floating point special values (NaN, Inf,
etc) have standardized text serialization/deserialization?

KHD
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,013
Latest member
KatriceSwa

Latest Threads

Top