Random Seeding

Discussion in 'C Programming' started by porterboy76@yahoo.com, May 15, 2006.

  1. Guest

    If you only use a 32 bit seed for a random number generator,
    does that mean you can only ever produce a maximum of
    2^32 (approx 4 billion) different sequences?

    What about the Mersenne Twister, with it's massive period
    of 2^19937-1. Will you only ever have access to a tiny
    portion of this ring of numbers, if you only use a 32-bit seed?
    Will this set of sequences be confined to the beginning of the
    period, ie, your sequence will have to start at some place
    between index number 1 and 4 billion of the period (ignores
    all the possible data after this)?.

    Or have I misunderstood the concept of a seed?

    (I am using the original c code from...
    http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/emt.html)
     
    , May 15, 2006
    #1
    1. Advertising

  2. Guest

    I would suggest that your possibly get 2^32 sequences with 2^19937-1
    number
    of elements in each.
     
    , May 15, 2006
    #2
    1. Advertising

  3. Guest

    > (ignores all the possible data after this)?.

    this should have read:

    (ignores all the possible "starting points" after this)?.

    >I would suggest that your possibly get 2^32 sequences with 2^19937-1
    >number of elements in each.


    That's kind of what I thought. It seems fairly limited.
    However, I just realised the Mersenne Twister c code
    provided by it's inventors allows for longer seeds.

    "Those who need initial seed with more than 32-bit
    length may use init_by_array() for initialization which
    admits an array of arbitrary length as seeds"

    I guess you could use an array seeded with the time,
    the process ID, and some compressed info from
    /dev/random or /dev/audio. Still I imagine it would
    still be very difficult to guarantee starting points
    distibuted uniformly around the ring of numbers?
     
    , May 15, 2006
    #3
  4. >If you only use a 32 bit seed for a random number generator,
    >does that mean you can only ever produce a maximum of
    >2^32 (approx 4 billion) different sequences?


    Yes, assuming you always start a sequence with the first number
    generated after seeding. It may be possible to use a "secondary
    seed" by getting and throwing away N pseudo-random numbers before
    using one, which, depending on the generator, may or may not improve
    anything.

    >What about the Mersenne Twister, with it's massive period
    >of 2^19937-1. Will you only ever have access to a tiny
    >portion of this ring of numbers, if you only use a 32-bit seed?


    It is quite possible to cripple a good pseudo-random number generator
    with a poor seed.

    >Will this set of sequences be confined to the beginning of the
    >period, ie, your sequence will have to start at some place
    >between index number 1 and 4 billion of the period (ignores
    >all the possible data after this)?.


    I don't think there's any guarantee of that, and it would depend
    on the generator.

    Gordon L. Burditt
     
    Gordon Burditt, May 15, 2006
    #4
  5. In article <> writes:
    > If you only use a 32 bit seed for a random number generator,
    > does that mean you can only ever produce a maximum of
    > 2^32 (approx 4 billion) different sequences?


    Depends on the generator. With some you may generate entirely different
    sequences, with others you just generate different starting points in
    a single sequence.

    > What about the Mersenne Twister, with it's massive period
    > of 2^19937-1. Will you only ever have access to a tiny
    > portion of this ring of numbers, if you only use a 32-bit seed?


    No, you will ultimately get the whole range. If I understand it well,
    MT uses a linear congruence method, and they inherently have only a
    single loop as result.

    > Will this set of sequences be confined to the beginning of the
    > period, ie, your sequence will have to start at some place
    > between index number 1 and 4 billion of the period (ignores
    > all the possible data after this)?.


    Not at all. In the Mersenne Twister you have a state of 19937 bits.
    The state loops around all those different states, but when the state
    has the appropriate 19905 bits equal to 0 (representing the 32 bit
    starting points) is not clear. So you do indeed jump in a different
    position of the generator, but those positions are not regularly
    spaced around the loop.
    --
    dik t. winter, cwi, kruislaan 413, 1098 sj amsterdam, nederland, +31205924131
    home: bovenover 215, 1025 jn amsterdam, nederland; http://www.cwi.nl/~dik/
     
    Dik T. Winter, May 15, 2006
    #5
  6. writes:
    >> (ignores all the possible data after this)?.

    >
    > this should have read:
    >
    > (ignores all the possible "starting points" after this)?.
    >
    >>I would suggest that your possibly get 2^32 sequences with 2^19937-1
    >>number of elements in each.

    >
    > That's kind of what I thought. It seems fairly limited.
    > However, I just realised the Mersenne Twister c code
    > provided by it's inventors allows for longer seeds.
    >
    > "Those who need initial seed with more than 32-bit
    > length may use init_by_array() for initialization which
    > admits an array of arbitrary length as seeds"
    >
    > I guess you could use an array seeded with the time,
    > the process ID, and some compressed info from
    > /dev/random or /dev/audio. Still I imagine it would
    > still be very difficult to guarantee starting points
    > distibuted uniformly around the ring of numbers?


    <OT>
    If you have /dev/random, why would you bother with the time, the
    process ID, or /dev/audio?

    For that matter, if you have /dev/random, it's not entirely clear you
    need to bother with a Mersenne Twister (but the question is beyond my
    expertise).
    </OT>

    --
    Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
    San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
    We must do something. This is something. Therefore, we must do this.
     
    Keith Thompson, May 15, 2006
    #6
  7. Guest

    Well, that kind of leads into the next question I
    was going to ask. How can I make the seeding
    work on all platforms? I have /dev/random on
    my linux box, but not on Windows (and I havent
    checked if its on my MAC, which is BSD based).

    Does Windows have an Entropy Pool like
    /dev/random that can be accessed by C.
    (OK, I guess that's slightly off topic).

    More on topic, do I have to use #ifdef statements
    to provide for entropy pools on different platforms?
    I was just going to use the functions provided by
    #include <time.h>, but I'm not happy that it gives
    me a wide enough range of seeds.

    (BTW, /dev/random may or may not be "random enough"
    compared to the MT, but it more than likely is not fast
    enough, since the disk must be accessed).
     
    , May 15, 2006
    #7
  8. Eric Sosman Guest

    Keith Thompson wrote On 05/15/06 16:13,:
    > writes:
    >
    >>>(ignores all the possible data after this)?.

    >>
    >>this should have read:
    >>
    >>(ignores all the possible "starting points" after this)?.
    >>
    >>
    >>>I would suggest that your possibly get 2^32 sequences with 2^19937-1
    >>>number of elements in each.

    >>
    >>That's kind of what I thought. It seems fairly limited.
    >>However, I just realised the Mersenne Twister c code
    >>provided by it's inventors allows for longer seeds.
    >>
    >>"Those who need initial seed with more than 32-bit
    >>length may use init_by_array() for initialization which
    >>admits an array of arbitrary length as seeds"
    >>
    >>I guess you could use an array seeded with the time,
    >>the process ID, and some compressed info from
    >>/dev/random or /dev/audio. Still I imagine it would
    >>still be very difficult to guarantee starting points
    >>distibuted uniformly around the ring of numbers?

    >
    >
    > <OT>
    > If you have /dev/random, why would you bother with the time, the
    > process ID, or /dev/audio?
    >
    > For that matter, if you have /dev/random, it's not entirely clear you
    > need to bother with a Mersenne Twister (but the question is beyond my
    > expertise).
    > </OT>


    <OT>

    A reproducible seed (and a subsequent reproducible
    sequence) can be of value. Consider testing a module by
    throwing pseudo-random data at it; when a failure turns
    up you'd like to be able to regenerate the exact same data
    as part of verifying your fix.

    For the second matter, data rate is generally the issue.
    It takes a significant amount of time for a physical source
    to generate random data and pass it through assorted bias-
    removing filters. If you try to read a zillion random numbers
    from such a source, you may need to wait quite a while. It's
    more common to use /dev/random or whatever to concoct a "truly
    random" seed for a deterministic pseudo-random generator that
    can generate subsequent numbers at computer speeds. For even
    less predictability, the "truly random" source can also be
    used to perturb the deterministic generator now and then.

    Oddly enough, D.H. Lehmer used something very like this
    perturbation technique in his original (and oft-criticized)
    linear congruential generator. According to Knuth:

    It is not fair to blame Lehmer for using a "bad"
    random-number generator in 1948, since his actual
    use of the generator was quite valid. The ENIAC
    computer was a highly parallel machine, programmed
    by means of a plugboard; Lehmer set it up so that
    one of its accumulators was repeatedly multiplying
    its own contents by 23, mod 10^8+1, yielding a new
    value every few microseconds. Since this multiplier
    23 is too small, we know that each value obtained
    by this process was too strongly related to the
    preceding value to be considered sufficiently random;
    but the durations of time between actual uses of the
    values in the special accumulator by the accompanying
    program were comparatively long and subject to some
    fluctuation. So the effective multiplier was 23^k
    for large, varying values of k!

    -- "The Art of Computer Programming, Volume II:
    Seminumerical Algorithms," section 3.3.1

    Similar techniques can be found nowadays in games, where
    it is common to generate and discard a pseudo-random value
    each time the program polls its input devices; the eventual
    sequence used in the main part of the program thus depends in
    part on the delays in keystroke, mouse, or joystick timings,
    generally considered unpredictable if not downright twitchy.

    </OT>

    --
     
    Eric Sosman, May 15, 2006
    #8
  9. In article <> writes:
    > (BTW, /dev/random may or may not be "random enough"
    > compared to the MT, but it more than likely is not fast
    > enough, since the disk must be accessed).


    No, /dev/random does *not* imply disk access. It is a pseudo device,
    just like /dev/zero. The random numbers are generated within the
    kernel, but I can find no place where it is stated how. It appears
    that most systems that implement it use statistics from kernel
    interrupts or whatever to get the information. In general it is
    cryptographically secure (which MT is not), but there is no way to
    ascertain whether it complies to standard randomness tests.

    So regarding your earlier question:
    > Does Windows have an Entropy Pool like
    > /dev/random that can be accessed by C.


    /dev/random does not come from an "entropy pool", it just generates
    random numbers within the kernel.

    When you need various subsequent random numbers /dev/random is a
    very bad idea. It appears to be good to generate just a single
    random number. When you need more, you can use the output as a
    seed for a standard random number generator (e.g. MT).

    When you are so bothered about the randomness of your numbers, you
    really should study the way the generators work. When you do that
    you will also appreciate that there are no generators that cover
    all possibilities.
    --
    dik t. winter, cwi, kruislaan 413, 1098 sj amsterdam, nederland, +31205924131
    home: bovenover 215, 1025 jn amsterdam, nederland; http://www.cwi.nl/~dik/
     
    Dik T. Winter, May 16, 2006
    #9
  10. Guest

    > No, /dev/random does *not* imply disk access. It is a pseudo device,
    > just like /dev/zero.


    Thanks, I didn't know that....

    > When you need various subsequent random numbers /dev/random is a
    > very bad idea. It appears to be good to generate just a single
    > random number. When you need more, you can use the output as a
    > seed for a standard random number generator (e.g. MT).


    ....whereas I am well aware of this. I will certainly be sticking to the
    Mersenne Twister for random sequence generation. My itch is random
    seeding, and I want to scratch it in a safe way.

    With a 32 bit seed, I will only ever generate 4 billion different
    sequences, which I think emasculates the MT. By allowing
    seeding by an arbitrary array of 32-bit numbers in their c-code,
    the inventors have overcome this limitation, but I still need to
    generate this array of 32-bit numbers in a way that is unpredictable,
    and which works on all platforms.

    <time.h> tools certainly work on all platforms of interest to me
    (MAC, M$ & *NIX), but I am looking for one or two other orthogonal
    seeds, to fill my array (hence the query about /dev/random and
    PID).
     
    , May 16, 2006
    #10
  11. Guest

    > A reproducible seed (and a subsequent reproducible
    > sequence) can be of value. Consider testing a module by
    > throwing pseudo-random data at it; when a failure turns
    > up you'd like to be able to regenerate the exact same data
    > as part of verifying your fix.


    Agreed. I intend testing my algorithms and simulations
    using a fixed unchanging seed, so I can see how changes
    affect the system. However, once I am satisfied that all is
    well, I would like to switch in a more unpredictable seed, to
    expand my range of possible tested sequences.

    /* for your info I am doing simulation of card playing
    strategies. At first I will deal the same sequence of hands
    over and over to a lot of different strategies. Once I have
    identified what is a good strategy and why (by examining
    reproducible data from a fixed seed) I will then move onto
    unreproducible data to see how the strategies fare in the
    "real world". */

    > Similar techniques can be found nowadays in games, where
    > it is common to generate and discard a pseudo-random value
    > each time the program polls its input devices; the eventual
    > sequence used in the main part of the program thus depends in
    > part on the delays in keystroke, mouse, or joystick timings,
    > generally considered unpredictable if not downright twitchy.


    I hadn't thought of user input as a seed, although for a fixed
    set of actions (such as typing a few inputs) the variance of the
    seed could be limited.

    Now that I think of it, if I am not trying to be cryptographically
    secure, then maybe operations on <time.h> functions will
    suffice for my needs, since I will always be guaranteed to
    advance through the Mersenne Twister sequence
    as the time always increases... I think that solves my problem.
     
    , May 16, 2006
    #11
  12. Guest

    wrote:
    > If you only use a 32 bit seed for a random number generator,
    > does that mean you can only ever produce a maximum of
    > 2^32 (approx 4 billion) different sequences?
    >
    > What about the Mersenne Twister, with it's massive period
    > of 2^19937-1. Will you only ever have access to a tiny
    > portion of this ring of numbers, if you only use a 32-bit seed?
    > Will this set of sequences be confined to the beginning of the
    > period, ie, your sequence will have to start at some place
    > between index number 1 and 4 billion of the period (ignores
    > all the possible data after this)?.


    The Mersenne Twister starts with a massively large cycled sequence of
    2^19937-1 numbers (it repeats after requesting that many random numbers
    in order.) The way it is seeded, each one of the possible 2^32 seeds
    is mapped to its own starting point within this cycle. So there are
    2^32 sequences, each of which is 2^19937-1 in length, and in fact are
    just shifted version of each other. BTW, (2^19937-1) / 2^32 is still
    really huge, but I don't know that any pair of seeds doesn't overlap a
    lot sooner -- maybe there is more information in the original paper
    describing it.

    So the seed does not embody the entire internal entropy which generates
    the output.

    --
    Paul Hsieh
    http://www.pobox.com/~qed/
    http://bstring.sf.net/
     
    , May 16, 2006
    #12
  13. Guest

    Keith Thompson wrote:
    > writes:
    > >> (ignores all the possible data after this)?.

    > > this should have read:
    > > (ignores all the possible "starting points" after this)?.
    > >>I would suggest that your possibly get 2^32 sequences with 2^19937-1
    > >>number of elements in each.

    > >
    > > That's kind of what I thought. It seems fairly limited.
    > > However, I just realised the Mersenne Twister c code
    > > provided by it's inventors allows for longer seeds.
    > >
    > > "Those who need initial seed with more than 32-bit
    > > length may use init_by_array() for initialization which
    > > admits an array of arbitrary length as seeds"
    > >
    > > I guess you could use an array seeded with the time,
    > > the process ID, and some compressed info from
    > > /dev/random or /dev/audio. Still I imagine it would
    > > still be very difficult to guarantee starting points
    > > distibuted uniformly around the ring of numbers?

    >
    > <OT>
    > If you have /dev/random, why would you bother with the time, the
    > process ID, or /dev/audio?


    http://www.pinkas.net/PAPERS/gpr06.pdf

    > For that matter, if you have /dev/random, it's not entirely clear you
    > need to bother with a Mersenne Twister (but the question is beyond my
    > expertise).


    Because Entropy is typically slow/expensive to obtain, while PRNGs are
    generally very fast. Most typically you can only get new entropy a
    some slow rate like no more than 10 samples per second (if, say, you
    are tracking mouse movement, or key presses) while a PRNG can be
    computed with speeds proportional to the speed of the CPU. So its like
    asking what you need a disk for if you have lots of memory.

    Its actually kind of curious -- CPU manufacturers have actually been
    looking at building circuit-level entropy generators, which of course,
    would completely nullify the problem, but software solutions have been
    getting better over time to the point, where direct hardware support
    may be overkill.

    --
    Paul Hsieh
    http://www.pobox.com/~qed/
    http://bstring.sf.net/
     
    , May 16, 2006
    #13
  14. Eric Sosman Guest

    wrote On 05/16/06 03:38,:
    >> A reproducible seed (and a subsequent reproducible
    >>sequence) can be of value. Consider testing a module by
    >>throwing pseudo-random data at it; when a failure turns
    >>up you'd like to be able to regenerate the exact same data
    >>as part of verifying your fix.

    >
    >
    > Agreed. I intend testing my algorithms and simulations
    > using a fixed unchanging seed, so I can see how changes
    > affect the system. However, once I am satisfied that all is
    > well, I would like to switch in a more unpredictable seed, to
    > expand my range of possible tested sequences.
    >
    > /* for your info I am doing simulation of card playing
    > strategies. At first I will deal the same sequence of hands
    > over and over to a lot of different strategies. Once I have
    > identified what is a good strategy and why (by examining
    > reproducible data from a fixed seed) I will then move onto
    > unreproducible data to see how the strategies fare in the
    > "real world". */
    >
    >
    >> Similar techniques can be found nowadays in games, where
    >>it is common to generate and discard a pseudo-random value
    >>each time the program polls its input devices; the eventual
    >>sequence used in the main part of the program thus depends in
    >>part on the delays in keystroke, mouse, or joystick timings,
    >>generally considered unpredictable if not downright twitchy.

    >
    >
    > I hadn't thought of user input as a seed, although for a fixed
    > set of actions (such as typing a few inputs) the variance of the
    > seed could be limited.


    I think you've misunderstood: The idea isn't to use these
    little timing variations as seeds, but as perturbations to a
    deterministic sequence that's seeded elsewhere. By throwing
    away an unpredictable quantity of numbers every so often,
    you change the deterministic sequence

    X1, X2, X3, X4, ...

    into the much less predictable

    X1, X2, X9, X10, X11, X42, ...

    --
     
    Eric Sosman, May 16, 2006
    #14
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Joe
    Replies:
    2
    Views:
    485
    Howard
    Nov 19, 2004
  2. Jack
    Replies:
    4
    Views:
    415
  3. HumanJHawkins
    Replies:
    2
    Views:
    500
    peter koch
    Nov 30, 2006
  4. (-Peter-)

    seeding random numbers

    (-Peter-), Feb 20, 2008, in forum: Java
    Replies:
    18
    Views:
    595
    John W. Kennedy
    Feb 22, 2008
  5. Replies:
    3
    Views:
    556
    JimLewis
    May 7, 2009
Loading...

Share This Page