Randomly selecting an element from an STL collection

Discussion in 'C++' started by Generic Usenet Account, Jun 27, 2007.

  1. I had a need to randomly select an element from an STL collection. It
    does not appear that this functionality is provided out-of-the-box
    with STL. Here is my crude implementation. I am using advance and
    distance in my approach. Is distance guaranteed to return an integral
    value? If not, can anyone suggest improvements:

    --Song

    ///// Header File //////

    #ifndef _RANDOM_ENTRY_H_
    #define _RANDOM_ENTRY_H_

    #include <stdlib.h>
    #include <sys/time.h>
    #include <algorithm>

    template<typename iterator>
    iterator
    random_entry(iterator first, iterator last)
    {
    int separation = distance(first, last);

    if(separation)
    {
    struct timeval tod; // time-of-day

    gettimeofday(&tod, NULL);
    srand(tod.tv_sec+tod.tv_usec);

    int randOffset = rand()% separation;

    advance(first, randOffset);
    }

    iterator retVal = first;

    return retVal;
    }

    #endif //_RANDOM_ENTRY_H_


    ///// Driver File /////
    #include "random_entry.h"

    #include <list>
    #include <vector>
    #include <map>
    #include <string>
    #include <iostream>

    using namespace std;


    main()
    {
    list<int> li;

    li.push_back(10);
    li.push_back(19);
    li.push_back(28);
    li.push_back(37);

    list<int>::iterator liItor = random_entry(li.begin(), li.end());
    if(liItor != li.end())
    cout << *liItor << endl;
    else
    cout << "Empty collection\n";

    vector<string> vs;
    vs.push_back("Brad Pitt");
    vs.push_back("Angelina Jolie");
    vs.push_back("Tom Cruise");
    vs.push_back("Renee Zellweger");
    vs.push_back("Julia Roberts");
    vs.push_back("Denzel Washington");
    vs.push_back("Will Smith");

    vector<string>::iterator vsItor = random_entry(vs.begin(),
    vs.end());
    if(vsItor != vs.end())
    cout << *vsItor << endl;
    else
    cout << "Empty collection\n";

    map<string, string> ms;

    ms["President"] = "George W Bush";
    ms["Vice President"] = "Dick Cheney";
    ms["Senate Majority Leader"] = "Harry Reid";
    ms["Senate Minority Leader"] = "Mitch Mcconnell ";
    ms["Speaker"] = "Nancy Pelosi";

    map<string, string>::iterator msItor = random_entry(ms.begin(),
    ms.end());
    if(msItor != ms.end())
    cout << msItor->first << ", " << msItor->second << endl;
    else
    cout << "Empty collection\n";
    }
     
    Generic Usenet Account, Jun 27, 2007
    #1
    1. Advertisements

  2. Not sure if it's guaranteed or not, but it's usually 'ptrdiff_t'.
    Do NOT use identifiers that begin with an underscore and a capital letter,
    those are *reserved* by the implementation.
    It is a BAD IDEA(tm) to seed the random number generator on every
    iteration. Seed it once in your program, then let it run freely.
     
    Victor Bazarov, Jun 27, 2007
    #2
    1. Advertisements

  3. Generic Usenet Account

    Alan Johnson Guest

    Just to strengthen Victor's suggestion ..

    24.4.1 makes the guarantee:
    "For every iterator type X for which equality is defined, there is a
    corresponding signed integral type called the difference type of the
    iterator."
     
    Alan Johnson, Jun 27, 2007
    #3
  4. Generic Usenet Account

    James Kanze Guest

    It's guaranteed for the standard allocator.
    I wonder if it's really necessary to bother. He's using rand()
    to choose an element, and RAND_MAX can be (and far too often is)
    as little as 32767. So if he has more elements that that, his
    code isn't going to work. Also, he used the expression
    rand() % distance to choose the element. This introduces a bias
    toward the smaller elements; the closer distance is to RAND_MAX,
    the larger the biais. Unless the number of elements is actually
    very small, the biais will be noticeable. So we can conclude
    that the actual number of elements will probably not be much
    more than 10 or 20. And that a simple int will work just fine.

    --
     
    James Kanze, Jun 27, 2007
    #4
  5. First of all my thanks to all of you for your excellent feedback ----
    Alan, James, Victor and others. I would really appreciate if you
    could tell me how I can avoid the "bias towards smaller elements". I
    have always used the "modulo division by random number" trick to
    restrict my pseudo-random numbers to a certain range. I am keen to
    avoid this mistake at some future date. A sample code snippet would
    be great. Although I do not expect anywhere close to 32767 entries in
    my collection, it is certainly going to be much more than 10 or 20.

    Thanks,
    Song
     
    Generic Usenet Account, Jun 28, 2007
    #5
  6. When I need a random number in a certain range, I use

    int rand_n(int lower, int upper)
    {
    return int(rand() * (upper - lower + 1.0) / (RAND_MAX + 1.0))
    + lower;
    }

    That essentially subdivides the 0..RAND_MAX range into (upper-lower+1)
    subranges and gives you the index of the subrange in which the random
    number you get from 'rand' fits, offset by 'lower'. I am sure that
    this doesn't work very well for (upper-lower) larger than RAND_MAX/2.

    V
     
    Victor Bazarov, Jun 28, 2007
    #6
  7. Generic Usenet Account

    Marcus Kwok Guest

    The C FAQ has a couple suggestions; see FAQ #13.16:

    http://www.c-faq.com/lib/randrange.html
     
    Marcus Kwok, Jun 28, 2007
    #7
  8. Generic Usenet Account

    James Kanze Guest

    That's usually sufficient for fairly small values. The bias
    towards the smaller elements problems is easily understood,
    however, if you consider a perfect random generator with a very
    small RAND_MAX. Suppose RAND_MAX is 9, i.e. your generator
    generates all of the numbers from 0 to 9, with equal
    probability. Now consider what happens if you do a modulo 6.
    What are the results for each number generated:

    0 % 6 = 0
    1 % 6 = 1
    2 % 6 = 2
    3 % 6 = 3
    4 % 6 = 4
    5 % 6 = 5
    6 % 6 = 0
    7 % 6 = 1
    8 % 6 = 2
    9 % 6 = 3

    It's obvious that the values 0-3 each appear 20% of the time,
    whereas 4 and 5 only appear 10%.

    The problem doesn't occur if RAND_MAX + 1 is a multiple of your
    value, so the usual solution is just to throw out any values
    above the last multiple. For a random value in the range
    0...N-1, I use something like:

    unsigned const limit = RAND_MAX+1 - (RAND_MAX+1) % N ;
    unsigned result = next() ;
    while ( result >= limit ) {
    result = next() ;
    }
    return result % N ;
    Be careful with rand(). In the past, at least, a lot of
    implementations have been very bad... some systematically
    alternate even and odd, for example. For anything more than
    just playing around, get a good generator with known behavior.
    (Boost has quite a few.)
     
    James Kanze, Jun 29, 2007
    #8
  9. That is almost guaranteed to fail on many implementations namely those
    that have RAND_MAX equal to INT_MAX (e.g. gcc on x86(-64)).

    At least make it RAND_MAX+1u. That should work on most implementations.
     
    Markus Schoder, Jun 30, 2007
    #9
  10. Generic Usenet Account

    James Kanze Guest

    Good point. I actually copied it from my own implementation of
    RandomInt, which is based on my own random number generator,
    with known properties, modifying it to use RAND_MAX+1, rather
    than Random::max(). (In my case, Random::max() defines the
    upper bound of a half open interval, i.e. is exclusive. And my
    own generator uses linear congruence, so it isn't be a power of
    2.)
    I think I'd cast RAND_MAX to unsigned; that u can easily be
    overlooked:).
     
    James Kanze, Jul 1, 2007
    #10
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.