Random Subset of Range of Integers

T

Thomas J. Clancy

Chris Dutrow said:
I searched around on the net for a bit, couldn't find anything though.

I would like to find some code for a function where I input A Range Of
Integers
For example: Function( 1, 100 );

And the function will return me an array holding a random subset of integers
in that range of a size that I specify

So the Function would Probabaly look something like this:

std::vector<int> RandomSubsetOfIntergers( int BeginRanger, int EndRanger,
int SubsetSize );

Have you tried to implement this on your own? Just curious, because it is a
trivial problem to solve.

tom
 
D

Dan Cernat

Don't top post in this NG

{reformatted}
Chris Dutrow said:
is
I thought about it for a while.

The best I could come up with was an O(NlogN) algorithm that would be a pain
for me to code.

So I figured I might as well not re-invent the wheel and try and find some
code written by experts who are likely much smarter than me and worked hard
on an optimal and bugless solution.

-Chris

O(NlogN)???
Look up in your documentation the function rnd -> it returns a random
integer from 0 to RAND_MAX

Your function becomes trivial now, isnt it?
I just cannot stop wonder, O(NlogN)???


/dan
 
N

Nudge

Chris said:
I would like to find some code for a function where I input A
Range Of Integers And the function will return me an array
holding a random subset of integers in that range of a size that
I specify

May the numbers repeat?

On a related note, there is an O(n) algorithm to generate a
permutation of 0 .. N-1

The pseudo code is the following:

for i=0 to N-1 do V = i; done // initialize V

for i=0 to N-2 do
j = rand() % (N-i) + i // i <= j < N
swap V and V[j]
done
 
C

Chris Dutrow

I searched around on the net for a bit, couldn't find anything though.

I would like to find some code for a function where I input A Range Of
Integers
For example: Function( 1, 100 );

And the function will return me an array holding a random subset of integers
in that range of a size that I specify

So the Function would Probabaly look something like this:

std::vector<int> RandomSubsetOfIntergers( int BeginRanger, int EndRanger,
int SubsetSize );


Anyone know where I can find something along these lines?
Also, if you know of any general good random number libratries that are fast
and have an even distribution, let me know as well!

Thanks!
-Chris
 
C

Chris Dutrow

I thought about it for a while.

The best I could come up with was an O(NlogN) algorithm that would be a pain
for me to code.

So I figured I might as well not re-invent the wheel and try and find some
code written by experts who are likely much smarter than me and worked hard
on an optimal and bugless solution.

-Chris
 
D

David White

Chris Dutrow said:
Hmmmm, I must not have explianed well.

I need random subSET of a RANGE of integers

So The function does not get a data structure holding integers from which to
choose. It just gets two integers: a BeginRange and EndRange

Then it needs to give me a data structure, most likely a std::vector that
holds a subSET of integers from that ranger that have been randomly picked.

For the returned integers to be a SET, they must all be unique.

It's just like a card dealer then. Instead of a subset of cards (a player's
hand) from a deck of cards, it is a subset of integers from a range. The
most efficient way I know to do it is to have a deck (an array) containing
the entire set of cards (or numbers) - in sequential order if you like, or
any other order - and to select each card from the array as follows:

int CardDeck::DealCard()
{
// get index of random card
int icard = rand() % nr_cards_remaining;
// get card
int card = deck[icard];
// move last available card into remainder of deck
deck[icard] = deck[nr_cards_remaining-1];
// move dealt card to bottom of remaining cards
deck[nr_cards_remaining-1] = card;
// reduce number of available cards by one
--nr_cards_remaining;
return card;
}

Just call this once for each card in the hand (or value in the subset). Note
that the line
deck[nr_cards_remaining-1] = card;
is only needed if you want to reuse the same deck in future without having
to refill it.

DW
 
C

Chris Dutrow

Dan Cernat said:
Don't top post in this NG

{reformatted}
it

O(NlogN)???
Look up in your documentation the function rnd -> it returns a random
integer from 0 to RAND_MAX

Your function becomes trivial now, isnt it?
I just cannot stop wonder, O(NlogN)???


/dan

Hmmmm, I must not have explianed well.

I need random subSET of a RANGE of integers

So The function does not get a data structure holding integers from which to
choose. It just gets two integers: a BeginRange and EndRange

Then it needs to give me a data structure, most likely a std::vector that
holds a subSET of integers from that ranger that have been randomly picked.

For the returned integers to be a SET, they must all be unique.

So the algorithm must pick out an integer from the Range and add it to the
vector, then pick out another number from the Range that has not already
been picked.

The first algorithm that came to mind was to just discard the number if it
has already been picked and try to pick again, but this doesn't work very
well when the subset size approaches the size of the range.

The next approach that I thought of would be to increment up the list when
an integer has been picked that has already been picked, so if integer 10
were to be generated by the random number generator, then try 11, if that
has already been picked, try 12 and so on. The problem with this approach
is that it makes numbers close to numbers that have already been picked MORE
likely to be picked. So this algorithm is FAULTY.

So the final algorithm that I thought of was a but more complicated, and not
so easy to code, I will post a description of it by request, but it led me
to desire to find a library that has a function that already does this.

Thanks again for your help guys,
-Chris
 
C

Chris Dutrow

Nudge said:
Chris said:
I would like to find some code for a function where I input A
Range Of Integers And the function will return me an array
holding a random subset of integers in that range of a size that
I specify

May the numbers repeat?

On a related note, there is an O(n) algorithm to generate a
permutation of 0 .. N-1

The pseudo code is the following:

for i=0 to N-1 do V = i; done // initialize V

for i=0 to N-2 do
j = rand() % (N-i) + i // i <= j < N
swap V and V[j]
done


Cool, I think you are getting what I am saying to some degree, I think
others here assumed I was stupid.

So, since a sebset is returned, the numbers CANNOT repeat and theren lies
the problem.

I also thought of the O(N) algorithms that return a permutation of a an
array or vector, but the PROBLEM with these is that the function does not
receive a vector or array full of integers, it only receives two integers
that specify a range, giving the function a vector full of integers in the
range would require much more memory than needed.

Thanks a lot for considering this!
-Chris
 
D

David White

Chris Dutrow said:
This still doesn't solve the problem though, heres why:

// get card
int card = deck[icard];

This assumes a data structure has been given that holds the set from which
the subset is derived from.

It doesn't assume that anything's been given. It's just an algorithm for
generating a unique subset. You could have a class called RandomSubset that
implements the algorithm I suggested, e.g.,

class RandomIntegerSelector
{
std::vector<int> m_allValues;
int m_nrValuesRemaining;
public:
// constructor fills m_allValues with the full set of values
RandomIntegerSelector(int begin, int end);
int NextRandomInteger();
};

The NextRandomInteger() member does the same as my CardDeck::DealCard()
member.
Your function then uses an object of this class to generate the subset:

std::vector<int> RandomSubsetOfIntegers( int beginRange, int endRange, int
subsetSize )
{
IntegerRandomSubset integerSelector(beginRange, endRange);
std::vector<int> subset;
int nrRemaining = subsetSize;
while(nrRemaining-- > 0)
subset.push_back(integerSelector.NextRandomInteger());
return subset;
}

DW
 
D

David White

A second correction...
class RandomIntegerSelector
{
[snip]

std::vector<int> RandomSubsetOfIntegers( int beginRange, int endRange, int
subsetSize )
{
IntegerRandomSubset integerSelector(beginRange, endRange);

That is: RandomIntegerSelector integerSelector(beginRange, endRange);

This is what happens when you change your mind at the last moment.

DW
 
D

David White

Chris Dutrow said:
Also, if you know of any general good random number libratries that are fast
and have an even distribution, let me know as well!

Try looking up Mersenne Twister.

DW
 
T

Thomas J. Clancy

Chris Dutrow said:
Range

Hmmmm, I must not have explianed well.

I need random subSET of a RANGE of integers

So The function does not get a data structure holding integers from which to
choose. It just gets two integers: a BeginRange and EndRange

Then it needs to give me a data structure, most likely a std::vector that
holds a subSET of integers from that ranger that have been randomly picked.

This is simply one loop, a rand() statement, and some push_backs into a
vector.
 
T

Thomas J. Clancy

Hmmmm, I must not have explianed well.

I need random subSET of a RANGE of integers

Would this not work?

std::vector<int> RandomSubsetOfIntergers( int BeginRanger, int EndRanger,
int SubsetSize )
{
vector<int> r;
for(int i = 0; i < SubsetSize; i++)
{
int j = (rand() % EndRanger) + BeginRanger;
r.push_back(j);
}
return r;
}

Of course it would be better if you passed in an out vector (reference to)
so that you can avoid the implicit copy on return. For example:

void RandomSubsetOfIntergers( int BeginRanger, int EndRanger, int
SubsetSize, std::vector<int>& r);
 
T

Thomas J. Clancy

Chris Dutrow said:
Nah, that wouldn't work because the in order for the vector to be a subset,
the integers in it must be unique.

In other words, with the above algorithm, there is a possibility of picking
the same number twice, this is not acceptable.

-Chris

To me, a subset doesn't necessary connote a set, although it certainly
contains the word set. I know, I know, what the heck do I mean? Nevermind.
So I suppose the following solution is too slow, then?

std::vector<int> RandomSubsetOfIntergers( int BeginRanger, int EndRanger,
int SubsetSize )
{
std::set<int> s;
for(int i = 0; i < SubsetSize; i++)
{
int j = (rand() % EndRanger) + BeginRanger;
if (s.find(j) != s.end())
{
s.insert(j);
}
else
{
i--;
}
}
std::copy(s.begin(), s.end(), back_inserter<int>(r.begin());
return r;
}
 
C

Chris Dutrow

David White said:
Chris Dutrow said:
Hmmmm, I must not have explianed well.

I need random subSET of a RANGE of integers

So The function does not get a data structure holding integers from
which
to
choose. It just gets two integers: a BeginRange and EndRange

Then it needs to give me a data structure, most likely a std::vector that
holds a subSET of integers from that ranger that have been randomly picked.

For the returned integers to be a SET, they must all be unique.

It's just like a card dealer then. Instead of a subset of cards (a player's
hand) from a deck of cards, it is a subset of integers from a range. The
most efficient way I know to do it is to have a deck (an array) containing
the entire set of cards (or numbers) - in sequential order if you like, or
any other order - and to select each card from the array as follows:

int CardDeck::DealCard()
{
// get index of random card
int icard = rand() % nr_cards_remaining;
// get card
int card = deck[icard];
// move last available card into remainder of deck
deck[icard] = deck[nr_cards_remaining-1];
// move dealt card to bottom of remaining cards
deck[nr_cards_remaining-1] = card;
// reduce number of available cards by one
--nr_cards_remaining;
return card;
}

Just call this once for each card in the hand (or value in the subset). Note
that the line
deck[nr_cards_remaining-1] = card;
is only needed if you want to reuse the same deck in future without having
to refill it.

DW

This still doesn't solve the problem though, heres why:

// get card
int card = deck[icard];

This assumes a data structure has been given that holds the set from which
the subset is derived from.

In this case, I do not have a data structure holding all the integers from
which to choose. I only have a range: BeginInt and EndInt and a random
subset of unique integers must be chosen from that Range.

Thanks anyway though.
-Chris
 
T

Thomas J. Clancy

Chris Dutrow said:
I searched around on the net for a bit, couldn't find anything though.

I would like to find some code for a function where I input A Range Of
Integers
For example: Function( 1, 100 );

And the function will return me an array holding a random subset of integers
in that range of a size that I specify

So the Function would Probabaly look something like this:

std::vector<int> RandomSubsetOfIntergers( int BeginRanger, int EndRanger,
int SubsetSize );

Would this be suitable?

#include <iostream>
#include <vector>
#include <set>
using namespace std;
vector<int> randSubset(int BeginRanger, int EndRanger, int SubsetSize)
{
set<int> s;
vector<int> r;
for(int i = 0; i < SubsetSize; i++)
{
int j = (rand() % EndRanger) + BeginRanger;
if (s.find(j) == s.end())
{
s.insert(j);
}
else
{
i--;
}
}
copy(s.begin(), s.end(), back_inserter(r));
return r;
}
 
T

Thomas J. Clancy

std::vector said:
Hmmmmm, I may have read this wrong but...

The algorithm does not appear too slow.

If I read it right, it does not return a subset of length "SubsetSize"
because it simply skips the insertion when a duplicate is encountered.
Also, FYI, I beleive that if you insert a key into a std::set data structure
that is already there, it doesn't insert it by defaul so you don't have to
have if statement there to check if it is already in the set:

if (s.find(j) != s.end())

Also, I beleive the correct definition of a subset in this case would be
that no intergers repeat because there are no duplicate integers in the
input set defined by BeginRang and EndRange. But even if I am wrong about
the definition, what I need is a subset of unique integers that fall in the
input range.

I'm about %100 sure that the solution to this problem is NOT trivial.

Though in the end what I'm looking for is not for someone to suggest an
algorithm for me to code, but for someone to suggest a library that I might
refer to that is elegant and well written.


Okay, I don't know what youy mean by NOT trivial. If you run the code I
thay I just entered at top (and with these corrections) it works as
expected. The if statement is necessary here because if the newly random
number DOES exist, I want to repeat the loop, hence the i--, which will
effectively get me, say, 500 unique random numbers between, say, 999 and
1,900,234 (for example). If these are the requirements the code below
works. So I'm not sure what else you really need. You could simply expect
to receive back a set rather than a vector, thus doing away with the
std::copy() at the end.

std::vector<int> RandomSubsetOfIntergers( int BeginRanger, int EndRanger,
int SubsetSize )
{
std::set<int> s;
for(int i = 0; i < SubsetSize; i++)
{
int j = (rand() % EndRanger) + BeginRanger;
if (s.find(j) == s.end())
{
s.insert(j);
}
else
{
i--;
}
}
std::copy(s.begin(), s.end(), back_inserter(r));
return r;
}
 
D

David White

Chris Dutrow said:
What I meant though is that the function uses a data structure that holds
the complete set of intergers.

Which is fine for this input range of
1 to 100

But very bad for an input range of
1 to 10000000000

I don't know of a single algorithm that's suitable for generating both a
small subset of a large set and a large subset of a small set. For the
latter case you can use a card dealer. For the former case you could forget
the array and just generate numbers in the required range and use a std::set
to check whether a value has already been used. If so, then generate another
value.
So this implementation is not very versitile because it uses memory that an
optimal algorithm would not need to use.

It depends on what you call optimal. The card dealer looks about optimal in
terms of execution time when the subset is a large proportion of the set. As
I see it you have a choice between fast and small for that case. I don't
think you can have both. You could choose which algorithm to use according
to the values passed. If the subset size is more than, say, 10 percent of
the entire set, use the card dealer, otherwise use the std::set.
what I'm doing here is designing an Artificial Neural Network. A Neuron in
the first layer links to a random number of Neurons in the second layer
between 1 and SecondLayer.size(). So I need a subset of integers that
correspond to the index's of the Neurons in the second layer to link to.
This Neural network could potentially be very large depending on the input
values so I don't want to create a large array of integers each time I need
to decided what each Neuron in Layer1 connects to.

Looks like an interesting project. Maybe it calls for techniques specially
designed for it rather than off-the-shelf solutions.

DW
 
T

Thomas J. Clancy

Hmmmmm, I may have read this wrong but...
Okay, I don't know what youy mean by NOT trivial. If you run the code I
thay I just entered at top (and with these corrections) it works as
expected. The if statement is necessary here because if the newly random
number DOES exist, I want to repeat the loop, hence the i--, which will
effectively get me, say, 500 unique random numbers between, say, 999 and
1,900,234 (for example). If these are the requirements the code below
works. So I'm not sure what else you really need. You could simply expect
to receive back a set rather than a vector, thus doing away with the
std::copy() at the end.

std::vector<int> RandomSubsetOfIntergers( int BeginRanger, int EndRanger,
int SubsetSize )
{
std::set<int> s;
for(int i = 0; i < SubsetSize; i++)
{
int j = (rand() % EndRanger) + BeginRanger;
if (s.find(j) == s.end())
{
s.insert(j);
}
else
{
i--;
}
}
std::copy(s.begin(), s.end(), back_inserter(r));
return r;
}

My apologies, this is the correct form of the function that will do what you
want.

std::vector<int> RandomSubsetOfIntergers( int BeginRanger, int EndRanger,
int SubsetSize )
{
std::vector<int> r;
std::set<int> s;
for(int i = 0; i < SubsetSize; i++)
{
int j = (rand() % EndRanger) + BeginRanger;
if (s.find(j) == s.end())
{
s.insert(j);
}
else
{
i--;
}
}
std::copy(s.begin(), s.end(), back_inserter(r));
return r;
}
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,767
Messages
2,569,572
Members
45,046
Latest member
Gavizuho

Latest Threads

Top