Random Subset of Range of Integers

C

Chris Dutrow

David White said:
Chris Dutrow said:
This still doesn't solve the problem though, heres why:

// get card
int card = deck[icard];

This assumes a data structure has been given that holds the set from which
the subset is derived from.

It doesn't assume that anything's been given. It's just an algorithm for
generating a unique subset. You could have a class called RandomSubset that
implements the algorithm I suggested, e.g.,

==============================>

Yeah, you're right, it doesn't assume that.

What I meant though is that the function uses a data structure that holds
the complete set of intergers.

Which is fine for this input range of
1 to 100

But very bad for an input range of
1 to 10000000000

So this implementation is not very versitile because it uses memory that an
optimal algorithm would not need to use.



class RandomIntegerSelector
{
std::vector<int> m_allValues;

==================>
See, this is what I have a problem with.
int m_nrValuesRemaining;
public:
// constructor fills m_allValues with the full set of values

==========================>
And this. What if a subset of length 10 is needed from an integer range of
0 to 10000000000000?
Then this implementation is wildly inappropriate.
RandomIntegerSelector(int begin, int end);
int NextRandomInteger();
};

The NextRandomInteger() member does the same as my CardDeck::DealCard()
member.
Your function then uses an object of this class to generate the subset:

std::vector<int> RandomSubsetOfIntegers( int beginRange, int endRange, int
subsetSize )
{
IntegerRandomSubset integerSelector(beginRange, endRange);
std::vector<int> subset;
int nrRemaining = subsetSize;
while(nrRemaining-- > 0)
subset.push_back(integerSelector.NextRandomInteger());
return subset;
}

DW

FYI,

what I'm doing here is designing an Artificial Neural Network. A Neuron in
the first layer links to a random number of Neurons in the second layer
between 1 and SecondLayer.size(). So I need a subset of integers that
correspond to the index's of the Neurons in the second layer to link to.
This Neural network could potentially be very large depending on the input
values so I don't want to create a large array of integers each time I need
to decided what each Neuron in Layer1 connects to.

-Chris
 
C

Chris Dutrow

Thomas J. Clancy said:
be

Would this not work?

std::vector<int> RandomSubsetOfIntergers( int BeginRanger, int EndRanger,
int SubsetSize )
{
vector<int> r;
for(int i = 0; i < SubsetSize; i++)
{
int j = (rand() % EndRanger) + BeginRanger;
r.push_back(j);
}
return r;
}

Of course it would be better if you passed in an out vector (reference to)
so that you can avoid the implicit copy on return. For example:

void RandomSubsetOfIntergers( int BeginRanger, int EndRanger, int
SubsetSize, std::vector<int>& r);

Nah, that wouldn't work because the in order for the vector to be a subset,
the integers in it must be unique.

In other words, with the above algorithm, there is a possibility of picking
the same number twice, this is not acceptable.

-Chris
 
C

Chris Dutrow

Thomas J. Clancy said:
To me, a subset doesn't necessary connote a set, although it certainly
contains the word set. I know, I know, what the heck do I mean? Nevermind.
So I suppose the following solution is too slow, then?

std::vector<int> RandomSubsetOfIntergers( int BeginRanger, int EndRanger,
int SubsetSize )
{
std::set<int> s;
for(int i = 0; i < SubsetSize; i++)
{
int j = (rand() % EndRanger) + BeginRanger;
if (s.find(j) != s.end())
{
s.insert(j);
}
else
{
i--;
}
}
std::copy(s.begin(), s.end(), back_inserter<int>(r.begin());
return r;
}

Hmmmmm, I may have read this wrong but...

The algorithm does not appear too slow.

If I read it right, it does not return a subset of length "SubsetSize"
because it simply skips the insertion when a duplicate is encountered.
Also, FYI, I beleive that if you insert a key into a std::set data structure
that is already there, it doesn't insert it by defaul so you don't have to
have if statement there to check if it is already in the set:

if (s.find(j) != s.end())

Also, I beleive the correct definition of a subset in this case would be
that no intergers repeat because there are no duplicate integers in the
input set defined by BeginRang and EndRange. But even if I am wrong about
the definition, what I need is a subset of unique integers that fall in the
input range.

I'm about %100 sure that the solution to this problem is NOT trivial.

Though in the end what I'm looking for is not for someone to suggest an
algorithm for me to code, but for someone to suggest a library that I might
refer to that is elegant and well written.

Thanks,
Chris
 
C

Chris Dutrow

Thomas J. Clancy said:
Okay, I don't know what youy mean by NOT trivial. If you run the code I
thay I just entered at top (and with these corrections) it works as
expected. The if statement is necessary here because if the newly random
number DOES exist, I want to repeat the loop, hence the i--, which will
effectively get me, say, 500 unique random numbers between, say, 999 and
1,900,234 (for example). If these are the requirements the code below
works. So I'm not sure what else you really need. You could simply expect
to receive back a set rather than a vector, thus doing away with the
std::copy() at the end.

std::vector<int> RandomSubsetOfIntergers( int BeginRanger, int EndRanger,
int SubsetSize )
{
std::set<int> s;
for(int i = 0; i < SubsetSize; i++)
{
int j = (rand() % EndRanger) + BeginRanger;
if (s.find(j) == s.end())
{
s.insert(j);
}
else
{
i--;
}
}
std::copy(s.begin(), s.end(), back_inserter(r));
return r;
}

Aha! I did read it wrong, I looked over it a couple of times but for some
reason by brain interpreted the i-- out as the loop iterator even though it
opviously is not.

I thought of this algorithm and actually described it in a previous post on
this thread.

The problem I had with it is that it will break down givin certain inputs
where the both the range size and the subset size are large and could
theoretically create an infinite loop although this in unlikely.

So I figure it is an unsound algorithm that I would not want to put into a
large program.

-Chris
 
D

David White

Chris Dutrow said:
Exactly, thats why I was looking for a library. Hoping one of those crazy
smart genious algorithm people into random numbers ran accross this
algorithm at one point and added it to a library.

I don't use many libraries, but those I have used have tended to keep things
simple. I'm not sure you'll find a library function that has the broad
specifications you are looking for.
The best solution I could think up was to have a SetSize variable
initialized to the size of the Input Range.

Then each time a number is selected, subract 1 from that variable so it
denotes the actual size of the set to choose the numbers through. The
random numbers "i" selected would then be i'th number in that set, but in
order to find which number corresponds to the i'th number, a binary search
would be done to find what numbers that are already in the subset come
before the i'th number.

In other words this is how the algorithm would work:

i = RandomNumberInSetRange();

ActualNumberChosen = i + NumbersThatComBeforeThei'thNumber

Without any detail I can't tell if this is potentially better than anything
suggested so far.
So thats where I got O(NlogN) from.

But I do have gut feeling that there is some slick O(N) algorithm....

I'm not so sure. To be able to handle any range of values and place a subset
of them in a unique sequence every time over a large number of repetitions
seems to me to be calling for explicitly excluding each value that was
previously generated, one way or another. But there are many brilliant
algorithms for doing all kinds of things, so I'm not prepared to say it
can't be done.
The thing is, if I coded complicated crap like this whenever I ran into the
need for it, I would never finish this freakin program!

But yeah, thanks for your help David, you're the only person who responded
to my thread who I didn't feel like was out to prove they were smarter than
me and I will end up using that random number code that you referred me to.

By the way, is this not the board to post such questions? It seemed like
what I was asking was beyond most of the people here? Or maybe algorithm
questions are not right for this board?

No, they aren't really. The charter of this NG is discussion of the C++
langauge, as defined by the standard, so this thread has been more off-topic
than on. Perhaps we can claim to be barely on-topic by posting plenty of
standard C++ code, but I don't think that would convince most of the
regulars here. There ought to be a newsgroup for algorithms, but I don't
know of one.

DW
 
C

Chris Dutrow

David White said:
I don't know of a single algorithm that's suitable for generating both a
small subset of a large set and a large subset of a small set. For the
latter case you can use a card dealer. For the former case you could forget
the array and just generate numbers in the required range and use a std::set
to check whether a value has already been used. If so, then generate another
value.

======================================>

I usually try to stay away from such algorithms where it makes choices as to
the method to solve it depending on the inputs and such. I don't consider
these "elegent" solutions.

But I think you are right, I think sometimes I don't get near as much
accomplished because I spend too much time looking for the "elegent"
solution. When a more practical solution like you suggest is both easier to
code, easier to undersatnd, and faster than the elegent solution in most if
not all cases.



It depends on what you call optimal. The card dealer looks about optimal in
terms of execution time when the subset is a large proportion of the set. As
I see it you have a choice between fast and small for that case. I don't
think you can have both. You could choose which algorithm to use according
to the values passed. If the subset size is more than, say, 10 percent of
the entire set, use the card dealer, otherwise use the std::set.


Looks like an interesting project. Maybe it calls for techniques specially
designed for it rather than off-the-shelf solutions.

Exactly, thats why I was looking for a library. Hoping one of those crazy
smart genious algorithm people into random numbers ran accross this
algorithm at one point and added it to a library.

The best solution I could think up was to have a SetSize variable
initialized to the size of the Input Range.

Then each time a number is selected, subract 1 from that variable so it
denotes the actual size of the set to choose the numbers through. The
random numbers "i" selected would then be i'th number in that set, but in
order to find which number corresponds to the i'th number, a binary search
would be done to find what numbers that are already in the subset come
before the i'th number.

In other words this is how the algorithm would work:

i = RandomNumberInSetRange();

ActualNumberChosen = i + NumbersThatComBeforeThei'thNumber

So thats where I got O(NlogN) from.

But I do have gut feeling that there is some slick O(N) algorithm....

The thing is, if I coded complicated crap like this whenever I ran into the
need for it, I would never finish this freakin program!

But yeah, thanks for your help David, you're the only person who responded
to my thread who I didn't feel like was out to prove they were smarter than
me and I will end up using that random number code that you referred me to.

By the way, is this not the board to post such questions? It seemed like
what I was asking was beyond most of the people here? Or maybe algorithm
questions are not right for this board?

Thanks!
-Chris
 
R

Rolf Magnus

David said:
No, they aren't really. The charter of this NG is discussion of the
C++ langauge, as defined by the standard, so this thread has been more
off-topic than on. Perhaps we can claim to be barely on-topic by
posting plenty of standard C++ code, but I don't think that would
convince most of the regulars here. There ought to be a newsgroup for
algorithms, but I don't know of one.

Well, the OP might try in an AI newsgroup like comp.ai.neural-nets,
maybe someone there already had a similar problem and solved it. Or a
more general newsgroup like comp.programming.
 
G

Gianni Mariani

Chris said:
I searched around on the net for a bit, couldn't find anything though.

I would like to find some code for a function where I input A Range Of
Integers
For example: Function( 1, 100 );

And the function will return me an array holding a random subset of integers
in that range of a size that I specify

So the Function would Probabaly look something like this:

std::vector<int> RandomSubsetOfIntergers( int BeginRanger, int EndRanger,
int SubsetSize );


Anyone know where I can find something along these lines?
Also, if you know of any general good random number libratries that are fast
and have an even distribution, let me know as well!


One solution is this - this is not cryptogrphically safe but for testing
or monte-carlo algorithms it's more than fine.

Theory:

It is fact that if

a is a positive integer and b is a positive integer and a and b a re
relatively prime (i.e. gcd(a,b)=1) (gcd is greatest common divisor)

then in the series:

p(n+1) = (p(n)+a)%b

Then:

p(n+b) = p(n)


So a simple fast way to generate your array is:

(warning - untested code)

void MakeRandomVector( std::vector<int> & out, int begin, int end )
{

unsigned period = end - begin + 1; // begin .. end (including end).

// pick a random key

unsigned long key;

do {

key = rand() + 1;

} while ( ( key <= period ) || ( gcd( key, period ) != 1 ) );

out.resize( period );

// pick a random starting point;
unsigned int p0 = rand() % period;

std::vector<int>::iterator itr = out.begin();

unsigned int pn = p0;

unsigned count = period;

while ( count )
{
* ( ++ itr ) = pn + begin;

pn = ( pn + key ) % period;

-- count;
}

assert( pn == p0 );
}

Now - to make this cryptograpgically secure, you could a) pick REALLY
large keys and b) you could repeat the procedure within itself

i.e.

where : MakeRandomVector( vecnmx, 0, period-1 );

and

vecn = vecnm1[ vecnm2[ vecnm3[ ... vec1[ i ] ... ] ] ];

( need to make sure that all the keys are prime (or relatively prime) ).
 
G

Gianni Mariani

Gianni Mariani wrote:
....
Now - to make this cryptograpgically secure, you could a) pick REALLY
large keys

I gave that some more thought and that's wrong - the size of the key is
irrelevant. The only thing that is important about the key is it's
modulus relative to the period.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,774
Messages
2,569,596
Members
45,132
Latest member
TeresaWcq1
Top