Picking X random entries from linked list of Y entries

D

Don Bruder

Subject tries to say it tersely, maybe doesn't do well...

I've got a double-linked list of entries - looks similar to this:

typedef struct ListEntry
{
Boolean Used;
// elide approximately two dozen more fields of various types
// for the sake of post brevity
// ...
struct ListEntry *Prev;
struct ListEntry *Next;
} ListEntries;

Elsewhere in the code, I set up a "root" ListEntry, like so:

ListEntries *ListRoot = calloc(1, sizeof(ListEntries));
ListRoot->Next = NULL;
ListRoot->Prev = NULL;

Everything else that happens operates against ListRoot.

I stuff it with (unpredictably variable from run to run, hence the use
of a linked list rather than an array of struct pointers) a number of
entries. That part is all good - everything associated with it works
exactly as intended.

As part of the handling of this list, one of the things I need to do is
select a random sub-group of "PickCount" entries from the list.
Currently, I do so like this:

// Select PickCount entries from List, returning them as a second list.
ListEntries *PickSomeEntries(int PickCount, ListEntries *List)
{
int Limit = CountEntries(List); // Should be self explanatory...
int Index = 0;
int Selected = 0;
ListEntries *Pointer = NULL;
// CList will be a sub-list of List to be returned to caller
ListEntries *CList = calloc(1, sizeof(ListEntries));
ListEntries *Pointer2 = CList;

srandomdev();
for (Index = 0;Index < PickCount; Index++)
{
Selected = random() % Limit; // Pick a number from 0 to Limit

// Yes, I realize mod-ing a random number is relatively poor
// practice. However, since this code doesn't need
// "crypto-strong" randoms - just a reasonably random sub-set
// of a list - I deem it acceptable.

Pointer = List; // Start at the first entry in the list and
// step through the list "Selected" times to reach the
// "Selected"th list entry
while (Selected)
{
Pointer = Pointer->Next;
Selected--;
}
// So Pointer now points at the "Selected"th entry of List.
// Copy it to the CList via Pointer2. Note that CopyOneEntry()
// DOES NOT copy Next/Prev. (doing so was found to be the cause
// of some REALLY nasty misbehavior in early development stages)
CopyOneEntry(Pointer, Pointer2);
// Now we set up Next/Prev to reflect the entry being added to the
// new list. (In the process, creating a new entry at the end of
// the growing CList and bumping Pointer2 to point at it for the
// next (if any) pass through the for() loop)
Pointer2->Prev = Pointer2;
Pointer2->Next = calloc(1, sizeof(ListEntries));
Pointer2 = Pointer2->Next;
}
// And finally, hand back the list of selected entries.
return CList;
}

This is functional, but I can't help feeling that it's clunky as hell,
and that there's probably a better way to do it. However, I can't come
up with that "better way".

Anybody see any obvious improvements or "gotchas" I might have missed?
 
E

Ersek, Laszlo

As part of the handling of this list, one of the things I need to do is
select a random sub-group of "PickCount" entries from the list.

If you can use vectors: see the Fisher-Yates shuffle, and stop after
filling in the first PickCount slots.

http://en.wikipedia.org/wiki/Fisher–Yates_shuffle#The_modern_algorithm

You could initialize the temporary array with pointers to the list nodes
(by walking the list once), then shuffle the array of pointers in the way
described above.

lacos
 
B

Ben Pfaff

Don Bruder said:
Subject tries to say it tersely, maybe doesn't do well...

The approaches that come to mind immediately are the ones from
Knuth 3.4.2 "Random Sampling and Shuffling", either algorithm S
or R depending on whether you know the number of elements in the
linked list. I guess that you do, so algorithm S is appropriate.
Paraphrased:

To select n items out of a set of N >= n:

t = m = 0
For each of the N records, in arbitrary order:
Let U be a random number between 0 and 1.
If (N - t) * U < n - m:
Add the current record to the sample.
m += 1
If m >= n, then we're done, otherwise continue.
t += 1
 
S

spikeysnack

Aside from the obvious admonishment
" This is an Objective-C Discussion Group <Cough>
Please see a C Language Group list ...<cough cough>" ...

I would suggest you create an array (randomly picking or not picking
as you traverse your list) of pointers to records and then iterate
through them for your tests. they don't need to be attached (linked)
to each other as the array organizes them linearly even though in no
particular order. This retains your original ordering of your linked
list -- discard (free) the pointer array when done.

I am trying to recall what advantages a linked list has over an array
or other logical container in a modern computer without a tape memory
but I think other than keeping the list in a certain restricted memory
space
there is little advantage. Database records in olden days "Drum memory
days"
used massively linked lists to generate searches on fields but modern
databases use associative lists, etc.
PCs love arrays and C loves pointers and arrays.
Shuffling links around vs. sorting arrays of pointers yield
differences
in milliseconds.

non-32/64 bit architectures and other programming languages (lisp,
scheme, perl,...) are sometimes better with lists than arrays.

Objective-C uses various container objects and its own object system
internally is linked-list-based [written in ANSI C, lists simplify the
class hierarchy ) but it is lightweight and has been optimized for
fast access to class ,sub-class and method traversals. A look a the
Objective-C run-time sources might be enlightening and give you some
ideas for your design.

C, Objective-C, and C++ object systems are all C-structures at heart;
the languages provide various kinds of support for handling them.
C provides a little -- memory allocation, unions, typedef

Objective-C provides the same as C plus full object support -- strong
AND dynamic typing, method dispatch, messaging, creation, destruction,
copy and inheritance, along with an active run-time system to manage
and keep track of objects and methods. Easy to learn if you know C.

c++ has a minimal run-time mostly concerned with type-information.
Objects and structs are identical but can have functions associated
with them. Strong typing
is extremely tightly controlled, so a class template system similar
to shell script macros can be used to simulate dynamic typing. high
learing curve to master c++.

You could abstract your design a little by making your data structs
more encapsulated; (that is -- without link pointers -- just data).
Then create LinkNodes and attach the data struct to the node via a
pointer. This allows you to put other kinds of meta-info about your
data into the LinkNode instead of the data structure. (ie. last access
time, ListName, etc)
It also makes the LinkNode responsible for traversal and relieves the
data from awareness of other data, in object-oriented fashion. You
can then have different kinds of data structs and reuse your
traversal code without major surgery on your data code. RecordType
should be part of the data struct if there is more than one type.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,731
Messages
2,569,432
Members
44,832
Latest member
GlennSmall

Latest Threads

Top