Picking X random entries from linked list of Y entries

Discussion in 'C Programming' started by Don Bruder, Aug 2, 2010.

  1. Don Bruder

    Don Bruder Guest

    Subject tries to say it tersely, maybe doesn't do well...

    I've got a double-linked list of entries - looks similar to this:

    typedef struct ListEntry
    Boolean Used;
    // elide approximately two dozen more fields of various types
    // for the sake of post brevity
    // ...
    struct ListEntry *Prev;
    struct ListEntry *Next;
    } ListEntries;

    Elsewhere in the code, I set up a "root" ListEntry, like so:

    ListEntries *ListRoot = calloc(1, sizeof(ListEntries));
    ListRoot->Next = NULL;
    ListRoot->Prev = NULL;

    Everything else that happens operates against ListRoot.

    I stuff it with (unpredictably variable from run to run, hence the use
    of a linked list rather than an array of struct pointers) a number of
    entries. That part is all good - everything associated with it works
    exactly as intended.

    As part of the handling of this list, one of the things I need to do is
    select a random sub-group of "PickCount" entries from the list.
    Currently, I do so like this:

    // Select PickCount entries from List, returning them as a second list.
    ListEntries *PickSomeEntries(int PickCount, ListEntries *List)
    int Limit = CountEntries(List); // Should be self explanatory...
    int Index = 0;
    int Selected = 0;
    ListEntries *Pointer = NULL;
    // CList will be a sub-list of List to be returned to caller
    ListEntries *CList = calloc(1, sizeof(ListEntries));
    ListEntries *Pointer2 = CList;

    for (Index = 0;Index < PickCount; Index++)
    Selected = random() % Limit; // Pick a number from 0 to Limit

    // Yes, I realize mod-ing a random number is relatively poor
    // practice. However, since this code doesn't need
    // "crypto-strong" randoms - just a reasonably random sub-set
    // of a list - I deem it acceptable.

    Pointer = List; // Start at the first entry in the list and
    // step through the list "Selected" times to reach the
    // "Selected"th list entry
    while (Selected)
    Pointer = Pointer->Next;
    // So Pointer now points at the "Selected"th entry of List.
    // Copy it to the CList via Pointer2. Note that CopyOneEntry()
    // DOES NOT copy Next/Prev. (doing so was found to be the cause
    // of some REALLY nasty misbehavior in early development stages)
    CopyOneEntry(Pointer, Pointer2);
    // Now we set up Next/Prev to reflect the entry being added to the
    // new list. (In the process, creating a new entry at the end of
    // the growing CList and bumping Pointer2 to point at it for the
    // next (if any) pass through the for() loop)
    Pointer2->Prev = Pointer2;
    Pointer2->Next = calloc(1, sizeof(ListEntries));
    Pointer2 = Pointer2->Next;
    // And finally, hand back the list of selected entries.
    return CList;

    This is functional, but I can't help feeling that it's clunky as hell,
    and that there's probably a better way to do it. However, I can't come
    up with that "better way".

    Anybody see any obvious improvements or "gotchas" I might have missed?
    Don Bruder, Aug 2, 2010
    1. Advertisements

  2. If you can use vectors: see the Fisher-Yates shuffle, and stop after
    filling in the first PickCount slots.


    You could initialize the temporary array with pointers to the list nodes
    (by walking the list once), then shuffle the array of pointers in the way
    described above.

    Ersek, Laszlo, Aug 2, 2010
    1. Advertisements

  3. Don Bruder

    Ben Pfaff Guest

    The approaches that come to mind immediately are the ones from
    Knuth 3.4.2 "Random Sampling and Shuffling", either algorithm S
    or R depending on whether you know the number of elements in the
    linked list. I guess that you do, so algorithm S is appropriate.

    To select n items out of a set of N >= n:

    t = m = 0
    For each of the N records, in arbitrary order:
    Let U be a random number between 0 and 1.
    If (N - t) * U < n - m:
    Add the current record to the sample.
    m += 1
    If m >= n, then we're done, otherwise continue.
    t += 1
    Ben Pfaff, Aug 2, 2010
  4. Don Bruder

    spikeysnack Guest

    Aside from the obvious admonishment
    " This is an Objective-C Discussion Group <Cough>
    Please see a C Language Group list ...<cough cough>" ...

    I would suggest you create an array (randomly picking or not picking
    as you traverse your list) of pointers to records and then iterate
    through them for your tests. they don't need to be attached (linked)
    to each other as the array organizes them linearly even though in no
    particular order. This retains your original ordering of your linked
    list -- discard (free) the pointer array when done.

    I am trying to recall what advantages a linked list has over an array
    or other logical container in a modern computer without a tape memory
    but I think other than keeping the list in a certain restricted memory
    there is little advantage. Database records in olden days "Drum memory
    used massively linked lists to generate searches on fields but modern
    databases use associative lists, etc.
    PCs love arrays and C loves pointers and arrays.
    Shuffling links around vs. sorting arrays of pointers yield
    in milliseconds.

    non-32/64 bit architectures and other programming languages (lisp,
    scheme, perl,...) are sometimes better with lists than arrays.

    Objective-C uses various container objects and its own object system
    internally is linked-list-based [written in ANSI C, lists simplify the
    class hierarchy ) but it is lightweight and has been optimized for
    fast access to class ,sub-class and method traversals. A look a the
    Objective-C run-time sources might be enlightening and give you some
    ideas for your design.

    C, Objective-C, and C++ object systems are all C-structures at heart;
    the languages provide various kinds of support for handling them.
    C provides a little -- memory allocation, unions, typedef

    Objective-C provides the same as C plus full object support -- strong
    AND dynamic typing, method dispatch, messaging, creation, destruction,
    copy and inheritance, along with an active run-time system to manage
    and keep track of objects and methods. Easy to learn if you know C.

    c++ has a minimal run-time mostly concerned with type-information.
    Objects and structs are identical but can have functions associated
    with them. Strong typing
    is extremely tightly controlled, so a class template system similar
    to shell script macros can be used to simulate dynamic typing. high
    learing curve to master c++.

    You could abstract your design a little by making your data structs
    more encapsulated; (that is -- without link pointers -- just data).
    Then create LinkNodes and attach the data struct to the node via a
    pointer. This allows you to put other kinds of meta-info about your
    data into the LinkNode instead of the data structure. (ie. last access
    time, ListName, etc)
    It also makes the LinkNode responsible for traversal and relieves the
    data from awareness of other data, in object-oriented fashion. You
    can then have different kinds of data structs and reuse your
    traversal code without major surgery on your data code. RecordType
    should be part of the data struct if there is more than one type.
    spikeysnack, Aug 3, 2010
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.