Joerg said:
Sounds interesting, though I don't understand it fully and
I don't have access to any copy of Knuth's books. Code might
be interesting, though. What about posting it or making it
available for download?
If I get it correctly, the algorithm you are talking about
is unstable, works well only for powers of two (except with
some special modification to avoid this) and doesn't
recognize natural runs in the input? I would like to test
it against mine to see how it performs.
#include <stddef.h>
#include <limits.h>
typedef struct unknown Node;
#define NEXT(ptr) *(Node**)((char*)(ptr) + offset)
#define MAXBITS (sizeof(Node*) * CHAR_BIT)
static void *
merge(Node *p, Node *q,
size_t offset,
int (*compare)(const void *, const void *))
/*
* Internal routine to merge two non-empty sorted lists. If elements
* of the two lists compare equal, those from the `p' list will precede
* those from the `q' list in the merged output.
*/
{
Node *head, **tail;
tail = &head;
for (;

{
if (compare(p, q) <= 0) {
*tail = p;
tail = &NEXT(p);
if ( (p = NEXT(p)) == NULL ) {
*tail = q;
break;
}
}
else {
*tail = q;
tail = &NEXT(q);
if ( (q = NEXT(q)) == NULL ) {
*tail = p;
break;
}
}
}
return head;
}
void * /* returns: pointer to new list head */
listsort(
void *head, /* first item in original list */
size_t offset, /* byte offset of link field in each item */
int (*compare) /* item comparison function */
(const void *, const void *))
/*
* listsort() rearranges the links of a singly-linked NULL-terminated
* list so they traverse the items in an order defined by a caller-
* provided comparison function, and returns a pointer to the first
* item in the rearranged list. The comparison function accepts two
* item pointers and returns a negative, zero, or positive integer to
* indicate that the first item compares less than, equal to, or greater
* than the second. The sort is stable.
*/
{
Node *list[MAXBITS-1]; /* sorted sub-lists of 2,4,8,... items */
Node *p, *q;
int bits, maxbits;
list[ maxbits = 0 ] = NULL;
while ( (p = head) != NULL ) {
if ( (q = NEXT(p)) == NULL )
break;
head = NEXT(q);
if (compare(p, q) <= 0) {
NEXT(q) = NULL;
}
else {
NEXT(q) = p;
NEXT(p) = NULL;
p = q;
}
for (bits = 0; (q = list[bits]) != NULL; ) {
list[bits] = NULL;
p = merge(q, p, offset, compare);
if (++bits > maxbits) {
maxbits = bits;
break;
}
}
list[bits] = p;
}
for (bits = 0; bits <= maxbits; ++bits) {
if ( (p = list[bits]) != NULL )
head = (head == NULL) ? p : merge(p, head, offset, compare);
}
return head;
}
Notes:
1) This is packaged as a type-blind "generic" list sort, the
only assumptions being that the links are struct pointers and
that a list's final node has a NULL link. A type-aware version
(with knowledge of the nodes' structure and able to compare nodes
in-line) should run faster.
2) The sort doesn't actually need a comparison function that
returns -ve/zero/+ve: a Boolean `precedes' function would do as
well and could be faster. In this sort, I decided to stick with
a qsort-and-bsearch style, mostly because that's what people are
accustomed to.
3) This code implements the basic McCarthy method without
protection against badly unbalanced merges in the cleanup phase.
3a) Well, not the "most basic" McCarthy: It sorts two-node
lists in-line instead of merging two one-node lists.
3b) Knuth suggests that "we can sort groups of, say, 16 items
using straight insertion," but in tests I've conducted and on the
machines available to me that doesn't seem to be a good idea. (He
makes the suggestion in connection with merge using arrays instead
of lists, and perhaps things would be different in that setting.)
4) The McCarthy algorithm may be more understandable if you
note the parallel to the process of counting in binary. When N
items (or item pairs, with the 3a optimization) have been consumed,
the elements of list[] correspond to the binary representation of
N: each zero bit corresponds to a NULL in list[], and each one bit
to a non-NULL link to a sorted sub-list of N items (2*N with 3a).
Consuming the next item (or pair) "increments" N by working from
right to left: if the low-order bit is zero, the new item lands
in list[0]. Otherwise, it is merged with list[0], which becomes
NULL, and then the new two-item (two-pair) list "carries" to the
list[1] place, and so on.
5) To avoid wildly unbalanced merges during cleanup, use two
arrays alist[] and blist[] instead of the single array list[].
The "incrementing" process first tries to fill the NULL link in
alist[0], but if it's non-NULL it uses blist[0] instead. If
both are non-NULL, it merges those two lists, deposits the new
sub-list in alist[0] and NULLs blist[0], then "carries" the merge
result one place to the left:
for (bits = 0; ; p = q) {
if (alist[bits] == NULL) {
alist[bits] = p;
break;
}
if (blist[bits] == NULL) {
blist[bits] = p;
break;
}
q = merge(alist[bits], blist[bits], offset, compare);
alist[bits] = p;
blist[bits] = NULL;
if (++bits > maxbits) {
maxbits = bits;
alist[bits] = q;
blist[bits] = NULL;
break;
}
}
In my tests, this method is just a hair slower than the original
on lists of length 2^k or 2^k-1, but shows a substantial gain for
lengths like 2^k+1. It is the fastest "general-purpose" list
merge I know.
6) All the versions mentioned above are "straight" merges.
"Natural" merges that make use of the order already present in
the input list can beat the pants off them *if* the list is in
very good order already. However, in my tests the pre-existing
order must be very good indeed for natural to beat straight;
Knuth's remark in the answer to exercise 5.2.4-12
"We may conclude that natural merging is preferable to
straight merging when linked allocation is being used,
although it was inferior for sequential allocation."
is not supported by my data. (Hypothesis: Knuth was thinking
about the number of comparisons during the merges themselves,
but forgot about the N-1 comparisons required to discover the
boundaries between the initial runs. On "random" input, a
natural merge uses N-1 comparisons to create approximately N/2
initial runs, while a straight merge uses only N/2 comparisons
to make the same amount of progress.)
7) I've been working on and off -- mostly off -- on a paper
describing test results for various kinds of linked-list merges,
but I'm not ready to "publish," however informally. When last
I set it aside, I was having a hard time writing a linked-list
Quicksort that didn't exhibit pathological behavior. Maybe if
I convince myself it's just not possible, I'll finish the damn'
thing and post it.
8) An interesting (and possibly discouraging) observation:
Efficient linked-list sorting is relatively unimportant! First,
sorting a linked list brings very little "algorithmic advantage"
the way sorting an array does: In a sorted array you can use
things like binary search, but sorting a linked list offers no
similar gain. (Sorting cuts the average time for an unsuccessful
search in half, but if N is large there are better ways to search.)
Second, linked lists are "unfriendly" to the memory implementations
of contemporary systems -- bad locality, cache thrashing, etc. --
so a good programmer usually tries to keep them short; if N is
small, even an inefficient sort will be quick. Yes, there are
circumstances where an efficient linked-list sort is needed, but
IMHO they are (or should be) on the rare side.