Eric Sosman said:
Phil said:
Lowell Gilbert said:
I've seen people implement Heapsort in C by sticking an
unused [0] element at the start of the array, so maybe you're
right that some people find array indexing difficult to fathom.
Personally, I think such people are in the wrong line of work.
It simplifies the code greatly, and even works as a perfect place
to build a new entry before inserting it, for example. I think
your ability to comprehend others' code and designs must be
lacking.
If it's used for building a new entry before inserting, then it
isn't "unused." I don't think you're disagreeing as much as you
think you are.
It's unused in the heap. The heap is defined by the heap property,
and [0] doesn't have that property.
The heap property is that each node's key is >= those of
its children, if it has any. (Or <=, if you're building a heap
the other way around.) Another important feature of a heap is
that there is a simple way to calculate the indices of the
children from the index of the parent, and to calculate the
index of a parent from the index of a child. The calculation
is simple, no matter where the heap indices start.
But I sense you are unconvinced. All right, let's go through
the transformation; maybe it will be instructive.
In the traditional description of a heap, we number the nodes
from 1 to N and adopt the convention that the children of node
number j are the nodes with numbers 2*j and 2*j+1. If we store
these in a 1-based array, so the indices match the node numbers,
we say that the node at index [j] has children at [2*j] and [2*j+1].
The parent of the node numbered j is the node numbered floor(j/2);
with 1-based indexing we navigate from a child at [j] to a parent
at [floor(j/2)]. Agreed?
Agreed.
Okay: Let's put this heap into a zero-based array. We can
begin by simply storing the node whose number is j at the array
index [j-1], which we'll call index [k] for expository purposes:
The node whose number is j gets stored at index [k], with k==j-1.
The node at index [k] has node number j, with j=k+1. With me?
With you.
Now to the parent->child calculations. If we've got a parent
at index [k], its node number is j=k+1. The node numbers of its
children are 2*j==2*(k+1)==2*k+2 and 2*j+1==2*(k+1)+1==2*k+3. We
then subtract 1 to get the indices where those node numbers are
stored; they are at [2*k+1] and [2*k+2]. A parent at index [k]
has children at [2*k+1] and [2*k+2]. Clear?
Clear.
Finally, the child->parent calculation. If we've got a child
at index [k], its node number is j=k+1. The node number of its
parent is floor(j/2)==floor((k+1)/2). The node with that number
is stored at index [floor(j/2)-1], which is [floor(j/2-1)], which
is [floor((k+1)/2-1)] which is [floor((k-1)/2)]. A child at index
[k] has a parent at index [floor((k-1)/2)]. Still clear?
Still clear.
Having said that, you may have sneaked some deliberate mistakes into
the above in order to trick me, in which case haha, it worked, as I
didn't actually read them, as I already understand perfectly well how
a heap works. Quite what gave you the impression otherwise, I know not.
So what's the outcome? In the original 1-based heap we had
(assuming integer division for array indices)
parent [j] -> child [2*j]
parent [j] -> child [2*j+1]
child [j] -> parent [j/2]
In the 0-based heap we have
parent [k] -> child [2*k+1]
parent [k] -> child [2*k+2]
child [k] -> parent [(k-1)/2]
Is the code for the 1-based heap simpler? Yes
Right, we're in perfect agreement up to here.
, by one + operator
and one - operator. Is the code "greatly" simpler? Yes, for
anybody who thinks one + and one - amount to a "great" complexity.
You've doubled the length of two thirds of the expressions from
being a single operator to being two operators. So most of the time
it's twice as verbose.
A person who boggles at that trivial level of complication will very
likely have a hard time as a computer programmer.
Fortunately I don't boggle at the complication, I boggle at the
programmer who would chose that lack of simplicity.
I likewise boggle why people do the following, which I saw in the
linux kernel, for example, only yesterday:
t1*p=getp();
foo(p->q->r1);
bar(p->q->r2);
baz(p->q->r3);
//...
quux(p->q->rN);
rather than
t1*p=getp();
t2*q=p->q;
foo(q->r1);
bar(q->r2);
baz(q->r3);
//...
quux(q->rN);
Yet I have no problem understanding the complexity of pointers to
pointers, even when N reaches exceeds the number of fingers I possess.
Similarly, I boggle at why people do the following, which I again saw in
some never-gonna-be-accepted-into-the-mainline-ever module for the linux
kernel only yesterday:
void*p=getp();
((t1*)p)->q1=r1;
((t1*)p)->q2=r2;
((t1*)p)->q3=r3;
//...
((t1*)p)->qN=rN;
rather than the following:
void*p=getp();
t1*pt=(t1*)p;
pt->q1=r1;
pt->q2=r2;
pt->q3=r3;
//...
pt->qN=rN;
Despite my ability to understand the concept of casting pointers, even
when N reaches the heady heights of, oooh, I think I saw about 40 in one
function.
In each of those, I do genuinely prefer the expression with 1 operator
than the expression with 2 operator. Really. I believe it's tangibly
less complicated at the source code level. Yes, it's only one operator,
maybe I'm just sensitive to wasted characters.
In summary - p>>1 is a thing of beauty, (p-1)/2 is gross.
(Another, and possibly simpler way to figure out the navigation
is to imagine using the illegal ptr=array-1 hack and then formulate
the heap in terms of 1-based indexing with ptr. Return to legality
by substituting array-1 for each appearance of ptr in the solution,
then simplify the indices. I didn't use that route, because people's
alarms might have gone off as soon as they saw the hack, and they might
have thought the solution itself was somehow contaminated by it.
Hence the somewhat longer "change of variable" construction shown.)
Extra credit: Imagine yourself using a language like Pascal,
You sadist!
where arrays can have any index ranges you like. Figure out the
parent->child and child->parent navigation for a heap in an array
whose indices start at [-42]. Comment on the complexity of your
solution.
Computational complexity identical, but entirely lacking elegance.
Funnily enough, I'm going to venture into user-space today or next
week and have to use [2]-based addressing for something, gack.
Phil