Dave said:
Hello all,
I'm pondering why the default underlying container for std:
riority_queue<>
is std::vector<>. It would seem that inserts are liable to happen anywhere,
which would make std::list<> a superior alternative. Why factors am I not
considering here? Why in the general case would std::vector<> be best?
This exemplifies why std::list (and linked lists in general) are
rarely as useful as they initially appear.
Linked lists allow O(k) insertion or deletion anywhere in the list,
which sounds good. Unfortunately, in typical situations, you first
have to figure out where in the list to to do that insertion or
deletion -- and traversing the list to that point is O(N). As such,
the O(k) insertion/deletion is really O(N) unless you can find the
right spot in the list without actually traversing it (cf. skip
lists).
In the case of a heap, things are a really even worse: you never
really have to insert in the middle of the heap anyway -- you make the
addition at one end of the heap, and then swap the new element with
roughly lg(N) elements to get it to the right spot. Since you can add
an item to one end of a vector (or either end of a deque) in
(amortized) constant time, the total complexity is O(lg N).
Each insertion in the list would involve O(N) traversal steps along
with the O(lg N) comparisons, and finally an O(k) insertion at the
right spot. As such, the only hope for a list would be if we could
count on those N traversal steps being _extraordinarily_ fast (it
clearly loses asymtotically, but might remain better in practical
ones).
Unfortunately, that's exactly the opposite of reality: in a vector,
the memory is contiguous, leading to high locality of reference.
Along with that, we have only the data we care about being stored --
i.e. data density is 100%. Both of these help with cache utilization.
In a list, each node is normally allocated individually, and includes
not only the data itself, but a couple of pointers as well. Therefore
we have relatively poor locality of reference, and lower density of
data (i.e. the cache ends up holding lots of pointers along with the
data proper). The result is poorer use of the cache, so not only is
the complexity higher, but the constants are too...
Later,
Jerry.