J
Juha Nieminen
Greg said:So, unlike, say, searching for an item in a set (in which every search could
require a "worst case" or O(log(N)) number of operations), erasing an item
from a set may require O(log(N)) operations only rarely (I estimate around 1
in e^N items erased could turn out to be a worst case).
I think you don't really understand the O() notation. The O()
notation expresses the asymptotic *upper limit* for the time required
to perform the specified operation. Or in other words, it describes
the *worst case*, ie. the largest amount of time the operation could
possibly take. It doesn't matter if this happens rarely. O() doesn't
express the average time nor the amortized time for that operation.
Even if erasing the element can be done in constant time one
billion times for each lg(n)-time erase, erasing the element is
still an O(lg(n))-time algorithm. As I said, O() expresses the
absolute worst case scenario.
"Amortized time" is a completely different thing. It says that if
you perform the operation a very large amount of times over a typical
data set, how much *in average* it will take time.
A clearer example would be a std::vector that doubles its capacity whenever
its size is about to exceed its current capacity. Now, even though inserting
an item into this vector could require a (worst case) N number of operations
- to allocate the added capacity - it should be clear that not every
insertion could be a worst case.
Insertion into a vector *is* an O(n) operation. It doesn't matter if
the operation is constant-time when amortized over a large amount of
insertions. O() expresses the worst possible case of one operation.
In fact the number of such worst cases is
limited to 1 per every 2^N insertions. So in this case, even though the
worst case has linear complexity, the average case still has a constant
complexity, because as the worst cases become worse - they also become less
frequent by the same degree.
Yes, back-insertion in a vector is an *amortized* constant-time
operation, but it's an O(n) operation.
Your explanation is entirely backwards. First, it seems that you must be
describing the number of steps needed to find a specified item in a
container.
No, I'm describing a tree traversal from begin() to end(). The entire
traversal takes O(n) time even though one single iterator increment
is an O(lg(n)) operation.
(Common sense would dictate that if an iterator increment is an
O(lg(n)) operation and it's done n times, the entire traversal must
be an O(n*lg(n)) operation. However, quite curiously, it's not. It's
an O(n) operation. It's actually rather easy and intuitive to prove.)
Second, Linear complexity, O(N), is much worse than logarithmic
complexity, O(log(N))
So what? I was talking about an entire tree traversal, from
begin() to end(). It's a linear operation.
So O(N) is the expected worst case to find an item in
certain unsorted containers (such as a vector). To find an item in a set on
the other hand, has only a logarithmic worst-case complexity O(log(N)) - a
significant improvement over a vector.
I was not talking about finding an item.