Delete from a std::set in amortized constant time

Juha Nieminen · Jun 15, 2007

Greg said:
So, unlike, say, searching for an item in a set (in which every search could
require a "worst case" or O(log(N)) number of operations), erasing an item
from a set may require O(log(N)) operations only rarely (I estimate around 1
in e^N items erased could turn out to be a worst case).

I think you don't really understand the O() notation. The O()
notation expresses the asymptotic *upper limit* for the time required
to perform the specified operation. Or in other words, it describes
the *worst case*, ie. the largest amount of time the operation could
possibly take. It doesn't matter if this happens rarely. O() doesn't
express the average time nor the amortized time for that operation.

Even if erasing the element can be done in constant time one
billion times for each lg(n)-time erase, erasing the element is
still an O(lg(n))-time algorithm. As I said, O() expresses the
absolute worst case scenario.

"Amortized time" is a completely different thing. It says that if
you perform the operation a very large amount of times over a typical
data set, how much *in average* it will take time.

A clearer example would be a std::vector that doubles its capacity whenever
its size is about to exceed its current capacity. Now, even though inserting
an item into this vector could require a (worst case) N number of operations
- to allocate the added capacity - it should be clear that not every
insertion could be a worst case.

Insertion into a vector *is* an O(n) operation. It doesn't matter if
the operation is constant-time when amortized over a large amount of
insertions. O() expresses the worst possible case of one operation.

In fact the number of such worst cases is
limited to 1 per every 2^N insertions. So in this case, even though the
worst case has linear complexity, the average case still has a constant
complexity, because as the worst cases become worse - they also become less
frequent by the same degree.

Yes, back-insertion in a vector is an *amortized* constant-time
operation, but it's an O(n) operation.

Your explanation is entirely backwards. First, it seems that you must be
describing the number of steps needed to find a specified item in a
container.

No, I'm describing a tree traversal from begin() to end(). The entire
traversal takes O(n) time even though one single iterator increment
is an O(lg(n)) operation.

(Common sense would dictate that if an iterator increment is an
O(lg(n)) operation and it's done n times, the entire traversal must
be an O(n*lg(n)) operation. However, quite curiously, it's not. It's
an O(n) operation. It's actually rather easy and intuitive to prove.)

Second, Linear complexity, O(N), is much worse than logarithmic
complexity, O(log(N))

So what? I was talking about an entire tree traversal, from
begin() to end(). It's a linear operation.

So O(N) is the expected worst case to find an item in
certain unsorted containers (such as a vector). To find an item in a set on
the other hand, has only a logarithmic worst-case complexity O(log(N)) - a
significant improvement over a vector.

I was not talking about finding an item.

Hans Bos · Jun 15, 2007

Greg Herlihy said:
Not really. Remember that deleting an item must require no more than an
amortized constant number of operations: therefore, if erasing an item from
a set takes O(log(N)) operations (say, to rebalance the underlying RB
tree), then it must be the case that none of the preceding (or following)
items erased could have required more than a constant number of operations
to complete. In other words, in order to achieve a constant amortized
operation count, there must be a sharp limit on how often a "worst case" may
occur relative to the number of "best cases" that occur when erasing an item
from a set.

So, unlike, say, searching for an item in a set (in which every search could
require a "worst case" or O(log(N)) number of operations), erasing an item
from a set may require O(log(N)) operations only rarely (I estimate around 1
in e^N items erased could turn out to be a worst case).
.....

Yes, but the worst cases occur less and less frequently and are therefore
always compensated by an ever-increasing number of constant "best cases". So
the average complexity per deletion when the best cases are included with
the worst cases - is still a constant.

This is not true. It depends on the order of deletion.
If each time the root of the tree is deleted, each delete operation will take
O(logN) time.

So even amortized this is O(logN) per deletion.

If you always delete a leaf node, the deletion will take O(1) time (ignoring
rebalancing).

Hans.

John Harrison · Jun 15, 2007

The delete version in In Introduction To Algorithms by Thomas Cormen is
the version that takes a pointer to an element (not a key that first has
to be found).

But I can't see how you can avoid the O (lg n ) time since the
subroutine 'tree-successor' has a running time equal to the height of
the tree.

Has a worst case running time equal to the height of the tree. But the
average time is O(1).

john

John Harrison · Jun 15, 2007

But delete not only uses re balancing, it also uses tree-successor which
have a running time equal to the height of the tree.

Tree successor has a worst case running time equal to the height of the
tree, but it's average time is O(1).

john

John Harrison · Jun 15, 2007

desktop said:
As James Kanze pointed out the constant complexity is based on only
contained objects and not the extra time needed to do re balancing which
leads to a O (lg n) bound.

That may be true of the standard (standard is pretty useless in that
case IMHO), but it is still the case that to delete from a red-black
tree given the node to delete, is in the average case, a constant time
operation.

john

Greg Herlihy · Jun 15, 2007

I think you don't really understand the O() notation. The O()
notation expresses the asymptotic *upper limit* for the time required
to perform the specified operation. Or in other words, it describes
the *worst case*, ie. the largest amount of time the operation could
possibly take. It doesn't matter if this happens rarely. O() doesn't
express the average time nor the amortized time for that operation.

You misunderstand the word "limit" as it applies to big-O notation. The word
"limit" here has the same meaning as it does in calculus. A limit is not the
maximum number of operations of a "worst case" scenario - rather a limit is
the value that the entire series of individual worst cases is converging
upon (or approaching). So a big-O complexity measure really does convey
something akin to the "average" (or expected) number of operations (in the
worst case) and really says nothing about how many operations that the worst
of the worst cases could take - a number that may greatly exceed the
complexity described by the big-O notation.

Even if erasing the element can be done in constant time one
billion times for each lg(n)-time erase, erasing the element is
still an O(lg(n))-time algorithm. As I said, O() expresses the
absolute worst case scenario.

No. In order for erasing an item from a set to have O(log(N)) complexity,
the limit of the series of "worst-case" deletions that could be performed
upon a set must approach O(log(N)). Yet no matter how you carefully you
might select the items to be placed in the set and no matter how carefully
you might order each deletion in order to maximize the number of operations
required - the number of operations of each deletion when divided by the
number of items deleted will not approach O(log(N)) but will instead
converge toward a constant value. In other words, deleting an item from a
set has constant and not a logarithmic complexity.

"Amortized time" is a completely different thing. It says that if
you perform the operation a very large amount of times over a typical
data set, how much *in average* it will take time.

No. Amortized complexity is not a statistical measure nor does it describe
the "average case" complexity (and the data set may not be "typical" it must
be a worst case data set). So amortized complexity describes the limit of
the series of "worst cases" (so it's similar in concept to the average
"worst" case). Note also that an amortized complexity need not be a
constant. The amortized complexity of a binary search is O(log(N)) since the
number of operations over the entire series of "worst case" searches does
not converge upon a constant but approaches an ever-increasing value - its
"limit".

Insertion into a vector *is* an O(n) operation. It doesn't matter if
the operation is constant-time when amortized over a large amount of
insertions. O() expresses the worst possible case of one operation.

Insertion at the start of a vector (or a fixed distance) from the start does
have linear complexity - while inserting at the end (or a fixed distance
before the end) has a constant time complexity.

Yes, back-insertion in a vector is an *amortized* constant-time
operation, but it's an O(n) operation.

No, inserting at the end of a vector has constant-time complexity. It is not
an O(N) operation by any means.

Greg

desktop · Jun 15, 2007

John said:
Tree successor has a worst case running time equal to the height of the
tree, but it's average time is O(1).

john

hehe everything seems to be running in amortized constant time now

Do you no of any documents that proves that tree-successor runs in
amortized constant time?

=?ISO-8859-1?Q?Erik_Wikstr=F6m?= · Jun 16, 2007

You misunderstand the word "limit" as it applies to big-O notation. The word
"limit" here has the same meaning as it does in calculus. A limit is not the
maximum number of operations of a "worst case" scenario - rather a limit is
the value that the entire series of individual worst cases is converging
upon (or approaching). So a big-O complexity measure really does convey
something akin to the "average" (or expected) number of operations (in the
worst case) and really says nothing about how many operations that the worst
of the worst cases could take - a number that may greatly exceed the
complexity described by the big-O notation.

No. In order for erasing an item from a set to have O(log(N)) complexity,
the limit of the series of "worst-case" deletions that could be performed
upon a set must approach O(log(N)). Yet no matter how you carefully you
might select the items to be placed in the set and no matter how carefully
you might order each deletion in order to maximize the number of operations
required - the number of operations of each deletion when divided by the
number of items deleted will not approach O(log(N)) but will instead
converge toward a constant value. In other words, deleting an item from a
set has constant and not a logarithmic complexity.

No. Amortized complexity is not a statistical measure nor does it describe
the "average case" complexity (and the data set may not be "typical" it must
be a worst case data set). So amortized complexity describes the limit of
the series of "worst cases" (so it's similar in concept to the average
"worst" case). Note also that an amortized complexity need not be a
constant. The amortized complexity of a binary search is O(log(N)) since the
number of operations over the entire series of "worst case" searches does
not converge upon a constant but approaches an ever-increasing value - its
"limit".

Insertion at the start of a vector (or a fixed distance) from the start does
have linear complexity - while inserting at the end (or a fixed distance
before the end) has a constant time complexity.

No, inserting at the end of a vector has constant-time complexity. It is not
an O(N) operation by any means.

Are you sure, I was under the impression that insertions in a vector
(even at the end) was amortized constant time due to the need to re-
allocate now and then. Insertions at the end (or anywhere provided a
pointer/iterator) to a list on the other hand is constant since there
will never be any reallocations.

John Harrison · Jun 16, 2007

desktop said:
hehe everything seems to be running in amortized constant time now
Do you no of any documents that proves that tree-successor runs in
amortized constant time?

Well, exercise 13.2-4 in Cormen states (in effect) prove that you can
use tree-successor to iterate through a n-node binary search tree in
O(n) time. So each indiviual step must be O(1).

But I'm no mathematical expert.

John Harrison · Jun 16, 2007

John said:
Well, exercise 13.2-4 in Cormen states (in effect) prove that you can
use tree-successor to iterate through a n-node binary search tree in
O(n) time. So each indiviual step must be O(1).

But I'm no mathematical expert.

I think the proof would be along the lines of this.

A binary tree of N nodes has N-1 links. If tree-successor is called to
iterate though the tree from beginning to end then each link will be
traversed twice (once in each direction). Therefore 2N-2 traverals are
made. Therefore an average of (2N-2)/N traverals for each step.
Therefore O(1) for each step.

john

Hans Bos · Jun 16, 2007

John Harrison said:
Well, exercise 13.2-4 in Cormen states (in effect) prove that you can use
tree-successor to iterate through a n-node binary search tree in O(n) time. So
each indiviual step must be O(1).

This is only true as long as the tree doesn't change during the iteration.

If you iterate from the root, it takes O(logN) time to find the successor.
If that successor is moved to the root, before you iterate to it's successor,
that step will also take O(logN) time.

So it the tree is modified while iterating, the amoritized time can still be
O(logN) per iteration.

Hans.

Jerry Coffin · Jun 17, 2007

[ ... ]

If each time the root of the tree is deleted, each delete operation will take
O(logN) time.

So even amortized this is O(logN) per deletion.

One minor addition to note: since you are deleting nodes, the value of N
decreases each iteration, sone when you delete the last node (for
example) N is 1.

In terms of big-O notation, this has no effect since O(Log(N/2)) is the
same as O(Log(N)) -- both are logarithmic. In the real world it's often
worth keeping in mind though...

Juha Nieminen · Jun 17, 2007

John said:
it's average time is O(1).

Nitpicking here, but I'm not sure you should be using the O()
notation for the average time. The O notation expresses the worst
case scenario and is not related to the average cases in any way.

More correct would probably be "in average it's a constant-time
operation".

Juha Nieminen · Jun 17, 2007

Greg said:
You misunderstand the word "limit" as it applies to big-O notation. The word
"limit" here has the same meaning as it does in calculus. A limit is not the
maximum number of operations of a "worst case" scenario - rather a limit is
the value that the entire series of individual worst cases is converging
upon (or approaching).

I don't think so.

Ok, I used the wrong term: It's not the "upper limit", but the
"upper bound" (there might be a slight difference in meaning, maybe).

The big-O notation gives the asymptotically "smallest" function
which the operation never exceeds. In other words, at no time will
the operation/algorithm perform asymptotically worse than what this
function says.

For example quicksort is an O(n^2) algorithm, even though in average
it performs faster than that. The O notation simply expresses its
worst case running time.

Back-insertion in a std::vector is an O(n) operation, even if it's
amortized constant-time. This is because from time to time the insertion
will require n operations. The O notation thus expresses the upper
bound for all the possible back-insertion operations (in other words,
it basically says that back-insertion will never be slower than linear
time, but it says nothing about if it can run faster than that or how
fast it runs in average).
Saying that back-insertion into std::vector is an O(1) operation
would be wrong. It would be giving the wrong information. Saying that
is equivalent to saying "back-insertion never runs slower than
constant time", which is not true.

Likewise a map iterator increment is an O(lg(n)) operation. It tells
us that one increment can take lg(n) steps, but never more. However,
this information alone doesn't tell us how much an increment takes
in average or in the best case scenario.

No. In order for erasing an item from a set to have O(log(N)) complexity,
the limit of the series of "worst-case" deletions that could be performed
upon a set must approach O(log(N)). Yet no matter how you carefully you
might select the items to be placed in the set and no matter how carefully
you might order each deletion in order to maximize the number of operations
required - the number of operations of each deletion when divided by the
number of items deleted will not approach O(log(N)) but will instead
converge toward a constant value. In other words, deleting an item from a
set has constant and not a logarithmic complexity.

You are still not understanding that O() tells us the upper bound
for the operation, not its average.

No, inserting at the end of a vector has constant-time complexity. It is not
an O(N) operation by any means.

O(n) tells us that "a back-insertion can take n steps, but no more".
This is true for std::vector, so the statement is true.

Inserting into std::set	12	Oct 26, 2006
std::priority_queue: Abysmally bad performance versus std::set, can'tfigure out why.	5	Nov 30, 2012
Is there in memory tree structure faster than std::map?	11	Aug 12, 2012
Using std::set::insert(hint p, e) ("hairy")	6	Jun 12, 2007
std::set constructor taking a sorted sequence	7	Jun 10, 2007
std::set versus tr1::unordered_set	1	Jun 3, 2005
Constant time insertion into a sorted list?	1	Jul 16, 2008
How are objects inserted into a set?	16	Jun 7, 2007

Delete from a std::set in amortized constant time

Juha Nieminen

Hans Bos

John Harrison

John Harrison

John Harrison

Greg Herlihy

desktop

=?ISO-8859-1?Q?Erik_Wikstr=F6m?=

John Harrison

John Harrison

Hans Bos

Jerry Coffin

Juha Nieminen

Juha Nieminen

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads