Delete from a std::set in amortized constant time

desktop · Jun 13, 2007

In the C++ standard sec 23.1.2 table 69 it says that erase(q) where q is
a pointer to an element can be done in amortized constant time.

I guess that is not worst case since std::set is practically a red-black
tree where insert/delete takes O(lg n) time. Or are there some other
explanation for this complexity?

Dave Steffen · Jun 13, 2007

desktop said:
In the C++ standard sec 23.1.2 table 69 it says that erase(q) where q
is a pointer to an element can be done in amortized constant time.

I guess that is not worst case since std::set is practically a
red-black tree where insert/delete takes O(lg n) time. Or are there
some other explanation for this complexity?

IIRC insert takes log n time, since you have to search for the right
place to insert it.

"Remove this value" also takes log n time, since you have to search
for the thing to delete.

In this case, you already know what thing to delete, since you've
got an iterator to it. No searching required, just fiddling with
some internal bookkeeping.

Greg Herlihy · Jun 13, 2007

In the C++ standard sec 23.1.2 table 69 it says that erase(q) where q is
a pointer to an element can be done in amortized constant time.

I guess that is not worst case since std::set is practically a red-black
tree where insert/delete takes O(lg n) time. Or are there some other
explanation for this complexity?

The explanation is simple. The removal time of a node when measured for an
RB tree includes the time needed to find the node to be deleted in the tree.

In this case of a call to set::erase(), no time needs to be spent searching
for the node (that is, the item) to be removed because its location is
passed to the erase method as a parameter. So by skipping the search for the
item, the item is able to be removed from the set in amortized constant (and
not logarithmic) time.

Greg

desktop · Jun 13, 2007

Dave said:
IIRC insert takes log n time, since you have to search for the right
place to insert it.

"Remove this value" also takes log n time, since you have to search
for the thing to delete.

In this case, you already know what thing to delete, since you've
got an iterator to it. No searching required, just fiddling with
some internal bookkeeping.

But you still need to do the following re balancing that can take O(lg
n) time

Greg Herlihy · Jun 13, 2007

But you still need to do the following re balancing that can take O(lg
n) time

But the amortized time for rebalancing an RB tree is O(1) - in another
words, a constant amount of time. So the C++ Standard's performance
requirements for deleting an item from a set can be met by implementing the
set with an RB tree.

Greg

desktop · Jun 13, 2007

Greg said:
But the amortized time for rebalancing an RB tree is O(1) - in another
words, a constant amount of time. So the C++ Standard's performance
requirements for deleting an item from a set can be met by implementing the
set with an RB tree.

Greg

The delete version in In Introduction To Algorithms by Thomas Cormen is
the version that takes a pointer to an element (not a key that first has
to be found).

But I can't see how you can avoid the O (lg n ) time since the
subroutine 'tree-successor' has a running time equal to the height of
the tree.

Juha Nieminen · Jun 13, 2007

Greg said:
In this case of a call to set::erase(), no time needs to be spent searching
for the node (that is, the item) to be removed because its location is
passed to the erase method as a parameter. So by skipping the search for the
item, the item is able to be removed from the set in amortized constant (and
not logarithmic) time.

And the algorithm somehow manages to rebalance the tree in amortized
constant time?

James Kanze · Jun 13, 2007

In the C++ standard sec 23.1.2 table 69 it says that erase(q)
where q is a pointer to an element can be done in amortized
constant time.

No it doesn't. It says that the complexity is "amortized
constant". And in §21.1/2, it says clearly that "All of the
complexity requirements in this clause are stated solely in
terms of the number of operations on the contained objects."
I would expect that erase(q) requires one call to the destructor
of the object, and that is it. Which definitely makes it O(1);
I don't even know why there is an "amortized" in there.

The standard never makes any requirements with regards to time.
In a very real way, it can't; there are too many variables
involved that are beyond the control of the implementation.

desktop · Jun 13, 2007

James said:
No it doesn't. It says that the complexity is "amortized
constant". And in §21.1/2, it says clearly that "All of the
complexity requirements in this clause are stated solely in
terms of the number of operations on the contained objects."
I would expect that erase(q) requires one call to the destructor
of the object, and that is it. Which definitely makes it O(1);
I don't even know why there is an "amortized" in there.

Ok so when erase(p) is said to be amortized constant they exclude the
time used to re balance the tree which would result in logarithmic time.

V.R. Marinov · Jun 13, 2007

desktop said:
In the C++ standard sec 23.1.2 table 69 it says that erase(q) where q is
a pointer to an element can be done in amortized constant time.

I guess that is not worst case since std::set is practically a red-black
tree where insert/delete takes O(lg n) time. Or are there some other
explanation for this complexity?

You don't read carefully enough.
According to the standard the function erase(T value) has log(N)
complexity where N is the number of elements. This function requires
searching.

And the function erase(iterator it) has constant complexity because
it doesn't require searching.

Zeppe · Jun 13, 2007

V.R. Marinov said:
You don't read carefully enough.
According to the standard the function erase(T value) has log(N)
complexity where N is the number of elements. This function requires
searching.

And the function erase(iterator it) has constant complexity because
it doesn't require searching.

You don't read carefully enough. The OP argues that the rebalancing of
the tree takes O(log n) time, so, despite of the search, the complexity
would be O(log n) just for the rebalancing.

Regards,

Zeppe

V.R. Marinov · Jun 13, 2007

Zeppe said:
You don't read carefully enough. The OP argues that the rebalancing of
the tree takes O(log n) time, so, despite of the search, the complexity
would be O(log n) just for the rebalancing.

I guess I was trying to say that rebalancing doesn't take O(lg(N))
complexity.

desktop · Jun 13, 2007

V.R. Marinov said:
You don't read carefully enough.
According to the standard the function erase(T value) has log(N)
complexity where N is the number of elements. This function requires
searching.

And the function erase(iterator it) has constant complexity because
it doesn't require searching.

As James Kanze pointed out the constant complexity is based on only
contained objects and not the extra time needed to do re balancing which
leads to a O (lg n) bound.

Zeppe · Jun 13, 2007

desktop said:
Ok so when erase(p) is said to be amortized constant they exclude the
time used to re balance the tree which would result in logarithmic time.

Not at all.

1) the sentence in 23.1/2 refers to the fact that the complexity is
independent on the type of object that is stored in the container. The
complexity is calculated in terms of operations on the objects, exactly
how normally it's calculated in the computer science books.

2) I don't have any book on data structures, but wikipedia tells that
the rebalancing takes O(log N) or amortized O(1) time. So, it's
reasonable to say that if you do not need to search but just to
rebalance, the required time is amortized O(1). Amortized O(1) is not
the same of O(1): the rebalancing IS O(log N) in the worst case, but
there is the guarantee that the worst case can't happen too often, so in
average it's O(1). More accurately, it means that if you have to erase a
sequence, this will be done in O(n) (not O(n log n)) time. The standard
apparently gives only this "weak" requirement.

3) having said so, I would suggest you to try to read carefully
something about these data structures if you are interest in them,
because it seems like you are trying to understand pretty complex and
theoretical stuff having done just a quick look trough some book. The
questions you are asking for are quite difficult, and almost anybody
here is proficient with the theory of RB trees (that are not strictly
related with c++ either).

Regards,

Zeppe

desktop · Jun 13, 2007

Zeppe said:
Not at all.

1) the sentence in 23.1/2 refers to the fact that the complexity is
independent on the type of object that is stored in the container. The
complexity is calculated in terms of operations on the objects, exactly
how normally it's calculated in the computer science books.

2) I don't have any book on data structures, but wikipedia tells that
the rebalancing takes O(log N) or amortized O(1) time. So, it's
reasonable to say that if you do not need to search but just to
rebalance, the required time is amortized O(1). Amortized O(1) is not
the same of O(1): the rebalancing IS O(log N) in the worst case, but
there is the guarantee that the worst case can't happen too often, so in
average it's O(1). More accurately, it means that if you have to erase a
sequence, this will be done in O(n) (not O(n log n)) time. The standard
apparently gives only this "weak" requirement.

But delete not only uses re balancing, it also uses tree-successor which
have a running time equal to the height of the tree. This subroutine is
used in the version of delete that takes a pointer to the element thats
going to be deleted.

3) having said so, I would suggest you to try to read carefully
something about these data structures if you are interest in them,
because it seems like you are trying to understand pretty complex and
theoretical stuff having done just a quick look trough some book. The
questions you are asking for are quite difficult, and almost anybody
here is proficient with the theory of RB trees (that are not strictly
related with c++ either).

The literature I read don't cover these special cases or uses of the
red/black tree and the basics they cover are not very explanatory (don't
think you can find a book covering the requirements of the operations in
the C++ standard in this kind of detail) but I agree that its getting
Off-Topic and I will make sure to post theoretical questions in a more
appropriate group.

James Kanze · Jun 13, 2007

Ok so when erase(p) is said to be amortized constant they
exclude the time used to re balance the tree which would
result in logarithmic time.

Essentially, yes. They also exclude the time to deallocate the
node; the time necessary for deallocation could depend on the
number of individual blocks allocated. They also exclude any
time necessary for paging, due to poor locality or whatever,
which might also depend (indirectly) on the size of the set. I
imagine that most people would agree that these last two are
reasonable to exclude; there's no way to really say much about
them anyway. But that means that the complexity guarantees
can't be measured in time. And the results of measuring them in
operations on the contained object results in the time to
rebalance the tree being excluded as well.

Whether this is significant depends on the object type. If
comparison or destruction are complicated operations, the few
pointer operations necessary for rebalancing won't be
significant, even if the tree is 20 deep, and you hit the worst
case. If the set only contains int's, and you use a blocked
fixed size allocator which can deallocate in with only one or
two pointer manipulations, the rebalancing is a major part of
the job.

So I'm not saying that it's good or bad, the way it's specified.
I'm just saying that that's how it's specified.

Greg Herlihy · Jun 14, 2007

And the algorithm somehow manages to rebalance the tree in amortized
constant time?

Yes.

And if you would like to know exactly how a RB tree is rebalanced, you can
look up the relevant entry in Wikipedia as a starting point.

Greg

Juha Nieminen · Jun 14, 2007

V.R. Marinov said:
I guess I was trying to say that rebalancing doesn't take O(lg(N))
complexity.

Note that O() and "amortized time" are different things. O() denotes
the *worst-case* scenario of one single operation. Even if rebalancing
the tree after a removal could be done in *amortized* constant time,
it could still be O(lg(N)).
Granted, I'm not acquainted with the exact algorithm used for
rebalancing the tree, but I'm pretty sure it takes lg(N) time in
the worst case.

It's the same thing as with the tree traversal: Even though a whole
traversal can be done in linear time (iow. the whole traversal takes
O(N) steps), one single iterator increment is O(lg(N)) (because there
are cases where it takes that many operations).

desktop · Jun 14, 2007

Juha said:
Note that O() and "amortized time" are different things. O() denotes
the *worst-case* scenario of one single operation. Even if rebalancing
the tree after a removal could be done in *amortized* constant time,
it could still be O(lg(N)).
Granted, I'm not acquainted with the exact algorithm used for
rebalancing the tree, but I'm pretty sure it takes lg(N) time in
the worst case.

It's the same thing as with the tree traversal: Even though a whole
traversal can be done in linear time (iow. the whole traversal takes
O(N) steps), one single iterator increment is O(lg(N)) (because there
are cases where it takes that many operations).

The tree-successor subroutine in delete (before the call to re-balance)
also takes O(lg n) time so this must dominate delete even though you can
do re balancing in amortized constant time.

Greg Herlihy · Jun 15, 2007

Note that O() and "amortized time" are different things. O() denotes
the *worst-case* scenario of one single operation. Even if rebalancing
the tree after a removal could be done in *amortized* constant time,
it could still be O(lg(N)).

Not really. Remember that deleting an item must require no more than an
amortized constant number of operations: therefore, if erasing an item from
a set takes O(log(N)) operations (say, to rebalance the underlying RB
tree), then it must be the case that none of the preceding (or following)
items erased could have required more than a constant number of operations
to complete. In other words, in order to achieve a constant amortized
operation count, there must be a sharp limit on how often a "worst case" may
occur relative to the number of "best cases" that occur when erasing an item
from a set.

So, unlike, say, searching for an item in a set (in which every search could
require a "worst case" or O(log(N)) number of operations), erasing an item
from a set may require O(log(N)) operations only rarely (I estimate around 1
in e^N items erased could turn out to be a worst case).

A clearer example would be a std::vector that doubles its capacity whenever
its size is about to exceed its current capacity. Now, even though inserting
an item into this vector could require a (worst case) N number of operations
- to allocate the added capacity - it should be clear that not every
insertion could be a worst case. In fact the number of such worst cases is
limited to 1 per every 2^N insertions. So in this case, even though the
worst case has linear complexity, the average case still has a constant
complexity, because as the worst cases become worse - they also become less
frequent by the same degree.

Granted, I'm not acquainted with the exact algorithm used for
rebalancing the tree, but I'm pretty sure it takes lg(N) time in
the worst case.

Yes, but the worst cases occur less and less frequently and are therefore
always compensated by an ever-increasing number of constant "best cases". So
the average complexity per deletion when the best cases are included with
the worst cases - is still a constant.

It's the same thing as with the tree traversal: Even though a whole
traversal can be done in linear time (iow. the whole traversal takes
O(N) steps), one single iterator increment is O(lg(N)) (because there
are cases where it takes that many operations).

Your explanation is entirely backwards. First, it seems that you must be
describing the number of steps needed to find a specified item in a
container. Second, Linear complexity, O(N), is much worse than logarithmic
complexity, O(log(N)) So O(N) is the expected worst case to find an item in
certain unsorted containers (such as a vector). To find an item in a set on
the other hand, has only a logarithmic worst-case complexity O(log(N)) - a
significant improvement over a vector.

In any event, searching in a set is not comparable to erasing or adding an
item (at a known location) - since, when searching there is no limit on the
frequency of "worst" cases searches versus "best" case searches - and in
fact every search could be a worst case - so its complexity has to be
reported with the most pessimistic assumption - just in case.

Greg

Inserting into std::set	12	Oct 26, 2006
std::priority_queue: Abysmally bad performance versus std::set, can'tfigure out why.	5	Nov 30, 2012
Is there in memory tree structure faster than std::map?	11	Aug 12, 2012
Using std::set::insert(hint p, e) ("hairy")	6	Jun 12, 2007
std::set constructor taking a sorted sequence	7	Jun 10, 2007
std::set versus tr1::unordered_set	1	Jun 3, 2005
Constant time insertion into a sorted list?	1	Jul 16, 2008
How are objects inserted into a set?	16	Jun 7, 2007

Delete from a std::set in amortized constant time

desktop

Dave Steffen

Greg Herlihy

desktop

Greg Herlihy

desktop

Juha Nieminen

James Kanze

desktop

V.R. Marinov

Zeppe

V.R. Marinov

desktop

Zeppe

desktop

James Kanze

Greg Herlihy

Juha Nieminen

desktop

Greg Herlihy

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads