std::set constructor taking a sorted sequence

D

desktop

In the C++ standard page 472 it says that you can construct a std::set
in linear time if the constructor gets a sorted sequence of elements.
But how is this possible when insert takes logarithmic time? Should the
time not be nlgn where n is the number of elements?
 
R

Rosarin Roy

The answer is in the question. If the sequence is ordered, there is no
need to repeat the research each time (logn), so the set can be
constructed in a linear time (you already know where to place the
element, that would be the action requiring logn time).

Regards,

Zeppe

Most of the standard libraries make use of red-black trees for
implementing set and map. In such case the insertion cannot be linear
in time, immaterial of input is sorted or not sorted. And the claim
that "you already know where to place the element" is not true across
calls, as the search always begins at root.

I wrote a test program (which I tested on Solaris running Sun C++ 5.8
compiler) to confirm that sorted elements' insertion doesn't take
linear time.

I am still curious to know if it is possible to construct a set in
linear time.

Rosarin Roy
 
Z

Zeppe

desktop said:
In the C++ standard page 472 it says that you can construct a std::set
in linear time if the constructor gets a sorted sequence of elements.
But how is this possible when insert takes logarithmic time? Should the
time not be nlgn where n is the number of elements?

The answer is in the question. If the sequence is ordered, there is no
need to repeat the research each time (logn), so the set can be
constructed in a linear time (you already know where to place the
element, that would be the action requiring logn time).

Regards,

Zeppe
 
J

Jerry Coffin

In the C++ standard page 472

It's much better to cite section numbers or (better still) section names
-- page numbers change from one version of the standard to the next, but
the section names remain much closer to constant.
it says that you can construct a std::set
in linear time if the constructor gets a sorted sequence of elements.
But how is this possible when insert takes logarithmic time? Should the
time not be nlgn where n is the number of elements?

Doing it with random access iterators is (mostly) pretty simple. You
basically ignore the fact that you're working with a red-black tree (or
AVL tree) and instead just construct a perfectly balanced tree -- take
the median item in the input sequence, and make that your root. This
leaves two halves of the input, which you recursively insert as the left
and right sub-trees of the current node.

This means you never re-balance the tree during creation (because you're
creating it as close to perfectly balanced as possible given the number
of nodes), and each node you insert is always a direct descendent of the
current node, so you never have to do a Log(N) traversal to find where
to insert a node.

With Input Iterators, this gets a bit ugly -- doing this in linear time
requires figuring the distance (and median item) in constant time. You
could temporarily copy from the input sequence to a vector or deque, but
that would rarely be worthwhile. In theory I believe this should be a
win for a sufficiently large collection, but by the time it gets large
enough, the space for the temporary copy would probably be prohibitive.
 
V

V.R. Marinov

Doing it with random access iterators is (mostly) pretty simple. You
basically ignore the fact that you're working with a red-black tree (or
AVL tree) and instead just construct a perfectly balanced tree -- take
the median item in the input sequence, and make that your root. This
leaves two halves of the input, which you recursively insert as the left
and right sub-trees of the current node.


This is a very nice idea but unfortunately it imposes (small)
unnecessary
overhead when inserting unsorted ranges.

With Input Iterators, this gets a bit ugly -- doing this in linear time
requires figuring the distance (and median item) in constant time. You
could temporarily copy from the input sequence to a vector or deque, but
that would rarely be worthwhile. In theory I believe this should be a
win for a sufficiently large collection, but by the time it gets large
enough, the space for the temporary copy would probably be prohibitive.

This is a bit overcomplicated and I guess that the set implementation
is
able to avoid it by making very good use of the "hint version" of
insert().
According to the standard the complexity of this function is:

"logarithmic in general, but amortized constant if t is inserted right
after p."
 
J

Jerry Coffin

On Jun 11, 5:20 am, Jerry Coffin <[email protected]> wrote:

[ ... ]
This is a bit overcomplicated and I guess that the set implementation
is able to avoid it by making very good use of the "hint version" of
insert(). According to the standard the complexity of this function is:

"logarithmic in general, but amortized constant if t is inserted right
after p."

This has one minor problem: while it gives amortized linear complexity,
that doesn't (strictly speaking) meet the requirement for strictly
linear complexity.
 
Z

Zeppe

Rosarin said:
[cut]

Most of the standard libraries make use of red-black trees for
implementing set and map. In such case the insertion cannot be linear
in time, immaterial of input is sorted or not sorted. And the claim
that "you already know where to place the element" is not true across
calls, as the search always begins at root.

as you can see from the quoted message, the OP says that "you can
construct a std::set in linear time if the constructor gets a sorted
sequence of elements." Of course if you make different calls it's not
true any more. The reference constructor is that one accepting iterator
first, iterator last. In that case, the comparison is done on the fly
while inserting. I don't really know the red-black tree behaviour, but I
can expect (correct me if I'm wrong) that, being a balanced binary tree,
the insertion will imply a tree rebalancing which is O(1) (that is, it
does not depend on the number of nodes). So, if I already know where to
put the next value, I just have to do the balancing, that is O(1), and
the total complexity is O(n).
I wrote a test program (which I tested on Solaris running Sun C++ 5.8
compiler) to confirm that sorted elements' insertion doesn't take
linear time.

I am still curious to know if it is possible to construct a set in
linear time.

You have to build the set with the proper constructor. You will see a
nice linear increment in the performance. Use this test:

#include <set>
#include <vector>
#include <boost/date_time/posix_time/posix_time.hpp>

int main()
{
std::vector<long> v(100000);
for(std::size_t i = 0; i < 100000; ++i){
v = i;
}

for(std::size_t i = 0; i < 100; ++i){
std::vector<long>::const_iterator begin = v.begin();
std::vector<long>::const_iterator end = v.begin() + 1000 * i;

boost::posix_time::ptime startTime =
boost::posix_time::microsec_clock::local_time();
std::set<long> s(begin, end);
boost::posix_time::ptime endTime =
boost::posix_time::microsec_clock::local_time();

std::cout << endTime - startTime << std::endl;
}
return 0;
}

and plot the results.

Regards,

Zeppe
 
D

desktop

Jerry said:
On Jun 11, 5:20 am, Jerry Coffin <[email protected]> wrote:

[ ... ]
This is a bit overcomplicated and I guess that the set implementation
is able to avoid it by making very good use of the "hint version" of
insert(). According to the standard the complexity of this function is:

"logarithmic in general, but amortized constant if t is inserted right
after p."

This has one minor problem: while it gives amortized linear complexity,
that doesn't (strictly speaking) meet the requirement for strictly
linear complexity.

Is it not enough to argument that the re balancing only takes constant
time when the sequence are sorted?

Re balancing can potentially take O(lg n) time. But when the sequence is
sorted the while loop in re balancing will be executed at most one time
so you get O(n) * O(1) which is O(n).
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,581
Members
45,055
Latest member
SlimSparkKetoACVReview

Latest Threads

Top