set, insert with hint

M

Mark P

Some time ago I posted here about inserting into a set with a hint:

http://groups-beta.google.com/group...+hint"+"Mark+P"&rnum=1&hl=en#018b8d0eadb38dbf

I quoted the SGI STL docs describing a.insert(p, t), where p is the hint
iterator and t is the inserted object:

"Insert with hint is logarithmic in general, but it is amortized
constant time if t is inserted immediately before p."

Several knowledgeable people asserted that this is contrary to the
standard which says that t should be inserted immediately after p.

I wrote a little program to test this out (see below). I then obtained
the following results, which suggest that the SGI docs are in fact
correct and that the insert should occur immediately before the hint.

Using gcc 3.2.2 with -O3:

start - no hint
finish: 14 seconds
start - insert after hint
finish: 14 seconds
start - insert before hint
finish: 3 seconds

Using Sun CC 7.1 ith -O5:

start - no hint
finish: 54 seconds
start - insert after hint
finish: 43 seconds
start - insert before hint
finish: 17 seconds

Was I just confused before and the standard does say that the hint
should point after the location of the insert? How else can these
results be explained?

Thanks,
Mark


TEST CODE BELOW:

#include <set>
#include <iostream>
#include <ctime>

int main() {

const int MaxVal = 30000000;
time_t startTime, endTime;
std::set<int> is;
std::set<int>::iterator isIter;

// no hint

std::cout << "start - no hint" << std::endl;
time(&startTime);

for (int i = 0; i < MaxVal; ++i)
is.insert(i);

time(&endTime);
std::cout << "finish: " << difftime(endTime,startTime) << " seconds"
<< std::endl;

// insert after hint

is.clear();
std::cout << "start - insert after hint" << std::endl;
time(&startTime);

isIter = (is.insert(0)).first;
for (int i = 1; i < MaxVal; ++i)
isIter = is.insert(isIter,i);

time(&endTime);
std::cout << "finish: " << difftime(endTime,startTime) << " seconds"
<< std::endl;

// insert before hint

is.clear();
std::cout << "start - insert before hint" << std::endl;
time(&startTime);

isIter = (is.insert(MaxVal)).first;
for (int i = MaxVal-1; i >= 0; --i)
isIter = is.insert(isIter,i);

time(&endTime);
std::cout << "finish: " << difftime(endTime,startTime) << " seconds"
<< std::endl;

}
 
P

Pete Becker

Mark said:
I quoted the SGI STL docs describing a.insert(p, t), where p is the hint
iterator and t is the inserted object:

"Insert with hint is logarithmic in general, but it is amortized
constant time if t is inserted immediately before p."

Several knowledgeable people asserted that this is contrary to the
standard which says that t should be inserted immediately after p.

I wrote a little program to test this out (see below). I then obtained
the following results, which suggest that the SGI docs are in fact
correct and that the insert should occur immediately before the hint.

You would do better to consult the language definition. The 2003
standard gives the complexity of insert with hint as: "logarithmic in
general, but amortized cosntant if t is inserted right after p."
How else can these results be explained?

Confusion.
 
M

Mark P

Pete said:
You would do better to consult the language definition. The 2003
standard gives the complexity of insert with hint as: "logarithmic in
general, but amortized cosntant if t is inserted right after p."



Confusion.

I'll concede that I may be confused, but why does my test code run
fastest when t is inserted right before p?
 
P

Pete Becker

Mark said:
I'll concede that I may be confused, but why does my test code run
fastest when t is inserted right before p?

I meant confusion on the part of implementors. It could be that the
libraries you're using don't conform to the standard.
 
M

Mark P

Pete said:
I meant confusion on the part of implementors. It could be that the
libraries you're using don't conform to the standard.

Ah, I see. I find this surprising (which is not to say doubtful)
because my Sun compiler uses (I think) a RogueWave STL implementation
and my gcc uses an SGI STL implementation. Then again, they may derive
from common ancestry.

If anyone has a chance to try my test code on other compilers, I'd be
interested to see the results.
 
L

Larry I Smith

Mark said:
Some time ago I posted here about inserting into a set with a hint:

http://groups-beta.google.com/group...+hint"+"Mark+P"&rnum=1&hl=en#018b8d0eadb38dbf


I quoted the SGI STL docs describing a.insert(p, t), where p is the hint
iterator and t is the inserted object:

"Insert with hint is logarithmic in general, but it is amortized
constant time if t is inserted immediately before p."

Several knowledgeable people asserted that this is contrary to the
standard which says that t should be inserted immediately after p.

I wrote a little program to test this out (see below). I then obtained
the following results, which suggest that the SGI docs are in fact
correct and that the insert should occur immediately before the hint.

Using gcc 3.2.2 with -O3:

start - no hint
finish: 14 seconds
start - insert after hint
finish: 14 seconds
start - insert before hint
finish: 3 seconds

Using Sun CC 7.1 ith -O5:

start - no hint
finish: 54 seconds
start - insert after hint
finish: 43 seconds
start - insert before hint
finish: 17 seconds

Was I just confused before and the standard does say that the hint
should point after the location of the insert? How else can these
results be explained?

Thanks,
Mark


TEST CODE BELOW:

#include <set>
#include <iostream>
#include <ctime>

int main() {

const int MaxVal = 30000000;
time_t startTime, endTime;
std::set<int> is;
std::set<int>::iterator isIter;

// no hint

std::cout << "start - no hint" << std::endl;
time(&startTime);

for (int i = 0; i < MaxVal; ++i)
is.insert(i);

time(&endTime);
std::cout << "finish: " << difftime(endTime,startTime) << " seconds"
<< std::endl;

// insert after hint

is.clear();
std::cout << "start - insert after hint" << std::endl;
time(&startTime);

isIter = (is.insert(0)).first;
for (int i = 1; i < MaxVal; ++i)
isIter = is.insert(isIter,i);

time(&endTime);
std::cout << "finish: " << difftime(endTime,startTime) << " seconds"
<< std::endl;

// insert before hint

is.clear();
std::cout << "start - insert before hint" << std::endl;
time(&startTime);

isIter = (is.insert(MaxVal)).first;
for (int i = MaxVal-1; i >= 0; --i)
isIter = is.insert(isIter,i);

time(&endTime);
std::cout << "finish: " << difftime(endTime,startTime) << " seconds"
<< std::endl;

}

Many 'set' implementations are backed by a red-black tree.
The performance of these trees degrades quickly when inserting
pre-sorted data (as you are doing here) due to all of the
tree re-balancing that must occur.

Also, many implementations have special handling for the
'hint' boundary conditions ('end()' and 'begin()').
In your last example (where you inserted data in
high-to-low order) the iterator returned by the 'insert()'
call is actually 'begin()'. Since 'insert()' has special
handling for this 'hint' the performance of the NEXT 'insert()'
call is high.

If you pass an iterator that is neither 'begin()' nor 'end()',
when 'insert()' increments that iterator and it does not
point to a valid item, insert() starts its search with
'begin()' (i.e. your 'hint' iterator is ignored).

If you are inserting data in low-to-high order this
approach will speed things up greatly:

isIter = (is.insert(0)).first;
for (int i = 1; i < MaxVal; ++i)
is.insert(is.end(), i);

Regards,
Larry
 
M

Mark P

Larry said:
Many 'set' implementations are backed by a red-black tree.
The performance of these trees degrades quickly when inserting
pre-sorted data (as you are doing here) due to all of the
tree re-balancing that must occur.

An excellent point which I overlooked (based on the frequent use of "Rb"
in the internal names of the STL souce, I strongly suspect it is a
red-black tree underlying the set)
Also, many implementations have special handling for the
'hint' boundary conditions ('end()' and 'begin()').
In your last example (where you inserted data in
high-to-low order) the iterator returned by the 'insert()'
call is actually 'begin()'. Since 'insert()' has special
handling for this 'hint' the performance of the NEXT 'insert()'
call is high.

If you pass an iterator that is neither 'begin()' nor 'end()',
when 'insert()' increments that iterator and it does not
point to a valid item, insert() starts its search with
'begin()' (i.e. your 'hint' iterator is ignored).

If you are inserting data in low-to-high order this
approach will speed things up greatly:

isIter = (is.insert(0)).first;
for (int i = 1; i < MaxVal; ++i)
is.insert(is.end(), i);

Thanks for the good advice. I'll give this a try.

-Mark
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,578
Members
45,052
Latest member
LucyCarper

Latest Threads

Top