std::set versus tr1::unordered_set

J

Joe Gottman

What are the advantages and disadvantages of the tree-based std::set versus
the hash-based tr1::unordered_set?

set advantages:
1) iterators remain valid after insert and erase (except for iterators
to the erased element).
2) Sets can be compared using operator== or operator<. The STL set
functions, (set_union, set_intersection, etc), easily work on sets.
3) Can do searches based on ordering (for instance, given a set of
strings, it is easy to find all the elements that start with 'e').

set disadvantages:
1) Searches, insertions, and deletions are O(log n).
2) Depends on operator< or another ordering predicate.


unordered_set advantages:
1) Searches, insertions, and deletions are amortized O(1).
2) Iterators remain valid after erase (except for iterators to the erased
element).


unordered_set disadvantages:
1) Depends on a GOOD hash function, as well as operator== or another
equivalence predicate.
2) Iterators are invalidated by insertions (this can be avoided by
calling rehash() before loading the table.)
3) It is more difficult to compare unordered_sets or to do union,
intersection, etc.

One thing I'm not sure about is the space issue. Sets require three
pointers and at least a bool for balance information for every element.
Unordered_sets require at least O(bucket_count) space for the buckets, but I
don't know what the space overhead is for the buckets or how much extra
space is required for each element of a bucket.

Joe Gottman
 
H

Howard Hinnant

Joe Gottman said:
What are the advantages and disadvantages of the tree-based std::set versus
the hash-based tr1::unordered_set?
2) Sets can be compared using operator== or operator<. The STL set
functions, (set_union, set_intersection, etc), easily work on sets. --
3) It is more difficult to compare unordered_sets or to do union,
intersection, etc.

I unsuccessfully tried to get an operator== into tr1::unordered_*.
Assuming a good hash function such an operation can have the intended
semantics and O(N) complexity.

// simplified pseudo code!
bool operator==(const C& x, const C& y)
{
if (x.size() != y.size()) // assumes O(1) size()
return false;
for each element in x, look it up in y // if look up is O(1)
if it doesn't exist in y return false // then loop is O(N)
return true;
}

The final decision was to let implementors provide it as an extension,
but not to require it.
One thing I'm not sure about is the space issue. Sets require three
pointers and at least a bool for balance information for every element.

Alignment requirements often bump that up to 4 words. The Metrowerks
implementation stuffs the bool into a low order bit of one of the
pointers for platforms where alignment > 1. That keeps the overhead
down to 3 words/element.
Unordered_sets require at least O(bucket_count) space for the buckets, but I
don't know what the space overhead is for the buckets or how much extra
space is required for each element of a bucket.

This is going to vary with implementation. The Metrowerks hash ... um
unordered containers are based on the classic structure. Assuming a
good hash and loading of 1, you've got a one bucket per element.
There's a word for the bucket and a an additional word for the element
in a bucket in case of a collision (2 words overhead / element). This
design restricts the interface to forward iterators. If you go with
bidirectional iterators, you up the space requirements to 3 words
overhead / element.

As load factor rises above 1, elements start to share the 1 word/bucket
overhead and thus the overhead/element drops. However this is rarely a
good deal for the client (sacrificing performance). And as the load
factor drops below 1, you start to collect empty buckets which still
have an overhead and thus your per element overhead rises (much like a
vector with a small size and large capacity). Again, this is usually
not a good deal for the client.

-Howard
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,055
Latest member
SlimSparkKetoACVReview

Latest Threads

Top