ordered unordered_map

R

Ralf Goertz

Hi,

I recently found out that using unordered_map<unsigned, Foo> instead of
map<unsigned, Foo> speeds up my program substantially (it contains
several hundreds of thousands elements). However, I really would like to
be able to iterate over those elements in an ordered way at the end of
the program. Is there a way to do that? If m would be that unordered map
what about m.rehash(n) where n=m.size()? According to Pete Pecker's book
that would "resize the container so that it has at least n buckets…".
And aren't the keys for integral types hashed by putting them into
residue classes mod bucket size? If so, m would be ordered, right?

Ralf
 
R

Ralf Goertz

Pete said:
No, that almost certainly wouldn't work. The problem is the hash values:
if two elements hash to the same value they end up in the same bucket,
regardless of how many buckets there are. And even if they hash to
different values, they can end up in the same bucket, because the hash
container has to adjust hash values to align with the number of buckets
(typically by taking the remainder modulo the number of buckets). You
might be able to do this with a custom hash function that returned a
value equal to the desired bucket, but that seems abusive.

So if it would be guaranteed that m contains exactly n keys in the range
[0,n) then rehashing could do the trick? Of course, in that case I could
use a vector in the first place.
If you're willing to reorganize the container, how about going a small
step further and replacing it at that point with an ordered container?

It seems I will have to do that then, thanks.
 
T

tonydee

It seems I will have to do that then, thanks.

It's a good, simple solution in most cases. If the objects are large
and keys small and your RAM limited, and/or that final step is
performance critical too, consider
- keeping the data in an ordered map, but adding an unordered (hash)
map indexing from key to ordered-map iterator
- keep the data in a vector, and having unordered and ordered maps
from key to index
- looking at boost's multi-indexed container library, which can manage
the synchronisation of data and indices for you

Cheers,
Tony
 
R

Ralf Goertz

tonydee said:
It's a good, simple solution in most cases. If the objects are large
and keys small and your RAM limited, and/or that final step is
performance critical too, consider
- keeping the data in an ordered map, but adding an unordered (hash)
map indexing from key to ordered-map iterator
- keep the data in a vector, and having unordered and ordered maps
from key to index
- looking at boost's multi-indexed container library, which can manage
the synchronisation of data and indices for you

These are worth trying, especially since I still need to find() in the
map. The reason why I want an ordered iterator is that I'd like to
compare all elements with each other and erase duplicate (type Foo)
values keeping the one with the lower (type unsigned) key (it is more
complicated than that but you get the point). So I use two nested loops:

typedef unordered_map<unsigned, Foo> Map;
typedef unordered_set<unsigned> Set;
Map m;
Map::iterator i1,i2;
Set s;
Set::iterator i3;

void find_elements_in_m_that_need_to_be_compared(unsigned to_this_element,Set&);

bool foo_equal(const Foo &,const Foo&);

for (i1=m.begin();i1!=m.end();++i1) {
find_elements_in_m_that_need_to_be_compared(i1->first,s);
for (i3=s.begin();i3!=s.end();++i3) {
i2=m.find(*i3);
if (foo_equal(i1->second,i2->second)) {
m.erase(i2); // (*)
}
}
}

Since m is not ordered, I cannot be sure that i1->first is less than
i2->first. Therefore I might erase the object with the lower key at (*).
OTOH unordered_map has an erase(iterator, iterator) member function.
This function seems to be of little use since the key 42 might be in the
range [8,15).
 
T

tonydee

It's a good, simple solution in most cases.  If the objects are large
and keys small and your RAM limited, and/or that final step is
performance critical too, consider
- keeping the data in an ordered map, but adding an unordered (hash)
map indexing from key to ordered-map iterator
- keep the data in a vector, and having unordered and ordered maps
from key to index
- looking at boost's multi-indexed container library, which can manage
the synchronisation of data and indices for you

These are worth trying, especially since I still need to find() in the
map. The reason why I want an ordered iterator is that I'd like to
compare all elements with each other and erase duplicate (type Foo)
values keeping the one with the lower (type unsigned) key (it is more
complicated than that but you get the point). So I use two nested loops:

typedef unordered_map<unsigned, Foo> Map;
typedef unordered_set<unsigned> Set;
Map m;
Map::iterator i1,i2;
Set s;
Set::iterator i3;

void find_elements_in_m_that_need_to_be_compared(unsigned to_this_element,Set&);

bool foo_equal(const Foo &,const Foo&);

for (i1=m.begin();i1!=m.end();++i1) {
        find_elements_in_m_that_need_to_be_compared(i1->first,s);
        for (i3=s.begin();i3!=s.end();++i3) {
                i2=m.find(*i3);
                if (foo_equal(i1->second,i2->second)) {
                        m.erase(i2); // (*)
                }
        }

}

Since m is not ordered, I cannot be sure that i1->first is less than
i2->first. Therefore I might erase the object with the lower key at (*).
OTOH unordered_map has an erase(iterator, iterator) member function.
This function seems to be of little use since the key 42 might be in the
range [8,15).

I'm not sure how often you need to "clean up" these duplicates, but if
Foo has (or can easily be given) a operator<, you may find it better
to have an (ordered) map (something like "std::map<Foo*, unsigned,
Compare_Via_Ptr<Foo*> >" with template <typename T> struct
Compare_Via_Ptr { bool operator()(const Foo* lhs, const Foo* rhs)
const { return *lhs < *rhs; } }; )...

That way you can do a quick lookup to see if an "incoming" Foo is
already stored, preventing the storage of duplicates.

Cheers,
Tony
 
R

Ralf Goertz

tonydee said:
I'm not sure how often you need to "clean up" these duplicates, but if
Foo has (or can easily be given) a operator<, you may find it better
to have an (ordered) map (something like "std::map<Foo*, unsigned,
Compare_Via_Ptr<Foo*> >" with template <typename T> struct
Compare_Via_Ptr { bool operator()(const Foo* lhs, const Foo* rhs)
const { return *lhs < *rhs; } }; )...

That way you can do a quick lookup to see if an "incoming" Foo is
already stored, preventing the storage of duplicates.

That is impossible. Only after thorough investigation of the whole set I
know that two given objects of type Foo are the same. And I not only
erase the duplicates but I also transfer information that is stored in
the one to be erased.

I have a solution for the problem (breaking the inner loop) but I
consider it being error prone and inelegant. That's why I hoped to find
a smarter one. Using one more level of indirection as you suggested
might be the way to go. Thanks for your input.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,764
Messages
2,569,567
Members
45,041
Latest member
RomeoFarnh

Latest Threads

Top