ordered unordered_map

Ralf Goertz · Feb 2, 2010

Hi,

I recently found out that using unordered_map<unsigned, Foo> instead of
map<unsigned, Foo> speeds up my program substantially (it contains
several hundreds of thousands elements). However, I really would like to
be able to iterate over those elements in an ordered way at the end of
the program. Is there a way to do that? If m would be that unordered map
what about m.rehash(n) where n=m.size()? According to Pete Pecker's book
that would "resize the container so that it has at least n bucketsâ€¦".
And aren't the keys for integral types hashed by putting them into
residue classes mod bucket size? If so, m would be ordered, right?

Ralf

Ralf Goertz · Feb 2, 2010

Pete said:
No, that almost certainly wouldn't work. The problem is the hash values:
if two elements hash to the same value they end up in the same bucket,
regardless of how many buckets there are. And even if they hash to
different values, they can end up in the same bucket, because the hash
container has to adjust hash values to align with the number of buckets
(typically by taking the remainder modulo the number of buckets). You
might be able to do this with a custom hash function that returned a
value equal to the desired bucket, but that seems abusive.

So if it would be guaranteed that m contains exactly n keys in the range
[0,n) then rehashing could do the trick? Of course, in that case I could
use a vector in the first place.

If you're willing to reorganize the container, how about going a small
step further and replacing it at that point with an ordered container?

It seems I will have to do that then, thanks.

tonydee · Feb 3, 2010

It seems I will have to do that then, thanks.

It's a good, simple solution in most cases. If the objects are large
and keys small and your RAM limited, and/or that final step is
performance critical too, consider
- keeping the data in an ordered map, but adding an unordered (hash)
map indexing from key to ordered-map iterator
- keep the data in a vector, and having unordered and ordered maps
from key to index
- looking at boost's multi-indexed container library, which can manage
the synchronisation of data and indices for you

Cheers,
Tony

Ralf Goertz · Feb 3, 2010

tonydee said:
It's a good, simple solution in most cases. If the objects are large
and keys small and your RAM limited, and/or that final step is
performance critical too, consider
- keeping the data in an ordered map, but adding an unordered (hash)
map indexing from key to ordered-map iterator
- keep the data in a vector, and having unordered and ordered maps
from key to index
- looking at boost's multi-indexed container library, which can manage
the synchronisation of data and indices for you

These are worth trying, especially since I still need to find() in the
map. The reason why I want an ordered iterator is that I'd like to
compare all elements with each other and erase duplicate (type Foo)
values keeping the one with the lower (type unsigned) key (it is more
complicated than that but you get the point). So I use two nested loops:

typedef unordered_map<unsigned, Foo> Map;
typedef unordered_set<unsigned> Set;
Map m;
Map::iterator i1,i2;
Set s;
Set::iterator i3;

void find_elements_in_m_that_need_to_be_compared(unsigned to_this_element,Set&);

bool foo_equal(const Foo &,const Foo&);

for (i1=m.begin();i1!=m.end();++i1) {
find_elements_in_m_that_need_to_be_compared(i1->first,s);
for (i3=s.begin();i3!=s.end();++i3) {
i2=m.find(*i3);
if (foo_equal(i1->second,i2->second)) {
m.erase(i2); // (*)
}
}
}

Since m is not ordered, I cannot be sure that i1->first is less than
i2->first. Therefore I might erase the object with the lower key at (*).
OTOH unordered_map has an erase(iterator, iterator) member function.
This function seems to be of little use since the key 42 might be in the
range [8,15).

tonydee · Feb 4, 2010

It's a good, simple solution in most cases. If the objects are large
and keys small and your RAM limited, and/or that final step is
performance critical too, consider
- keeping the data in an ordered map, but adding an unordered (hash)
map indexing from key to ordered-map iterator
- keep the data in a vector, and having unordered and ordered maps
from key to index
- looking at boost's multi-indexed container library, which can manage
the synchronisation of data and indices for you

Click to expand...

These are worth trying, especially since I still need to find() in the
map. The reason why I want an ordered iterator is that I'd like to
compare all elements with each other and erase duplicate (type Foo)
values keeping the one with the lower (type unsigned) key (it is more
complicated than that but you get the point). So I use two nested loops:

typedef unordered_map<unsigned, Foo> Map;
typedef unordered_set<unsigned> Set;
Map m;
Map::iterator i1,i2;
Set s;
Set::iterator i3;

void find_elements_in_m_that_need_to_be_compared(unsigned to_this_element,Set&);

bool foo_equal(const Foo &,const Foo&);

for (i1=m.begin();i1!=m.end();++i1) {
find_elements_in_m_that_need_to_be_compared(i1->first,s);
for (i3=s.begin();i3!=s.end();++i3) {
i2=m.find(*i3);
if (foo_equal(i1->second,i2->second)) {
m.erase(i2); // (*)
}
}

}

Since m is not ordered, I cannot be sure that i1->first is less than
i2->first. Therefore I might erase the object with the lower key at (*).
OTOH unordered_map has an erase(iterator, iterator) member function.
This function seems to be of little use since the key 42 might be in the
range [8,15).

I'm not sure how often you need to "clean up" these duplicates, but if
Foo has (or can easily be given) a operator<, you may find it better
to have an (ordered) map (something like "std::map<Foo*, unsigned,
Compare_Via_Ptr<Foo*> >" with template <typename T> struct
Compare_Via_Ptr { bool operator()(const Foo* lhs, const Foo* rhs)
const { return *lhs < *rhs; } }; )...

That way you can do a quick lookup to see if an "incoming" Foo is
already stored, preventing the storage of duplicates.

Cheers,
Tony

Ralf Goertz · Feb 4, 2010

tonydee said:
I'm not sure how often you need to "clean up" these duplicates, but if
Foo has (or can easily be given) a operator<, you may find it better
to have an (ordered) map (something like "std::map<Foo*, unsigned,
Compare_Via_Ptr<Foo*> >" with template <typename T> struct
Compare_Via_Ptr { bool operator()(const Foo* lhs, const Foo* rhs)
const { return *lhs < *rhs; } }; )...

That way you can do a quick lookup to see if an "incoming" Foo is
already stored, preventing the storage of duplicates.

That is impossible. Only after thorough investigation of the whole set I
know that two given objects of type Foo are the same. And I not only
erase the duplicates but I also transfer information that is stored in
the one to be erased.

I have a solution for the problem (breaking the inner loop) but I
consider it being error prone and inelegant. That's why I hoped to find
a smarter one. Using one more level of indirection as you suggested
might be the way to go. Thanks for your input.

ordered unordered_map	0	Feb 2, 2010
How can I release the free memory held by boost::unordered_map?	4	May 31, 2012
How should unordered_map be used correctly ... ?	16	Dec 19, 2010
TF-IDF	1	Aug 19, 2021
Weird things when calculating the size for a mergesort-buffer	0	Apr 1, 2013
Design Question	13	Feb 2, 2009
Implementing the KISS4691 RNG	22	Sep 5, 2010
On alignment (final committee draft for C++0x and n1425 for C1X)	2	Aug 21, 2010

ordered unordered_map

Ralf Goertz

Ralf Goertz

tonydee

Ralf Goertz

tonydee

Ralf Goertz

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads