intersection of two std::maps

J

jacek.dziedzic

Hi!

What is the canonical way of finding an intersection of two
std::maps?
i.e. I have

std::map<whatever,size_t> map1;
std::map<whatever,size_t> map2;

.... and I need an std::vector containing all the values that occur
simultaneously in both of the maps, nevermind the keys. In each
of the maps the values do not repeat.

One idea is to copy all values from map1 into a vector, and then
for each value from map2 check if it's present in the vector already,
if not -- then push it into the vector. This, of course, has ugly
complexity, so perhaps a set would do better. Is there a better
way?

I was also thinking of putting the values into a multiset and then
taking out all values with a count of 2, but there must be a simpler
way.

TIA,
- J.
 
M

Mark P

Hi!

What is the canonical way of finding an intersection of two
std::maps?

I'm not sure there is one. It's a pretty strange request since,
ignoring the keys, the map is not a very useful value container.
Pushing all of the values into a set is one possibility. I don't see
any advantage to your multiset proposal. I think what I would do is:

1. Copy all values from each map to a separate vector.
2. Sort each vector.
3. Use std::set_intersection to obtain the intersection.

If copying values is too costly you can also use vectors of pointers to
the original values in the map, but then you'll need to write your own
comparison functions for steps 2 and 3, and do some additional copying
at the end.

-Mark
 
J

jacek.dziedzic

Mark said:
I'm not sure there is one. It's a pretty strange request since,
ignoring the keys, the map is not a very useful value container.

The keys were needed earlier to make the values unique,
in each of the map. Or, in other words, I had pairs like this
key1, value_a
key2, value_b
key1, value_c
key3, value_d
key1, value_e

and by using a map I could eliminate 'value_a' and 'value_c'
and retain only 'value_e', since they were all stored at the
same entry of the map.
Pushing
all of the values into a set is one possibility. I don't see any
advantage to your multiset proposal. I think what I would do is:

1. Copy all values from each map to a separate vector.
2. Sort each vector.
3. Use std::set_intersection to obtain the intersection.

I understand, except for the vector part. Why not copy
each map to a separate set and then do set_intersection?
If copying values is too costly you can also use vectors of pointers to
the original values in the map, but then you'll need to write your own
comparison functions for steps 2 and 3, and do some additional copying
at the end.

Nope, the elements are lightweight (size_t's), but there
are a lot of them. Typically there would be 1M elements in
each of the maps and the maps would be almost identical,
with differences in the order of several elements.

thanks, I guess I'll stick with your proposal,
- J.
 
M

Mark P

I understand, except for the vector part. Why not copy
each map to a separate set and then do set_intersection?

You could do that as well. It's fewer lines of code, perhaps, but
conventional wisdom is that unless you need to sort "online", it's
generally more efficient to collect all values and perform a single sort
operation at the end (as I suggest) rather than maintaining a sorted
structure as items are individually added (the set approach). See Scott
Meyers "Effective STL" and do keep in mind the usual caveats about
premature optimization, but I would expect better performance with the
vector.
 
J

jacek.dziedzic

You could do that as well. It's fewer lines of code, perhaps, but
conventional wisdom is that unless you need to sort "online", it's
generally more efficient to collect all values and perform a single sort
operation at the end (as I suggest) rather than maintaining a sorted
structure as items are individually added (the set approach). See Scott
Meyers "Effective STL" and do keep in mind the usual caveats about
premature optimization, but I would expect better performance with the
vector.

I see, thank you.
- J.
 
J

Jim Langston

Mark P said:
You could do that as well. It's fewer lines of code, perhaps, but
conventional wisdom is that unless you need to sort "online", it's
generally more efficient to collect all values and perform a single sort
operation at the end (as I suggest) rather than maintaining a sorted
structure as items are individually added (the set approach). See Scott
Meyers "Effective STL" and do keep in mind the usual caveats about
premature optimization, but I would expect better performance with the
vector.

Also, your values are not guaranteed to be unique in the map since they are
not keyed. Copying to std::vector or std::set probably depends on what you
want to do with duplicate values in the same map. Do you want to enforce
uniqueness on each the values in each copied set/vector? If so, use set.
If not, use vector.

Vector should be faster however for copying the data into as it does not
have to build the index.
 
J

jacek.dziedzic

Also, your values are not guaranteed to be unique in the map since they are
not keyed.

Note however that in the OP I wrote:

JD> In each of the maps the values do not repeat.

I know for sure that in each of the maps there are no duplicate
values.
Copying to std::vector or std::set probably depends on what you
want to do with duplicate values in the same map. Do you want to enforce
uniqueness on each the values in each copied set/vector? If so, use set.
If not, use vector.

Vector should be faster however for copying the data into as it does not
have to build the index.

I guess I'll stick with vector.

thanks,
- J.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,774
Messages
2,569,599
Members
45,175
Latest member
Vinay Kumar_ Nevatia
Top