Pruning a std::map efficiently?

J

Jorgen Grahn

Around here, March--April is apple tree pruning time. I have a slighly
different problem:

I have a std::map<K, T> where I want to move all T (or pair<K, T>)
which match some predicate Pred to some other container. The map may
have a size in the millions, and Pred often matches all, or 50%, or
10% or so of the elements.

I'd prefer to use a standard algorithm for this, rather than iterating
manually, dealing with iterator invalidation and so on. Avoiding
repeated tree rebalancing would be welcome, but I suspect it's not
possible.

What's a good std algorithm for this? I guess std::remove_if() doesn't
work on maps -- since you cannot reorder the elements.

Since T is a pointer type in my case, I have some other options. I can
build a new map with only the !Pred elements in it, swap it with the
original, and blow away the original. Not sure that I would gain
performance from that (even more rebalancing?) but at least the code
would be clean ...

/Jorgen
 
A

alasham.said

Hello,

This could be one implementation: copy_if is not part of the standard
library, but is easy to implement (see The C++ Programming Language)

template<typename In, typename Out, typename Pred>
Out copy_if(In first, In last, Out res, Pred Pr)
{
while (first != last)
{
if (Pr(*first))
*res++ = *first;
++first;
}
return res;
}

It is then possible to copy from map m1 to m2, applying predicate
Pred:

copy_if( m1.begin(), m1.end(), std::inserter( m2, m2.begin() ),
Pred );

Regards
 
J

Jorgen Grahn

Hello,

This could be one implementation: copy_if is not part of the standard
library, but is easy to implement (see The C++ Programming Language)

template<typename In, typename Out, typename Pred>
Out copy_if(In first, In last, Out res, Pred Pr) ....

It is then possible to copy from map m1 to m2, applying predicate
Pred:

copy_if( m1.begin(), m1.end(), std::inserter( m2, m2.begin() ),
Pred );

Thanks, but it's not really what I asked for. That was part of my
fallback solution, which I knew how to implement, but felt was
suboptimal.

Three things I noted while going down that road:

- std::copy_if() doesn't exist, but std::remove_copy_if() does,
and has the same semantics except the predicate must be negated.
I am a bit surprised Stroustrup didn't mention that.

- It doesn't help me, since back_inserter doesn't work on std::map. Or
is there another way to fill a std::map via std::remove_copy_if()?

- But the loop is trivial enough in this case, and I can
build my new tree more efficiently using the hinted
m.insert(m.end(), val) -- by a factor 5 in my experiments.

/Jorgen
 
A

alasham.said

Hello,

Although still not providing the optimal solution you are seeking, I
think I can comment on two points:
- It doesn't help me, since back_inserter doesn't work on std::map. Or
  is there another way to fill a std::map via std::remove_copy_if()?

std::inserter works with std::map - it uses the std::map::insert
member function

- But the loop is trivial enough in this case, and I can
  build my new tree more efficiently using the hinted
  m.insert(m.end(), val) -- by a factor 5 in my experiments.

You can similarly do:

remove_copy_if(
m1.begin(), m1.end(), std::inserter( m2, m2.end() ), Pred );


Also, it might be much more efficient (depending on the use you make
of the map object) to create an iterator adaptor that would allow you
to iterate over members of the container that meet a predicate, rather
than deleting elements, or copying to another container (see the
Boost.Iterator Library). Of course I would not be able to tell if that
is of use to your particular application.

Regards
 
J

James Kanze

Around here, March--April is apple tree pruning time. I have a
slighly different problem:
I have a std::map<K, T> where I want to move all T (or pair<K,
T>) which match some predicate Pred to some other container.
The map may have a size in the millions, and Pred often
matches all, or 50%, or 10% or so of the elements.
I'd prefer to use a standard algorithm for this, rather than
iterating manually, dealing with iterator invalidation and so
on. Avoiding repeated tree rebalancing would be welcome, but I
suspect it's not possible.
What's a good std algorithm for this? I guess std::remove_if()
doesn't work on maps -- since you cannot reorder the elements.
Correct.

Since T is a pointer type in my case, I have some other
options. I can build a new map with only the !Pred elements in
it, swap it with the original, and blow away the original. Not
sure that I would gain performance from that (even more
rebalancing?) but at least the code would be clean ...

If you're going to drop the original map, then there's no point
in removing the elements from it. Some variant on the
non-existant copy_if could be used. A more interesting solution
would be to write it yourself, using the hinting form of insert;
this should speed things up, since there would be no need for
std::map to find the correct insertion point for each insert.
If you had copy_if, I think using an insert_iterator would also
work like this:

std::map< K, T > newMap ;
copy_if( oldMap.begin(), oldMap.end(),
std::inserter( newMap, newMap.end() ),
predicate ) ;
 
J

Jorgen Grahn

If you're going to drop the original map, then there's no point
in removing the elements from it.

It's the other way around for me: since I am afraid to remove elements
while iterating, I'm thinking of using "copy, swap, kill the
original". When I mentioned "even more rebalancing" above, I was
thinking of rebalancing during insertion into the new tree, not during
deletion in the original.

Actually that's what I have settled for for now. I may revisit the
problem later, maybe with a pooled allocator, or replacing the
std::map with a hash-based container.

Thanks,
Jorgen
 
G

guy.tristram

I'd prefer to use a standard algorithm for this, rather than iterating
manually, dealing with iterator invalidation and so on.

Avoiding iterator invalidation isn't too difficult here:

while( it != map.end() )
{
if( remove_condition( it ) )
it = map.erase( it );
else
++it;
}

should do the trick. It might be more efficient if you are removing
large numbers of elements to identify ranges of unwanted elements and
remove them with erase( first, last ).

HTH - Guy
 
J

Jorgen Grahn

Also, it might be much more efficient (depending on the use you make
of the map object) to create an iterator adaptor that would allow you
to iterate over members of the container that meet a predicate, rather
than deleting elements, or copying to another container (see the
Boost.Iterator Library).

Good point. Keeping the "erased" elements in the map and manually
ignoring them, right?
Of course I would not be able to tell if that
is of use to your particular application.

Me neither, at this stage :)

/Jorgen
 
J

Jorgen Grahn

Avoiding iterator invalidation isn't too difficult here:

while( it != map.end() )
{
if( remove_condition( it ) )
it = map.erase( it );
else
++it;
}

should do the trick.

Hm, map.erase( it ) returns void on my system (Gnu). Do you have a
library extension, perhaps?

I have to admit that I didn't look into the iterator invalidation
rules at all myself.
It might be more efficient if you are removing
large numbers of elements to identify ranges of unwanted elements and
remove them with erase( first, last ).

Not likely to be useful in my case, but a good idea. (BTW, it seems
that the Gnu implementation turns that into a series of erase() at the
lowest level, but I guess another std::map implementation could be
more efficient).

/Jorgen
 
J

Jerry Coffin

[ ... ]
Hm, map.erase( it ) returns void on my system (Gnu). Do you have a
library extension, perhaps?

Yes, but it's pretty easy to get by without it as well:

if (remove_condition(it))
map.erase(it++);
else
++it;

In this case, the post-increment probably doesn't really cost much (if
anything) compared to a pre-increment. In particular, the compiler's
going to create a copy of the iterator to pass to the function anyway,
so the fact that a post-increment requires creating a copy of the
iterator doesn't really add any complexity or work.

Ther's also an implicit (but widespread) assumption that copying
iterators is cheap in any case -- just for one example, the standard
algorithms (almost?) all receive them by value, not reference.
 
M

Michael Mol

Yes, but it's pretty easy to get by without it as well:

        if (remove_condition(it))
                map.erase(it++);
        else
                ++it;

In this case, the post-increment probably doesn't really cost much (if
anything) compared to a pre-increment. In particular, the compiler's
going to create a copy of the iterator to pass to the function anyway,
so the fact that a post-increment requires creating a copy of the
iterator doesn't really add any complexity or work.

Is the iterator still valid to perform an increment on after you've
erased it? I'm fairly certain you can't dereference it, and I
presumed that that extended to increment and decrement.
 
K

Kai-Uwe Bux

Michael said:
Is the iterator still valid to perform an increment on after you've
erased it?
No.

I'm fairly certain you can't dereference it, and I
presumed that that extended to increment and decrement.

Yes.

Note: that does not happen in the code snippet from above. A sequence point
between the start of the erase() function and the evaluation of its
argument ensures that the side-effect of incrementing it has taken place
before the entry marked by the previous value of it is being erased.


Best

Kai-Uwe Bux
 
J

Jerry Coffin

Is the iterator still valid to perform an increment on after you've
erased it? I'm fairly certain you can't dereference it, and I
presumed that that extended to increment and decrement.

The iterator is no longer valid after the erasure, but that doesn't
matter -- we're not incrementing it after the erasure. By the current
(soon to be obsolete for C++) terminology, there is a sequence point
after the arguments to a function are evaluated, but before the function
itself executes. That means the increment has to take place _before_ the
erasure.
 
G

guy.tristram

Hm, map.erase( it ) returns void on my system (Gnu).  Do you have a
library extension, perhaps?

Oh, that's interesting: I'm not sure if it's an MS extension or if
other implementations and docs haven't caught up with a change. Their
docs seem to disagree with other docs on the net and I don't have (or
at least can't find) access to the latest standard. A bit annoying
from a portability standpoint if they have unilaterally made this
extension without explicitly flagging it as such.

[erasing range instead of single elements]
Not likely to be useful in my case, but a good idea. (BTW, it seems
that the Gnu implementation turns that into a series of erase() at the
lowest level, but I guess another std::map implementation could be
more efficient).

Yes - just peeked at MS's and it's the same there.

Guy
 
J

James Kanze

Hm, map.erase( it ) returns void on my system (Gnu). Do you
have a library extension, perhaps?

This was a rather obvious error in the standard (which has been
corrected in the next draft). Officially, map<>::erase should
return void, but better implementations have always had it
returning the iterator, and this will be required in the next
release of the standard.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,582
Members
45,067
Latest member
HunterTere

Latest Threads

Top