Clean ways to count remove_if() removals?

J

Jorgen Grahn

I'm looking at std::remove_if(Iter, Iter, Pred) and
std::list::remove_if(Pred). I'd like to use it to remove elements from
a std::list *and* find out how many elements were removed.

Is there a clean way?

At first I wrote a predicate class which counted how many times its
operator() returned true -- but then I realized that predicates are
passed by value, so if I did

MyPredicate p;
my_list.remove_if(p);
cout << "removed " << p.size << " elements\n";

'p' was copied, and never used again.
I can pass a pointer or reference to a counter:

unsigned size = 0;
MyPredicate p(&size);
...

but that doesn't seem very elegant. Have I missed something? Is there
a standard idiom for dealing with this?

/Jorgen
 
V

Victor Bazarov

Jorgen said:
I'm looking at std::remove_if(Iter, Iter, Pred) and
std::list::remove_if(Pred). I'd like to use it to remove elements from
a std::list *and* find out how many elements were removed.

Is there a clean way?

At first I wrote a predicate class which counted how many times its
operator() returned true -- but then I realized that predicates are
passed by value, so if I did

MyPredicate p;
my_list.remove_if(p);
cout << "removed " << p.size << " elements\n";

'p' was copied, and never used again.
I can pass a pointer or reference to a counter:

unsigned size = 0;
MyPredicate p(&size);
...

but that doesn't seem very elegant. Have I missed something? Is there
a standard idiom for dealing with this?

How about

auto szBefore = my_list.size();
remove_if ...
auto szAfter = my_list.size();
cout << "removed " << szBefore - szAfter << " elements\n";

V
 
M

Marcel Müller

Victor said:
How about

auto szBefore = my_list.size();
remove_if ...
auto szAfter = my_list.size();

If list is what it looks like, .size() is O(n). This might be not intended.


Marcel
 
M

Marcel Müller

Jorgen said:
I can pass a pointer or reference to a counter:

unsigned size = 0;
MyPredicate p(&size);
...

but that doesn't seem very elegant. Have I missed something? Is there
a standard idiom for dealing with this?

I would say the way you showed above. This is for instance exaclty the
way that C# effectively uses when local variables are captured by lambda
expressions.


Marcel
 
V

Victor Bazarov

Marcel said:
If list is what it looks like, .size() is O(n). This might be not intended.

Counting how many times the predicate is called is also O(n). And the
list *could* cache the size somewhere (if it's smart, that is).

BTW, 'remove_if' doesn't change the size. That's my fault. 'erase'
does. So, to be semantically correct, one needs to call 'std::distance'
for the two iterators:

auto itNewEnd = remove_if...
auto removed = std::distance(itNewEnd, my_list.end());

IIRC, 'remove_if' only places the elements to be removed to the end of
the container.

V
 
J

Jorgen Grahn

That's right -- I was about to point that out in my first posting. It
feels stupid to walk through the list element-by-element three times.

Still, that is what I do as a workaround. And in this particular case
it doesn't really matter much for speed -- my lists have a mean size
of 2 and a max size of 11 ...
Counting how many times the predicate is called is also O(n). And the
list *could* cache the size somewhere (if it's smart, that is).
BTW, 'remove_if' doesn't change the size. That's my fault. 'erase'
does. So, to be semantically correct, one needs to call 'std::distance'
for the two iterators:

auto itNewEnd = remove_if...
auto removed = std::distance(itNewEnd, my_list.end());

IIRC, 'remove_if' only places the elements to be removed to the end of
the container.

You are right about std::remove_if, but there is also a
list::remove_if which *does* erase the elements.

/Jorgen
 
J

James Kanze

Counting how many times the predicate is called is also O(n).
And the list *could* cache the size somewhere (if it's smart,
that is).

BTW, 'remove_if' doesn't change the size.

std::list<>::remove_if does (and typically, should be a lot
faster than std::remove_if followed by erase on a list).
 
J

Jorgen Grahn

I would say the way you showed above. This is for instance exaclty the
way that C# effectively uses when local variables are captured by lambda
expressions.

That still doesn't make it more attractive to me ... or does it?
My problem with the code below is that I have an object 'p' with a
worthless state.

But it looks a bit nicer formulated like this:

unsigned size = 0;
baz = std::remove_if(a, b, MyPredicate(&size));

I guess I could live with that.

/Jorgen
 
S

SG

I'm looking at std::remove_if(Iter, Iter, Pred) and
std::list::remove_if(Pred). I'd like to use it to remove
elements from a std::list *and* find out how many elements were
removed.

Is there a clean way?

At first I wrote a predicate class which counted how many times its
operator() returned true -- but then I realized that predicates are
passed by value, so if I did

  MyPredicate p;
  my_list.remove_if(p);
  cout << "removed " << p.size << " elements\n";

'p' was copied, and never used again.
I can pass a pointer or reference to a counter:

  unsigned size = 0;
  MyPredicate p(&size);
  ...

but that doesn't seem very elegant. Have I missed something? Is there
a standard idiom for dealing with this?

If remove_if returned a copy of the predicate you could have written

p = list.remove_if(p);

That's how it works fo std::for_each. Since list::remove_if returns
void I think you don't have much of a choice. In C++0x you will be
able to write

list.remove_if(ref(p));

reference_wrapper<T> has an overloaded function call operator which
perfectly forwards parameters. I don't know whether Boost.Ref has any
magic in it to do something similar.

Cheers!
SG
 
M

Michael Doubez

That's right -- I was about to point that out in my first posting. It
feels stupid to walk through the list element-by-element three times.

Still, that is what I do as a workaround. And in this particular case
it doesn't really matter much for speed -- my lists have a mean size
of 2 and a max size of 11 ...

You could also use std::count_if() with your predicate to count the
elements that will be erased. That would make it only 2 walkthrough.
You are right about std::remove_if, but there is also a
list::remove_if which *does* erase the elements.

You could also roll your own loop (in a function):

template<class T>
size_t list_erase_if(std::list<T>& my_list, MyPredicate p)
{
typedef std::list<T>::iterator iterator;
size_t counter = 0;

const iterator begin = my_list.begin();
const iterator end = my_list.end();
for( iterator it = std::find_if(begin,end,p) ;
it != end ;
it = std::find(it,end,p)
)
{
it = my_list.erase( it );
++counter;
}

return counter;
}
 
M

Michael Doubez

If remove_if returned a copy of the predicate you could have written

  p = list.remove_if(p);

Which is not a big help since AFAIK there is no guarantee that the
same predicate instance is called on all elements or even that it is
called only once.

In addition to that, there is not much sense in returning a predicate,
it is not like it is supposed to hold a value.
That's how it works fo std::for_each.

But for_each() takes a functor, not a predicate and provides the
guarantee that it returns the function object after it has been
applied to each element (and local state are allowed).
Since list::remove_if returns
void I think you don't have much of a choice. In C++0x you will be
able to write

  list.remove_if(ref(p));

reference_wrapper<T> has an overloaded function call operator which
perfectly forwards parameters. I don't know whether Boost.Ref has any
magic in it to do something similar.

Which doesn't help either.
 
M

Maxim Yegorushkin

James said:
The standard requires list<>::size() to have constant time,
which means that it must be cached.

Well, here is stlport4 std::list<>::size() (bundled with Sun Studio 12):

size_type size() const {
size_type __result = distance(begin(), end());
return __result;
}
 
S

SG

Which is not a big help since AFAIK there is no guarantee that the
same predicate instance is called on all elements or even that it is
called only once.

In addition to that, there is not much sense in returning a predicate,
it is not like it is supposed to hold a value.
Right.


But for_each() takes a functor, not a predicate and provides the
guarantee that it returns the function object after it has been
applied to each element (and local state are allowed).

Yeah, I know. If you check out the requirements of the concept
Predicate you'll note only two restrictions: A predicate returns
something that is implicitly convertible to bool and it is not allowed
to modify its arguments. You still can have "state" (i.e. pointer).

I just forgot about lambdas:

int count = 0;
list.remove_if([&count](my_type const& x)->bool{...});
Which doesn't help either.

Please elaborate. I don't have the C++03 standard documents available
but the C++0x draft explicitly mentions

Complexity: Exactly distance(begin(), end()) applications of the
corresponding predicate.

which seems enough a guarantee to me to make the ref(p) or lambda
version work. It doesn't even matter if the function call operator is
applied on the /same/ predicate object.


Cheers!
SG
 
M

Michael Doubez

I just forgot about lambdas:

  int count = 0;
  list.remove_if([&count](my_type const& x)->bool{...});
Which doesn't help either.

Please elaborate. I don't have the C++03 standard documents available
but the C++0x draft explicitly mentions

   Complexity: Exactly distance(begin(), end()) applications of the
               corresponding predicate.

which seems enough a guarantee to me to make the ref(p) or lambda
version work. It doesn't even matter if the function call operator is
applied on the /same/ predicate object.

So it seems.

And C++03 has the same guarantee 23.2.2.4/18:
Complexity: Exactly size() applications of the corresponding predicate
 
J

joseph cook

The standard requires list<>::size() to have constant time,
which means that it must be cached.

So it does. The documentation for SGI STL states that it can be O(N),
and warns that one should always use "empty()" instead of "size()
==0". I wonder why that is...

Joe
 
M

Michael Doubez

The standard says it "should have constant complexity". That's a
recommendation, not a requirement. One issue here is the requirement
that list::splice have constant complexity, which can't be done if size
is also constant complexity.

The size can be memorized, invalidated upon splice and recomputed at
first call of size().

That would make it amortized constant time at worse (at the cost of a
comparison in size() which is small enough).
 
J

Jerry Coffin

[ ... ]
The standard requires list<>::size() to have constant time,
which means that it must be cached.

Where is that requirement? The only thing I know of is the infamouse
"note A" attached to Table 65, which only says it: "should have
constant complexity."
 
J

Jerry Coffin

[ ... ]
The standard says it "should have constant complexity". That's a
recommendation, not a requirement. One issue here is the requirement
that list::splice have constant complexity, which can't be done if size
is also constant complexity.

I was looking at that, but after looking carefully, I believe size()
can be made constant complexity while meeting all the complexity
requirements for splice().

There are three overloads of splice:
void splice(iterator position, list<T, Allocator> &x);
Inserts the entirety of x at before position. The cached size can be
updated in constant time:
cached_size += x.size();
x.cached_size = 0;

void splice(iterator position, list<T, Allocator>& x, iterator i);

This removes x and inserts it as this[position]. The sizes can be
updated in consant time:
++cached_size;
--x.cached_size;

void splice(iterator position, list<T, Allocator> &x, iterator first,
iterator last);

This is the (somewhat) tricky one: the requirement is:
Constant time if &x == this; otherwise, linear time

If &x==this, we're just moving elements around in the same container
-- so the container's size doesn't change. We can just modify the
pointers in constant time, and never have to touch the cached size at
all.

If we're moving elements from one container to another, linear time
is allowed. This allows walking the elements in the list from first
to last to count the elements.

If there was no requirement for updating a cached count, all the
overloads of splice could always have constant complexity.

As an aside (and perhaps I should post this in comp.std.c++ instead,
but since you're apparently reading this thread...) a quick look at
N2857 shows that it still uses the same wording, mentioning "time".
Might it be better to change that from "time" to "complexity"?
 
A

Alf P. Steinbach

* joseph cook:
So it does.

It doesn't.

There are two possible in practice list implementations: one where size is O(1),
and one where splicing is O(1).

The standard allows both.

The documentation for SGI STL states that it can be O(N),
and warns that one should always use "empty()" instead of "size()
==0". I wonder why that is...

In order to make splicing O(1).


Cheers & hth.,

- Alf
 
H

Hans Bos

Alf P. Steinbach said:
* joseph cook:

It doesn't.

There are two possible in practice list implementations: one where size is
O(1), and one where splicing is O(1).

So when using the library I cannot rely on the fact that either operation
may be O(1).
I have to assume that both size() and splice() are O(N).

It would be better if the standard specifies which one is O(1).

Greetings,
Hans.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,768
Messages
2,569,574
Members
45,051
Latest member
CarleyMcCr

Latest Threads

Top