perfomance of clear vs swap

Krishanu Debnath · Nov 28, 2006

Hello,

I have a call to hash_map::clear() function which takes long time.

someClass::someFunction()
{

// typedef hash_map<name_id, uint> Mp;
// Mp p;
// assuming proper namespace, hash function for name_id obj.

p.clear();
}

Above p.clear() takes long time, profiling indicates 'number of bucket
of hash table is large'.

Now, just for sake of experiments, I replaced this 'clear' call with
swap. i.e.

someClass::someFunction()
{

// typedef hash_map<name_id, uint> Mp;
// Mp p;
// assuming proper namespace, hash function for name_is obj.

//p.clear();
Mp tmp;
p.swap(tmp);
}

Now runtime drops significantly, 10 fold less.

What's exactly cause this run time reduction?

Thanks,
Krishanu

Noah Roberts · Nov 28, 2006

Krishanu said:
Hello,

I have a call to hash_map::clear() function which takes long time.

Hmmm...surprised this made it into moderated. I've taken it out as I
don't like to wait.

hash_map is not standard and is different on every implementation that
implements it. It's hard to say what the reduction in speed is caused
by.

mlimber · Nov 28, 2006

Noah said:
Hmmm...surprised this made it into moderated. I've taken it out as I
don't like to wait.

hash_map is not standard and is different on every implementation that
implements it. It's hard to say what the reduction in speed is caused
by.

But it is part of TR1 as std::tr1::unordered_map, which qualifies under
FAQ 5.9.

Cheers! --M

mlimber · Nov 28, 2006

Krishanu said:
I have a call to hash_map::clear() function which takes long time.

someClass::someFunction()
{

// typedef hash_map<name_id, uint> Mp;
// Mp p;
// assuming proper namespace, hash function for name_id obj.

p.clear();
}

Above p.clear() takes long time, profiling indicates 'number of bucket
of hash table is large'.

Now, just for sake of experiments, I replaced this 'clear' call with
swap. i.e.

someClass::someFunction()
{

// typedef hash_map<name_id, uint> Mp;
// Mp p;
// assuming proper namespace, hash function for name_is obj.

//p.clear();
Mp tmp;
p.swap(tmp);
}

Now runtime drops significantly, 10 fold less.

What's exactly cause this run time reduction?

I'm not sure why this would be. Are you using SGI's hash_map extension
(which is similar to std::tr1::unordered_map)? Are you measuring the
performance of the entire function including the destructors for
automatic objects like tmp?

Cheers! --M

Noah Roberts · Nov 28, 2006

mlimber said:
But it is part of TR1 as std::tr1::unordered_map, which qualifies under
FAQ 5.9.

But hash_map is not unordered_map. Read the first part on unordered
containers in TR1 that explains exactly why the name hash_map was _not_
used....because that name is already taken in most implementations by
incompatible and disparate versions of a hash interface.

I doubt that is the topic of discussion or the OP probably would have
used the TR1 name. More likely they are working with some
implementation defined interface.

Krishanu Debnath · Nov 28, 2006

mlimber said:
I'm not sure why this would be. Are you using SGI's hash_map extension

Yes. I am using g++ 4.0.2, which adopts SGI's implementation, I believe.

(which is similar to std::tr1::unordered_map)? Are you measuring the
performance of the entire function including the destructors for
automatic objects like tmp?

Yes.

Krishanu

mlimber · Nov 28, 2006

Noah said:
But hash_map is not unordered_map. Read the first part on unordered
containers in TR1 that explains exactly why the name hash_map was _not_
used....because that name is already taken in most implementations by
incompatible and disparate versions of a hash interface.

Right, but it is also similar enough to the TR1 container to fall
within the bounds of this group, IMHO (and apparently in that of the
moderator of c.l.c++.m). This question may simply be a QOI issue for a
container that is quite close to TR1's, or it may be an issue of misuse
of the container or faulty measurements. In any of these cases, at
least initially I think it is falls within the (fuzzy) boundaries for
this group, whereas if it is an issue of a non-standard container that
is unlike unordered_map but with the same name, then it is almost
certainly outside the bounds of this group. Given the information in
the OP, however, I don't think we can determine for certain which of
these it is, and so ISTM that we should give it the benefit of the
doubt.

Cheers! --M

Aaron Graham · Nov 29, 2006

I have a call to hash_map::clear() function which takes long time.
[...]

Now, just for sake of experiments, I replaced this 'clear' call with
swap. [...]
Now runtime drops significantly, 10 fold less.
What's exactly cause this run time reduction?

Because the two functions do different amounts of work.

clear() requires time that is linear with respect to the number of
elements in the container. Destructors are called, memory is
deallocated, etc. But swap() can be done by swapping a few pointers.

Aaron

Noah Roberts · Nov 29, 2006

Aaron said:
I have a call to hash_map::clear() function which takes long time. [...]
Now, just for sake of experiments, I replaced this 'clear' call with
swap. [...]
Now runtime drops significantly, 10 fold less.
What's exactly cause this run time reduction?

Click to expand...

Because the two functions do different amounts of work.

clear() requires time that is linear with respect to the number of
elements in the container. Destructors are called, memory is
deallocated, etc. But swap() can be done by swapping a few pointers.

Yes, but it seems that swapping with an empty temporary and then
deleting it would also call the same destructors and such that clear
would.

Seungbeom Kim · Nov 29, 2006

Aaron said:
I have a call to hash_map::clear() function which takes long time. [...]
Now, just for sake of experiments, I replaced this 'clear' call with
swap. [...]
Now runtime drops significantly, 10 fold less.
What's exactly cause this run time reduction?

Click to expand...

Because the two functions do different amounts of work.

clear() requires time that is linear with respect to the number of
elements in the container. Destructors are called, memory is
deallocated, etc. But swap() can be done by swapping a few pointers.

That's true for swap() alone, but what about the destructor of the
"temporary" variable (tmp in the OP's code) when it goes out of scoppe?
Doesn't it do the same things, including destruction and deallocation?

Carlos Moreno · Nov 29, 2006

Aaron said:
clear() requires time that is linear with respect to the number of
elements in the container. Destructors are called, memory is
deallocated, etc. But swap() can be done by swapping a few pointers.

If you look carefully at his code, there should be a
desrtuctor being called, causing all deallocations to
be necessary (the same amount of them).

I'm not sure exactly how he measured things, but if he
calls SomeFunction repeatedly (say, in a loop a few
thousand times to be able to measure with more accuracy),
then each invokation requires the destruction of the map,
which should make it roughly equivalent to the clear()
option (at least comparable execution times, if not
equivalent).

Am I missing something?

Carlos

James Kanze · Nov 29, 2006

Aaron said:
I have a call to hash_map::clear() function which takes long time. [...]
Now, just for sake of experiments, I replaced this 'clear' call with
swap. [...]
Now runtime drops significantly, 10 fold less.
What's exactly cause this run time reduction?

Click to expand...

Because the two functions do different amounts of work.

clear() requires time that is linear with respect to the number of
elements in the container. Destructors are called, memory is
deallocated, etc. But swap() can be done by swapping a few pointers.

But he then destructs the temporary to which he swapped, and
that destruction must call destructors, deallocate the memory,
etc. Because the final results of destruction and clear are
different, it's easy to imagine some difference in performance,
but the 10 fold difference he cites? Seems a bit high to me.
In the g++ implementation I have, the destructor of the
underlying hashtable just calls clear, so his swap should
actually run slightly slower.

If the actual hash table is a local variable in someFunction,
his version with clear actually calls clear() twice on the full
table, once in his explicit call, and once in the destructor.
clear() frees the nodes, but does not reduce the number of
buckets, so there is an iteration over all of the buckets in the
destructor, even though the table is empty. This could result
in some reduction in performance, but I still can't see a
10 fold difference due to it.

Krishanu Debnath · Nov 29, 2006

Carlos said:
If you look carefully at his code, there should be a
desrtuctor being called, causing all deallocations to
be necessary (the same amount of them).

Exactly, in g++ implementation(I am using g++ 4.0.2) hashtable
destructor calls the 'clear' function.

I'm not sure exactly how he measured things, but if he
calls SomeFunction repeatedly (say, in a loop a few
thousand times to be able to measure with more accuracy),
then each invokation requires the destruction of the map,
which should make it roughly equivalent to the clear()
option (at least comparable execution times, if not
equivalent).

I used gprof on whole executable. By theory 'swap' version
should take more time 'clear + additional steps for swapping'.

Am I missing something?

Join the club. I post this issue in gnu.g++.help, and no response so far
as expected. As it happened in past, eventually I will have to catch one
of gcc developers in personal emails.

Krishanu

Aaron Graham · Nov 29, 2006

That's true for swap() alone, but what about the destructor of the

"temporary" variable (tmp in the OP's code) when it goes out of scoppe?
Doesn't it do the same things, including destruction and deallocation?

Hmmm... but when I try it that way, using clear() is actually quite a
bit faster than using temporary/swap:

#include <ext/hash_map>
__gnu_cxx::hash_map<int, int> m;
void func1() {
__gnu_cxx::hash_map<int, int> t;
t.swap(m);
}
void func2() {
m.clear();
}
int main(int argc, char** argv) {
for (int x = 0; x < 10000; x++) {
for (int y = 0; y < 10000; y++) m[y] = y;
if (argc == 1) func1();
else func2();
}
}

g++ -O2 foo.cc
time ./a.out

20.413u 0.000s 0:20.40 100.0% 0+0k 0+0io 0pf+0w

time ./a.out 2

13.528u 0.016s 0:13.54 99.9% 0+0k 0+0io 0pf+0w

So I'm still not convinced that he's measuring what he thinks he's
measuring. I'm using gcc 4.1.1, by the way.

Aaron

wade · Nov 29, 2006

Krishanu said:
Hello,

I have a call to hash_map::clear() function which takes long time.

someClass::someFunction()
{

// typedef hash_map<name_id, uint> Mp;
// Mp p;
// assuming proper namespace, hash function for name_id obj.

p.clear();
}

Above p.clear() takes long time, profiling indicates 'number of bucket
of hash table is large'.

Now, just for sake of experiments, I replaced this 'clear' call with
swap. i.e.
[...]
Now runtime drops significantly, 10 fold less.

What's exactly cause this run time reduction?

Note that the in the Dinkum implementation (VC7.1, VC8) (if I am
reading it correctly), given a hash_map which currently has N elements,
but in the past had as many as M elements, the costs are

Destructor: O(N) (assumes compiler is smart enough to destroy
vector<iterator> in O(1))

clear(): O(M)

while(p.begin() != p.end()) erase(p.begin()); O(M*N) (quadratic)

So if you have an implementation that uses the Dinkum layout, but
implements clear() with the loop shown above, the swap/destruct
strategy would be much faster than clear(). Even with the Dinkum
clear() code, swap/destruct is a bit faster if the map was previously
much larger than it currently is.

IIRC the SGI versions I've seen did not have this behavior.

P.J. Plauger · Nov 29, 2006

Krishanu said:
Krishanu said:

Hello,

I have a call to hash_map::clear() function which takes long time.

someClass::someFunction()
{

// typedef hash_map<name_id, uint> Mp;
// Mp p;
// assuming proper namespace, hash function for name_id obj.

p.clear();
}

Above p.clear() takes long time, profiling indicates 'number of bucket
of hash table is large'.

Now, just for sake of experiments, I replaced this 'clear' call with
swap. i.e.
[...]
Now runtime drops significantly, 10 fold less.

What's exactly cause this run time reduction?

Click to expand...

Note that the in the Dinkum implementation (VC7.1, VC8) (if I am
reading it correctly), given a hash_map which currently has N elements,
but in the past had as many as M elements, the costs are

Destructor: O(N) (assumes compiler is smart enough to destroy
vector<iterator> in O(1))

clear(): O(M)

while(p.begin() != p.end()) erase(p.begin()); O(M*N) (quadratic)

So if you have an implementation that uses the Dinkum layout, but
implements clear() with the loop shown above, the swap/destruct
strategy would be much faster than clear(). Even with the Dinkum
clear() code, swap/destruct is a bit faster if the map was previously
much larger than it currently is.

You're confounding several things here. We implement hash_* as a
list of elements plus a vector of buckets, each characterized by
a list iterator. hash_*::clear() calls list_clear, which destroys
all the elements, then assigns the list end iterator to all elements
of the hash table, which empties all the buckets. So you're talking
O(N) destructor calls plus O(M) iterator assignments, each assignment
generally being small, simple, and hence fast.

The swap trick has the one advantage of destroying the vector
instead of reinitializing it -- at least that's an advantage if
the hash table has grown large. But if the clearing time is
dominated by calling nontrivial element destructors, you should
get about the same time either way.

IIRC the SGI versions I've seen did not have this behavior.

Our implementation is quite different from SGI's, and has a number
of advantages.

P.J. Plauger
Dinkumware, Ltd.
http://www.dinkumware.com

Krishanu Debnath · Nov 29, 2006

James said:
Aaron said:

I have a call to hash_map::clear() function which takes long time. [...]
Now, just for sake of experiments, I replaced this 'clear' call with
swap. [...]
Now runtime drops significantly, 10 fold less.
What's exactly cause this run time reduction?

Click to expand...

Click to expand...

Because the two functions do different amounts of work.

Click to expand...

clear() requires time that is linear with respect to the number of
elements in the container. Destructors are called, memory is
deallocated, etc. But swap() can be done by swapping a few pointers.

Click to expand...

But he then destructs the temporary to which he swapped, and
that destruction must call destructors, deallocate the memory,
etc. Because the final results of destruction and clear are
different, it's easy to imagine some difference in performance,
but the 10 fold difference he cites? Seems a bit high to me.
In the g++ implementation I have, the destructor of the
underlying hashtable just calls clear, so his swap should
actually run slightly slower.

If the actual hash table is a local variable in someFunction,
his version with clear actually calls clear() twice on the full
table, once in his explicit call, and once in the destructor.
clear() frees the nodes, but does not reduce the number of
buckets, so there is an iteration over all of the buckets in the
destructor, even though the table is empty. This could result

Very close. what happened actually I am populating and clearing that
hash_map many times(91679 times in this particular test case). So
every time hash_map::clear actually traversing over same or more number
of buckets than last time. For a particular 'hash_map::clear' it is
perfectly possible that no of elements in map is very small but no of
bucket is very high due to previous call.

So swap tricks give you better performance.

in some reduction in performance, but I still can't see a
10 fold difference due to it.

clear was taking around 90% of total execution time. Now check the
hash_map::hashtable::clear implementation of g++ against the following data.

data for 'clear' version:

hash_bucket.size() #call 845505722
hash_bucket #call 1690828086
_delete node_ #call 838371

data for 'swap' version:

hash_bucket.size() #call 17982146
hash_bucket #call 35780934
_delete node_ #call 838371

Krishanu

James Kanze · Nov 30, 2006

Krishanu said:
James said:

Aaron said:

I have a call to hash_map::clear() function which takes long time.
[...]
Now, just for sake of experiments, I replaced this 'clear' call with
swap.
[...]
Now runtime drops significantly, 10 fold less.
What's exactly cause this run time reduction?
Because the two functions do different amounts of work.
clear() requires time that is linear with respect to the number of
elements in the container. Destructors are called, memory is
deallocated, etc. But swap() can be done by swapping a few pointers.

Click to expand...

But he then destructs the temporary to which he swapped, and
that destruction must call destructors, deallocate the memory,
etc. Because the final results of destruction and clear are
different, it's easy to imagine some difference in performance,
but the 10 fold difference he cites? Seems a bit high to me.
In the g++ implementation I have, the destructor of the
underlying hashtable just calls clear, so his swap should
actually run slightly slower.
If the actual hash table is a local variable in someFunction,
his version with clear actually calls clear() twice on the full
table, once in his explicit call, and once in the destructor.
clear() frees the nodes, but does not reduce the number of
buckets, so there is an iteration over all of the buckets in the
destructor, even though the table is empty. This could result

Click to expand...

Very close. what happened actually I am populating and clearing that
hash_map many times(91679 times in this particular test case). So
every time hash_map::clear actually traversing over same or more number
of buckets than last time. For a particular 'hash_map::clear' it is
perfectly possible that no of elements in map is very small but no of
bucket is very high due to previous call.

So swap tricks give you better performance.

Or not, depending. Allocating new buckets and doing a rehash
isn't free either. If you typically have a fairly large number
of elements, clear could give better performance because after
the first couple of times, you've reached the maximum number of
buckets, and don't have to create any more. (Depending on the
implementation, adding buckets could require a rehash of the
entire table.) On the other hand, if you typically have only 10
or 20 entries, with only rarely a great many, swap will
generally be faster, not only in clearing the table, but
overall, with the table using less memory, etc. In an extreme
case, if the first use has a million entries, and all of the
following vary between 10 and 20, clear could be a disaster; if
you always have exactly N entries, clear will probably be
faster.

As a general rule, unless I definitely want to avoid the
reallocations (often the case with e.g. std::vector), I'll
simply create a new instance each time. But a lot depends on
context.

--
James Kanze (GABI Software) email:[email protected]
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

wade · Nov 30, 2006

P.J. Plauger said:
Our implementation is quite different from SGI's, and has a number
of advantages.

No argument there. I was just trying to give examples where for
reasonable implementations, the swap/destruct trick could be
significantly faster than clear().

For your implementation, that should happen only if there are currently
many more buckets than elements (which presumably means that size() is
currently much less than it used to be).

However, an implementation very similar to yours could show quadratic
behavior for clear(). Modify your code so that

1) erase(first,last) doesn't do the optimization where it checks for
clear() semantics.
2) clear() calls erase(begin(),end()).

For such an implementation, clear() would take quadratic time (and I
suspect that would be non-conforming, but I haven't checked) while
swap/destruct would still take linear time.

I doubt this is what the OP ran into, but without knowing his exact
implementation, I'd consider it plausible.

wade · Nov 30, 2006

James said:
Allocating new buckets and doing a rehash
isn't free either. If you typically have a fairly large number
of elements, clear could give better performance because after
the first couple of times, you've reached the maximum number of
buckets, and don't have to create any more. (Depending on the
implementation, adding buckets could require a rehash of the
entire table.)

In most implementations the total number of hashes that will occur as
the table grows is proportional to the final size (with a small
constant). So by the time you have K elements, you've probably called
the hash function less than 4K times (of course the implementation can
cache the raw hash() result, so hash is only called once for each
element). The array growth is typically pretty cheap (O(n) splice
operations).

The extra allocations just don't matter, unless the final size() is
small. Suppose we do factor-of-two growth, and insert 1000000
elements. If the array is preallocated, we end up doing 1000000
allocations (for the elements). If the array is not preallocated, we
end up doing 1000020 allocations (elements + array growth).

On the other hand, with some implementations, iterating over a map with
many more buckets than elements will be expensive (because iterators
have to examine the empty buckets). With some other implementations,
inserting or erasing elements into a sparse bucket table is expensive
(adjacent empty buckets require updates).

Growing hash maps as needed will sometimes be less efficient by a small
constant factor. Working with sparse hash maps can turn O(1)
operations into O(capcity()/size()) operations. The worst-case for
sparse hash maps is much worse.

From his numbers, it looks like the OP was, on average, calling clear()

on maps that were very sparse (less than 1% full). For a wide range of
implementations, that is a bad thing to do.

high traffic/availability application and gnu_cxx::hash_map problem -better to use tr1/unordered_map	1	Jan 5, 2011
New flyweight template class	0	Dec 10, 2007
comp.lang.c FAQ list Table of Contents	0	Jan 12, 2008
comp.lang.c FAQ list Table of Contents	0	Jan 1, 2006
comp.lang.c Answers to Frequently Asked Questions (FAQ List)	15	Apr 1, 2006
comp.lang.c Answers (Abridged) to Frequently Asked Questions (FAQ)	0	Mar 1, 2008
comp.lang.c Answers (Abridged) to Frequently Asked Questions (FAQ)	0	Jan 12, 2008
Rails 0.8.5: Better fixtures, shared generators, sendmail for AM, lots of fixes!	3	Nov 18, 2004

perfomance of clear vs swap

Krishanu Debnath

Noah Roberts

mlimber

mlimber

Noah Roberts

Krishanu Debnath

mlimber

Aaron Graham

Noah Roberts

Seungbeom Kim

Carlos Moreno

James Kanze

Krishanu Debnath

Aaron Graham

wade

P.J. Plauger

Krishanu Debnath

James Kanze

wade

wade

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads