perfomance of clear vs swap

K

Krishanu Debnath

Hello,

I have a call to hash_map::clear() function which takes long time.

someClass::someFunction()
{

// typedef hash_map<name_id, uint> Mp;
// Mp p;
// assuming proper namespace, hash function for name_id obj.

p.clear();
}

Above p.clear() takes long time, profiling indicates 'number of bucket
of hash table is large'.

Now, just for sake of experiments, I replaced this 'clear' call with
swap. i.e.

someClass::someFunction()
{

// typedef hash_map<name_id, uint> Mp;
// Mp p;
// assuming proper namespace, hash function for name_is obj.

//p.clear();
Mp tmp;
p.swap(tmp);
}

Now runtime drops significantly, 10 fold less.

What's exactly cause this run time reduction?

Thanks,
Krishanu
 
N

Noah Roberts

Krishanu said:
Hello,

I have a call to hash_map::clear() function which takes long time.

Hmmm...surprised this made it into moderated. I've taken it out as I
don't like to wait.

hash_map is not standard and is different on every implementation that
implements it. It's hard to say what the reduction in speed is caused
by.
 
M

mlimber

Noah said:
Hmmm...surprised this made it into moderated. I've taken it out as I
don't like to wait.

hash_map is not standard and is different on every implementation that
implements it. It's hard to say what the reduction in speed is caused
by.

But it is part of TR1 as std::tr1::unordered_map, which qualifies under
FAQ 5.9.

Cheers! --M
 
M

mlimber

Krishanu said:
I have a call to hash_map::clear() function which takes long time.

someClass::someFunction()
{

// typedef hash_map<name_id, uint> Mp;
// Mp p;
// assuming proper namespace, hash function for name_id obj.

p.clear();
}

Above p.clear() takes long time, profiling indicates 'number of bucket
of hash table is large'.

Now, just for sake of experiments, I replaced this 'clear' call with
swap. i.e.

someClass::someFunction()
{

// typedef hash_map<name_id, uint> Mp;
// Mp p;
// assuming proper namespace, hash function for name_is obj.

//p.clear();
Mp tmp;
p.swap(tmp);
}

Now runtime drops significantly, 10 fold less.

What's exactly cause this run time reduction?

I'm not sure why this would be. Are you using SGI's hash_map extension
(which is similar to std::tr1::unordered_map)? Are you measuring the
performance of the entire function including the destructors for
automatic objects like tmp?

Cheers! --M
 
N

Noah Roberts

mlimber said:
But it is part of TR1 as std::tr1::unordered_map, which qualifies under
FAQ 5.9.

But hash_map is not unordered_map. Read the first part on unordered
containers in TR1 that explains exactly why the name hash_map was _not_
used....because that name is already taken in most implementations by
incompatible and disparate versions of a hash interface.

I doubt that is the topic of discussion or the OP probably would have
used the TR1 name. More likely they are working with some
implementation defined interface.
 
K

Krishanu Debnath

mlimber said:
I'm not sure why this would be. Are you using SGI's hash_map extension

Yes. I am using g++ 4.0.2, which adopts SGI's implementation, I believe.
(which is similar to std::tr1::unordered_map)? Are you measuring the
performance of the entire function including the destructors for
automatic objects like tmp?

Yes.

Krishanu
 
M

mlimber

Noah said:
But hash_map is not unordered_map. Read the first part on unordered
containers in TR1 that explains exactly why the name hash_map was _not_
used....because that name is already taken in most implementations by
incompatible and disparate versions of a hash interface.

Right, but it is also similar enough to the TR1 container to fall
within the bounds of this group, IMHO (and apparently in that of the
moderator of c.l.c++.m). This question may simply be a QOI issue for a
container that is quite close to TR1's, or it may be an issue of misuse
of the container or faulty measurements. In any of these cases, at
least initially I think it is falls within the (fuzzy) boundaries for
this group, whereas if it is an issue of a non-standard container that
is unlike unordered_map but with the same name, then it is almost
certainly outside the bounds of this group. Given the information in
the OP, however, I don't think we can determine for certain which of
these it is, and so ISTM that we should give it the benefit of the
doubt.

Cheers! --M
 
A

Aaron Graham

I have a call to hash_map::clear() function which takes long time.
[...]
Now, just for sake of experiments, I replaced this 'clear' call with
swap. [...]
Now runtime drops significantly, 10 fold less.
What's exactly cause this run time reduction?

Because the two functions do different amounts of work.

clear() requires time that is linear with respect to the number of
elements in the container. Destructors are called, memory is
deallocated, etc. But swap() can be done by swapping a few pointers.

Aaron
 
N

Noah Roberts

Aaron said:
I have a call to hash_map::clear() function which takes long time. [...]
Now, just for sake of experiments, I replaced this 'clear' call with
swap. [...]
Now runtime drops significantly, 10 fold less.
What's exactly cause this run time reduction?

Because the two functions do different amounts of work.

clear() requires time that is linear with respect to the number of
elements in the container. Destructors are called, memory is
deallocated, etc. But swap() can be done by swapping a few pointers.

Yes, but it seems that swapping with an empty temporary and then
deleting it would also call the same destructors and such that clear
would.
 
S

Seungbeom Kim

Aaron said:
I have a call to hash_map::clear() function which takes long time. [...]
Now, just for sake of experiments, I replaced this 'clear' call with
swap. [...]
Now runtime drops significantly, 10 fold less.
What's exactly cause this run time reduction?

Because the two functions do different amounts of work.

clear() requires time that is linear with respect to the number of
elements in the container. Destructors are called, memory is
deallocated, etc. But swap() can be done by swapping a few pointers.

That's true for swap() alone, but what about the destructor of the
"temporary" variable (tmp in the OP's code) when it goes out of scoppe?
Doesn't it do the same things, including destruction and deallocation?
 
C

Carlos Moreno

Aaron said:
clear() requires time that is linear with respect to the number of
elements in the container. Destructors are called, memory is
deallocated, etc. But swap() can be done by swapping a few pointers.

If you look carefully at his code, there should be a
desrtuctor being called, causing all deallocations to
be necessary (the same amount of them).

I'm not sure exactly how he measured things, but if he
calls SomeFunction repeatedly (say, in a loop a few
thousand times to be able to measure with more accuracy),
then each invokation requires the destruction of the map,
which should make it roughly equivalent to the clear()
option (at least comparable execution times, if not
equivalent).

Am I missing something?

Carlos
 
J

James Kanze

Aaron said:
I have a call to hash_map::clear() function which takes long time. [...]
Now, just for sake of experiments, I replaced this 'clear' call with
swap. [...]
Now runtime drops significantly, 10 fold less.
What's exactly cause this run time reduction?
Because the two functions do different amounts of work.
clear() requires time that is linear with respect to the number of
elements in the container. Destructors are called, memory is
deallocated, etc. But swap() can be done by swapping a few pointers.

But he then destructs the temporary to which he swapped, and
that destruction must call destructors, deallocate the memory,
etc. Because the final results of destruction and clear are
different, it's easy to imagine some difference in performance,
but the 10 fold difference he cites? Seems a bit high to me.
In the g++ implementation I have, the destructor of the
underlying hashtable just calls clear, so his swap should
actually run slightly slower.

If the actual hash table is a local variable in someFunction,
his version with clear actually calls clear() twice on the full
table, once in his explicit call, and once in the destructor.
clear() frees the nodes, but does not reduce the number of
buckets, so there is an iteration over all of the buckets in the
destructor, even though the table is empty. This could result
in some reduction in performance, but I still can't see a
10 fold difference due to it.
 
K

Krishanu Debnath

Carlos said:
If you look carefully at his code, there should be a
desrtuctor being called, causing all deallocations to
be necessary (the same amount of them).

Exactly, in g++ implementation(I am using g++ 4.0.2) hashtable
destructor calls the 'clear' function.
I'm not sure exactly how he measured things, but if he
calls SomeFunction repeatedly (say, in a loop a few
thousand times to be able to measure with more accuracy),
then each invokation requires the destruction of the map,
which should make it roughly equivalent to the clear()
option (at least comparable execution times, if not
equivalent).

I used gprof on whole executable. By theory 'swap' version
should take more time 'clear + additional steps for swapping'.

Am I missing something?

Join the club. I post this issue in gnu.g++.help, and no response so far
as expected. As it happened in past, eventually I will have to catch one
of gcc developers in personal emails.

Krishanu
 
A

Aaron Graham

That's true for swap() alone, but what about the destructor of the
"temporary" variable (tmp in the OP's code) when it goes out of scoppe?
Doesn't it do the same things, including destruction and deallocation?

Hmmm... but when I try it that way, using clear() is actually quite a
bit faster than using temporary/swap:

#include <ext/hash_map>
__gnu_cxx::hash_map<int, int> m;
void func1() {
__gnu_cxx::hash_map<int, int> t;
t.swap(m);
}
void func2() {
m.clear();
}
int main(int argc, char** argv) {
for (int x = 0; x < 10000; x++) {
for (int y = 0; y < 10000; y++) m[y] = y;
if (argc == 1) func1();
else func2();
}
}

g++ -O2 foo.cc
time ./a.out
20.413u 0.000s 0:20.40 100.0% 0+0k 0+0io 0pf+0w
time ./a.out 2
13.528u 0.016s 0:13.54 99.9% 0+0k 0+0io 0pf+0w


So I'm still not convinced that he's measuring what he thinks he's
measuring. I'm using gcc 4.1.1, by the way.

Aaron
 
W

wade

Krishanu said:
Hello,

I have a call to hash_map::clear() function which takes long time.

someClass::someFunction()
{

// typedef hash_map<name_id, uint> Mp;
// Mp p;
// assuming proper namespace, hash function for name_id obj.

p.clear();
}

Above p.clear() takes long time, profiling indicates 'number of bucket
of hash table is large'.

Now, just for sake of experiments, I replaced this 'clear' call with
swap. i.e.
[...]
Now runtime drops significantly, 10 fold less.

What's exactly cause this run time reduction?

Note that the in the Dinkum implementation (VC7.1, VC8) (if I am
reading it correctly), given a hash_map which currently has N elements,
but in the past had as many as M elements, the costs are

Destructor: O(N) (assumes compiler is smart enough to destroy
vector<iterator> in O(1))

clear(): O(M)

while(p.begin() != p.end()) erase(p.begin()); O(M*N) (quadratic)

So if you have an implementation that uses the Dinkum layout, but
implements clear() with the loop shown above, the swap/destruct
strategy would be much faster than clear(). Even with the Dinkum
clear() code, swap/destruct is a bit faster if the map was previously
much larger than it currently is.

IIRC the SGI versions I've seen did not have this behavior.
 
P

P.J. Plauger

Krishanu said:
Hello,

I have a call to hash_map::clear() function which takes long time.

someClass::someFunction()
{

// typedef hash_map<name_id, uint> Mp;
// Mp p;
// assuming proper namespace, hash function for name_id obj.

p.clear();
}

Above p.clear() takes long time, profiling indicates 'number of bucket
of hash table is large'.

Now, just for sake of experiments, I replaced this 'clear' call with
swap. i.e.
[...]
Now runtime drops significantly, 10 fold less.

What's exactly cause this run time reduction?

Note that the in the Dinkum implementation (VC7.1, VC8) (if I am
reading it correctly), given a hash_map which currently has N elements,
but in the past had as many as M elements, the costs are

Destructor: O(N) (assumes compiler is smart enough to destroy
vector<iterator> in O(1))

clear(): O(M)

while(p.begin() != p.end()) erase(p.begin()); O(M*N) (quadratic)

So if you have an implementation that uses the Dinkum layout, but
implements clear() with the loop shown above, the swap/destruct
strategy would be much faster than clear(). Even with the Dinkum
clear() code, swap/destruct is a bit faster if the map was previously
much larger than it currently is.

You're confounding several things here. We implement hash_* as a
list of elements plus a vector of buckets, each characterized by
a list iterator. hash_*::clear() calls list_clear, which destroys
all the elements, then assigns the list end iterator to all elements
of the hash table, which empties all the buckets. So you're talking
O(N) destructor calls plus O(M) iterator assignments, each assignment
generally being small, simple, and hence fast.

The swap trick has the one advantage of destroying the vector
instead of reinitializing it -- at least that's an advantage if
the hash table has grown large. But if the clearing time is
dominated by calling nontrivial element destructors, you should
get about the same time either way.
IIRC the SGI versions I've seen did not have this behavior.

Our implementation is quite different from SGI's, and has a number
of advantages.

P.J. Plauger
Dinkumware, Ltd.
http://www.dinkumware.com
 
K

Krishanu Debnath

James said:
Aaron said:
I have a call to hash_map::clear() function which takes long time. [...]
Now, just for sake of experiments, I replaced this 'clear' call with
swap. [...]
Now runtime drops significantly, 10 fold less.
What's exactly cause this run time reduction?
Because the two functions do different amounts of work.
clear() requires time that is linear with respect to the number of
elements in the container. Destructors are called, memory is
deallocated, etc. But swap() can be done by swapping a few pointers.

But he then destructs the temporary to which he swapped, and
that destruction must call destructors, deallocate the memory,
etc. Because the final results of destruction and clear are
different, it's easy to imagine some difference in performance,
but the 10 fold difference he cites? Seems a bit high to me.
In the g++ implementation I have, the destructor of the
underlying hashtable just calls clear, so his swap should
actually run slightly slower.

If the actual hash table is a local variable in someFunction,
his version with clear actually calls clear() twice on the full
table, once in his explicit call, and once in the destructor.
clear() frees the nodes, but does not reduce the number of
buckets, so there is an iteration over all of the buckets in the
destructor, even though the table is empty. This could result

Very close. what happened actually I am populating and clearing that
hash_map many times(91679 times in this particular test case). So
every time hash_map::clear actually traversing over same or more number
of buckets than last time. For a particular 'hash_map::clear' it is
perfectly possible that no of elements in map is very small but no of
bucket is very high due to previous call.

So swap tricks give you better performance.
in some reduction in performance, but I still can't see a
10 fold difference due to it.

clear was taking around 90% of total execution time. Now check the
hash_map::hashtable::clear implementation of g++ against the following data.

data for 'clear' version:

hash_bucket.size() #call 845505722
hash_bucket #call 1690828086
_delete node_ #call 838371

data for 'swap' version:

hash_bucket.size() #call 17982146
hash_bucket #call 35780934
_delete node_ #call 838371

Krishanu
 
J

James Kanze

Krishanu said:
James said:
Aaron said:
I have a call to hash_map::clear() function which takes long time.
[...]
Now, just for sake of experiments, I replaced this 'clear' call with
swap.
[...]
Now runtime drops significantly, 10 fold less.
What's exactly cause this run time reduction?
Because the two functions do different amounts of work.
clear() requires time that is linear with respect to the number of
elements in the container. Destructors are called, memory is
deallocated, etc. But swap() can be done by swapping a few pointers.
But he then destructs the temporary to which he swapped, and
that destruction must call destructors, deallocate the memory,
etc. Because the final results of destruction and clear are
different, it's easy to imagine some difference in performance,
but the 10 fold difference he cites? Seems a bit high to me.
In the g++ implementation I have, the destructor of the
underlying hashtable just calls clear, so his swap should
actually run slightly slower.
If the actual hash table is a local variable in someFunction,
his version with clear actually calls clear() twice on the full
table, once in his explicit call, and once in the destructor.
clear() frees the nodes, but does not reduce the number of
buckets, so there is an iteration over all of the buckets in the
destructor, even though the table is empty. This could result
Very close. what happened actually I am populating and clearing that
hash_map many times(91679 times in this particular test case). So
every time hash_map::clear actually traversing over same or more number
of buckets than last time. For a particular 'hash_map::clear' it is
perfectly possible that no of elements in map is very small but no of
bucket is very high due to previous call.
So swap tricks give you better performance.

Or not, depending. Allocating new buckets and doing a rehash
isn't free either. If you typically have a fairly large number
of elements, clear could give better performance because after
the first couple of times, you've reached the maximum number of
buckets, and don't have to create any more. (Depending on the
implementation, adding buckets could require a rehash of the
entire table.) On the other hand, if you typically have only 10
or 20 entries, with only rarely a great many, swap will
generally be faster, not only in clearing the table, but
overall, with the table using less memory, etc. In an extreme
case, if the first use has a million entries, and all of the
following vary between 10 and 20, clear could be a disaster; if
you always have exactly N entries, clear will probably be
faster.

As a general rule, unless I definitely want to avoid the
reallocations (often the case with e.g. std::vector), I'll
simply create a new instance each time. But a lot depends on
context.

--
James Kanze (GABI Software) email:[email protected]
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34


[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
 
W

wade

P.J. Plauger said:
Our implementation is quite different from SGI's, and has a number
of advantages.

No argument there. I was just trying to give examples where for
reasonable implementations, the swap/destruct trick could be
significantly faster than clear().

For your implementation, that should happen only if there are currently
many more buckets than elements (which presumably means that size() is
currently much less than it used to be).

However, an implementation very similar to yours could show quadratic
behavior for clear(). Modify your code so that

1) erase(first,last) doesn't do the optimization where it checks for
clear() semantics.
2) clear() calls erase(begin(),end()).

For such an implementation, clear() would take quadratic time (and I
suspect that would be non-conforming, but I haven't checked) while
swap/destruct would still take linear time.

I doubt this is what the OP ran into, but without knowing his exact
implementation, I'd consider it plausible.
 
W

wade

James said:
Allocating new buckets and doing a rehash
isn't free either. If you typically have a fairly large number
of elements, clear could give better performance because after
the first couple of times, you've reached the maximum number of
buckets, and don't have to create any more. (Depending on the
implementation, adding buckets could require a rehash of the
entire table.)

In most implementations the total number of hashes that will occur as
the table grows is proportional to the final size (with a small
constant). So by the time you have K elements, you've probably called
the hash function less than 4K times (of course the implementation can
cache the raw hash() result, so hash is only called once for each
element). The array growth is typically pretty cheap (O(n) splice
operations).

The extra allocations just don't matter, unless the final size() is
small. Suppose we do factor-of-two growth, and insert 1000000
elements. If the array is preallocated, we end up doing 1000000
allocations (for the elements). If the array is not preallocated, we
end up doing 1000020 allocations (elements + array growth).

On the other hand, with some implementations, iterating over a map with
many more buckets than elements will be expensive (because iterators
have to examine the empty buckets). With some other implementations,
inserting or erasing elements into a sparse bucket table is expensive
(adjacent empty buckets require updates).

Growing hash maps as needed will sometimes be less efficient by a small
constant factor. Working with sparse hash maps can turn O(1)
operations into O(capcity()/size()) operations. The worst-case for
sparse hash maps is much worse.
From his numbers, it looks like the OP was, on average, calling clear()
on maps that were very sparse (less than 1% full). For a wide range of
implementations, that is a bad thing to do.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,581
Members
45,057
Latest member
KetoBeezACVGummies

Latest Threads

Top