Fastest way to clone a hash_map

D

devdude

Hi,

I have the need to take a snapshot of a hash_map during execution
(actually transform it to a vector). This map is a shared resource
and therefore must be locked prior to any read/write operations thus I
need to minimize the amount of time the map resource is locked.

The map is defined as type <string, boost::shared_ptr<myobject>>. My
algorithm is as such:

void SnapShotToVector( vector< pair< string,
boost::shared_ptr<myobject> >& vec )
{

lockResource(this->map);
vec.resize( map.size() );
copy(this->map.begin(), this->map.end(),list.begin());
unlockResource(this->map);
}

For about 3M elements w/in the map, I'm noticing that the resize op
takes about 150ms and the copy takes ~850ms. Is there any way to do
better? I suppose the total time doesn' t matter as it's the time the
resource is actually locked is the primary concern.

TIA
 
M

Marcel Müller

devdude said:
For about 3M elements w/in the map, I'm noticing that the resize op
takes about 150ms and the copy takes ~850ms. Is there any way to do
better? I suppose the total time doesn' t matter as it's the time the
resource is actually locked is the primary concern.

You won't come around to copy the list of elements within the mutex
section. You might do the resize outside the lock, but you risk that a
second more expensive resize is required during the copy.

If you need shorter locks then you have to use a collection with an
iterator that is thread safe in a way that iteration over the mutable
collection gives defined behavior. Directory scans behave like that for
example. But the results are not reproduceable, of course.


Marcel
 
E

Erik Wikström

Hi,

I have the need to take a snapshot of a hash_map during execution
(actually transform it to a vector). This map is a shared resource
and therefore must be locked prior to any read/write operations thus I
need to minimize the amount of time the map resource is locked.

The map is defined as type <string, boost::shared_ptr<myobject>>. My
algorithm is as such:

void SnapShotToVector( vector< pair< string,
boost::shared_ptr<myobject> >& vec )
{

lockResource(this->map);
vec.resize( map.size() );
copy(this->map.begin(), this->map.end(),list.begin());
unlockResource(this->map);
}

For about 3M elements w/in the map, I'm noticing that the resize op
takes about 150ms and the copy takes ~850ms. Is there any way to do
better? I suppose the total time doesn' t matter as it's the time the
resource is actually locked is the primary concern.

You can start by moving the vec.resize() outside the lock (actually you
should probably only have to use a reserve() and not resize()). You can
probably also use a readers/writers lock to allow reads to the map while
copying it, which might be enough if you do not perform to many
modifications on it.

Depending on how you have implemented the hash-map you might be able to
add some kind of copy on write functionality, so that if an element is
added/ remove/changed in the map it will not affect the original map but
a copy, which shares the unmodified elements with the original map.

If the readers/writers lock is not enough you need to either make the
copying faster to do something smart with the map, either way you need
some information about the implementation of the map.
 
D

devdude

You can start by moving the vec.resize() outside the lock (actually you
should probably only have to use a reserve() and not resize()). You can
probably also use a readers/writers lock to allow reads to the map while
copying it, which might be enough if you do not perform to many
modifications on it.

Depending on how you have implemented the hash-map you might be able to
add some kind of copy on write functionality, so that if an element is
added/ remove/changed in the map it will not affect the original map but
a copy, which shares the unmodified elements with the original map.

If the readers/writers lock is not enough you need to either make the
copying faster to do something smart with the map, either way you need
some information about the implementation of the map.

I am actually using a reader/writer lock (are there standard
implementations of r/w locks avail by chance?) but the numbers are
from stress testing using all writes so it doesn't really matter.
hash_map is google's dense_map. moving vec.resize() outside the lock
gives me problems with the copy() -- assertion occurs in the bowels of
stl. Are there strategies to "chunk" the copy through multiple
iterations over the map?
 
E

Erik Wikström

Please do not quote signatures. And by the way, multi-posting is
generally not a good idea, if you have to post in multiple groups you
should cross-post (that way we will see what others have to say).
I am actually using a reader/writer lock (are there standard
implementations of r/w locks avail by chance?) but the numbers are
from stress testing using all writes so it doesn't really matter.
hash_map is google's dense_map. moving vec.resize() outside the lock
gives me problems with the copy() -- assertion occurs in the bowels of
stl.

Probably because the size of the map changes between the resize and the
copying and you iterate past the end of the vector. You should insert
the elements instead of assigning them, try using reserve()and an insert
iterator.

Notice that if the size of the map increases between the reserve and the
copying the vector might have to reallocate which will cost a lot, but
since reserve() is quite cheap you can probably do it inside the lock.
If that does not work you might try a deque, insertions at the end are
O(1), and not amortised O(1) like the vector, and you never have to
reallocate.
Are there strategies to "chunk" the copy through multiple
iterations over the map?

Do you mean to only copy parts of the map and then wait a bit and copy
the next part? You probably could but the results would be quite
meaningless since some elements will be added after you copy that part,
and others will be deleted.

Have you tried to create a copy of the map inside the lock, and copy to
the vector outside it?
 
D

devdude

Have you tried to create a copy of the map inside the lock, and copy to
the vector outside it?

If you mean:

hash_map<T1,T2> newMap (oldMap);

Then I have tried this and the copy const. takes 2+ secs alone to
perform. Is there something faster?
 
J

James Kanze

I have the need to take a snapshot of a hash_map during
execution (actually transform it to a vector). This map is a
shared resource and therefore must be locked prior to any
read/write operations thus I need to minimize the amount of
time the map resource is locked.
The map is defined as type <string,
boost::shared_ptr<myobject>>. My algorithm is as such:
void SnapShotToVector( vector< pair< string,
boost::shared_ptr<myobject> >& vec )
{
lockResource(this->map);
vec.resize( map.size() );
copy(this->map.begin(), this->map.end(),list.begin());
unlockResource(this->map);
}

(Just a nit, but you really should use some sort of scoped lock
here. Both the resize and the copy can throw an exception,
leaving your resource forever locked.)
For about 3M elements w/in the map, I'm noticing that the
resize op takes about 150ms and the copy takes ~850ms. Is
there any way to do better? I suppose the total time doesn' t
matter as it's the time the resource is actually locked is the
primary concern.

There are really only two solutions: what you've done, and:

vec.reserve( map.size() ) ;
std::copy( map.begin(), map.end(),
std::back_inserter( vec ) );

Which one will be faster will depend on the implementation of
std::vector and std::string, so you'll have to measure. But
you're not going to get any order of magnitude gains.

You might also gain a little by resizing once before locking
(with perhaps a fudge factor, to allow for the map growing while
you're resizing):

size_t trialSize ;
lockResource( map ) ;
trialSize = 1.2 * map.size() ) ;
unlockResource( map ) ;
vec.resize( trialSize ) ;
lockResource( map ) ;
vec.resize( map.size() ) ;
copy( map.begin(), map.end(), vec.begin() ) ;
unlockResource( map ) ;

The resize in the locked zone should only change the size a
little, and hopefully will not increase it, so you'll only call
a few destructors.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,432
Messages
2,571,682
Members
48,796
Latest member
Greg L.

Latest Threads

Top