std::map efficiency

Kin Pang · Feb 5, 2004

Hi,

I have a routine where I'm using a std::map from a pair of ints to double
for caching evaluation results.
The cache depth could be upto say 1000 in depth.
However, my application needs to clear and replenish the cache repeatedly
and profiling seems to suggests
the performance is handicated by repeated calls to new.
Can anybody advice me on how to improve the performance or suggests
alternatives please.

Thanks in advance.

Howard Hinnant · Feb 5, 2004

Kin Pang said:
Hi,

I have a routine where I'm using a std::map from a pair of ints to double
for caching evaluation results.
The cache depth could be upto say 1000 in depth.
However, my application needs to clear and replenish the cache repeatedly
and profiling seems to suggests
the performance is handicated by repeated calls to new.
Can anybody advice me on how to improve the performance or suggests
alternatives please.

Sounds like an ideal application for a "pooling allocator". Create an
allocator that only expects 1 object to be allocated at a time, but
grabs a handful of them from the system (say 1000), and hands them out
as they are needed. You can then specify that allocator in your map,
and it will use it.

Issues you will have to answer:

1. Should the allocator serve only one map, or many?

2. If serving many, are you operating in a multi-threaded environment?

If you don't want to write your own pool allocator, you might check out
Steve Cleary's pool at boost (www.boost.org).

-Howard

Cy Edmunds · Feb 6, 2004

Kin Pang said:
Hi,

I have a routine where I'm using a std::map from a pair of ints to double
for caching evaluation results.
The cache depth could be upto say 1000 in depth.
However, my application needs to clear and replenish the cache repeatedly
and profiling seems to suggests
the performance is handicated by repeated calls to new.
Can anybody advice me on how to improve the performance or suggests
alternatives please.

Thanks in advance.

You should probably be looking for a hash table. Binary trees have to be
rebalanced a lot as you are constructing them and that can be expensive.

Daniel T. · Feb 6, 2004

Kin Pang said:
I have a routine where I'm using a std::map from a pair of ints to double
for caching evaluation results.
The cache depth could be upto say 1000 in depth.
However, my application needs to clear and replenish the cache repeatedly
and profiling seems to suggests
the performance is handicated by repeated calls to new.
Can anybody advice me on how to improve the performance or suggests
alternatives please.

Use a vector< pair< int, int > >. Create a function like:

bool compare(const pair<int, int>& lhs, const pair<int, int>& rhs) {
return lhs.first < rhs.first;
}

Check out
<http://ecell.sourceforge.net/doc/refman/html/AssocVector_8h-source.html>
For a class that implements the map interface using a vector as I
describe above. Use it, instead of std::map, in your program and see if
performance improves.

Siemel Naran · Feb 6, 2004

Use a vector< pair< int, int > >. Create a function like:

Do you mean vector<pair<pair<int,int>,double>? That's usually faster than
map for lookup, so it's a good choice when you want to construct a sorted
array once, then use over and over again for lookup and lower bound. In
this case try to insert all the elements in any order at once and call sort.
But inserting one element into the array is expensive because you have to
keep the array in sorted order, and takes O(N). But who knows, it still
might give better performance for this problem.

Daniel T. · Feb 6, 2004

Siemel Naran said:
Do you mean vector<pair<pair<int,int>,double>?

what would the double be for? said:
That's usually faster than
map for lookup, so it's a good choice when you want to construct a sorted
array once, then use over and over again for lookup and lower bound. In
this case try to insert all the elements in any order at once and call sort.
But inserting one element into the array is expensive because you have to
keep the array in sorted order, and takes O(N). But who knows, it still
might give better performance for this problem.

He said that profiling suggested that the slow down was in memory
allocations because the map was constantly being made larger then
smaller. Using the sorted vector would help that particular problem.
With a class that implements the map interface with a sorted vector and
can lazy sort the vector, the OP can easily switch between the two and
see which is better for any particular problem.

Ivan Vecerina · Feb 6, 2004

| I have a routine where I'm using a std::map from a pair of ints to double
| for caching evaluation results.
| The cache depth could be upto say 1000 in depth.
| However, my application needs to clear and replenish the cache repeatedly
| and profiling seems to suggests
| the performance is handicated by repeated calls to new.
| Can anybody advice me on how to improve the performance or suggests
| alternatives please.

As others suggested, some type of hash table is likely to provide
the best results.

Here's a basic example:

double expensiveCompute(int a, int b); // the expensive function

class CachedCompute
{
struct Entry {
Entry(int a_, int b_)
: a(a_), b(b_), result

:expensiveCompute(a_,b_)) {}
int a, b;
double result;
};
enum { kCacheSize = 1024 };
std::vector<Entry> cache_;

static int getHashedIndex(int a, int b)
{ // A LOT OF ROOM FOR IMPROVEMENT HERE
return (17*a + 0x193*b) % kCacheSize;
}

public:
CachedCompute()
: cache_( kCacheSize, Entry(0,0) ) // init cache with dummy contents
{ }

double compute(int a, int b)
{
Entry& entry = cache_[ getHashedIndex(a,b) ];
if( entry.a!=a || entry.b!=b )
entry = Entry(a,b); //overwrite the non-matching entry
return entry.result;
}
};

Create an instance of CachedCompute and use its compute()
member function instead of the global expensiveCompute()
to benefit from the caching.

The getHashedIndex() function should be optimized to avoid
collisions (different values of a&b used repeatedly that
lead to the same index).
An application-specific optimization is sometimes possible.
Alternatively, consider a proven hashing approach such as the
FNV hash: http://www.isthe.com/chongo/tech/comp/fnv/index.html

The table size (kCacheSize) can also be improved (the std::vector
is anyway more memory-efficient than std::map, the saving is
probably about 50%).

Another improvement might be to look at multiple hash table
entries when calling compute (e.g. the 4 following ones).
There are various ways to handle table refreshes in this case...

Anyway, I would expect the snippet above to provide much better
performance than an approach based on std::map ...
But please let me know of your results.

I hope this helps,
Ivan

Kin Pang · Feb 10, 2004

Hi,

This is a distilled version of what I'm trying to do (code snipet guaranteed
to compile). I have a Fn which has an operator that I want to cache.
Fn is used as basis for a functional algebra for more complex function
expressions.
The benchmark for the caching performance is EvalCache defined below.
I'm comparing performance of EvalCacheWithVec with EvalCache using ideas
suggested kindly by Ivan Vecerina.
For my application, I'm finding that EvalCache outperforms EvalCacheWithVec
by a factor of close to two.
In fact EvalCacheWithVec runs quicker with smaller cache sizes.
I don't really want to move the cache outside of the class Fn.
Any suggestions on how I can improve the performance of EvalCacheWithVec
further?
Thanks in advance.

class Fn
{
private:
typedef EvalCache<int,int,double> CACHE;
// typedef EvalCacheWithVec<int,int,double> CACHE;
SharedPtr<CACHE> _fnCache;

SharedPtr<blah> _rep;
public:
double operator()( int date_, int evalDate_ ) const
{
//! check to see whether cached value is valid and use if so.
std:

air<int,int> args=std::make_pair(date_, evalDate_);
const double* cached=_fnCache->lookup(args);
if (cached==0)//! ie invalid
{
double res=(*_rep)(date_, evalDate_);
_fnCache->add(args, res);
return res;
}
else
{
return *cached;
}
}
};

template< typename T1, typename T2, typename R >
class EvalCache
{
private:
typedef std:

air<T1,T2> FNARGS;
FNARGS _args;
R _cached;
bool _valid;

public:
~EvalCache(){}
EvalCache():_valid(false){}

//! return value of 0 indicates that the cached value cannot be used.
inline const R* lookup(const FNARGS& args_)
{
if (_valid==false) return 0;
if (_args==args_) return &_cached;
else return 0;
}

inline void add(const FNARGS& args_, const R& res_)
{
_args = args_;
_cached= res_;
_valid = true;
}

inline void clear(){ _valid=false; }
};

template<typename T1, typename T2, typename R>
class EvalCacheWithVec{
private:
typedef std:

air<T1,T2> FNARGS;
typedef std:

air<FNARGS,R> ENTRY;
std::vector<ENTRY> _cached;
R _cachedVal;

enum{kCacheSize=64};
std::bitset<kCacheSize> _valid;
bool _noneValid;

size_t getHashedIndex(T1 a, T2 b)
{ // A LOT OF ROOM FOR IMPROVEMENT HERE
return (17*a + 0x193*b) % kCacheSize;
}

public:
~EvalCacheWithVec(){}
EvalCacheWithVec()
:
_cached(kCacheSize),
_valid(),
_noneValid(true)
{}

//! return value of 0 indicates that the cached value cannot be used.
const R* lookup(const FNARGS& args_)
{
if (_noneValid==true) return 0;

T1 arg1(args_.first);
T2 arg2(args_.second);
size_t idx = getHashedIndex(arg1, arg2);
if (_valid[idx]==false) return 0;

const ENTRY& entry = _cached[idx];
if (entry.first!=args_) return 0;

_cachedVal=entry.second;
return &_cachedVal;
}

void add(const FNARGS& args_, const R& res_)
{
T1 arg1(args_.first);
T2 arg2(args_.second);
size_t idx = getHashedIndex(arg1, arg2);
_cached[idx]=std::make_pair(args_,res_);
_valid[idx]=true;
_noneValid=false;
}

void clear()
{
_noneValid=true;
_valid.reset();
}

};

Ivan Vecerina said:
| I have a routine where I'm using a std::map from a pair of ints to double
| for caching evaluation results.
| The cache depth could be upto say 1000 in depth.
| However, my application needs to clear and replenish the cache repeatedly
| and profiling seems to suggests
| the performance is handicated by repeated calls to new.
| Can anybody advice me on how to improve the performance or suggests
| alternatives please.

As others suggested, some type of hash table is likely to provide
the best results.

Here's a basic example:

double expensiveCompute(int a, int b); // the expensive function

class CachedCompute
{
struct Entry {
Entry(int a_, int b_)
: a(a_), b(b_), result:expensiveCompute(a_,b_)) {}
int a, b;
double result;
};
enum { kCacheSize = 1024 };
std::vector<Entry> cache_;

static int getHashedIndex(int a, int b)
{ // A LOT OF ROOM FOR IMPROVEMENT HERE
return (17*a + 0x193*b) % kCacheSize;
}

public:
CachedCompute()
: cache_( kCacheSize, Entry(0,0) ) // init cache with dummy contents
{ }

double compute(int a, int b)
{
Entry& entry = cache_[ getHashedIndex(a,b) ];
if( entry.a!=a || entry.b!=b )
entry = Entry(a,b); //overwrite the non-matching entry
return entry.result;
}
};

Create an instance of CachedCompute and use its compute()
member function instead of the global expensiveCompute()
to benefit from the caching.

The getHashedIndex() function should be optimized to avoid
collisions (different values of a&b used repeatedly that
lead to the same index).
An application-specific optimization is sometimes possible.
Alternatively, consider a proven hashing approach such as the
FNV hash: http://www.isthe.com/chongo/tech/comp/fnv/index.html

The table size (kCacheSize) can also be improved (the std::vector
is anyway more memory-efficient than std::map, the saving is
probably about 50%).

Another improvement might be to look at multiple hash table
entries when calling compute (e.g. the 4 following ones).
There are various ways to handle table refreshes in this case...

Anyway, I would expect the snippet above to provide much better
performance than an approach based on std::map ...
But please let me know of your results.

I hope this helps,
Ivan

Ivan Vecerina · Feb 11, 2004

Hi Kin,

This is a distilled version of what I'm trying to do (code snipet guaranteed
to compile).

Actually it won't compile since Fn needs to be defined after EvalCache.
It would also be nice to have a complete program with #include directives
and a main() function - if you would like anyone to actually benchmark it.
Please also keep in mind that the solution that provides optimal
execution speed will depend on both the sequence of call inputs,
and on the computation function that is being cached.

I have a Fn which has an operator that I want to cache.
Fn is used as basis for a functional algebra for more complex function
expressions.
The benchmark for the caching performance is EvalCache defined below.
I'm comparing performance of EvalCacheWithVec with EvalCache using ideas
suggested kindly by Ivan Vecerina.
For my application, I'm finding that EvalCache outperforms EvalCacheWithVec
by a factor of close to two.

EvalCache only caches the single last result that has been computed,
and therefore does less work than EvalCacheWithVec.
This hints at the following problems:
- the function being cached executes quickly, and there is
no benefit in caching.
- the calling code is repeating the evaluation of the same input
parameters several times, then moving on to the next.
- your input sequence creates many collisions with the
current hasing function.

In fact EvalCacheWithVec runs quicker with smaller cache sizes.

This could be a sign that memory access (cache misses) are
limiting performance. It hints again at a cached function
that is executing quickly, and might not need to be cached
(unless input values strongly conflict with the hash function).

Could you provide a more complete code example, and performance
metrics (including a comparison with no use of caching) ?

I don't really want to move the cache outside of the class Fn.
Any suggestions on how I can improve the performance
of EvalCacheWithVec further?

Compared to the code I had posted, two redundancies have
been added in EvalCacheWithVec:
- A flag is being kept (in std::bitset) to indicate wheter
a cache entry is valid or not. Under intensive cache use,
this will slow things down compared to the original approach:
filling the whole cache with a valid input-output entry.
- When a cache entry is updated, the hashed cache index
is being computed twice.

Further performance tuning is very dependent on what you
application specifically does.

Warm regards,
Ivan

Repetitive indexing a std::map with same index	2	Nov 3, 2011
Is there in memory tree structure faster than std::map?	11	Aug 12, 2012
Pruning a std::map efficiently?	14	Apr 1, 2009
std::map element in heap or stack?	4	Dec 8, 2009
std::map, linear interpolation	2	Feb 4, 2008
help for the core dump in ::std::map	3	Nov 18, 2009
Initialize a std::set with keys from a std::map	10	Jan 16, 2008
std::map	5	Jul 5, 2004

std::map efficiency

Kin Pang

Howard Hinnant

Cy Edmunds

Daniel T.

Siemel Naran

Daniel T.

Ivan Vecerina

Kin Pang

Ivan Vecerina

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads