Generic function Memo-izer template class

R

rep_movsd

Hi folks

I was on topcoder and came across an interesting problem...

It involved dynamic programming ( storing function results in a map to
avoid repeated computation ) and I ended up having to write 3 functions
that looked pretty much the same, so I thought- "Why not template - ize
the whole thing" and ended up with a generic Memoizer template, which
can be used to memoize any recursive function whose result depends on
multiple recursive invocations of itself.

Most basic case of such a fn is fibonacci

int fib(int n)
{
return n < 2 ? n : fib(n-1) + fib(n -2);
}

this highly stack consuming function will take eternity to compute
fib(50)

a standard memoization would be like

#include <map>
using std::map;
map<int, int> fib_memo;
int fibm(int n)
{
map<int,int>::iterator fi = fib_memo.find(n);
if(fi == fib_memo.end())
{
return (fib_memo[n] = fibm(n - 1) + fibm(n - 2));
}
else
return fi->second;
}

but its tedious to do this for every function... what i came up with
lets u express the core computation clearly in the form of helper
classes and get the template class to memoize it for you automatically
thusly :

// define 3 classes needed by the template
class FibCalc
{
public:
void apply(int a, pair<bool, int>& ret)
{
ret.first = a < 2; // fib 0 , 1 => 0, 1
ret.second = a;
}
};

class FibGetNextIterArgs
{
public:
void apply(int arg, vector<int>& rets)
{
rets.push_back(arg - 1);
rets.push_back(arg - 2);
}
};

class FibCalcNextIter
{
public:
int apply(vector<int>& rets)
{
return rets[0] + rets[1];
}
};

and call it like this:

Memoize<int, int, FibCalc, FibGetNextIterArgs, FibCalcNextIter> fibo;
int i = 1000;
cout << fibo.apply(i);

the Memoize definition is below:

template<typename RET, typename ARG, class CALC, class GETNEXTITERARGS,
class CALCNEXTITER>
class Memoize
{
public:
CALC m_Calc;
GETNEXTITERARGS m_NextIterArgs;
CALCNEXTITER m_CalcNextIter;
map<ARG, RET> m_Memo;
RET apply(ARG &a)
{
pair<bool, RET> r;
m_Calc.apply(a, r);
if(r.first)
return (m_Memo[a] = r.second);

map<ARG, RET>::iterator mi = m_Memo.find(a);
if(mi != m_Memo.end())
{
return m_Memo[a];
}
else
{
vector<ARG> nextIterArgs;
m_NextIterArgs.apply(a, nextIterArgs);

vector<RET> nextIterRets;
nextIterRets.reserve(nextIterArgs.size());
for(size_t i = 0; i < nextIterArgs.size(); ++i)
{
nextIterRets.push_back(apply(nextIterArgs));
}

return (m_Memo[a] = m_CalcNextIter.apply(nextIterRets));
}
}
};


Voila ... Auto memoization for functions :D ( I beleive haskell has
that )

If any of you gurus have an idea about improving this tell me.

One immediate thing I can see is to have 1 class instead of 3 and have
3 members in it, maybe make it a struct to get rid of that "public:"
I'm adding it to my topcoder bag of tricks...

Thanks
Vivek
 
A

Alan Johnson

rep_movsd said:
Hi folks

I was on topcoder and came across an interesting problem...

It involved dynamic programming ( storing function results in a map to
avoid repeated computation ) and I ended up having to write 3 functions
that looked pretty much the same, so I thought- "Why not template - ize
the whole thing" and ended up with a generic Memoizer template, which
can be used to memoize any recursive function whose result depends on
multiple recursive invocations of itself.

Most basic case of such a fn is fibonacci

int fib(int n)
{
return n < 2 ? n : fib(n-1) + fib(n -2);
}

this highly stack consuming function will take eternity to compute
fib(50)

a standard memoization would be like

#include <map>
using std::map;
map<int, int> fib_memo;
int fibm(int n)
{
map<int,int>::iterator fi = fib_memo.find(n);
if(fi == fib_memo.end())
{
return (fib_memo[n] = fibm(n - 1) + fibm(n - 2));
}
else
return fi->second;
}

but its tedious to do this for every function... what i came up with
lets u express the core computation clearly in the form of helper
classes and get the template class to memoize it for you automatically
thusly :

// define 3 classes needed by the template
class FibCalc
{
public:
void apply(int a, pair<bool, int>& ret)
{
ret.first = a < 2; // fib 0 , 1 => 0, 1
ret.second = a;
}
};

class FibGetNextIterArgs
{
public:
void apply(int arg, vector<int>& rets)
{
rets.push_back(arg - 1);
rets.push_back(arg - 2);
}
};

class FibCalcNextIter
{
public:
int apply(vector<int>& rets)
{
return rets[0] + rets[1];
}
};

and call it like this:

Memoize<int, int, FibCalc, FibGetNextIterArgs, FibCalcNextIter> fibo;
int i = 1000;
cout << fibo.apply(i);

the Memoize definition is below:

template<typename RET, typename ARG, class CALC, class GETNEXTITERARGS,
class CALCNEXTITER>
class Memoize
{
public:
CALC m_Calc;
GETNEXTITERARGS m_NextIterArgs;
CALCNEXTITER m_CalcNextIter;
map<ARG, RET> m_Memo;
RET apply(ARG &a)
{
pair<bool, RET> r;
m_Calc.apply(a, r);
if(r.first)
return (m_Memo[a] = r.second);

map<ARG, RET>::iterator mi = m_Memo.find(a);
if(mi != m_Memo.end())
{
return m_Memo[a];
}
else
{
vector<ARG> nextIterArgs;
m_NextIterArgs.apply(a, nextIterArgs);

vector<RET> nextIterRets;
nextIterRets.reserve(nextIterArgs.size());
for(size_t i = 0; i < nextIterArgs.size(); ++i)
{
nextIterRets.push_back(apply(nextIterArgs));
}

return (m_Memo[a] = m_CalcNextIter.apply(nextIterRets));
}
}
};


Voila ... Auto memoization for functions :D ( I beleive haskell has
that )

If any of you gurus have an idea about improving this tell me.

One immediate thing I can see is to have 1 class instead of 3 and have
3 members in it, maybe make it a struct to get rid of that "public:"
I'm adding it to my topcoder bag of tricks...

Thanks
Vivek


Using map to store the history bothers me a little, just because it
makes the runtime different (worse) than what many would expect for
some algorithms. Using Fibonacci as an example, you should be able to
calculate each successive member of the sequence in constant time
(assuming constant time operations on integers), so computing the first
N members of the sequence should require O(N) time.

Storing the results in a map, though, only gives you a guarantee that
you'll be able to lookup elements in O(lg N) time, so computing the
first N members of the sequence has time in O(N lg N).

I assume you chose a map to conserve space in algorithms where not
every index is used. I would suggest somehow expanding your template
to allow the client to decide the space/speed trade off, perhaps by
supplying a container type as a template parameter.
 
R

rep_movsd

Alan said:
Using map to store the history bothers me a little, just because it
makes the runtime different (worse) than what many would expect for
some algorithms. Using Fibonacci as an example, you should be able to
calculate each successive member of the sequence in constant time
(assuming constant time operations on integers), so computing the first
N members of the sequence should require O(N) time.

While a tail recursive or iterative fibonacci can be written to run in
O(n) time, that is not the point of this.

The point is that there are some algorithms which are much easily
expressed in a recursive fashion and this technique helps to express
the solution in a recursive paradigm and still get solutions which do
not have exponential space and time requirements.

In such cases, the invocation of the function with argument x would
depend on N number of recursive invocations with x1 to xN.
In such situations, memoization is the only way to beat the clock.
Haskell is one language where functions are memoized automatically.

Take for example this function to calculate the number of possible k
digit numbers which consist of the digits 1 to n and where the digits
are all in non-ascending order :

eg for k = 2 , n = 3

11
21
22
31
32
33

are all non ascending.


Heres the raw recursive implementation

int getNonDescCombos(int k, int n)
{
if(k == 1)
return n;

int total = 0;
for(int i = 1; i <= n; ++i)
{
total += getNonDescCombos(k - 1, i);
}
return total;
}

While its working is clear, No prizes for guessing that this will
overflow the stack for even modest values of n and k.

now look at the memoized version :

map<pair<int, int>, int> nonDescCombos;
int getNonDescCombos(pair<int, int> arg)
{
int k = arg.first;
int n = arg.second;
if(k == 1)
return n;

map<pair<int, int>, int>::iterator mi = nonDescCombos.find(arg);
if(mi == nonDescCombos.end())
{
int total = 0;
for(int i = 1; i <= n; ++i)
{
total += getNonDescCombos(make_pair(k - 1, i));
}
return (nonDescCombos[arg] = total);
}
else
{
return mi->second;
}
}

getNonDescCombos( make_pair(100, 9) );
Runs with good space and time charecteristics, but is much more
complicated and the logic is more opaque.

now generalized with memoizer template...

class GetDescCombosCalc
{
public:
void apply(pair<int, int> &p, pair<bool, int>& ret)
{
ret.first = p.second == 1;
ret.second = p.first;
}
};

class GetDescCombosGetNextIterArgs
{
public:
void apply(pair<int, int>& arg, vector<pair<int, int> >& rets)
{
for(int i = 1; i <= arg.first; ++i)
{
rets.push_back(make_pair(i, arg.second - 1));
}
}
};

class GetDescCombosCalcNextIter
{
public:
int apply(vector<int>& rets)
{
int total = 0;
for(size_t i = 0; i < rets.size(); ++i)
{
total += rets;
}

return total;
}
};

Definitely more verbose, but the logic of the recursive version is
preserved...
the first class does the job of calculating the simplest cases..
the second gives a collection of values (x1 to xN) for which the
function has to be computed before computing f(x).
The last one computes f(x) given f(x1) to f(xN) computed from the
previous results.
A large number of algorithms can readily be genralized to this pattern
with clear logic.

Finally the invocation becomes as simple as....

Memoize<int, pair<int, int>, GetDescCombosCalc,
GetDescCombosGetNextIterArgs, GetDescCombosCalcNextIter> m;
int n = 100;
int k = 29;
m.apply(make_pair(k, n));
Storing the results in a map, though, only gives you a guarantee that
you'll be able to lookup elements in O(lg N) time, so computing the
first N members of the sequence has time in O(N lg N).
I assume you chose a map to conserve space in algorithms where not
every index is used. I would suggest somehow expanding your template
to allow the client to decide the space/speed trade off, perhaps by
supplying a container type as a template parameter.

One option would be to use a hash_map maybe, and anyway the Memoize
class is opaque so changing the implementation would make no difference
to the code for the computation, as long as the logic were same the
intermediate results could be stored any which way.

Regards
Vivek
 
G

Greg Buchholz

rep_movsd said:
Hi folks

I was on topcoder and came across an interesting problem...

It involved dynamic programming ( storing function results in a map to
avoid repeated computation ) and I ended up having to write 3 functions
that looked pretty much the same, so I thought- "Why not template - ize
the whole thing" and ended up with a generic Memoizer template, which
can be used to memoize any recursive function whose result depends on
multiple recursive invocations of itself.

// You might also like to try something like a memozied fixed point
// combinator. ( http://en.wikipedia.org/wiki/Y_combinator )

#include<iostream>
#include<map>

std::map<int,int> memo_table;

struct Fix
{
int (*f)(int,Fix);

int operator()(int x, Fix g)
{
return (memo_table.find(x) != memo_table.end())
? memo_table[x]
: memo_table[x] = g.f(x,g);
};
};

int fib(int n, Fix f) { return n < 2 ? n : f(n-1,f) + f(n-2,f); }

int main(int argc, char *argv[])
{
Fix g = {fib};
std::cout << g(40,g) << std::endl;
return 0;
}
 
A

Alf P. Steinbach

* Greg Buchholz:
#include<iostream>
#include<map>

std::map<int,int> memo_table;

struct Fix
{
int (*f)(int,Fix);

int operator()(int x, Fix g)
{
return (memo_table.find(x) != memo_table.end())
? memo_table[x]
: memo_table[x] = g.f(x,g);
};
};

int fib(int n, Fix f) { return n < 2 ? n : f(n-1,f) + f(n-2,f); }

int main(int argc, char *argv[])
{
Fix g = {fib};
std::cout << g(40,g) << std::endl;
return 0;
}

What's the point of the argument passing when Fix uses a global table?
Fix is just a bug waiting to happen. E.g., after using Fix g = {fib},
try using Fix h = {identityFunction}.

And what's the point of caching (which for some reason a lot of people
now call "memoization") when that involves completely rewriting the
function, which can be much more easily rewritten to be efficient?

Finally, what's the point of filling in the cache at run-time when that
will result in all values up to (n, f(n)) being placed in the cache?
 
R

rep_movsd

What's the point of the argument passing when Fix uses a global table?
Fix is just a bug waiting to happen. E.g., after using Fix g = {fib},
try using Fix h = {identityFunction}.

And what's the point of caching (which for some reason a lot of people
now call "memoization") when that involves completely rewriting the
function, which can be much more easily rewritten to be efficient?

Well you cannot express some algorithms iteratively no matter how
cleverly you code, you will end up using an explicit stack.
Finally, what's the point of filling in the cache at run-time when that
will result in all values up to (n, f(n)) being placed in the cache?

Well the whole point of memoization is that it forms the base of
"dynamic" programming problems where divide-and-conquer changes to
divide-and-conquer-memoize-to-avoid-reconquer.

I humbly suggest http://en.wikipedia.org/wiki/Dynamic_programming

Regards
Vivek
 
A

Alf P. Steinbach

* rep_movsd:
Well you cannot express some algorithms iteratively no matter how
cleverly you code, you will end up using an explicit stack.
So?



Well the whole point of memoization is that it forms the base of
"dynamic" programming problems where divide-and-conquer changes to
divide-and-conquer-memoize-to-avoid-reconquer.

Again, so? AFAICS, again there's no relevance to my question.


Well.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,012
Latest member
RoxanneDzm

Latest Threads

Top