Function decorator that caches function results

  • Thread starter =?ISO-8859-1?Q?Lasse_V=E5gs=E6ther_Karlsen?=
  • Start date
?

=?ISO-8859-1?Q?Lasse_V=E5gs=E6ther_Karlsen?=

After working through a fair number of the challenges at
www.mathschallenge.net, I noticed that some long-running functions can
be helped *a lot* by caching their function results and retrieving from
cache instead of calculating again. This means that often I can use a
natural recursive implementation instead of unwinding the recursive
calls to avoid big exponential running-times.

Ok, so I thought, how about creating a decorator that caches the
function results and retrieves them from cache if possible, otherwise it
calls the function and store the value in the cache for the next invokation.

This is what I came up with so far:

def cache_function(fn):
cache = {}
def cached_result(*args, **kwargs):
if args in cache:
return cache[args]
result = fn(*args, **kwargs)
cache[args] = result
return result
return cached_result

Example of usage:

@cache_function
def fibonacci(idx):
if idx in (1, 2):
return 1
else:
return fibonacci(idx-1) + fibonacci(idx-2)

for index in range(1, 101):
print index, fibonacci(index)

this example goes from taking exponential time to run to being near
instantaneous.

However, this:

for index in range(1, 101):
print index, fibonacci(idx = index)

this one uses the kwargs list of arguments, and I'm not sure how to
change my function to take that into account.

Does anyone have any clues as to how to do that?

The solution would have to consider fibonacci(50) and fibonacci(idx =
50) as the same call and thus retrieve the second one from the cache.
 
D

Duncan Booth

Lasse said:
However, this:

for index in range(1, 101):
print index, fibonacci(idx = index)

this one uses the kwargs list of arguments, and I'm not sure how to
change my function to take that into account.

Does anyone have any clues as to how to do that?

The solution would have to consider fibonacci(50) and fibonacci(idx =
50) as the same call and thus retrieve the second one from the cache.

Most of your gain comes from caching the recursive calls, not from caching
the original call.

If the solution doesn't consider keyword arguments equivalent to positional
arguments, it will make *one* call to the real function, and then the
recursive call will still come out of the cache, so the net result will
still be to give you almost all the speedup you wanted.

Unless the results stored in the cache are very large data structures, I
would suggest that you simply store (args,kwargs) as the cache key and
accept the hit that sometime you'll cache the same call multiple times.
 
J

jepler

Unless the results stored in the cache are very large data structures, I
would suggest that you simply store (args,kwargs) as the cache key and
accept the hit that sometime you'll cache the same call multiple times.

... except that dicts cannot be dict keys

Another 'memoize' decorator uses this to get the key:
kw = kwargs.items()
kw.sort()
key = (args, tuple(kw))
http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/325905

Jeff

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)

iD8DBQFDR9vXJd01MZaTXX0RAq5+AJ4yVo8MbputurPvObHBdturrhZ70gCfUpN2
HxoUEMfpxhkFW3FUDVR+Qkg=
=4UHm
-----END PGP SIGNATURE-----
 
S

Sam Pointon

What about not storing args at all? Something like this:

def cache_function(func, args_list):
cache = {}
def cached_result(*args, **kwargs):
kwargs.update(dict(zip(args_list, args)))
if kwargs in cache:
return cache[kwargs]
result = func(**kwargs)
cache[kwargs] = result
return result
return cached_result

args_list is a list of all the argument names, so that they can be
converted into keyword arguments.
 
?

=?ISO-8859-1?Q?Lasse_V=E5gs=E6ther_Karlsen?=

... except that dicts cannot be dict keys

Another 'memoize' decorator uses this to get the key:
kw = kwargs.items()
kw.sort()
key = (args, tuple(kw))
http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/325905

Jeff

Yeah, but as far as I can see it, this one too fails to recognize
situations where the function is called twice with essentially the same
values, except that in one call it uses named arguments:

k1 = fibonacci(100)
k2 = fibonacci(idx = 100)

this is essentially the same call, except the second one uses a named
argument, which means the function will be invoked and a second cache
entry will be stored.

Granted, not a big problem in most such cases, but here's my augmented
function. Bare in mind that I'm 2 weeks into Python so there's bound to
be room for improvement :)

def cache(fn):
cache = {}
arg_names = inspect.getargspec(fn)[0]
def cached_result(*args, **kwargs):
# If function is called without parameters, call it without
using the cache
if len(args) == 0 and len(kwargs) == 0:
return fn()

# Work out all parameter names and values
values = {}
for i in range(len(args)):
values[arg_names] = args
for key in kwargs:
values[key] = kwargs[key]
key = tuple([(key, value) for (key, value) in
sorted(values.iteritems())])

# Check cache and return cached value if possible
if key in cache:
return cache[key]

# Work out new value, cache it and return it
result = fn(*args, **kwargs)
cache[key] = result
return result

# Return wrapper function
return cached_result
 
?

=?ISO-8859-1?Q?Lasse_V=E5gs=E6ther_Karlsen?=

Sam said:
What about not storing args at all? Something like this:

def cache_function(func, args_list):
cache = {}
def cached_result(*args, **kwargs):
kwargs.update(dict(zip(args_list, args)))
if kwargs in cache:
return cache[kwargs]
result = func(**kwargs)
cache[kwargs] = result
return result
return cached_result

args_list is a list of all the argument names, so that they can be
converted into keyword arguments.

I'll take a look at the zip function, didn't know about that one, but
your example also has the problem that dictionaries can't be used as
dictionary keys, but that can be easily solved. I think I can simplify
my solution with some of yours.

The parameter names can be gotten by doing a inspect.getargspec(fn)[0]
so that can be done by the decorator function.
 
F

Fredrik Lundh

Lasse said:
Yeah, but as far as I can see it, this one too fails to recognize
situations where the function is called twice with essentially the same
values, except that in one call it uses named arguments:

k1 = fibonacci(100)
k2 = fibonacci(idx = 100)

this is essentially the same call, except the second one uses a named
argument, which means the function will be invoked and a second cache
entry will be stored.

whoever writes code like that deserves to be punished.

I'd say this thread points to a misunderstanding of what keyword arguments
are, and how they should be used. the basic rule is that you shouldn't mix and
match; use positional arguments for things that are documented to be positional
parameters, and keyword arguments for keyword parameters. if you don't,
you'll end up with code that depends on implementation details. and that's
never a good idea...

</F>
 
?

=?ISO-8859-1?Q?Lasse_V=E5gs=E6ther_Karlsen?=

<snip>

Ok, here's my updated version:

class cache(object):
def __init__(self, timeout=0):
self.timeout = timeout
self.cache = {}

def __call__(self, fn):
arg_names = inspect.getargspec(fn)[0]
def cached_result(*args, **kwargs):
# Update named arguments with positional argument values
kwargs.update(dict(zip(arg_names, args)))

# Work out key as a tuple of ('argname', value) pairs
key = tuple(sorted(kwargs.items()))

# Check cache and return cached value if possible
if key in self.cache:
(value, last_time) = self.cache[key]
if self.timeout <= 0 or time.time() - last_time <=
self.timeout:
return value

# Work out new value, cache it and return it
result = fn(**kwargs)

self.cache[key] = (result, time.time())
return result

# Return wrapper function
return cached_result

Changed from previous versions:
- converted to class, must use () on decorator now
- added timeout, results will be recalculated when it expires
- timeout=0 means no timeout, results will always be reused
- now handles both positional and keyword arguments
 
?

=?ISO-8859-1?Q?Lasse_V=E5gs=E6ther_Karlsen?=

Fredrik Lundh wrote:
whoever writes code like that deserves to be punished.

I'd say this thread points to a misunderstanding of what keyword arguments
are, and how they should be used. the basic rule is that you shouldn't mix and
match; use positional arguments for things that are documented to be positional
<snip>

I agree completely.
However, it turns out I have to communicate code with and from people
that have a coding standard that dictates using keyword arguments for
all interfaces. Those functions will also benefit from the cache system
as many of them involves database lookups.

In any case, your response gave me another thing that my solution won't
handle so I'm going to just leave it as it is and look at it if I ever
need it for positional arguments, which I honestly don't believe I will.
 
G

George Sakkis

Lasse Vågsæther Karlsen said:
[snip]

Ok, so I thought, how about creating a decorator that caches the
function results and retrieves them from cache if possible, otherwise it
calls the function and store the value in the cache for the next invokation.

[snip]

Cool, you re-invented the memoization pattern:
http://en.wikipedia.org/wiki/Memoization
http://aspn.activestate.com/ASPN/search?query=memoize&x=0&y=0&section=PYTHONCKBK&type=Subsection

Yes, it's kinda discouraging that most interesting ideas have already been conceived, implemented
and used by others...<wink>

George
 
?

=?ISO-8859-1?Q?Lasse_V=E5gs=E6ther_Karlsen?=

George Sakkis wrote:
Cool, you re-invented the memoization pattern:
http://en.wikipedia.org/wiki/Memoization
http://aspn.activestate.com/ASPN/search?query=memoize&x=0&y=0&section=PYTHONCKBK&type=Subsection

Yes, it's kinda discouraging that most interesting ideas have already been conceived, implemented
and used by others...<wink>
<snip>

I know, I've been scouring over the ASPN recipes and digging through
code like there's no tomorrow.

But, I don't view it as pointless even though there are existing
implementations and solutions out there.

First of all, I don't like to stuff a lot of what is obviously library
type of code into a project, unless I can reference a library that got
that function or class or whatnot. Creates a rather big maintenance
nightmare :)

So, I would have to stuff that into a library, which is what I did with
my "own" function (thank you for helping btw). The recipe on ASPN is
probably better than what I got, but... I understand how my solution
came about and what makes it tick.

Secondly, I've been "programming" Python for, what, about 11 days now or
so, so I want to re-implement as much as possibly right now, even to the
point where I create a worse solution than an existing one, as long as
it works for me, just to be able to learn the nuances of Python, because
Python is ... different than what I'm used to.

For instance, in C# and .NET you got attributes, but they don't actually
do anything on their own, in other words you can't tag a method and have
the operation of that method deviate from a similar method without the
attribute, unless you pick one of the attributes the compiler knows
about, so it's just meta-data that sits silent until some other method
goes around to look for it.

In Python I've now learned that a function is just an object like
everything else and I can wrap a new object around it to modify its
behaviour, and I can use the decorator pattern to do it.

I'm probably going to be back here in a few days or even hours with
another "task" where you can probably cough up dozens of existing source
code solutions that I could use.

For instance, there's this thing I've heard of called the "wheel".....

:)
 
P

Paul Rubin

Steven D'Aprano said:
def cache_function(fn):
cache = {}
def cached_result(*args, **kwargs):
if args in cache:
return cache[args]
result = fn(*args, **kwargs)
cache[args] = result
return result
return cached_result

I'm curious... where does cache live after you use cache_function to
memoize some function? It doesn't appear to be an attribute of the newly
memoized function, nor does it look like a global variable.

It's in the closure returned by cache_function. cached_result doesn't
change the value of 'cache' (it only changes the boxed content),
cached_result finds it in the outer scope, so you can get away with
that in Python. You couldn't do something like:

def counter():
n = 0
def k():
n += 1 # rebinds n, so Python thinks n is local, oops
return n
return k

Since k changes the value of n, Python thinks n is local to k, and
you get a NameError when you try to increment it the first time.
That you can't set the value of a closure variable is something of a
Python wart and is one of the things that makes me want a "local"
declaration in Python.

Closures are a Scheme idiom and Pythonistas tend to use class
instances instead, but both techniques are useful.
 
S

Steven D'Aprano

Ok, so I thought, how about creating a decorator that caches the
function results and retrieves them from cache if possible, otherwise it
calls the function and store the value in the cache for the next invokation.

This is what I came up with so far:

def cache_function(fn):
cache = {}
def cached_result(*args, **kwargs):
if args in cache:
return cache[args]
result = fn(*args, **kwargs)
cache[args] = result
return result
return cached_result

I'm curious... where does cache live after you use cache_function to
memoize some function? It doesn't appear to be an attribute of the newly
memoized function, nor does it look like a global variable.


The solution would have to consider fibonacci(50) and fibonacci(idx =
50) as the same call and thus retrieve the second one from the cache.

In general, you probably can't do that without lots and lots of really
clever code searching through the function argument lists. If you know
that your function will always have the same keyword, then you can do
something like this:

if args in cache:
return cache[args]
elif kwargs.keys() == "idx":
return cache[kwargs["idx"]]

but that's a special case and you really don't want to do it *wink*

Otherwise, just accept that you may be caching multiple copies of the same
data, and do something like this:

if args in cache:
return cache[args]
elif kwargs.items() in cache:
return cache[kwargs.items()]

with the appropriate changes to the rest of the code.

You may also find that you could get a slight performance increase by
using the idiom

try:
return cache[args]
except:
# calculate the result and store it in the cache.

instead of testing for membership. As always, timing some test functions
is your friend...
 
D

Diez B. Roggisch

def cache_function(fn):
cache = {}
def cached_result(*args, **kwargs):
if args in cache:
return cache[args]
result = fn(*args, **kwargs)
cache[args] = result
return result
return cached_result


I'm curious... where does cache live after you use cache_function to
memoize some function? It doesn't appear to be an attribute of the newly
memoized function, nor does it look like a global variable.

It's part of the closure - as well as fn, btw. So it is
per-decorated-function.

Diez
 
S

Steven D'Aprano

It's in the closure returned by cache_function.

What's a closure?

Googling on define:closure isn't very useful. There are a *lot* of
different meanings for closure, none of them to do with programming.

No, wait, I tell a lie -- Wikipedia to the rescue again:

http://en.wikipedia.org/wiki/Closure_(programming)

I notice that type(some_closure) and type(ordinary_function) both return
the same result, <type 'function'>. I also notice that both the closure
and ordinary functions have an attribute "func_closure". Is there a way to
tell which is a closure and which is not? In my tests, the ordinary
function has func_closure == None, but I don't know if that will always be
the case of just for my tests.

If I wanted to inspect the value of cache, where would I find it? In other
words, if I say some_closure.SOMETHING I would get cache... what should
SOMETHING be set to?

That assumes that cache can be seen in this way. If it can't, is this a
way to create "really private" variables in Python?
 
F

Fredrik Lundh

Steven said:
I notice that type(some_closure) and type(ordinary_function) both return
the same result, <type 'function'>. I also notice that both the closure
and ordinary functions have an attribute "func_closure". Is there a way to
tell which is a closure and which is not? In my tests, the ordinary
function has func_closure == None, but I don't know if that will always be
the case of just for my tests.

closure != function.

(did you read the wikipedia page? the closure is a combination of the function
code itself, the information needed to call it, and the context it was defined in.
in Python, the full closure includes stuff you can reach via assorted attributes
on the function object; most notably func_closure and func_globals)
If I wanted to inspect the value of cache, where would I find it?

nowhere. at least no officially; see "Rebinding names in enclosing
scopes" in the design document for a discussion:

http://www.python.org/peps/pep-0227.html
That assumes that cache can be seen in this way. If it can't, is this a
way to create "really private" variables in Python?

no, because Python doesn't prevent you from digging into the
internals:

http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/440515

</F>
 
D

Diez B. Roggisch

If you create a closure, using a memoization technique as per the original
post, and then call type() on that closure, Python reports <type 'function'>.

Because it is. The closure is only sort of an extension to the locals()
available to that function. Not more, not less.
If you use dir() on that closure-that-Python-calls-a-function, it tells
you that there is an attribute "func_closure". But ordinary functions that
aren't closures also have that attribute.

Because there is no difference between them - a function _can_ have a
closure, the same way a HttpRequest _can_ have POST-data or not. That
doesn't make it different.

You are of course right that one _could_ have implemented this with
different classes. But as it happens, it isn't.
According to my tests, ordinary functions have None as the func_closure
attribute, but I don't know if that will always be the case, or just the
few examples I tested it. With a sample size of three, I have no
confidence that I've found a bullet-proof test.

See this small testscript how to expose the closures content -- however
I'm not familiar with the mechanics the bind "some_variable" to the
cell-objects content. But actually I don't care...

def closure_test():
some_variable = {}
print "%x" % id(some_variable)
def get(x):
return some_variable[x]
def set(x,v):
some_variable[x] = v
return get, set



g, s = closure_test()

s(10, 20)
print g(10)
print s.func_closure[0]

Diez
 
S

Steven D'Aprano

closure != function.

(did you read the wikipedia page?

Yes I did. Did you read my post?

If you create a closure, using a memoization technique as per the original
post, and then call type() on that closure, Python reports <type 'function'>.

If you use dir() on that closure-that-Python-calls-a-function, it tells
you that there is an attribute "func_closure". But ordinary functions that
aren't closures also have that attribute.

According to my tests, ordinary functions have None as the func_closure
attribute, but I don't know if that will always be the case, or just the
few examples I tested it. With a sample size of three, I have no
confidence that I've found a bullet-proof test.


Other that that, thank you for the links.
 
F

Fredrik Lundh

Steven said:
Yes I did. Did you read my post?

given that the wikipedia page says

"a closure is an abstraction representing a function, plus the
lexical environment (see static scoping) in which the function
was created."

and you're still focussing on type(function), it sure looks as if you missed
certain parts of that explanation.

let's take it again, with emphasis on some important words:

"a closure is an ABSTRACTION representing a function, PLUS the
lexical ENVIRONMENT (see static scoping) in which the function
was created."
If you create a closure, using a memoization technique as per the original
post, and then call type() on that closure, Python reports <type 'function'>.

that's because "closure" is an abstract concept. there is no "closure" object
in Python, just as there is no "program" object. function objects always con-
tain all the information they need about their context, and the sum of that is
what forms the closure.
If you use dir() on that closure-that-Python-calls-a-function, it tells
you that there is an attribute "func_closure". But ordinary functions that
aren't closures also have that attribute.

"ordinary functions" also have closures: the global context is also part of
the function's environment, and therefore part of the closure (in Python,
it can be argued that all attributes of the function object are part of the
closure)

</F>
 
R

Ron Adam

Steven said:
Ok, so I thought, how about creating a decorator that caches the
function results and retrieves them from cache if possible, otherwise it
calls the function and store the value in the cache for the next invokation.

This is what I came up with so far:

def cache_function(fn):
cache = {}
def cached_result(*args, **kwargs):
if args in cache:
return cache[args]
result = fn(*args, **kwargs)
cache[args] = result
return result
return cached_result


I'm curious... where does cache live after you use cache_function to
memoize some function? It doesn't appear to be an attribute of the newly
memoized function, nor does it look like a global variable.

If I understand this correctly...

if args in cache:

Creates a local binding to the "cache" object that gets returned with
the cashed_result function the same as...

result = fn(*args, **kwargs)

... the function 'fn' does here. So they remain local bindings to
objects in the scope they were first referenced from even after the
function is returned. In effect, 'cache' and 'fn' are replaced by the
objects they reference before the cached_result function is returned.

Is this correct?

Cheers,
Ron
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,764
Messages
2,569,566
Members
45,041
Latest member
RomeoFarnh

Latest Threads

Top