Memoization and encapsulation

S

Steven D'Aprano

I was playing around with simple memoization and came up with something
like this:

_cache = {}
def func(x):
global _cache
if _cache.has_key(x):
return _cache[x]
else:
result = x+1 # or a time consuming calculation...
_cache[x] = result
return result

when it hit me if I could somehow bind the cache to the function, I could
get rid of that pesky global variable.

I tried this:
.... try:
.... func.cache
.... except AttributeError:
.... func.cache = {}
.... if func.cache.has_key(x):
.... return func.cache[x]
.... else:
.... result = x + 1
.... func.cache[x] = result
.... return result

and it works as expected, but it lacks elegance.

Instead of using a function, I can also use a new-style class as if it
were a function:
.... cache = {}
.... def __new__(self, x):
.... if self.cache.has_key(x):
.... return self.cache[x]
.... else:
.... result = x+1
.... self.cache[x] = result
.... return result

and again it works, but I can't help feeling it is an abuse of the class
mechanism to do this.

What do folks think? Is there a better way?
 
B

bonono

Steven said:
I was playing around with simple memoization and came up with something
like this:

_cache = {}
def func(x):
global _cache
if _cache.has_key(x):
return _cache[x]
else:
result = x+1 # or a time consuming calculation...
_cache[x] = result
return result

when it hit me if I could somehow bind the cache to the function, I could
get rid of that pesky global variable.
what would be the problem of the following, other than exposing the
cache which the caller may override ?

def func(x,_cache={}):
 
R

Raymond Hettinger

Steven said:
I was playing around with simple memoization and came up with something
like this:

_cache = {}
def func(x):
global _cache
if _cache.has_key(x):
return _cache[x]
else:
result = x+1 # or a time consuming calculation...
_cache[x] = result
return result

when it hit me if I could somehow bind the cache to the function, I could
get rid of that pesky global variable.

Try something like this:

def func(x, _cache={}):
if x in cache:
return cache[x]
result = x + 1 # or a time consuming function
return result

* If cache hits are the norm, then a try/except version would be a bit
faster.

* In your original code, the global declaration wasn't needed because
the assigment to _cache wasn't changed, rather its contents were.



Raymond
 
R

Raymond Hettinger

Steven said:
I was playing around with simple memoization and came up with something
like this:

_cache = {}
def func(x):
global _cache
if _cache.has_key(x):
return _cache[x]
else:
result = x+1 # or a time consuming calculation...
_cache[x] = result
return result

when it hit me if I could somehow bind the cache to the function, I could
get rid of that pesky global variable.

Try something like this:

def func(x, _cache={}):
if x in _cache:
return _cache[x]
result = x + 1 # or a time consuming function
_cache[x] = result
return result

* If cache hits are the norm, then a try/except version would be a bit
faster.

* In your original code, the global declaration wasn't needed because
the assigment to _cache wasn't changed, rather its contents were.



Raymond
 
J

Just

Steven D'Aprano said:
I was playing around with simple memoization and came up with something
like this:

_cache = {}
def func(x):
global _cache

There's no need to declare _cache as global, since you're not assigning
to it. So this global isn't all that pesky after all...
if _cache.has_key(x):
return _cache[x]
else:
result = x+1 # or a time consuming calculation...
_cache[x] = result
return result

when it hit me if I could somehow bind the cache to the function, I could
get rid of that pesky global variable. [ ... ]
What do folks think? Is there a better way?

I actually prefer such a global variable to the default arg trick. The
idiom I generally use is:

_cache = {}
def func(x):
result = _cache.get(x)
if result is None:
result = x + 1 # or a time consuming calculation...
_cache[x] = result
return result

Just
 
S

skip

just> I actually prefer such a global variable to the default arg
just> trick. The idiom I generally use is:

just> _cache = {}
just> def func(x):
just> result = _cache.get(x)
just> if result is None:
just> result = x + 1 # or a time consuming calculation...
just> _cache[x] = result
just> return result

None of the responses I've seen mention the use of decorators such as the
one shown here:

http://wiki.python.org/moin/PythonDecoratorLibrary

While wrapping one function in another is obviously a bit slower, you can
memoize any function without tweaking its source.

Skip
 
S

Steven D'Aprano

Steven said:
I was playing around with simple memoization and came up with something
like this:
[snip]

Try something like this:

def func(x, _cache={}):
if x in cache:
return cache[x]
result = x + 1 # or a time consuming function
return result

Of course. The most obvious thing is the least obvious :)

* If cache hits are the norm, then a try/except version would be a bit
faster.

* In your original code, the global declaration wasn't needed because
the assigment to _cache wasn't changed, rather its contents were.

Sure, it isn't needed, but I still like to make it explicit. Call it a
foible of mine :)
 
S

Steven D'Aprano

There's no need to declare _cache as global, since you're not assigning
to it. So this global isn't all that pesky after all...

It is still a global variable, with all the potential Badness thereof,
even if you don't have to declare it.
 
T

Tom Anderson

just> I actually prefer such a global variable to the default arg
just> trick. The idiom I generally use is:

just> _cache = {}
just> def func(x):
just> result = _cache.get(x)
just> if result is None:
just> result = x + 1 # or a time consuming calculation...
just> _cache[x] = result
just> return result

None of the responses I've seen mention the use of decorators such as the
one shown here:

http://wiki.python.org/moin/PythonDecoratorLibrary

While wrapping one function in another is obviously a bit slower, you can
memoize any function without tweaking its source.

I'd definitely say this is the way to go.

def memoised(fn):
cache = {}
def memoised_fn(*args):
if args in cache:
return cache[args]
else:
rtn = fn(*args)
cache[args] = rtn
return rtn
return memoised_fn

@memoised
def func(x):
return x + 1 # or a time-consuming calculation

tom
 
D

drewp

But while we're at it, how about that unhashable object problem?

@memoised
def func(x, i):
return x

L = [1,2,3]
print func(L, 1) # TypeError: list objects are unhashable

What's the deal? My func can certainly be memoized (but possibly with a
slower lookup depending on how many args are "unhashable"). In fact,
list objects *could* be hashed:

class HashableList(list):
def __hash__(self):
return 0

L = HashableList([1,2,3])
print func(L, 1)

That's both correct and useful. I've written fancier memoize()
implementations in the past that added a __hash__ to incoming objects
that didn't have any, although handling lists would be a tricky case.
In my cases, most arguments could be hashed and the values that
couldn't were part of very small sets, so I actually had very few hash
table collisions.

I think python is broken here-- why aren't lists hashable, or why isn't
there a straightforward way to make memoised() work?
 
T

Tom Anderson

I think python is broken here-- why aren't lists hashable, or why isn't
there a straightforward way to make memoised() work?

a = [1, 2, 3]
d = {a: "foo"}
a[0] = 0
print d[a]

I feel your pain, but i don't think lists (and mutable objects generally)
being unhashable is brokenness. I do think there's room for a range of
opinion, though, and i'm not sure what i think is right.

tom
 
M

Mike Meyer

class HashableList(list):
def __hash__(self):
return 0

This will suck for performance if you put a lot of lists in the same
dictionary. Faster is:

class FastHashableList(list):
def __hash__(self):
return id(self)
I think python is broken here-- why aren't lists hashable, or why isn't
there a straightforward way to make memoised() work?

There isn't a straightforward way to make memoised work because -
well, it's not a straightforward thing to make work.

For instance, consider the two hashable list implementations
above. Consider this:

@memoised
def broken(l):
return len(l) + 1

x = HL([1, 2, 3]) # Or FHL
print broken(x)
x.append(4)
print broken(x)

Will your memoised handle this case correctly?

Which is also why lists aren't hashable - there's no good definition
of the hash of a list that isn't broken for some common use case.

<mike
 
D

drewp

Faster is:
class FastHashableList(list):
def __hash__(self):
return id(self)

But that's wrong. Two equal objects must hash to the same value, and
you're not guaranteeing that at all. My degenerate __hash__ does
guarantee that, and I still claim the performance impact from
collisions will actually be very low if each unhashable argument is not
too diverse, which was the common scenario for me. That is, my Foo
objects were unhashable, but there were only like two Foo objects in
the program that might be passed to the memoized function (but with
many possible combinations of additional float-type arguments).
There isn't a straightforward way to make memoised work because -
well, it's not a straightforward thing to make work.

Seems straightforward to me, as long as equality is defined for the
argument types. A hash that spreads the argument values across a hash
table is a fine optimization, but I don't think a memoizer needs to
have optimal performance in that area to be useful. In many of the
cases I have come across, even the dumbest list search (i.e. what you
get if all args always hash to 0) will be much faster than actually
rerunning the original function.
x = HL([1, 2, 3]) # Or FHL
print broken(x)
x.append(4)
print broken(x)

Will your memoised handle this case correctly?

For that one to work my fancy memoizer would have to make a copy of the
arguments. The comparisons will work fine, though. [1,2,3] !=
[1,2,3,4].
Which is also why lists aren't hashable - there's no good definition
of the hash of a list that isn't broken for some common use case.

Only if broken means 'slow'. My hunch is that lists weren't made
hashable because people would naively use them heavily as dict keys and
suffer poorer performance than if they used tuples.

I would still like to see a well-done memoizer that doesn't suffer from
the unhashable bug and maybe has a way to avoid the mutable object bug
that you presented in your last post.

The version I keep mentioning as "my fancy memoizer" is actually just a
hacked one that adds a __hash__ to the first argument, since that's all
I needed at the time.
 
M

Mike Meyer

But that's wrong.

You're right - I should have fixed equality while I was at it. The
results may not be what you had in mind, but it's a perfectly
reasonable use case.
There isn't a straightforward way to make memoised work because -
well, it's not a straightforward thing to make work.
Seems straightforward to me, as long as equality is defined for the
argument types. A hash that spreads the argument values across a hash
table is a fine optimization, but I don't think a memoizer needs to
have optimal performance in that area to be useful. In many of the
cases I have come across, even the dumbest list search (i.e. what you
get if all args always hash to 0) will be much faster than actually
rerunning the original function.
x = HL([1, 2, 3]) # Or FHL
print broken(x)
x.append(4)
print broken(x)

Will your memoised handle this case correctly?

For that one to work my fancy memoizer would have to make a copy of the
arguments. The comparisons will work fine, though. [1,2,3] !=
[1,2,3,4].

Right - you need to make a copy. There are three things to consider
here:

1) the copy doesn't have to be a list. You can create a tuple of the
list and use that, and get the same effect without having create a
hashable list type.

2) You need to apply this hack for *every* mutable type that you want
to handle as an argument.

3) There will still be cases where it's broken:

@memoised
def still_broken_after_all_these_years(x):
return id(x) + 1

x = [1, 2, 3]
y = [1, 2, 3]

assert still_broken_after_all_these_years(x) != still_broken_after_all_these_ears(y)
Only if broken means 'slow'. My hunch is that lists weren't made
hashable because people would naively use them heavily as dict keys and
suffer poorer performance than if they used tuples.

You're only looking at *your* use case. The definition you used isn't
broken for that, just slow. For *other* use cases, it's broken. There
are cases where you want a list to be considered equal to itself as it
changes over time.
I would still like to see a well-done memoizer that doesn't suffer from
the unhashable bug and maybe has a way to avoid the mutable object bug
that you presented in your last post.

Like I said, I don't think it's possible to do it in a way that
"works" for all possible functions. For instance, you could solve the
two bugs you mention in one fell swoop, by declaring that you don't
memoize calls that have mutable arguments, but call the function every
time for them. That's really the only solution that's going to work
for all functions. Except that "hashable" and "immutable" aren't
synonyms; there are hashable builtins that are mutable, so there are
arguments that will be memoized incorrectly.

<mike
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,904
Latest member
HealthyVisionsCBDPrice

Latest Threads

Top