weird dict problem, how can this even happen?

Joel Hedlund · Dec 15, 2008

I'm having a very hard time explaining why this snippet *sometimes*
raises KeyError:

snippet:

print type(self.pool)
for frag in self.pool.keys():
if frag is fragment_info:
print "the fragment_info *is* in the pool", hash(frag), hash(fragment_info), hash(frag) == hash(fragment_info), frag == fragment_info, frag in self.pool, frag in self.pool.keys()
try:
renderer_index = self.pool.pop(fragment_info)
except KeyError:
print "Glorious KeyError!"
for frag in self.pool.keys():
if frag is fragment_info:
print "the fragment_info *is* in the pool", hash(frag), hash(fragment_info), hash(frag) == hash(fragment_info), frag == fragment_info, frag in self.pool, frag in self.pool.keys()
raise

output:
<type 'dict'>
the fragment_info *is* in the pool 987212075 987212075 True True False True
Glorious KeyError!
the fragment_info *is* in the pool 987212075 987212075 True True False True
Traceback (most recent call last):
File "/home/yohell/workspace/missy/core/gui.py", line 92, in process_job
renderer_index = self.pool.pop(fragment_info)
KeyError: <core.gui.FragmentInfo object at 0x8fc906c>

This snippet is part of a much larger gtk program, and the problem only
from time to time, predominantly when the cpu is under heavy load and
this method gets called a lot. If I didn't know better I'd say it's a
bug in python's dict implementation, but I do know better, so I know
it's far more likely that I've made a mistake somewhere. I'll be damned
if I can figure out what and where though. I've reproduced this bug (?)
with python-2.5.2 on Ubuntu 8.10 and python-2.5.1 on WinXP.

I would very much like an explanation to this that does not involve
threads, because I haven't made any that I'm aware of. I can't even
understand how this could happen. How do I even debug this?

Please help, I feel like I've taken crazy pills here!
/Joel Hedlund

Joel Hedlund · Dec 16, 2008

Duncan said:
It could happen quite easily if the hash value of the object has changed
since it was put in the dictionary. what does the definition of your
core.gui.FragmentInfo object look like?

Dunno if it'll help much, but:

class FragmentInfo(object):
def __init__(self, renderer, render_area):
self.renderer = renderer
self.render_area = render_area

def __hash__(self):
return hash((FragmentInfo, self.renderer, self.render_area))

def __eq__(self, other):
return (isinstance(other, self.__class__) and
other.renderer == self.renderer and
other.render_area == self.render_area)

Is the hash definitely immutable?

No. It's a hash of a tuple of its key attributes, themselves similar
objects.

The program can be thought of as a map viewer. In the gui part, image
fragments are cached for speed, and fragments are only half rendered if
there's a lot of complex features to draw. The "pool" consists of semi
rendered fragments. The reason I did it this way is the cache. The cache
is a dict with a use tracker so when the hash changes, the older
fragments eventually drop from the cache. Hmm... I'm starting to realise
now why my implementation of this isn't so hot. I'm going to hazard a
guess here, and then you can correct me ok?

I create a dict and populate it with a key-value pair, and then the
key's hash changes. When the key is returned by k = d.keys(), then k not
in d, even though k in d.keys().

Simple example:

class moo(object):
def __init__(self, a):
self.a = a
def __hash__(self):
return hash(self.a)

d = {moo(1): 1}

k = d.keys()[0]
k.a = 2

k = d.keys()[0]
print k in d, k in d.keys()
> d.pop(k)
output:
False True
Traceback (most recent call last):
File "/bioinfo/yohell/eclipse/Test/src/test.py", line 14, in <module>
d.pop(k)
KeyError: <__main__.moo object at 0x7f1c64120590>

I'd say that's pretty similar to what I observed.

I guess the logical outcome is that the cache dict will fill up with old
junk that I can't access and can't selectively flush since the hashes
have changed, unless I actually somehow save copies of the keys, which
can get pretty complex and probably won't do wonders for execution
speed. Yeah this was probably a bad soulution.

I should probably do this with lists instead because I can't really
think of a way of salvaging this. Am i right?

Thanks for your help!
/Joel

Arnaud Delobelle · Dec 16, 2008

Joel Hedlund said:
I'm having a very hard time explaining why this snippet *sometimes*
raises KeyError:

snippet:

This snippet is part of a much larger gtk program, and the problem
only from time to time, predominantly when the cpu is under heavy load
and this method gets called a lot. If I didn't know better I'd say
it's a bug in python's dict implementation, but I do know better, so I
know it's far more likely that I've made a mistake somewhere. I'll be
damned if I can figure out what and where though. I've reproduced this
bug (?) with python-2.5.2 on Ubuntu 8.10 and python-2.5.1 on WinXP.

I would very much like an explanation to this that does not involve
threads, because I haven't made any that I'm aware of. I can't even
understand how this could happen. How do I even debug this?

Please help, I feel like I've taken crazy pills here!
/Joel Hedlund

It could happen if two object x and y are such that hash(x) != hash(y)
but x == y, breaking the requirement on hash values.

HTH

Joel Hedlund · Dec 16, 2008

Duncan said:
I think you probably are correct. The only thing I can think that might
help is if you can catch all the situations where changes to the dependent
values might change the hash and wrap them up: before changing the hash pop
the item out of the dict, then reinsert it after the change.

That would probably require a lot of uncomfortable signal handling,
especially for a piece of functionality I'd like to be as unobtrusive as
possible in the application.

Alternatively give up on defining hash and __eq__ for FragmentInfo and rely
on object identity instead.

Object identity wouldn't work so well for caching. Objects would always
be drawn as they appeared for the first time. No updates would be shown
until the objects were flushed from the cache.

I've been experimenting with a list cache now and I can't say I'm
noticing any change in performance for a cache of 100 items. I'm still
using the hash to "freeze" a sort of object tag in order to detect
changes, and I require both hash and object equality for cache hits,
like so:

def index(self, key):
h = hash(key)
for i, item in enumerate(self.items):
if item.hash == h and item.key == key:
return i
raise KeyError(key)

This seems to do what I want and does OK performance wise.

Thanks again!

/Joel

Joel Hedlund · Dec 17, 2008

Scott said:
Perhaps your hash function could be something like:

I'm not sure I understand what you're suggesting.

/Joel

Joel Hedlund · Dec 17, 2008

Scott said:
Perhaps your hash function could be something like:

I'm not sure I understand what you're suggesting.

/Joel

Steven D'Aprano · Dec 17, 2008

That would probably require a lot of uncomfortable signal handling,
especially for a piece of functionality I'd like to be as unobtrusive as
possible in the application.

Object identity wouldn't work so well for caching. Objects would always
be drawn as they appeared for the first time. No updates would be shown
until the objects were flushed from the cache.

Perhaps I don't understand your program structure, but I don't see how
that follows.

I've been experimenting with a list cache now and I can't say I'm
noticing any change in performance for a cache of 100 items.

100 items isn't very big though. If you have 50,000 items you may notice
significant slow down

If having many items in the cache is possible, you should consider using
a binary search instead of a linear search through the cache. See the
bisect module.

Steven D'Aprano · Dec 17, 2008

I'm not sure I understand what you're suggesting.

Neither am I, since you've removed the hash function which might have
given readers a clue.

*wink*

Snipping irrelevant text is a Good Thing, but in this case you were a
little heavy-handed.

Joel Hedlund · Dec 17, 2008

Steven said:
Perhaps I don't understand your program structure, but I don't see how
that follows.

First off, please note that I consider my problem to be solved, many
thanks to c.l.p and especially Duncan Booth. But of course continued
discussion on this topic can be both enlightening and entertaining as
long as people are interested. So here goes:

I'm making a scientific program that visualizes data. It can be thought
of as a freely zoomable map viewer with a configurable stack of data
feature renderers, themselves also configurable to enhance different
aspects of the individual features. As you zoom in, it becomes possible
to render more types of features in a visually recognizable manner, so
consequentially more feature renderers become enabled. The determining
properties for the final image are then the coordinates and zoom level
of the view, and the current renderer stack configuration. Rendering may
be time consuming, so I cache the resulting bitmap fragments using a
"key object" called FragmentInfo that holds this information.

Some renderers may take a very long time to do their work, so in order
to keep the gui nice and responsive, I interrupt the rendering chain at
that point, put the remainder of the rendering chain in a job pool, and
postpone finishing up the rendering until there are cpu cycles to spare.
At that time, I go through the job pool and ask the cache which fragment
was most recently accessed. This fragment is the most likely to actually
be in the current view, and thus the most interesting for the user to
have finallized.

Now, when the user navigates the view to any given point, the gui asks
the cache if the bitmap fragments necessary to tile up the current view
have already been rendered, and if so, retrieves them straight from the
cache and paints them to the screen. And that's why object identity
wouldn't work. If the user changes something in the config for the
current renderer stack (=mutates the objects), the renderers still
retain the same object identities and therefore the old versions would
be retrieved from the cache, and no updates would be shown until the
fragments are flushed from the cache, and the fragment subsequently
redrawn.

I guess you could delve deep into the data members and pull out all
their object identities and hash wrt that if you'd really want to, but I
don't really see the point.

The stupid cache implementation that I originally employed used a dict
to store the FragmentInfo:BitmapFragment items plus a use tracker (list)
on the side. This obviously doesn't work, because as soon as the
renderer stack mutates, list and dict go out of sync and I can no longer
selectively flush old items from the dict, because it's not readily
apparent how to reconstruct the old keys. Purely using lists here is
vastly superior because I can just .pop() the least recently used items
from the tail and be done with them. Also for lists as small as this,
the cost in performance is at most negligible.

100 items isn't very big though. If you have 50,000 items you may notice
significant slow down

If having many items in the cache is possible, you should consider using
a binary search instead of a linear search through the cache. See the
bisect module.

Thanks for the tip, but I don't forsee this cache ever needing to be
that big. ~100 is quite enough for keeping the gui reasonably responsive
in this case.

/Joel

Joel Hedlund · Dec 19, 2008

Joel said:
First off, please note that I consider my problem to be solved, many
thanks to c.l.p and especially Duncan Booth. But of course continued
discussion on this topic can be both enlightening and entertaining as
long as people are interested. So here goes:

heh, nothing like a wall of text to kill off interest I guess.

But thank you all for your time and helpful advice! Aand happy holidays!

/Joel

Unexpected comparisons in dict lookup	1	Mar 18, 2014
How can I make this piece of code even faster?	18	Jul 20, 2013
Can this code be used? Github repository question	0	Sep 28, 2023
[LONG] java.net.URI encoding weirdness	18	May 5, 2014
set and dict iteration	42	Aug 16, 2012
del not working for (exhausted) dict iterable value (Python 3.3)	3	Mar 12, 2013
issubclass(dict, Mapping)	10	Dec 22, 2010
weird isinstance/issubclass behavior?	3	Nov 29, 2012

weird dict problem, how can this even happen?

Joel Hedlund

Joel Hedlund

Arnaud Delobelle

Joel Hedlund

Joel Hedlund

Joel Hedlund

Steven D'Aprano

Steven D'Aprano

Joel Hedlund

Joel Hedlund

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads