Retrieving an object from a set


A

Arnaud Delobelle

Dear Pythoneers,

I've got a seemingly simple problem, but for which I cannot find a
simple solution.

I have a set of objects (say S) containing an object which is equal to
a given object (say x). So

x in S

is true. So there is an object y in S which is equal to x. My
problem is how to retrieve y, without going through the whole set.
Here is a simple illustration with tuples (my actual scenario is not
with tuples but with a custom class):
y = (1, 2, 3) # This is the 'hidden object'
S = set([y] + range(10000))
x = (1, 2, 3)
x in S True
x is y
False

I haven't found y. It's a very simple problem, and this is the
simplest solution I can think of:

class FindEqual(object):
def __init__(self, obj):
self.obj = obj
def __hash__(self):
return hash(self.obj)
def __eq__(self, other):
equal = self.obj == other
if equal:
self.lastequal = other
return equal
True

I've found y! I'm not happy with this as it really is a trick. Is
there a cleaner solution?
 
Ad

Advertisements

S

Steven D'Aprano

Arnaud said:
Dear Pythoneers,

I've got a seemingly simple problem, but for which I cannot find a
simple solution.

I have a set of objects (say S) containing an object which is equal to
a given object (say x). So

x in S

is true. So there is an object y in S which is equal to x. My
problem is how to retrieve y, without going through the whole set.

Why do you care? Since x == y, what benefit do you get from extracting the
actual object y?

I'm not necessarily saying that you *should not* care, only that it is a
very rare occurrence. The only thing I can think of is interning objects,
which is best done with a dict, not a set:


CACHE = {}

def intern(obj):
return CACHE.setdefault(obj, obj)


which you could then use like this:

py> s = "hello world"
py> intern(s)
'hello world'
py> t = 'hello world'
py> t is s
False
py> intern(t) is s
True


However, there aren't very many cases where doing this is actually helpful.
Under normal circumstances, object equality is much more important than
identity, and if you find that identity is important to you, then you
probably should rethink your code.

So... now that I've told you why you shouldn't do it, here's how you can do
it anyway:

def retrieve(S, x):
"""Returns the object in set S which is equal to x."""
s = set(S) # make a copy of S
s.discard(x)
t = S.difference(s)
if t:
return t.pop()
raise KeyError('not found')


S = set(range(10))
y = (1, 2, "hello world")
x = (1, 2, "hello world")
assert x is not y and x == y
S.add(y)
z = retrieve(S, x)
assert z is y


By the way, since this makes a copy of the set, it is O(n). The
straight-forward approach:

for element in S:
if x == element:
x = element
break

is also O(n), but with less overhead. On the other hand, the retrieve
function above does most of its work in C, while the straight-forward loop
is pure Python, so it's difficult to say which will be faster. I suggest
you time them and see.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top