U
Ulrich Eckhardt
Hi!
I'm trying to solve a computational problem and of course speed and size is
important there. Apart from picking the right algorithm, I came across an
idea that could help speed up things and keep memory requirements down. What
I have is regions described by min and max coordinates. At first, I just
modeled these as a simple class containing two values for each axis.
In a second step, I derived this class from tuple instead of object. Some
code then moved from __init__ to __new__ and some code that modified these
objects had to be changed to replace them instead. The upside to this is
that they can be used as keys in sets and dicts, which isn't the case for
mutable types[1].
What I'm now considering is to only allow a single instance of these objects
for each set of values, similar to interned strings. What I would gain is
that I could safely compare objects for identity instead of equality. What
I'm not yet sure is how much overhead that would give me and/or how to keep
it low. The idea is to store each instance in a set and after creating a new
object I would first look up an equal object in the global set and return
that instead, otherwise add the new one.
The problem I foresee is that if I define equality as identity, this lookup
when creating will never eliminate duplicates. If I only fall back to
equality comparison for non-identical objects, I would probably sacrifice
most of the gain. If I build a dict mapping between the values and the
actual objects, I would have doubled the required memory and uselessly store
the same values twice there.
Am I looking in the wrong direction? Is there some better approach? Please
don't tell me to use C, as I'm specifically interested in learning Python,
I'm pretty sure I could have solved the problem quickly in C++ otherwise.
Other suggestions?
Cheers!
Uli
[1] Somebody correct me if I'm wrong, but I believe I could have defined a
hashing function for the type and thus allowed its use in a set or dict,
right? However, goofing up because you accidentally modified an object and
changed its hash value is something I don't want to risk anyway.
I'm trying to solve a computational problem and of course speed and size is
important there. Apart from picking the right algorithm, I came across an
idea that could help speed up things and keep memory requirements down. What
I have is regions described by min and max coordinates. At first, I just
modeled these as a simple class containing two values for each axis.
In a second step, I derived this class from tuple instead of object. Some
code then moved from __init__ to __new__ and some code that modified these
objects had to be changed to replace them instead. The upside to this is
that they can be used as keys in sets and dicts, which isn't the case for
mutable types[1].
What I'm now considering is to only allow a single instance of these objects
for each set of values, similar to interned strings. What I would gain is
that I could safely compare objects for identity instead of equality. What
I'm not yet sure is how much overhead that would give me and/or how to keep
it low. The idea is to store each instance in a set and after creating a new
object I would first look up an equal object in the global set and return
that instead, otherwise add the new one.
The problem I foresee is that if I define equality as identity, this lookup
when creating will never eliminate duplicates. If I only fall back to
equality comparison for non-identical objects, I would probably sacrifice
most of the gain. If I build a dict mapping between the values and the
actual objects, I would have doubled the required memory and uselessly store
the same values twice there.
Am I looking in the wrong direction? Is there some better approach? Please
don't tell me to use C, as I'm specifically interested in learning Python,
I'm pretty sure I could have solved the problem quickly in C++ otherwise.
Other suggestions?
Cheers!
Uli
[1] Somebody correct me if I'm wrong, but I believe I could have defined a
hashing function for the type and thus allowed its use in a set or dict,
right? However, goofing up because you accidentally modified an object and
changed its hash value is something I don't want to risk anyway.