Are you using millions of objects, or MB size objects? Otherwise,
this is no argument.
Yes, millions. In my natural language processing tasks, I almost
always need to define patterns, identify their occurrences in a huge
data, and count them. Say, I have a big text file, consisting of
millions of words, and I want to count the frequency of trigrams:
trigrams([1,2,3,4,5]) == [(1,2,3),(2,3,4),(3,4,5)]
I can save the counts in a dict D1. Later, I may want to recount the
trigrams, with some minor modifications, say, doing it on every other
line of the input file, and the counts are saved in dict D2. Problem
is, D1 and D2 have almost the same set of keys (trigrams of the text),
yet the keys in D2 are new instances, even though these keys probably
have already been inserted into D1. So I end up with unnecessary
duplicates of keys. And this can be a great waste of memory with huge
input data.
BTW, what happens if you, by some operation, make a == b, and
afterwards change b so another object instance must be created?
This instance management is quite a runtime overhead.
I probably need this class to be immutable.
I state that the object management will often eat more performance
than equality testing. Except you have a huge number of equal
objects. If the latter was the case you should rethink your program
design.
Yeah, counting is all about equal or not.
Generally not. In CPython, just very short strings are created only
once.
Wow, I didn't know this. But exactly how Python manage these strings?
My interpretator gave me such results: