I use defaultdict(int) all the time, but it's been fairly rare
to want to do set operations (union, intersection, difference) on
these. I'm not even sure what the intersection should mean.
Let me write the multiset with 3 x's and 2y y' as {2x, 3y} (this is
not a suggested Python syntax!).
Note: in the following my multisets can have negative multiplicities,
whereas the usual definition only accepts positive multiplicities.
Let H be the collection of all python hashable objects.
Let Z be the set of integers.
So a multiset is a mapping from H to Z which maps cofinitely many
objects
to 0 (a sort of defaultdict(int))
=== For mathematically minded people ===
Well, if one thinks about the mathematical meaning of multisets, then
the collection of all multisets is the free Z-module over H.
Therefore the meaningful operations for multisets are addition,
subtraction and multiplication by a scalar.
=== For not so mathematically minded people ===
One way of thinking about a multiset can be as a vector whose
dimensions are explicitly named, instead of being implicit. For
example in 3D the vector [2, 3, 4] could instead be written as {2x,
3y, 4z}. Just like vectors, multisets can then be added, subtracted
and multiplied by as scalar:
{2x, 3y} + {4x, 7z} = {6x, 3y, 7z}
{4x, 2y, 1z} - {1x, 2y, 3z} = {3x, -2z}
{3x, 2y}*5 = {15x, 10y}
Now if one wants to extend union and intersection to multisets:
* For sets {x, y} union {y, z} = {x, y, z}. The natural way of
extending this to multisets is having the union operator take the
max of the multiplicities of each element, i.e.
{2x, 7y, 1z} union {3y, 1z, 5t} = {2x, 7y, 1z, 5t}
{3x, -5y} union {2z} = {3x, 2z} (!)
* Similarly, for intersection one would take the min of
multiplicities, i.e.
{2x, 7y, 1z} intersection {3y, 1z, 5t} = {3y, 1z}
{3x, -5y} intersection {2z} = {-5y} (!)
* difference and symmetric_difference can be defined in terms of +, -,
union, intersection:
A difference B = A - (A intersection B)
A symmetric_difference B = (A union B) - (A intersection B)
Note that union, intersection, difference and symmetric_difference
defined this way are fully compatible with the corresponding set
operations in the following sense:
If A and B are sets, then
- multiset(A).union(multiset(B)) == multiset(A.union(B))
- multiset(A).intersection(multiset(B)) == multiset(A.intersection(B))
(same for difference, symmetric_difference)
I think this is how I would like my multisets