# EXOR or symmetric difference for the Counter class

Discussion in 'Python' started by Paddy, Aug 12, 2010.

I find myself needing to calculate the difference between two Counters
or multisets or bags.

I want those items that are unique to each bag. I know how to
calculate it:

>>> b = Counter(a=1, b=2)
>>> c = Counter(a=3, b=1)
>>> diff = (b - c) + (c - b)
>>> (b - c)

Counter({'b': 1})
>>> (c - b)

Counter({'a': 2})
>>> diff

Counter({'a': 2, 'b': 1})

But thought why doesn't this operation appear already as a method of
the class?

2. ### Steven D'ApranoGuest

On Thu, 12 Aug 2010 13:20:19 -0700, Paddy wrote:

> I find myself needing to calculate the difference between two Counters
> or multisets or bags.

Is this collections.Counter from Python 3.1? If so, you should say so,
and if not, you should tell us which Counter class this is. It will save
people (by which I mean *me*) from spending an unrewarding few minutes
trying to import Counter in Python 2.5 and writing up a sarcastic
response ...

> I want those items that are unique to each bag. I know how to calculate
> it:
>
> >>> b = Counter(a=1, b=2)
> >>> c = Counter(a=3, b=1)
> >>> diff = (b - c) + (c - b)
> >>> (b - c)

> Counter({'b': 1})
> >>> (c - b)

> Counter({'a': 2})
> >>> diff

> Counter({'a': 2, 'b': 1})
>
> But thought why doesn't this operation appear already as a method of the
> class?

Don't know. Perhaps you should put in a feature request.

--
Steven

Steven D'Aprano, Aug 13, 2010

3. ### Chris RebertGuest

On Thu, Aug 12, 2010 at 10:36 PM, Steven D'Aprano
<> wrote:
> On Thu, 12 Aug 2010 13:20:19 -0700, Paddy wrote:
>
>> I find myself needing to calculate the difference between two Counters
>> or multisets or bags.

>
> Is this collections.Counter from Python 3.1?

"Changed in version **2.7**: Added Counter and OrderedDict."

That is still ahead of the curve though.

Cheers,
Chris

Chris Rebert, Aug 13, 2010

On Aug 13, 6:36 am, Steven D'Aprano <st...@REMOVE-THIS-
cybersource.com.au> wrote:
> On Thu, 12 Aug 2010 13:20:19 -0700, Paddy wrote:
> > I find myself needing to calculate the difference between two Counters
> > or multisets or bags.

>
> Is this collections.Counter from Python 3.1? If so, you should say so,
> and if not, you should tell us which Counter class this is. It will save
> people (by which I mean *me*) from spending an unrewarding few minutes
> trying to import Counter in Python 2.5 and writing up a sarcastic
> response ...
>
> > I want those items that are unique to each bag. I know how to calculate
> > it:

>
> >     >>> b = Counter(a=1, b=2)
> >     >>> c = Counter(a=3, b=1)
> >     >>> diff = (b - c) + (c - b)
> >     >>> (b - c)
> >     Counter({'b': 1})
> >     >>> (c - b)
> >     Counter({'a': 2})
> >     >>> diff
> >     Counter({'a': 2, 'b': 1})

>
> > But thought why doesn't this operation appear already as a method of the
> > class?

>
> Don't know. Perhaps you should put in a feature request.
>
> --
> Steven

Yes it is Counter in both 3.1 and 2.7 (And somewhere on Activestate
too).

Before I put in a feature request, I wanted to know if other have seen
the need.

5. ### Raymond HettingerGuest

On Aug 12, 1:20 pm, Paddy <> wrote:
> I find myself needing to calculate the difference between two Counters
> or multisets or bags.
>
> I want those items that are unique to each bag.

Tell us about your use cases. I'm curious how a program would ascribe
semantic meaning to the result. The phrase "unique to each bag"
doesn't quite cover it, perhaps something like "number in either
source above the minimum held in common".

AFAICT, I've never needed something like this as a primitive. Even
the xor operation for regular sets is rarely used.

> I know how to
> calculate it:
>
>     >>> b = Counter(a=1, b=2)
>     >>> c = Counter(a=3, b=1)
>     >>> diff = (b - c) + (c - b)
> >>> diff

> Counter({'a': 2, 'b': 1})

That seems simple enough.
You could also use:

diff = (b | c) - (b & c) # max(b,c) - min(b,c)

Raymond

Raymond Hettinger, Aug 14, 2010

On 14 Aug, 18:14, Raymond Hettinger <> wrote:
> On Aug 12, 1:20 pm, Paddy <> wrote:
>
> > I find myself needing to calculate the difference between two Counters
> > or multisets or bags.

>
> > I want those items that are unique to each bag.

>
> Tell us about your use cases.  I'm curious how a program would ascribe
> semantic meaning to the result.  The phrase "unique to each bag"
> doesn't quite cover it, perhaps something like "number in either
> source above the minimum held in common".
>
> AFAICT, I've never needed something like this as a primitive.  Even
> the xor operation for regular sets is rarely used.
>
> > I know how to
> > calculate it:

>
> >     >>> b = Counter(a=1, b=2)
> >     >>> c = Counter(a=3, b=1)
> >     >>> diff = (b - c) + (c - b)
> >     >>> diff
> >     Counter({'a': 2, 'b': 1})

>
> That seems simple enough.
> You could also use:
>
>  diff = (b | c) - (b & c)   # max(b,c) - min(b,c)
>
> Raymond

Hi Raymond and others,

Lets say you have two *sets* of integers representing two near-copies
of some system, then a measure of their difference could be calculated
as:

len(X.symmetric_difference(Y)) / (len(X) + len(Y)) * 100 %

If the two collections of integers are allowed duplicates then you
need a Counter/bag/multi-set type and the diff calculation I gave
originally.

Thanks.

7. ### Raymond HettingerGuest

> Lets say you have two *sets* of integers representing two near-copies
> of some system, then a measure of their difference could be calculated
> as:
>
> len(X.symmetric_difference(Y)) / (len(X) + len(Y)) * 100 %
>
> If the two collections of integers are allowed duplicates then you
> need a Counter/bag/multi-set type and the diff calculation I gave
> originally.

Thanks for sharing your use case.

It's unlikely that I will add this method to the Counter API because
the rarity of use case does not warrant the added API complexity.
IMO, adding a method like this makes the class harder to learn,
understand and remember. It doesn't seem like much of a win over
using the existing alternatives:

* (b - c) + (c - b)
* (b | c) - (b & c)
* DIY using the two counters as simple dicts
* writing a subclass providing additional binary operations

I would like to see someone post a subclass to the ASPN Cookbook that
adds a number of interesting, though not common operations. Your
symmetric_difference() method could be one. A dot_product() operation
could be another. Elementwise arithmetic is another option (we
divide, etc). Scaling operations are another possibility (multiple
all elements by five, for example).

The Counter() class has low aspirations. It is a dictionary that
fills-in missing values with zero and is augmented by a handful of
basic methods for managing the counts.

Raymond

Raymond Hettinger, Aug 17, 2010

On 17 Aug, 02:29, Raymond Hettinger <> wrote:
>
> > Lets say you have two *sets* of integers representing two near-copies
> > of some system, then a measure of their difference could be calculated
> > as:

>
> > len(X.symmetric_difference(Y)) / (len(X) + len(Y)) * 100 %

>
> > If the two collections of integers are allowed duplicates then you
> > need a Counter/bag/multi-set type and the diff calculation I gave
> > originally.

>
> Thanks for sharing your use case.
>
> It's unlikely that I will add this method to the Counter API because
> the rarity of use case does not warrant the added API complexity.
> IMO, adding a method like this makes the class harder to learn,
> understand and remember.  It doesn't seem like much of a win over
> using the existing alternatives:
>
>  * (b - c) + (c - b)
>  * (b | c) - (b & c)
>  * DIY using the two counters as simple dicts
>  * writing a subclass providing additional binary operations
>
> I would like to see someone post a subclass to the ASPN Cookbook that
> adds a number of interesting, though not common operations.  Your
> symmetric_difference() method could be one.  A dot_product() operation
> could be another.  Elementwise arithmetic is another option (we
> divide, etc).  Scaling operations are another possibility (multiple
> all elements by five, for example).
>
> The Counter() class has low aspirations.  It is a dictionary that
> fills-in missing values with zero and is augmented by a handful of
> basic methods for managing the counts.
>
> Raymond

I created this that could be an addition to the bottom of the Python 3
collections.Counter class definition:

def __xor__(self, other):
''' symmetric difference: Subtract count, but keep only abs
results with non-zero counts.

>>> Counter('abbbc') ^ Counter('bccd')

Counter({'b': 2, 'a': 1, 'c': 1, 'd': 1})
>>> a, b = Counter('abbbc'), Counter('bccd')
>>> (a-b) + (b - a) == a ^ b

True

'''
if not isinstance(other, Counter):
return NotImplemented
result = Counter()
for elem in set(self) | set(other):
newcount = self[elem] - other[elem]
if newcount != 0:
result[elem] = newcount if newcount > 0 else -newcount
return result

On Aug 17, 10:47 pm, Paddy <> wrote:
> On 17 Aug, 02:29, Raymond Hettinger <> wrote:
>
>
>

>
> > > Lets say you have two *sets* of integers representing two near-copies
> > > of some system, then a measure of their difference could be calculated
> > > as:

>
> > > len(X.symmetric_difference(Y)) / (len(X) + len(Y)) * 100 %

>
> > > If the two collections of integers are allowed duplicates then you
> > > need a Counter/bag/multi-set type and the diff calculation I gave
> > > originally.

>
> > Thanks for sharing your use case.

>
> > It's unlikely that I will add this method to the Counter API because
> > the rarity of use case does not warrant the added API complexity.
> > IMO, adding a method like this makes the class harder to learn,
> > understand and remember.  It doesn't seem like much of a win over
> > using the existing alternatives:

>
> >  * (b - c) + (c - b)
> >  * (b | c) - (b & c)
> >  * DIY using the two counters as simple dicts
> >  * writing a subclass providing additional binary operations

>
> > I would like to see someone post a subclass to the ASPN Cookbook that
> > adds a number of interesting, though not common operations.  Your
> > symmetric_difference() method could be one.  A dot_product() operation
> > could be another.  Elementwise arithmetic is another option (we
> > already have add and subtract, but could possibly use multiply,
> > divide, etc).  Scaling operations are another possibility (multiple
> > all elements by five, for example).

>
> > The Counter() class has low aspirations.  It is a dictionary that
> > fills-in missing values with zero and is augmented by a handful of
> > basic methods for managing the counts.

>
> > Raymond

>
> I created this that could be an addition to the bottom of the Python 3
> collections.Counter class definition:
>
>     def __xor__(self, other):
>         ''' symmetric difference: Subtract count, but keep only abs
> results with non-zero counts.
>
>         >>> Counter('abbbc') ^ Counter('bccd')
>         Counter({'b': 2, 'a': 1, 'c': 1, 'd': 1})
>         >>> a, b = Counter('abbbc'), Counter('bccd')
>         >>> (a-b) + (b - a) == a ^ b
>         True
>
>         '''
>         if not isinstance(other, Counter):
>             return NotImplemented
>         result = Counter()
>         for elem in set(self) | set(other):
>             newcount = self[elem] - other[elem]
>             if newcount != 0:
>                 result[elem] = newcount if newcount > 0 else -newcount
>         return result
>

And heres the cartesian product/multiply:

def __mul__(self, other):
'''Multiply counts by an integer; or cartesioan product
of two counters.

>>> Counter('abbb') * 3

Counter({'b': 9, 'a': 3})
>>> Counter('12') * Counter('21')

Counter({('2', '1'): 1, ('1', '2'): 1, ('1', '1'): 1, ('2',
'2'): 1})
>>> Counter('122') * Counter('211')

Counter({('2', '1'): 4, ('1', '1'): 2, ('2', '2'): 2, ('1',
'2'): 1})
'''
if isinstance(other, int):
return Counter(**dict((k, v*other)
for k,v in self.items()))
elif isinstance(other, Counter):
return Counter( (x, y)
for x in self.elements()
for y in other.elements() )
else:
return NotImplemented

(Although I don't have a use case for this one).

On Aug 17, 2:29 am, Raymond Hettinger <> wrote:

> I would like to see someone post a subclass to the ASPN Cookbook that
> adds a number of interesting, though not common operations.  Your
> symmetric_difference() method could be one.  A dot_product() operation
> could be another.  Elementwise arithmetic is another option (we
> divide, etc).  Scaling operations are another possibility (multiple
> all elements by five, for example).
>

>
> Raymond

Sample code is at http://code.activestate.com/recipes/577362-extension-to-python-3-counter-class/