EXOR or symmetric difference for the Counter class

P

Paddy

I find myself needing to calculate the difference between two Counters
or multisets or bags.

I want those items that are unique to each bag. I know how to
calculate it:
Counter({'a': 2, 'b': 1})

But thought why doesn't this operation appear already as a method of
the class?

- Paddy.
 
S

Steven D'Aprano

I find myself needing to calculate the difference between two Counters
or multisets or bags.

Is this collections.Counter from Python 3.1? If so, you should say so,
and if not, you should tell us which Counter class this is. It will save
people (by which I mean *me*) from spending an unrewarding few minutes
trying to import Counter in Python 2.5 and writing up a sarcastic
response ... :)

I want those items that are unique to each bag. I know how to calculate
it:

Counter({'a': 2, 'b': 1})

But thought why doesn't this operation appear already as a method of the
class?

Don't know. Perhaps you should put in a feature request.
 
C

Chris Rebert

Is this collections.Counter from Python 3.1?

"Changed in version **2.7**: Added Counter and OrderedDict."

That is still ahead of the curve though.

Cheers,
Chris
 
P

Paddy

Is this collections.Counter from Python 3.1? If so, you should say so,
and if not, you should tell us which Counter class this is. It will save
people (by which I mean *me*) from spending an unrewarding few minutes
trying to import Counter in Python 2.5 and writing up a sarcastic
response ... :)




Don't know. Perhaps you should put in a feature request.

Yes it is Counter in both 3.1 and 2.7 (And somewhere on Activestate
too).

Before I put in a feature request, I wanted to know if other have seen
the need.
 
R

Raymond Hettinger

I find myself needing to calculate the difference between two Counters
or multisets or bags.

I want those items that are unique to each bag.

Tell us about your use cases. I'm curious how a program would ascribe
semantic meaning to the result. The phrase "unique to each bag"
doesn't quite cover it, perhaps something like "number in either
source above the minimum held in common".

AFAICT, I've never needed something like this as a primitive. Even
the xor operation for regular sets is rarely used.

I know how to
calculate it:

    >>> b = Counter(a=1, b=2)
    >>> c = Counter(a=3, b=1)
    >>> diff = (b - c) + (c - b)
Counter({'a': 2, 'b': 1})

That seems simple enough.
You could also use:

diff = (b | c) - (b & c) # max(b,c) - min(b,c)


Raymond
 
P

Paddy

Tell us about your use cases.  I'm curious how a program would ascribe
semantic meaning to the result.  The phrase "unique to each bag"
doesn't quite cover it, perhaps something like "number in either
source above the minimum held in common".

AFAICT, I've never needed something like this as a primitive.  Even
the xor operation for regular sets is rarely used.



That seems simple enough.
You could also use:

 diff = (b | c) - (b & c)   # max(b,c) - min(b,c)

Raymond

Hi Raymond and others,

Lets say you have two *sets* of integers representing two near-copies
of some system, then a measure of their difference could be calculated
as:

len(X.symmetric_difference(Y)) / (len(X) + len(Y)) * 100 %

If the two collections of integers are allowed duplicates then you
need a Counter/bag/multi-set type and the diff calculation I gave
originally.

Thanks.
 
R

Raymond Hettinger

[Paddy]
Lets say you have two *sets* of integers representing two near-copies
of some system, then a measure of their difference could be calculated
as:

len(X.symmetric_difference(Y)) / (len(X) + len(Y)) * 100 %

If the two collections of integers are allowed duplicates then you
need a Counter/bag/multi-set type and the diff calculation I gave
originally.

Thanks for sharing your use case.

It's unlikely that I will add this method to the Counter API because
the rarity of use case does not warrant the added API complexity.
IMO, adding a method like this makes the class harder to learn,
understand and remember. It doesn't seem like much of a win over
using the existing alternatives:

* (b - c) + (c - b)
* (b | c) - (b & c)
* DIY using the two counters as simple dicts
* writing a subclass providing additional binary operations

I would like to see someone post a subclass to the ASPN Cookbook that
adds a number of interesting, though not common operations. Your
symmetric_difference() method could be one. A dot_product() operation
could be another. Elementwise arithmetic is another option (we
already have add and subtract, but could possibly use multiply,
divide, etc). Scaling operations are another possibility (multiple
all elements by five, for example).

The Counter() class has low aspirations. It is a dictionary that
fills-in missing values with zero and is augmented by a handful of
basic methods for managing the counts.


Raymond
 
P

Paddy

[Paddy]
Lets say you have two *sets* of integers representing two near-copies
of some system, then a measure of their difference could be calculated
as:
len(X.symmetric_difference(Y)) / (len(X) + len(Y)) * 100 %
If the two collections of integers are allowed duplicates then you
need a Counter/bag/multi-set type and the diff calculation I gave
originally.

Thanks for sharing your use case.

It's unlikely that I will add this method to the Counter API because
the rarity of use case does not warrant the added API complexity.
IMO, adding a method like this makes the class harder to learn,
understand and remember.  It doesn't seem like much of a win over
using the existing alternatives:

 * (b - c) + (c - b)
 * (b | c) - (b & c)
 * DIY using the two counters as simple dicts
 * writing a subclass providing additional binary operations

I would like to see someone post a subclass to the ASPN Cookbook that
adds a number of interesting, though not common operations.  Your
symmetric_difference() method could be one.  A dot_product() operation
could be another.  Elementwise arithmetic is another option (we
already have add and subtract, but could possibly use multiply,
divide, etc).  Scaling operations are another possibility (multiple
all elements by five, for example).

The Counter() class has low aspirations.  It is a dictionary that
fills-in missing values with zero and is augmented by a handful of
basic methods for managing the counts.

Raymond

I created this that could be an addition to the bottom of the Python 3
collections.Counter class definition:

def __xor__(self, other):
''' symmetric difference: Subtract count, but keep only abs
results with non-zero counts.
True

'''
if not isinstance(other, Counter):
return NotImplemented
result = Counter()
for elem in set(self) | set(other):
newcount = self[elem] - other[elem]
if newcount != 0:
result[elem] = newcount if newcount > 0 else -newcount
return result

- Paddy.
 
P

Paddy

[Paddy]
Lets say you have two *sets* of integers representing two near-copies
of some system, then a measure of their difference could be calculated
as:
len(X.symmetric_difference(Y)) / (len(X) + len(Y)) * 100 %
If the two collections of integers are allowed duplicates then you
need a Counter/bag/multi-set type and the diff calculation I gave
originally.
Thanks for sharing your use case.
It's unlikely that I will add this method to the Counter API because
the rarity of use case does not warrant the added API complexity.
IMO, adding a method like this makes the class harder to learn,
understand and remember.  It doesn't seem like much of a win over
using the existing alternatives:
 * (b - c) + (c - b)
 * (b | c) - (b & c)
 * DIY using the two counters as simple dicts
 * writing a subclass providing additional binary operations
I would like to see someone post a subclass to the ASPN Cookbook that
adds a number of interesting, though not common operations.  Your
symmetric_difference() method could be one.  A dot_product() operation
could be another.  Elementwise arithmetic is another option (we
already have add and subtract, but could possibly use multiply,
divide, etc).  Scaling operations are another possibility (multiple
all elements by five, for example).
The Counter() class has low aspirations.  It is a dictionary that
fills-in missing values with zero and is augmented by a handful of
basic methods for managing the counts.

I created this that could be an addition to the bottom of the Python 3
collections.Counter class definition:

    def __xor__(self, other):
        ''' symmetric difference: Subtract count, but keep only abs
results with non-zero counts.

        >>> Counter('abbbc') ^ Counter('bccd')
        Counter({'b': 2, 'a': 1, 'c': 1, 'd': 1})
        >>> a, b = Counter('abbbc'), Counter('bccd')
        >>> (a-b) + (b - a) == a ^ b
        True

        '''
        if not isinstance(other, Counter):
            return NotImplemented
        result = Counter()
        for elem in set(self) | set(other):
            newcount = self[elem] - other[elem]
            if newcount != 0:
                result[elem] = newcount if newcount > 0 else -newcount
        return result

- Paddy.

And heres the cartesian product/multiply:

def __mul__(self, other):
'''Multiply counts by an integer; or cartesioan product
of two counters.
Counter({('2', '1'): 1, ('1', '2'): 1, ('1', '1'): 1, ('2',
'2'): 1}) Counter({('2', '1'): 4, ('1', '1'): 2, ('2', '2'): 2, ('1',
'2'): 1})
'''
if isinstance(other, int):
return Counter(**dict((k, v*other)
for k,v in self.items()))
elif isinstance(other, Counter):
return Counter( (x, y)
for x in self.elements()
for y in other.elements() )
else:
return NotImplemented

(Although I don't have a use case for this one).

- Paddy.
 
P

Paddy

I would like to see someone post a subclass to the ASPN Cookbook that
adds a number of interesting, though not common operations.  Your
symmetric_difference() method could be one.  A dot_product() operation
could be another.  Elementwise arithmetic is another option (we
already have add and subtract, but could possibly use multiply,
divide, etc).  Scaling operations are another possibility (multiple
all elements by five, for example).

Sample code is at http://code.activestate.com/recipes/577362-extension-to-python-3-counter-class/
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,433
Messages
2,571,683
Members
48,796
Latest member
Greg L.

Latest Threads

Top