Counter Class -- Bag/Multiset

G

Giovanni Bajo

The collections module in Python 2.7 and Python 3.1 has gotten a new
Counter class that works like bags and multisets in other languages.

I've adapted it for Python2.5/2.6 so people can start using it right
away:
http://docs.python.org/dev/library/collections.html#counter-objects

Here's a link to the docs for the new class:
http://code.activestate.com/recipes/576611/

Hi Raymond,

* I'm not a native speaker, but why use the word "Counter"? A "counter"
to my ear sounds like a number that is increased each time an event
occurs; the website counter, eg, comes to mind. I can understanda its
meaning probably stretches to "an object that counts", but I really can't
think of it as a group of object, or a container of object. Moreover, I
find it a much more useful abstraction the idea of a "multi-set" (that
is, a set where elements can appear with multiple cardinality), rather
than stressing the concept of "counting" how many times each element
appears in the set.

* I find it *very* confusing c.items() vs c.elements(). Items and
elements are synonymous (again, in my understanding of English).

All in all, I think I prefer your previous bag class:
http://code.activestate.com/recipes/259174/
 
C

Chris Rebert

Hi Raymond,

* I'm not a native speaker, but why use the word "Counter"? A "counter"
to my ear sounds like a number that is increased each time an event
occurs; the website counter, eg, comes to mind. I can understanda its
meaning probably stretches to "an object that counts", but I really can't
think of it as a group of object, or a container of object. Moreover, I
find it a much more useful abstraction the idea of a "multi-set" (that
is, a set where elements can appear with multiple cardinality), rather
than stressing the concept of "counting" how many times each element
appears in the set.

* I find it *very* confusing c.items() vs c.elements(). Items and
elements are synonymous (again, in my understanding of English).

I concur and would like to say additionally that having Counter's
len() be the number of *unique* items as opposed to just the number of
items seems a bit counterintuitive.

Cheers,
Chris
 
P

Paul Rubin

Giovanni Bajo said:
* I'm not a native speaker, but why use the word "Counter"?

I agree with this, the new functionality is welcome but I think
the traditional term "multiset" or "bag" would have been better.
 
T

Terry Reedy

Yes, little clicker devices for instances.

Me neither.
I concur and would like to say additionally that having Counter's
len() be the number of *unique* items as opposed to just the number of
items seems a bit counterintuitive.

bag/multiset/counter has been implemented as dict with items as keys and
counts as values, and len(dict) = #keys, so unique items is what I
expect. Giving the bag a sum-of-counts attributes also might be nice,
though.
 
M

msrachel.e

* I find it *very* confusing c.items() vs c.elements(). Items and
elements are synonymous (again, in my understanding of English).

Would have used the term "items" but that term has a different meaning
in the context of dictionaries where "items" means (key,value) pairs.
The word "elements" is used for members of the set object and has a
parallel meaning in the context of multisets. Since the class is a
dict subclass (as recommended by Guido), the term "items" was off
limits because it means something else. So, "elements" as used in the
sets documentation was the next, natural choice.

Raymond
 
M

msrachel.e

I agree with this, the new functionality is welcome but I think
the traditional term "multiset" or "bag" would have been better.

The term counter was what was originally approved. At first, I didn't
like it but found that it had some advantages. The main advantage is
that there are *bazillions* of programmers (including some very good
ones) who have never heard the term multiset or bag but have an
instant,
intuitive understanding of counting and counters.

Also, in the context of this implementation, "multiset" would not be a
good choice. The mathematical entity, multiset, is defined with
elements
having a count of one or more. In contrast, the Counter class allows
counts to go to zero or become negative. In addition, I worried that
calling it a multiset would suggest that it has a set-like API instead
of a dict-like API. With the former, you write c.add(elem). With the
latter, you write c[elem] += 1. So the name, "multiset" would be
misleading.

Bike-shed discussions aside (everyone has an opinion of how to name
things),
how do you guys like the functionality? Do you find it useable in
real-world
use cases?

Raymond
 
P

Paul Rubin

how do you guys like the functionality? Do you find it useable in
real-world use cases?

It's interesting. I wouldn't have thought of that API--I'm used to
populating defaultdict(int) in the obvious ways for this sort of
thing--but it is attractive and I think it will be useful. Thanks for
implementing it.
 
B

bearophileHUGS

Raymond Hettinger:
The collections module in Python 2.7 and Python 3.1 has gotten a new
Counter class that works like bags and multisets in other languages.

Very nice. Python std lib is growing more data structures, increasing
the power of a default Python installation. I can remove more and more
modules from my bag of tricks.

I like the name Bag(), it's shorter. Names are important enough.

Are keys restricted to be long and int values only? Or are they
general (referenced) objects + a control of their integral nature?

I think you can add better explanations to this, like an example:
c += Counter() # remove zero and negative counts

Bye,
bearophile
 
B

bearophileHUGS

bearophile:
Are keys restricted to be long and int values only? Or are they
general (referenced) objects + a control of their integral nature?

Here I meant values, sorry.
Also: what's the rationale of allowing negative values too?

Bye,
bearophile
 
M

msrachel.e

bearophile:


Here I meant values, sorry.
Also: what's the rationale of allowing negative values too?

The dict values can be anything, but only ints make sense in the
context of bags/multisets/counters etc.
Since it is a dict subclass, we really have no control of the
content. It's pretty much a "consenting adults" data structure (much
like the heapq module which cannot enforce a heap condition on the
underlying list which is exposed to the user).

To block negative values, the setter methods would have to be
overridden and would dramatically slow down the major use cases for
counters (and we would not be able to let people freely convert to and
from arbitrary dicts). Also, we would have to introduce and document
some kind of exception for attempts to set a negative value (i.e. c[x]
-= 1 could raise an exception). This would unnecessarily complicate
the API in an attempt to limit what a user can do (perhaps barring
them from a use case they consider to be important).

So, if you don't want negative counts, I recommend that you don't
subtract more than you started off with ;-)

Raymond
 
T

Terry Reedy

The term counter was what was originally approved. At first, I didn't
like it but found that it had some advantages. The main advantage is
that there are *bazillions* of programmers (including some very good
ones) who have never heard the term multiset or bag but have an
instant,
intuitive understanding of counting and counters.

Also, in the context of this implementation, "multiset" would not be a
good choice. The mathematical entity, multiset, is defined with
elements
having a count of one or more. In contrast, the Counter class allows
counts to go to zero or become negative. In addition, I worried that
calling it a multiset would suggest that it has a set-like API instead
of a dict-like API. With the former, you write c.add(elem). With the
latter, you write c[elem] += 1. So the name, "multiset" would be
misleading.

Agreed. I withdraw my objection.
 
T

Terry Reedy

Also: what's the rationale of allowing negative values too?

I can guess two:
1) Nuisance to check given that Python does not have a count (0,1,2...)
type.
2. Useful. + = items on hand; - = items back-ordered.
Bank account go negative also ;-). So does elevation.
Similarly, an app could set 0 to mean 'desired quantity on hand', so
non-zero count is surplus or deficit.

I might not have thought to allow this, but it seems to open new
applications. (This definitely makes bag or multiset inappropriate as
names.)

tjr
 
G

Giovanni Bajo

I concur and would like to say additionally that having Counter's
len() be the number of *unique* items as opposed to just the number of
items seems a bit counterintuitive.

In fact, I think that it makes sense when you're stressing the fact that
it's just a dictionary of objects and their counters (which the name
"Counter" in fact stresses).

And my main objection is exactly this: a multiset is a differnt abstract
data structure (which is probably worth its own ABC in Python); the fact
that can be implemented through a mapping of objects and counters is
just its concrete implementation. This was perfectly represented by
Raymond's previous bag class, that was *using* a dictionary instead of
*being* a dictionary.
 
M

MRAB

Terry said:
I can guess two:
1) Nuisance to check given that Python does not have a count
(0,1,2...) type.
2. Useful. + = items on hand; - = items back-ordered. Bank account go
negative also ;-). So does elevation.
Similarly, an app could set 0 to mean 'desired quantity on hand', so
non-zero count is surplus or deficit.

I might not have thought to allow this, but it seems to open new
applications. (This definitely makes bag or multiset inappropriate
as names.)
I would've limited the counts to non-negative values too, but being able
to store negative values does allow credit/debit or in-stock/on-order.
In such cases, the name 'Counter' makes more sense.
 
P

pataphor

The collections module in Python 2.7 and Python 3.1 has gotten a new
Counter class that works like bags and multisets in other languages.

I like that! Now that we have a multiset or Counter I think a
redefinition of itertools.permutations is in order.

For example something like this:

def genperm(D, R = []):
#generate the permutations of a multiset
if not max(D.values()):
yield tuple(R)
else:
for k,v in sorted(D.items()):
if v:
D[k] -= 1
for g in genperm(D,R+[k]):
yield g
D[k] += 1

def perm(seq):
D = {}
for x in seq:
D[x] = D.get(x,0)+1
for X in genperm(D):
yield X

def test():
for i,x in enumerate(perm('aabbcc')):
print i,x
print len(set(perm('aabbcc')))

if __name__ == '__main__':
test()

The dictionary I'm using here could be seamlessly replaced by your
Counter class.

By the way, I like your itertools recipes a lot, but I miss something
that repeats elements n times, an xcycle or ncycle or whatever, like I
used in this (older code, sorry for the name overlap):

def repeat(x,n = 0):
i = 0
while i < n :
yield x
i += 1

def xcycle(seq,n):
while 1:
for x in seq:
for y in repeat(x,n):
yield y

def counter(symbols,width):
base = len(symbols)
R = []
k = width
while k:
k -= 1
R.append(xcycle(symbols,base**k))
nc = base**width
while nc:
yield list(x.next() for x in R)
nc -=1

def test():
symbols = '01'
width = 4
for state in counter(symbols,width):
print state

if __name__=='__main__':
test()

I think such a thing could be handy for generating more kinds of
periodic sequences. By the way, itertools is starting to feel like it's
becoming some sort of alternate programming design all by itself. I
wonder when it is going to break loose from python! I like that
way of thinking a lot, thanks.

P.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,773
Messages
2,569,594
Members
45,120
Latest member
ShelaWalli
Top