Pythonic way to count sequences

Discussion in 'Python' started by CM, Apr 25, 2013.

1. CMGuest

I have to count the number of various two-digit sequences in a list
such as this:

mylist = [(2,4), (2,4), (3,4), (4,5), (2,1)] # (Here the (2,4)
sequence appears 2 times.)

and tally up the results, assigning each to a variable. The inelegant
first pass at this was something like...

# Create names and set them all to 0
alpha = 0
beta = 0
delta = 0
gamma = 0
# etc...

# loop over all the tuple sequences and increment appropriately
for sequence_tuple in list_of_tuples:
if sequence_tuple == (1,2):
alpha += 1
if sequence_tuple == (2,4):
beta += 1
if sequence_tuple == (2,5):
delta +=1
# etc... But I actually have more than 10 sequence types.

# Finally, I need a list created like this:
result_list = [alpha, beta, delta, gamma] #etc...in that order

I can sense there is very likely an elegant/Pythonic way to do this,
and probably with a dict, or possibly with some Python structure I
don't typically use. Suggestions sought. Thanks.
CM, Apr 25, 2013

2. Chris AngelicoGuest

On Thu, Apr 25, 2013 at 3:05 PM, CM <> wrote:
> I have to count the number of various two-digit sequences in a list
> such as this:
>
> mylist = [(2,4), (2,4), (3,4), (4,5), (2,1)] # (Here the (2,4)
> sequence appears 2 times.)
>
> and tally up the results, assigning each to a variable.

You can use a tuple as a dictionary key, just like you would a string.
So you can count them up directly with a dictionary:

count = {}
for sequence_tuple in list_of_tuples:
count[sequence_tuple] = count.get(sequence_tuple,0) + 1

Also, since this is such a common thing to do, there's a standard
library way of doing it:

import collections
count = collections.Counter(list_of_tuples)

This doesn't depend on knowing ahead of time what your elements will
be. At the end of it, you can simply iterate over 'count' and get all

for sequence,number in count.items():
print("%d of %r" % (number,sequence))

ChrisA
Chris Angelico, Apr 25, 2013

3. Steven D'ApranoGuest

On Wed, 24 Apr 2013 22:05:52 -0700, CM wrote:

> I have to count the number of various two-digit sequences in a list such
> as this:
>
> mylist = [(2,4), (2,4), (3,4), (4,5), (2,1)] # (Here the (2,4) sequence
> appears 2 times.)
>
> and tally up the results, assigning each to a variable. The inelegant
> first pass at this was something like...
>
> # Create names and set them all to 0
> alpha = 0
> beta = 0
> delta = 0
> gamma = 0
> # etc...

Do they absolutely have to be global variables like that? Seems like a
bad design, especially if you don't know in advance exactly how many
there are.

> # loop over all the tuple sequences and increment appropriately for
> sequence_tuple in list_of_tuples:
> if sequence_tuple == (1,2):
> alpha += 1
> if sequence_tuple == (2,4):
> beta += 1
> if sequence_tuple == (2,5):
> delta +=1
> # etc... But I actually have more than 10 sequence types.

counts = {}
for t in list_of_tuples:
counts[t] = counts.get(t, 0) + 1

Or, use collections.Counter:

from collections import Counter
counts = Counter(list_of_tuples)

> # Finally, I need a list created like this: result_list = [alpha, beta,
> delta, gamma] #etc...in that order

Dicts are unordered, so getting the results in a specific order will be a
bit tricky. You could do this:

results = sorted(counts.items(), key=lambda t: t[0])
results = [t[1] for t in results]

if you are lucky enough to have the desired order match the natural order
of the tuples. Otherwise:

desired_order = [(2, 3), (3, 1), (1, 2), ...]
results = [counts.get(t, 0) for t in desired_order]

--
Steven
Steven D'Aprano, Apr 25, 2013
4. Serhiy StorchakaGuest

25.04.13 08:26, Chris Angelico Ð½Ð°Ð¿Ð¸ÑÐ°Ð²(Ð»Ð°):
> So you can count them up directly with a dictionary:
>
> count = {}
> for sequence_tuple in list_of_tuples:
> count[sequence_tuple] = count.get(sequence_tuple,0) + 1

Or alternatives:

count = {}
for sequence_tuple in list_of_tuples:
if sequence_tuple] in count:
count[sequence_tuple] += 1
else:
count[sequence_tuple] = 1

count = {}
for sequence_tuple in list_of_tuples:
try:
count[sequence_tuple] += 1
except KeyError:
count[sequence_tuple] = 1

import collections
count = collections.defaultdict(int)
for sequence_tuple in list_of_tuples:
count[sequence_tuple] += 1

But of course collections.Counter is a preferable way now.
Serhiy Storchaka, Apr 25, 2013
5. Denis McMahonGuest

On Wed, 24 Apr 2013 22:05:52 -0700, CM wrote:

> I have to count the number of various two-digit sequences in a list such
> as this:
>
> mylist = [(2,4), (2,4), (3,4), (4,5), (2,1)] # (Here the (2,4) sequence
> appears 2 times.)
>
> and tally up the results, assigning each to a variable. The inelegant
> first pass at this was something like...
>
> # Create names and set them all to 0 alpha = 0 beta = 0 delta = 0 gamma
> = 0 # etc...
>
> # loop over all the tuple sequences and increment appropriately for
> sequence_tuple in list_of_tuples:
> if sequence_tuple == (1,2):
> alpha += 1
> if sequence_tuple == (2,4):
> beta += 1
> if sequence_tuple == (2,5):
> delta +=1
> # etc... But I actually have more than 10 sequence types.
>
> # Finally, I need a list created like this:
> result_list = [alpha, beta, delta, gamma] #etc...in that order
>
> I can sense there is very likely an elegant/Pythonic way to do this, and
> probably with a dict, or possibly with some Python structure I don't
> typically use. Suggestions sought. Thanks.

mylist = [ (3,3), (1,2), "fred", ("peter",1,7), 1, 19, 37, 28.312,
("monkey"), "fred", "fred", (1,2) ]

bits = {}

for thing in mylist:
if thing in bits:
bits[thing] += 1
else:
bits[thing] = 1

for thing in bits:
print thing, " occurs ", bits[thing], " times"

outputs:

(1, 2) occurs 2 times
1 occurs 1 times
('peter', 1, 7) occurs 1 times
(3, 3) occurs 1 times
28.312 occurs 1 times
fred occurs 3 times
19 occurs 1 times
monkey occurs 1 times
37 occurs 1 times

if you want to check that thing is a 2 int tuple then use something like:

for thing in mylist:
if isinstance( thing, tuple ) and len( thing ) == 2 and isinstance
( thing[0], ( int, long ) ) and isinstance( thing[1], ( int, long) ):
if thing in bits:
bits[thing] += 1
else:
bits[thing] = 1

--
Denis McMahon,
Denis McMahon, Apr 26, 2013
6. ModulokGuest

On 4/25/13, Denis McMahon <> wrote:
> On Wed, 24 Apr 2013 22:05:52 -0700, CM wrote:
>
>> I have to count the number of various two-digit sequences in a list such
>> as this:
>>
>> mylist = [(2,4), (2,4), (3,4), (4,5), (2,1)] # (Here the (2,4) sequence
>> appears 2 times.)
>>
>> and tally up the results, assigning each to a variable.

....

Consider using the ``collections`` module::

from collections import Counter

mylist = [(2,4), (2,4), (3,4), (4,5), (2,1)]
count = Counter()
for k in mylist:
count[k] += 1

print(count)

# Output looks like this:
# Counter({(2, 4): 2, (4, 5): 1, (3, 4): 1, (2, 1): 1})

You then have access to methods to return the most common items, etc. See more
examples here:

http://docs.python.org/3.3/library/collections.html#collections.Counter

Good luck!
-Modulok-
Modulok, Apr 26, 2013
7. CMGuest

Thank you, everyone, for the answers. Very helpful and knowledge-
expanding.
CM, Apr 26, 2013
8. Matthew GilsonGuest

A Counter is definitely the way to go about this. Just as a little more
information. The below example can be simplified:

from collections import Counter
count = Counter(mylist)

With the other example, you could have achieved the same thing (and been
backward compatible to python2.5) with

from collections import defaultdict
count = defaultdict(int)
for k in mylist:
count[k] += 1

On 4/25/13 9:16 PM, Modulok wrote:
> On 4/25/13, Denis McMahon <> wrote:
>> On Wed, 24 Apr 2013 22:05:52 -0700, CM wrote:
>>
>>> I have to count the number of various two-digit sequences in a list such
>>> as this:
>>>
>>> mylist = [(2,4), (2,4), (3,4), (4,5), (2,1)] # (Here the (2,4) sequence
>>> appears 2 times.)
>>>
>>> and tally up the results, assigning each to a variable.

> ...
>
> Consider using the ``collections`` module::
>
>
> from collections import Counter
>
> mylist = [(2,4), (2,4), (3,4), (4,5), (2,1)]
> count = Counter()
> for k in mylist:
> count[k] += 1
>
> print(count)
>
> # Output looks like this:
> # Counter({(2, 4): 2, (4, 5): 1, (3, 4): 1, (2, 1): 1})
>
>
> You then have access to methods to return the most common items, etc. See more
> examples here:
>
> http://docs.python.org/3.3/library/collections.html#collections.Counter
>
>
> Good luck!
> -Modulok-
Matthew Gilson, Apr 26, 2013