Counter for items in lists in lists?

C

Charlotte Henkle

Hello;

I'm pondering how to count the number of times an item appears in
total in a nested list. For example:

myList=[[a,b,c,d],[a,f,g,h],[a,b,x,y]]

I'd like to know that a appeared three times, and b appeared twice,
and the rest appeard only once.

Every solution I've touched on so far has seemed clumsy. Does anyone
have any thoughts for a down and dirty solution to this?

Many thanks...

/Charlotte
 
S

Steven Bethard

Charlotte Henkle said:
I'm pondering how to count the number of times an item appears in
total in a nested list.

How about this:
myList=[['a','b','c','d'],['a','f','g','h'],['a','b','x','y']]
def count(item):
.... if not isinstance(item, list):
.... return {item:1}
.... counts = {}
.... for i in item:
.... for key, ct in count(i).items():
.... counts[key] = counts.get(key, 0) + ct
.... return counts
....{'a': 3, 'c': 1, 'b': 2, 'd': 1, 'g': 1, 'f': 1, 'h': 1, 'y': 1, 'x': 1}

Steve
 
B

Bryan

Steven said:
Charlotte Henkle said:
I'm pondering how to count the number of times an item appears in
total in a nested list.


How about this:

myList=[['a','b','c','d'],['a','f','g','h'],['a','b','x','y']]
def count(item):

... if not isinstance(item, list):
... return {item:1}
... counts = {}
... for i in item:
... for key, ct in count(i).items():
... counts[key] = counts.get(key, 0) + ct
... return counts
...

{'a': 3, 'c': 1, 'b': 2, 'd': 1, 'g': 1, 'f': 1, 'h': 1, 'y': 1, 'x': 1}

Steve

or maybe a less general approach might work if the nested list is always one deep:
>>> myList=[['a','b','c','d'],['a','f','g','h'],['a','b','x','y']]
>>> tmp = []
>>> d = {}
>>> for item in myList: tmp += item
>>> for key in tmp: d[key] = d.get(key, 0) + 1
>>> d
{'a': 3, 'c': 1, 'b': 2, 'd': 1, 'g': 1, 'f': 1, 'h': 1, 'y': 1, 'x': 1}


bryan
 
S

Steven Bethard

Bryan said:
or maybe a less general approach might work if the nested list is always one
deep:
myList=[['a','b','c','d'],['a','f','g','h'],['a','b','x','y']]
tmp = []
d = {}
for item in myList: tmp += item
for key in tmp: d[key] = d.get(key, 0) + 1
d
{'a': 3, 'c': 1, 'b': 2, 'd': 1, 'g': 1, 'f': 1, 'h': 1, 'y': 1, 'x': 1}

Yeah, if that's the case, you don't really even need to construct a temporary
list:
myList=[['a','b','c','d'],['a','f','g','h'],['a','b','x','y']]
counts = {}
for lst in myList:
.... for item in lst:
.... counts[item] = counts.get(item, 0) + 1
.... {'a': 3, 'c': 1, 'b': 2, 'd': 1, 'g': 1, 'f': 1, 'h': 1, 'y': 1, 'x': 1}

Steve
 
A

Alex Martelli

Charlotte Henkle said:
Hello;

I'm pondering how to count the number of times an item appears in
total in a nested list. For example:

myList=[[a,b,c,d],[a,f,g,h],[a,b,x,y]]

I'd like to know that a appeared three times, and b appeared twice,
and the rest appeard only once.

Two problems enmeshed, here: how to count items of various values if you
have a way to iterate over the values; and how to iterate on a nested
list. You may code them together, but factoring them as two reusable
functions is preferable -- you'll end up with more reusable building
blocks than by coding them as one monolithic function. Python is good
at allowing such factorings. There may be a third sub-problem, about
how best to _present_ the counts -- hard to say from your statement
about the problem, whether that's a problem, but, see later.

You're not telling us what is the type of the items. If they're strings
or numbers, basically anything that can be a key to a dictionary, the
counting problem is best solved by building a dictionary:

def itemcount(sequence):
d = {}
for item in sequence: d[item] = 1 + d.get(item, 0)
return d

this kind of function is known as an 'accumulator': given a sequence it
loops over it computing/building some resulting object and returns it.
Not a specific Python term, but a nice one to keep in mind.

You don't clarify the structure of your list -- is it always two levels
as you present it, could it be more, are all items always going to be at
the same level or could some level have a mixture of items and sublists?
Taking the simplest hypothesis -- always two levels and all items at the
second level, the first one being made up of sublists only:

def itemseq(nested_list):
for sublist in nested_list:
for item in sublist: yield item

this kind of function is known as a 'simple generator': it produces a
sequence you may loop on with a for statement, pass to an accumulator,
etc. This is a specific Python term, and the keyword that lets you tell
whether a function is a generator is 'yield'.

So, given these two functions -- or enhanced equivalents if your list or
its items don't meet these hypotheses -- the dict of counts is easy to
obtain:

myCounts = itemcount(itemseq(myList))

Now for the presentation issue -- you say you want to receive the
information "that a appeared three times, and b appeared twice,
and the rest appeard only once". This appears to mean that you want to
get a sort of items by count, with items appearing only once just
summarized as a group, not listed individually. If I correctly divined
your intention, this might help:

def summarize_counts(counts_dict):
number_of_singletons = 0
count_per_item_list = []
for item, count in counts_dict.iteritems():
if count == 1: number_of_singletons += 1
else: count_per_item_list.append((count, item))
count_per_item_list.sort()
for count, item in count_per_item_list:
print 'Item %r appeared %d times' % (item, count)
print '%d items appeared only once each' % number_of_singletons


So, overall, if I've read your mind correctly,

summarize_counts(itemcount(itemseq(myList)))

should do exactly what you required. However, mindreading is always an
iffy business (that's part of why the Zen of Python says "explicit is
better than implicit"...:). So, if you can clarify the various
ambiguous and uncertain details of your specs, which forced me to guess
(violating another Zen injunction: "in the face of ambiguity, resist the
temptation to guess"), over and over, at what exactly you meant, I'm
confident that the various pieces presented here can be enhanced to meet
whatever your requirements actually _are_.


Alex
 
M

Michael Hoffman

If you want to be able to use any iterable, you can use itertools.chain().
>>> import itertools
>>> myList=[['a','b','c','d'],['a','f','g','h'],['a','b','x','y']]
>>> counts = {}
>>> for item in itertools.chain(*myList):
.... counts[item] = counts.get(item, 0) + 1
....{'a': 3, 'c': 1, 'b': 2, 'd': 1, 'g': 1, 'f': 1, 'h': 1, 'y': 1, 'x': 1}
 
C

Charlotte Henkle

Ahhhh...thank you all!

Actually, I wanted to be able to count the items in the list to check
my thinking on a problem, and as it turns out, my thinking was
incorrect. So I have a follow up question now ;)

Some background:

I was given a problem by a friend. He is in a tennis group that has 6
members. They play once a week for 10 weeks. They were having
trouble generating a schedule that was random but fair (IE everyone
gets to play a base number of times , and the mod is evenly and
randomly distributed).

I decided that I wanted to abstract as much of the problem as possible
so that it would be reusable for other groups (IE to solve it for
games with N players, Y times over X weeks). And then I decided I
wanted this to be my second program in python, so forgive my
clumsiness ;)

My first pass through was short and simple: Figure out the total
number of games that will be played, and then use something like this:

gamePlayers=random.sample(templist, players_per_game)

to fill up all the game slots. Neat, simple.

The problem I found with this solution was that it didn't give me a
weighted list for remainder. The same person could get in all of the
"extra" games.

So instead I decided to fill games from my list of players by removal
until I had a number of players left less than a full game, and then
have a back-up list. The back up list would act like a queue, but if
you were already in the game, it would take the next guy. If you got
into a game from the back-up list, you were sent to the end of the
line. My 2 lines grew to 50 plus ;)

Sadly, this isn't quite working either. Now that I can print out the
players, I see that generally things work, but every once in a while,
I get an unexpected result. For example, with 6 players, 10 weeks, 2
games per week, I would expect combinations of:

{'a': 13, 'c': 14, 'b': 14, 'e': 13, 'd': 13, 'f': 13}

(4 13s and 2 14s)

But sometimes I get:

{'a': 13, 'c': 14, 'b': 14, 'e': 14, 'd': 12, 'f': 13}

(2 13s, 3 14s and a 12)

That 12 breaks the planned even distribution. :(

I suspect the problem comes from the random pulling in the first part,
but I'm not sure. I also feel that some sections (espcially the
print) don't have a "python-grace", so I would love some suggestions
to make them more...slithery? :)

To make a long story longer, here's the code:

....#! /usr/bin/env python
....
....#A program to randomly fill a tennis schedule
....#
....#The original theory looked like this:
....# gamePlayers=random.sample(templist, players_per_game)
....# print gamePlayers
....#
....#But that didn't give a weighted list for extra games
....
....import random
....
....#Eventually these will get set dynamically
....number_of_weeks=10
....players=['a', 'b', 'c', 'd', 'e', 'f']
....games_per_week=2
....players_per_game=4
....games=number_of_weeks*games_per_week
....
....#this will be used to pull off "extra game" players
....backupList=players[:]
....random.shuffle(backupList)
....
....#a templist so we can modify it.
....templist=players[:]
....
....#our finished product:
....finishedList=[]
....
....while len(finishedList)!=games:
.... if len(templist)>=players_per_game:
.... gamePlayers=[]
.... while len(gamePlayers)!=players_per_game:
.... randomNumber=random.randint(0, len(templist)-1)
.... potentialPlayer=templist.pop(randomNumber)
.... gamePlayers.append(potentialPlayer)
.... finishedList.append(gamePlayers)
.... else:
.... gamePlayers=templist
.... print "I am the leftover game players", gamePlayers
.... print "I am the list of backup players", backupList
.... count=0
.... while len(gamePlayers)!=players_per_game:
.... print "I am a potential player"
.... potentialPlayer=backupList[count]
.... print potentialPlayer
.... print "checking to see if I'm in the game"
.... if potentialPlayer not in gamePlayers:
.... print "I do not think the player is in the game"
.... print "I am the back-up list", backupList
.... potentialPlayer=backupList.pop(count)
.... gamePlayers.append(potentialPlayer)
.... backupList.append(potentialPlayer)
.... print "I am the back-up list after reorder", backupList
.... print "I am the gamePlayers after test and insertion",
gamePlayers
....
.... else:
.... print "I think that player is in the game"
.... count+=1
.... finishedList.append(gamePlayers)
.... templist=players[:]
....
....#count the list (thank you, Steve!
....http://groups.google.com/groups?dq=...lang.python&ie=UTF-8&hl=en&btnG=Google+Search
....
....def count(item):
.... if not isinstance(item, list):
.... return {item:1}
.... counts = {}
.... for i in item:
.... for key, ct in count(i).items():
.... counts[key] = counts.get(key, 0) + ct
.... return counts
....
....def printList(weeks, games, list):
.... x=0
.... while x!=weeks:
.... y=0
.... print "Week: ", x+1
.... while y<games:
.... print "Game ",y+1, " players are ", list[y]
.... y+=1
.... x+=1
....
....#printing out and counting the final list
....
....printList(number_of_weeks, games_per_week, finishedList)
....print finishedList.count("a")
 
A

Alex Martelli

Charlotte Henkle said:
I was given a problem by a friend. He is in a tennis group that has 6
members. They play once a week for 10 weeks. They were having
trouble generating a schedule that was random but fair (IE everyone
gets to play a base number of times , and the mod is evenly and
randomly distributed).

How many of the 6 members can play on each of those once-weekly
occasions?
My first pass through was short and simple: Figure out the total
number of games that will be played, and then use something like this:

gamePlayers=random.sample(templist, players_per_game)

to fill up all the game slots. Neat, simple.

The problem I found with this solution was that it didn't give me a
weighted list for remainder. The same person could get in all of the
"extra" games.

Yep, you're being a bit "too random" there -- sampling with repetition.

Consider random.shuffle: puts the players in arbitrary order but without
repetition. Just pick whatever number at a time, say from the back --
doing so from the front would be just as good, but slower, so it might
as well be from the back. When you're trying to pick more players than
are left in the shuffled list you're peeling stuff from, make sure those
get in then reshuffle the original list minus those who are already in,
for the next time.

Ah, natural language is SO complicated, let's put it in pseudocode.
First, you'll need sets, since obviously there are set operations
involved (e.g., "minus those who are already in" is goofy language for a
set difference operation). So let's make sure we have sets. In Python
2.4, they're built-in; in 2.3, they're in module sets of the standard
library. To ensure you run optimally in either release, star your
program with:

import random
try: set=set
except NameError: from sets import Set as set

Now we're ready to write our pseudocode. At each step, i.e. each time
another weekly game is planned, we'll have: some list X of players who
haven't played yet in this 'round' in shuffled order; a request for N
players for tonight; and of course the original set P of players. If N
is less than the items in X we're in an easy case:
if N < len(X):
yield X[-N:]
del X[-N:]
i.e. we just yield the last N items of list X then remove those same
items from further consideration in this round.

If more players are going to play tonight, than there are items in X,
it's a little bit more complicated. We need to prepare a shuffled list
of all players except those who are alreayd in X...:
else:
newX = list(P-set(X))
random.shuffle(newX)
and then yield the contents of X and the needed N-len(X) items of newX:
fromNewX = N-len(X)
yield X + newX[-fromNewX:]
finally, we need to remove from newX the items we just yielded, and set
the remainder as the new value of X for the next evening of play:
X = newX[:-fromNewX]

Great, but how do we get started? Heh, funny, if X is empty we're just
in a starting position -- len(X) is 0 so we're gonna prepare newX, and
we're gonna prepare it from the whole of set P... just as we want. So
all the initialization we need is to make sure that P _is_ a set and
that X is empty:
P = set(P)
X = []

Great -- now that we have our pseudocode, how do we turn it into actual
working code? That's where Python shines, for the pseudocode we jotted
down to help our thinking as we reasoned about the problem IS working
Python code. We could rename X, P and N, but apart from that, here is
the generator we need...:

def choose_players(P, N):
assert len(P) >= N > 0
P = set(P)
X = []
while True:
if N < len(X):
yield X[-N:]
del X[-N:]
else:
newX = list(P-set(X))
random.shuffle(newX)
fromNewX = N-len(X)
yield X + newX[-fromNewX:]
X = newX[:-fromNewX]

I've added a check that N makes sense (0 or fewer players, or more than
the club's membership, being needed each time, obviously makes no
sense!) and an unending loop around the whole thing. This is a
nonterminating generator -- it won't ever stop iterating by itself; it
needs to be called the appropriate number of times. I find that more
useful than passing the generator a parameter telling it how many times
to loop (you'd just need to add such a parameter T and change the while
to 'for i in xrange(T):' -- but if you need to do that, you could just
as well use iterator.islice or whatever to limit the numbers of steps
over the generator...).


Here's an example use...:

for i, players in enumerate(choose_players(range(13), 4))):
print players
if i > 9: break

Here there are 13 members, named 0 to 12, and each week 4 can play, for
11 weeks. You can run it a few times to eyeball it for correctness, but
of course you'll want to check it out better eventually, for example:

for number_of_tests in range(10):
number_of_plays = 13*[0]
for i, players in enumerate(choose_players(range(13), 4)):
for p in players: number_of_plays[p] += 1
if i > 9: break
for i in range(max(number_of_plays)+1):
print '%d:%d'%(i, number_of_plays.count(i)),
print

You'll soon see that the results aren't as even as we wished, e.g.:

kallisti:~/downloads alex$ python2.3 pla.py
0:0 1:0 2:2 3:4 4:7
0:0 1:0 2:1 3:6 4:6
0:0 1:0 2:0 3:8 4:5
0:0 1:0 2:0 3:8 4:5
0:0 1:0 2:2 3:4 4:7
0:0 1:0 2:1 3:6 4:6
0:0 1:0 2:1 3:6 4:6
0:0 1:0 2:0 3:8 4:5
0:0 1:0 2:0 3:8 4:5
0:0 1:0 2:0 3:8 4:5

ooops -- the 44 available 'slots' are NOT always divided "three apiece
and a fourth one for 5 lucky ones of the 13 members" -- sometimes one or
even two members play only twice! So what's wrong with the pseudode we
had so nicely worked out...? Hmmm, clearly the 'left over' 13th player
who first gets to play on the 4th night isn't given a chance to play
again soon enough... he should go right back into the new X. Can we fix
that? We can sure try, just change in the above:
newX = list(P-set(X))
random.shuffle(newX)
fromNewX = N-len(X)
yield X + newX[-fromNewX:]
X = newX[:-fromNewX]
into:
newX = list(P-set(X))
random.shuffle(newX)
fromNewX = N-len(X)
yield X + newX[-fromNewX:]
X += newX[:-fromNewX]
random.shuffle(X)

Ah, NOW our tests are more satisfactory -- huge runs of
0:0 1:0 2:0 3:8 4:5
just as we wanted! Of course, that's no mathematical _proof_ that our
procedure is correct -- indeed, it's QUITE interesting to provide such a
proof (but I have no space left in the margins of this post to do
so...:).


I hope that by showing you how a little program can evolve in real life,
from specs to thinking in pseudocode to Python, eyeball-testing, better
testing (with counting;-), correction of some problem, ..., I may have
helped you on your way to "thinking like a programmer"!-)


Alex
 
C

Charlotte Henkle

To make a long story longer, here's the code:

Whoops...Correction:

....#! /usr/bin/env python
....#Charlotte Henkle
....
....#A program to randomly fill a tennis schedule
....#
....#The original theory looked like this:
....# gamePlayers=random.sample(templist, players_per_game)
....# print gamePlayers
....#
....#But that didn't give a weighted list for extra games
....
....import random
....
....#Eventually these will get set dynamically
....number_of_weeks=10
....players=['a', 'b', 'c', 'd', 'e', 'f', 'g']
....games_per_week=2
....players_per_game=4
....games=number_of_weeks*games_per_week
....
....#this will be used to pull off "extra game" players
....backupList=players[:]
....random.shuffle(backupList)
....
....#a templist so we can modify it.
....templist=players[:]
....
....#our finished product:
....finishedList=[]
....
....while len(finishedList)!=games:
.... if len(templist)>=players_per_game:
.... gamePlayers=[]
.... while len(gamePlayers)!=players_per_game:
.... randomNumber=random.randint(0,
len(templist)-1)
.... potentialPlayer=templist.pop(randomNumber)
.... gamePlayers.append(potentialPlayer)
.... finishedList.append(gamePlayers)
.... else:
.... gamePlayers=templist
.... print "I am the leftover game players", gamePlayers
.... print "I am the list of backup players", backupList
.... count=0
.... while len(gamePlayers)!=players_per_game:
.... print "I am a potential player "
.... potentialPlayer=backupList[count]
.... print potentialPlayer
.... print "checking to see if I'm in the game"
.... if potentialPlayer not in gamePlayers:
.... print "I do not think the player is
in the game"
.... print "I am the back-up list",
backupList
....
potentialPlayer=backupList.pop(count)
.... gamePlayers.append(potentialPlayer)
.... backupList.append(potentialPlayer)
.... print "I am the back-up list after
reorder", backupList
.... print "I am the gamePlayers after
test and insertion", gamePlayers
....
.... else:
.... print "I think that player is in
the game"
.... count+=1
.... finishedList.append(gamePlayers)
.... templist=players[:]
....
....#count the list (thank you, Steve!
....http://groups.google.com/groups?dq=...lang.python&ie=UTF-8&hl=en&btnG=Google+Search
....
....def count(item):
.... if not isinstance(item, list):
.... return {item:1}
.... counts = {}
.... for i in item:
.... for key, ct in count(i).items():
.... counts[key] = counts.get(key, 0) + ct
.... return counts
....
....def printList(weeks, games, list):
.... x=0
.... y=0
.... index=0
.... while x!=weeks:
.... print "Week: ", x+1
.... y=0
.... while y<games:
.... print "Game ",y+1, " players are ",
list[index]
.... y+=1
.... index+=1
.... x+=1
....
....#printing out and counting the final list
....
....printList(number_of_weeks, games_per_week, finishedList)
....print count(finishedList)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,008
Latest member
HaroldDark

Latest Threads

Top