# Summing a 2D list

Discussion in 'Python' started by Mark, Jun 12, 2008.

1. ### MarkGuest

Hi all,

I have a scenario where I have a list like this:

User Score
1 0
1 1
1 5
2 3
2 1
3 2
4 3
4 3
4 2

And I need to add up the score for each user to get something like
this:

User Score
1 6
2 4
3 2
4 8

Is this possible? If so, how can I do it? I've tried looping through
the arrays and not had much luck so far.

Any help much appreciated,

Mark

Mark, Jun 12, 2008

2. ### Diez B. RoggischGuest

Mark wrote:

> Hi all,
>
> I have a scenario where I have a list like this:
>
> User Score
> 1 0
> 1 1
> 1 5
> 2 3
> 2 1
> 3 2
> 4 3
> 4 3
> 4 2
>
> And I need to add up the score for each user to get something like
> this:
>
> User Score
> 1 6
> 2 4
> 3 2
> 4 8
>
> Is this possible? If so, how can I do it? I've tried looping through
> the arrays and not had much luck so far.
>
> Any help much appreciated,

Show us your efforts in code so far. Especially what the actual data looks
like. Then we can suggest a solution.

Diez

Diez B. Roggisch, Jun 12, 2008

3. ### ChrisGuest

On Jun 12, 3:48 pm, Mark <> wrote:
> Hi all,
>
> I have a scenario where I have a list like this:
>
> User            Score
> 1                 0
> 1                 1
> 1                 5
> 2                 3
> 2                 1
> 3                 2
> 4                 3
> 4                 3
> 4                 2
>
> And I need to add up the score for each user to get something like
> this:
>
> User            Score
> 1                 6
> 2                 4
> 3                 2
> 4                 8
>
> Is this possible? If so, how can I do it? I've tried looping through
> the arrays and not had much luck so far.
>
> Any help much appreciated,
>
> Mark

user_score = {}
for record in list:
user, score = record.split()
if user in user_score: user_score[user] += score
else: user_score[user] = score

print '\n'.join(['%s\t%s' % (user, score) for user,score in
sorted(user_score.items())])

You don't mention what data structure you are keeping your records in
but hopefully this helps you in the right direction.

Chris, Jun 12, 2008
4. ### MarkGuest

On Jun 12, 3:02 pm, "Diez B. Roggisch" <> wrote:
> Mark wrote:
> > Hi all,

>
> > I have a scenario where I have a list like this:

>
> > User            Score
> > 1                 0
> > 1                 1
> > 1                 5
> > 2                 3
> > 2                 1
> > 3                 2
> > 4                 3
> > 4                 3
> > 4                 2

>
> > And I need to add up the score for each user to get something like
> > this:

>
> > User            Score
> > 1                 6
> > 2                 4
> > 3                 2
> > 4                 8

>
> > Is this possible? If so, how can I do it? I've tried looping through
> > the arrays and not had much luck so far.

>
> > Any help much appreciated,

>
> Show us your efforts in code so far. Especially what the actual data looks
> like. Then we can suggest a solution.
>
> Diez

Hi Diez, thanks for the quick reply.

To be honest I'm relatively new to Python, so I don't know too much
about how all the loop constructs work and how they differ to other
languages. I'm building an app in Django and this data is coming out
of a database and it looks like what I put up there!

This was my (failed) attempt:

predictions = Prediction.objects.all()
scores = []
for prediction in predictions:
i = [prediction.predictor.id, 0]
if prediction.predictionscore:
i[1] += int(prediction.predictionscore)
scores.append(i)

I did have another loop in there (I'm fairly sure I need one) but that
didn't work either. I don't imagine that snippet is very helpful,
sorry!

Any tips would be gratefully recieved!

Thanks,

Mark

Mark, Jun 12, 2008
5. ### AidanGuest

Mark wrote:
> Hi all,
>
> I have a scenario where I have a list like this:
>
> User Score
> 1 0
> 1 1
> 1 5
> 2 3
> 2 1
> 3 2
> 4 3
> 4 3
> 4 2
>
> And I need to add up the score for each user to get something like
> this:
>
> User Score
> 1 6
> 2 4
> 3 2
> 4 8
>
> Is this possible? If so, how can I do it? I've tried looping through
> the arrays and not had much luck so far.
>
> Any help much appreciated,
>
> Mark

does this work for you?

users = [1,1,1,2,2,3,4,4,4]
score = [0,1,5,3,1,2,3,3,2]

d = dict()

for u,s in zip(users,score):
if d.has_key(u):
d += s
else:
d = s

for key in d.keys():
print 'user: %d\nscore: %d\n' % (key,d[key])

Aidan, Jun 12, 2008
6. ### John SalernoGuest

"Mark" <> wrote in message
news:...
On Jun 12, 3:02 pm, "Diez B. Roggisch" <> wrote:
> Mark wrote:

---
This was my (failed) attempt:

predictions = Prediction.objects.all()
scores = []
for prediction in predictions:
i = [prediction.predictor.id, 0]
if prediction.predictionscore:
i[1] += int(prediction.predictionscore)
scores.append(i)
---

Your question sounds like a fun little project, but can you post what the
actual list of users/scores looks like? Is it a list of tuples like this:

[(1, 0), (1, 1) ... ]

Or something else?

John Salerno, Jun 12, 2008
7. ### Diez B. RoggischGuest

> To be honest I'm relatively new to Python, so I don't know too much
> about how all the loop constructs work and how they differ to other
> languages. I'm building an app in Django and this data is coming out
> of a database and it looks like what I put up there!
>
> This was my (failed) attempt:
>
> predictions = Prediction.objects.all()
> scores = []
> for prediction in predictions:
> i = [prediction.predictor.id, 0]
> if prediction.predictionscore:
> i[1] += int(prediction.predictionscore)
> scores.append(i)
>
> I did have another loop in there (I'm fairly sure I need one) but that
> didn't work either. I don't imagine that snippet is very helpful,
> sorry!

It is helpful because it tells us what your actual data looks like.

What you need is to get a list of (predictor, score)-pairs. These you should
be able to get like this:

l = [(p.predictor.id, p.predictionscore) for p in predictions]

Now you need to sort this list - because in the next step, we will aggregate
the values for each predictor.

result = []
current_predictor = None
total_sum = 0
for predictor, score in l:
if predictor != current_predictor:
# only if we really have a current_predictor,
# the non-existent first one doesn't count
if current_predictor is not None:
result.append((predictor, total_sum))
total_sum = 0
current_predictor = predictor
total_sum += score

That should be roughly it.

Diez

Diez B. Roggisch, Jun 12, 2008
8. ### MarkGuest

John, it's a QuerySet coming from a database in Django. I don't know
enough about the structure of this object to go into detail I'm
afraid.

Aidan, I got an error trying your suggestion: 'zip argument #2 must
support iteration', I don't know what this means!

Thanks to all who have answered! Sorry I'm not being very specific!

Mark, Jun 12, 2008
9. ### AidanGuest

Mark wrote:
> John, it's a QuerySet coming from a database in Django. I don't know
> enough about the structure of this object to go into detail I'm
> afraid.
>
> Aidan, I got an error trying your suggestion: 'zip argument #2 must
> support iteration', I don't know what this means!

well, if we can create 2 iterable sequences one which contains the user
the other the scores, it should work

the error means that the second argument to the zip function was not an
iterable, such as a list tuple or string

can you show me the lines you're using to retrieve the data sets from
the database? then i might be able to tell you how to build the 2 lists
you need.

> Thanks to all who have answered! Sorry I'm not being very specific!

Aidan, Jun 12, 2008
10. ### AidanGuest

Aidan wrote:
> Mark wrote:
>> John, it's a QuerySet coming from a database in Django. I don't know
>> enough about the structure of this object to go into detail I'm
>> afraid.
>>
>> Aidan, I got an error trying your suggestion: 'zip argument #2 must
>> support iteration', I don't know what this means!

>
> well, if we can create 2 iterable sequences one which contains the user
> the other the scores, it should work
>
> the error means that the second argument to the zip function was not an
> iterable, such as a list tuple or string
>
> can you show me the lines you're using to retrieve the data sets from
> the database? then i might be able to tell you how to build the 2 lists
> you need.
>

predictions = Prediction.objects.all()
pairs = [(p.predictor.id,p.predictionscore) for p in predictions]

those 2 lines will will build a list of user/score pairs. you can then
replace the call to zip with pairs

any luck?

Aidan, Jun 12, 2008
11. ### Gerhard HäringGuest

Mark wrote:
> John, it's a QuerySet coming from a database in Django. I don't know
> enough about the structure of this object to go into detail I'm
> afraid. [...]

Then let the database do the summing up. That's what it's there for

select user, sum(score) from score_table
group by user

or something very similar, depending on the actual database schema. I
don't know how to do this with Django's ORM, but is the way to do it in
plain SQL.

-- Gerhard

Gerhard Häring, Jun 12, 2008
12. ### Gerhard HäringGuest

Counting things fast - was Re: Summing a 2D list

Aidan wrote:
> does this work for you?
>
> users = [1,1,1,2,2,3,4,4,4]
> score = [0,1,5,3,1,2,3,3,2]
>
> d = dict()
>
> for u,s in zip(users,score):
> if d.has_key(u):
> d += s
> else:
> d = s
>
> for key in d.keys():
> print 'user: %d\nscore: %d\n' % (key,d[key])

I've recently had the very same problem and needed to optimize for the
best solution. I've tried quite a few, including:

1) using a dictionary with a default value

d = collections.defaultdict(lambda: 0)
d[key] += value

2) Trying out if avoiding object allocation is worth the effort. Using
Cython:

cdef class Counter:
cdef int _counter
def __init__(self):
self._counter = 0

def inc(self):
self._counter += 1

def __int__(self):
return self._counter

self._counter += 1
return self

And no, this was *not* faster than the final solution. This counter
class, which is basically a mutable int, is exactly as fast as just
using this one (final solution) - tada!

counter = {}
try:
counter[key] += 1
except KeyError:
counter[key] = 1

Using psyco makes this a bit faster still. psyco can't optimize
defaultdict or my custom Counter class, though.

-- Gerhard

Gerhard Häring, Jun 12, 2008
13. ### MarkGuest

On Jun 12, 3:45 pm, Aidan <> wrote:
> Aidan wrote:
> > Mark wrote:
> >> John, it's a QuerySet coming from a database in Django. I don't know
> >> enough about the structure of this object to go into detail I'm
> >> afraid.

>
> >> Aidan, I got an error trying your suggestion: 'zip argument #2 must
> >> support iteration', I don't know what this means!

>
> > well, if we can create 2 iterable sequences one which contains the user
> > the other the scores, it should work

>
> > the error means that the second argument to the zip function was not an
> > iterable, such as a list tuple or string

>
> > can you show me the lines you're using to retrieve the data sets from
> > the database? then i might be able to tell you how to build the 2 lists
> > you need.

>
>
> predictions = Prediction.objects.all()
> pairs = [(p.predictor.id,p.predictionscore) for p in predictions]
>
> those 2 lines will will build a list of user/score pairs.  you can then
> replace the call to zip with pairs
>
> any luck?

Thanks Aidan, this works great!

Thanks also to everyone else, I'm sure your suggestions would have
worked too if I'd been competent enough to do them properly!

Mark, Jun 12, 2008

Re: Counting things fast - was Re: Summing a 2D list

On Jun 12, 4:14 pm, Gerhard Häring <> wrote:
> Aidan wrote:
> > does this work for you?

>
> > users = [1,1,1,2,2,3,4,4,4]
> > score = [0,1,5,3,1,2,3,3,2]

>
> > d = dict()

>
> > for u,s in zip(users,score):
> > if d.has_key(u):
> > d += s
> > else:
> > d = s

>
> > for key in d.keys():
> > print 'user: %d\nscore: %d\n' % (key,d[key])

>
> I've recently had the very same problem and needed to optimize for the
> best solution. I've tried quite a few, including:
>
> 1) using a dictionary with a default value
>
> d = collections.defaultdict(lambda: 0)
> d[key] += value
>

<<SNIP>>
> -- Gerhard

This might be faster, by avoiding the lambda:

d = collections.defaultdict(int)
d[key] += value

15. ### Karsten HeymannGuest

Hi Mark,

Mark <> writes:
> I have a scenario where I have a list like this:
>
> User Score
> 1 0
> 1 1
> 1 5
> 2 3
> 2 1
> 3 2
> 4 3
> 4 3
> 4 2
>
> And I need to add up the score for each user to get something like
> this:
>
> User Score
> 1 6
> 2 4
> 3 2
> 4 8
>
> Is this possible? If so, how can I do it? I've tried looping through
> the arrays and not had much luck so far.

Although your problem has already been solved, I'd like to present a
different approach which can be quite a bit faster. The most common
approach seems to be using a dictionary:

summed_up={}
for user,vote in pairs:
if summed_up.has_key(user):
summed_up[user]+=vote
else:
summed_up[user]=vote

But if the list of users is compact and the maximum value is known
before, the using a list and coding the user into the list position is
much more elegant:

summed_up=list( (0,) * max_user )
for user,vote in pairs:
summed_up[user] += vote

I've run a quick and dirty test on these approaches and found that the
latter takes only half the time than the first. More precisely, with
about 2 million pairs, i got:

* dict approach: 2s
(4s with "try: ... except KeyError:" instead of the "if")
* list approach: 0.9s

BTW this was inspired by the book "Programming Pearls" I read some
years ago where a similar approach saved some magnitudes of time
(using a bit field instead of a list to store reserved/free phone
numbers IIRC).

Yours,
Karsten

Karsten Heymann, Jun 13, 2008
16. ### BJörn LindqvistGuest

On Fri, Jun 13, 2008 at 2:12 PM, Karsten Heymann
<> wrote:
> Although your problem has already been solved, I'd like to present a
> different approach which can be quite a bit faster. The most common
> approach seems to be using a dictionary:
>
> summed_up={}
> for user,vote in pairs:
> if summed_up.has_key(user):
> summed_up[user]+=vote
> else:
> summed_up[user]=vote

You'll save even more by using:

if user in summed_up:

--
mvh Björn

BJörn Lindqvist, Jun 13, 2008
17. ### BJörn LindqvistGuest

On Thu, Jun 12, 2008 at 3:48 PM, Mark <> wrote:
> Hi all,
>
> I have a scenario where I have a list like this:
>
> User Score
> 1 0
> 1 1
> 1 5
> 2 3
> 2 1
> 3 2
> 4 3
> 4 3
> 4 2
>
> And I need to add up the score for each user to get something like
> this:
>
> User Score
> 1 6
> 2 4
> 3 2
> 4 8
>
> Is this possible? If so, how can I do it? I've tried looping through
> the arrays and not had much luck so far.

Here is another solution:

from itertools import groupby
from operator import itemgetter

users = [1, 1, 1, 2, 2, 3, 4, 4, 4]
scores = [0, 1, 5, 3, 1, 2, 3, 3, 2]

for u, s in groupby(zip(users, scores), itemgetter(0)):
print u, sum(y for x, y in s)

You probably should reconsider how you can store your data in a more
efficient format.

--
mvh Björn

BJörn Lindqvist, Jun 13, 2008
18. ### Gerhard HäringGuest

BJörn Lindqvist wrote:
> [...]
> Here is another solution:
>
> from itertools import groupby
> from operator import itemgetter
>
> users = [1, 1, 1, 2, 2, 3, 4, 4, 4]
> scores = [0, 1, 5, 3, 1, 2, 3, 3, 2]
>
> for u, s in groupby(zip(users, scores), itemgetter(0)):
> print u, sum(y for x, y in s)

Except that this won't work unless users and scores are sorted by user
first. groupby() only coalesces identical values, and doesn't do what a
"GROUP BY" clause in SQL is doing.

Adding a sorted() call around zip() should be enough to make groupby()
actually useful.

But then, this code definitely starts to look like somebody desperately
wanted to find a use for Python's new gimmicks.

Here's more of the same sort ;-)

>>> import sqlite3
>>> sqlite3.connect(":memory:").execute("create table tmp(user,

score)").executemany("insert into tmp(user, score) values (?, ?)",
zip(users, scores)).execute("select user, sum(score) from tmp group by
user").fetchall()
[(1, 6), (2, 4), (3, 2), (4, 8)]

-- Gerhard

Gerhard Häring, Jun 13, 2008
19. ### Karsten HeymannGuest

Hi BjÃ¶rn,

"BJÃ¶rn Lindqvist" <> writes:
> On Fri, Jun 13, 2008 at 2:12 PM, Karsten Heymann
> <> wrote:
>> summed_up={}
>> for user,vote in pairs:
>> if summed_up.has_key(user):
>> summed_up[user]+=vote
>> else:
>> summed_up[user]=vote

>
> You'll save even more by using:
>
> if user in summed_up:
>

You're right, then it goes down to 1.5s (compared to 0.9 for the pure
list version). Pythons dictionaries are really great

Yours
Karsten

Karsten Heymann, Jun 13, 2008

On Jun 13, 1:12 pm, Karsten Heymann <>
wrote:
> Hi Mark,
>
>
>
> Mark <> writes:
> > I have a scenario where I have a list like this:

>
> > User Score
> > 1 0
> > 1 1
> > 1 5
> > 2 3
> > 2 1
> > 3 2
> > 4 3
> > 4 3
> > 4 2

>
> > And I need to add up the score for each user to get something like
> > this:

>
> > User Score
> > 1 6
> > 2 4
> > 3 2
> > 4 8

>
> > Is this possible? If so, how can I do it? I've tried looping through
> > the arrays and not had much luck so far.

>
> Although your problem has already been solved, I'd like to present a
> different approach which can be quite a bit faster. The most common
> approach seems to be using a dictionary:
>
> summed_up={}
> for user,vote in pairs:
> if summed_up.has_key(user):
> summed_up[user]+=vote
> else:
> summed_up[user]=vote
>
> But if the list of users is compact and the maximum value is known
> before, the using a list and coding the user into the list position is
> much more elegant:
>
> summed_up=list( (0,) * max_user )
> for user,vote in pairs:
> summed_up[user] += vote
>
> I've run a quick and dirty test on these approaches and found that the
> latter takes only half the time than the first. More precisely, with
> about 2 million pairs, i got:
>
> * dict approach: 2s
> (4s with "try: ... except KeyError:" instead of the "if")
> * list approach: 0.9s
>
> BTW this was inspired by the book "Programming Pearls" I read some
> years ago where a similar approach saved some magnitudes of time
> (using a bit field instead of a list to store reserved/free phone
> numbers IIRC).
>
> Yours,
> Karsten

How does your solution fare against the defaultdict solution of:

d = collections.defaultdict(int)
for u,s in zip(users,score): d += s