Summing a 2D list

M

Mark

Hi all,

I have a scenario where I have a list like this:

User Score
1 0
1 1
1 5
2 3
2 1
3 2
4 3
4 3
4 2

And I need to add up the score for each user to get something like
this:

User Score
1 6
2 4
3 2
4 8

Is this possible? If so, how can I do it? I've tried looping through
the arrays and not had much luck so far.

Any help much appreciated,

Mark
 
D

Diez B. Roggisch

Mark said:
Hi all,

I have a scenario where I have a list like this:

User Score
1 0
1 1
1 5
2 3
2 1
3 2
4 3
4 3
4 2

And I need to add up the score for each user to get something like
this:

User Score
1 6
2 4
3 2
4 8

Is this possible? If so, how can I do it? I've tried looping through
the arrays and not had much luck so far.

Any help much appreciated,

Show us your efforts in code so far. Especially what the actual data looks
like. Then we can suggest a solution.

Diez
 
C

Chris

Hi all,

I have a scenario where I have a list like this:

User            Score
1                 0
1                 1
1                 5
2                 3
2                 1
3                 2
4                 3
4                 3
4                 2

And I need to add up the score for each user to get something like
this:

User            Score
1                 6
2                 4
3                 2
4                 8

Is this possible? If so, how can I do it? I've tried looping through
the arrays and not had much luck so far.

Any help much appreciated,

Mark

user_score = {}
for record in list:
user, score = record.split()
if user in user_score: user_score[user] += score
else: user_score[user] = score

print '\n'.join(['%s\t%s' % (user, score) for user,score in
sorted(user_score.items())])

You don't mention what data structure you are keeping your records in
but hopefully this helps you in the right direction.
 
M

Mark

Show us your efforts in code so far. Especially what the actual data looks
like. Then we can suggest a solution.

Diez

Hi Diez, thanks for the quick reply.

To be honest I'm relatively new to Python, so I don't know too much
about how all the loop constructs work and how they differ to other
languages. I'm building an app in Django and this data is coming out
of a database and it looks like what I put up there!

This was my (failed) attempt:

predictions = Prediction.objects.all()
scores = []
for prediction in predictions:
i = [prediction.predictor.id, 0]
if prediction.predictionscore:
i[1] += int(prediction.predictionscore)
scores.append(i)

I did have another loop in there (I'm fairly sure I need one) but that
didn't work either. I don't imagine that snippet is very helpful,
sorry!

Any tips would be gratefully recieved!

Thanks,

Mark
 
A

Aidan

Mark said:
Hi all,

I have a scenario where I have a list like this:

User Score
1 0
1 1
1 5
2 3
2 1
3 2
4 3
4 3
4 2

And I need to add up the score for each user to get something like
this:

User Score
1 6
2 4
3 2
4 8

Is this possible? If so, how can I do it? I've tried looping through
the arrays and not had much luck so far.

Any help much appreciated,

Mark


does this work for you?


users = [1,1,1,2,2,3,4,4,4]
score = [0,1,5,3,1,2,3,3,2]

d = dict()

for u,s in zip(users,score):
if d.has_key(u):
d += s
else:
d = s

for key in d.keys():
print 'user: %d\nscore: %d\n' % (key,d[key])
 
J

John Salerno

Mark wrote:

---
This was my (failed) attempt:

predictions = Prediction.objects.all()
scores = []
for prediction in predictions:
i = [prediction.predictor.id, 0]
if prediction.predictionscore:
i[1] += int(prediction.predictionscore)
scores.append(i)
---

Your question sounds like a fun little project, but can you post what the
actual list of users/scores looks like? Is it a list of tuples like this:

[(1, 0), (1, 1) ... ]

Or something else?
 
D

Diez B. Roggisch

To be honest I'm relatively new to Python, so I don't know too much
about how all the loop constructs work and how they differ to other
languages. I'm building an app in Django and this data is coming out
of a database and it looks like what I put up there!

This was my (failed) attempt:

predictions = Prediction.objects.all()
scores = []
for prediction in predictions:
i = [prediction.predictor.id, 0]
if prediction.predictionscore:
i[1] += int(prediction.predictionscore)
scores.append(i)

I did have another loop in there (I'm fairly sure I need one) but that
didn't work either. I don't imagine that snippet is very helpful,
sorry!

It is helpful because it tells us what your actual data looks like.

What you need is to get a list of (predictor, score)-pairs. These you should
be able to get like this:

l = [(p.predictor.id, p.predictionscore) for p in predictions]

Now you need to sort this list - because in the next step, we will aggregate
the values for each predictor.

result = []
current_predictor = None
total_sum = 0
for predictor, score in l:
if predictor != current_predictor:
# only if we really have a current_predictor,
# the non-existent first one doesn't count
if current_predictor is not None:
result.append((predictor, total_sum))
total_sum = 0
current_predictor = predictor
total_sum += score

That should be roughly it.

Diez
 
M

Mark

John, it's a QuerySet coming from a database in Django. I don't know
enough about the structure of this object to go into detail I'm
afraid.

Aidan, I got an error trying your suggestion: 'zip argument #2 must
support iteration', I don't know what this means!

Thanks to all who have answered! Sorry I'm not being very specific!
 
A

Aidan

Mark said:
John, it's a QuerySet coming from a database in Django. I don't know
enough about the structure of this object to go into detail I'm
afraid.

Aidan, I got an error trying your suggestion: 'zip argument #2 must
support iteration', I don't know what this means!

well, if we can create 2 iterable sequences one which contains the user
the other the scores, it should work

the error means that the second argument to the zip function was not an
iterable, such as a list tuple or string

can you show me the lines you're using to retrieve the data sets from
the database? then i might be able to tell you how to build the 2 lists
you need.
 
A

Aidan

Aidan said:
well, if we can create 2 iterable sequences one which contains the user
the other the scores, it should work

the error means that the second argument to the zip function was not an
iterable, such as a list tuple or string

can you show me the lines you're using to retrieve the data sets from
the database? then i might be able to tell you how to build the 2 lists
you need.

wait you already did...

predictions = Prediction.objects.all()
pairs = [(p.predictor.id,p.predictionscore) for p in predictions]

those 2 lines will will build a list of user/score pairs. you can then
replace the call to zip with pairs

any luck?
 
G

Gerhard Häring

Mark said:
John, it's a QuerySet coming from a database in Django. I don't know
enough about the structure of this object to go into detail I'm
afraid. [...]

Then let the database do the summing up. That's what it's there for :)

select user, sum(score) from score_table
group by user

or something very similar, depending on the actual database schema. I
don't know how to do this with Django's ORM, but is the way to do it in
plain SQL.

-- Gerhard
 
G

Gerhard Häring

Aidan said:
does this work for you?

users = [1,1,1,2,2,3,4,4,4]
score = [0,1,5,3,1,2,3,3,2]

d = dict()

for u,s in zip(users,score):
if d.has_key(u):
d += s
else:
d = s

for key in d.keys():
print 'user: %d\nscore: %d\n' % (key,d[key])


I've recently had the very same problem and needed to optimize for the
best solution. I've tried quite a few, including:

1) using a dictionary with a default value

d = collections.defaultdict(lambda: 0)
d[key] += value

2) Trying out if avoiding object allocation is worth the effort. Using
Cython:

cdef class Counter:
cdef int _counter
def __init__(self):
self._counter = 0

def inc(self):
self._counter += 1

def __int__(self):
return self._counter

def __iadd__(self, operand):
self._counter += 1
return self

And no, this was *not* faster than the final solution. This counter
class, which is basically a mutable int, is exactly as fast as just
using this one (final solution) - tada!

counter = {}
try:
counter[key] += 1
except KeyError:
counter[key] = 1

Using psyco makes this a bit faster still. psyco can't optimize
defaultdict or my custom Counter class, though.

-- Gerhard
 
M

Mark

well, if we can create 2 iterable sequences one which contains the user
the other the scores, it should work
the error means that the second argument to the zip function was not an
iterable, such as a list tuple or string
can you show me the lines you're using to retrieve the data sets from
the database? then i might be able to tell you how to build the 2 lists
you need.

wait you already did...

predictions = Prediction.objects.all()
pairs = [(p.predictor.id,p.predictionscore) for p in predictions]

those 2 lines will will build a list of user/score pairs.  you can then
replace the call to zip with pairs

any luck?

Thanks Aidan, this works great!

Thanks also to everyone else, I'm sure your suggestions would have
worked too if I'd been competent enough to do them properly!
 
P

Paddy

Aidan said:
does this work for you?
users = [1,1,1,2,2,3,4,4,4]
score = [0,1,5,3,1,2,3,3,2]
d = dict()
for u,s in zip(users,score):
if d.has_key(u):
d += s
else:
d = s

for key in d.keys():
print 'user: %d\nscore: %d\n' % (key,d[key])

I've recently had the very same problem and needed to optimize for the
best solution. I've tried quite a few, including:

1) using a dictionary with a default value

d = collections.defaultdict(lambda: 0)
d[key] += value

-- Gerhard

This might be faster, by avoiding the lambda:

d = collections.defaultdict(int)
d[key] += value

- Paddy.
 
K

Karsten Heymann

Hi Mark,

Mark said:
I have a scenario where I have a list like this:

User Score
1 0
1 1
1 5
2 3
2 1
3 2
4 3
4 3
4 2

And I need to add up the score for each user to get something like
this:

User Score
1 6
2 4
3 2
4 8

Is this possible? If so, how can I do it? I've tried looping through
the arrays and not had much luck so far.

Although your problem has already been solved, I'd like to present a
different approach which can be quite a bit faster. The most common
approach seems to be using a dictionary:

summed_up={}
for user,vote in pairs:
if summed_up.has_key(user):
summed_up[user]+=vote
else:
summed_up[user]=vote

But if the list of users is compact and the maximum value is known
before, the using a list and coding the user into the list position is
much more elegant:

summed_up=list( (0,) * max_user )
for user,vote in pairs:
summed_up[user] += vote

I've run a quick and dirty test on these approaches and found that the
latter takes only half the time than the first. More precisely, with
about 2 million pairs, i got:

* dict approach: 2s
(4s with "try: ... except KeyError:" instead of the "if")
* list approach: 0.9s

BTW this was inspired by the book "Programming Pearls" I read some
years ago where a similar approach saved some magnitudes of time
(using a bit field instead of a list to store reserved/free phone
numbers IIRC).

Yours,
Karsten
 
B

BJörn Lindqvist

Although your problem has already been solved, I'd like to present a
different approach which can be quite a bit faster. The most common
approach seems to be using a dictionary:

summed_up={}
for user,vote in pairs:
if summed_up.has_key(user):
summed_up[user]+=vote
else:
summed_up[user]=vote

You'll save even more by using:

if user in summed_up:

instead of has_key.
 
B

BJörn Lindqvist

Hi all,

I have a scenario where I have a list like this:

User Score
1 0
1 1
1 5
2 3
2 1
3 2
4 3
4 3
4 2

And I need to add up the score for each user to get something like
this:

User Score
1 6
2 4
3 2
4 8

Is this possible? If so, how can I do it? I've tried looping through
the arrays and not had much luck so far.

Here is another solution:

from itertools import groupby
from operator import itemgetter

users = [1, 1, 1, 2, 2, 3, 4, 4, 4]
scores = [0, 1, 5, 3, 1, 2, 3, 3, 2]

for u, s in groupby(zip(users, scores), itemgetter(0)):
print u, sum(y for x, y in s)

You probably should reconsider how you can store your data in a more
efficient format.
 
G

Gerhard Häring

BJörn Lindqvist said:
[...]
Here is another solution:

from itertools import groupby
from operator import itemgetter

users = [1, 1, 1, 2, 2, 3, 4, 4, 4]
scores = [0, 1, 5, 3, 1, 2, 3, 3, 2]

for u, s in groupby(zip(users, scores), itemgetter(0)):
print u, sum(y for x, y in s)

Except that this won't work unless users and scores are sorted by user
first. groupby() only coalesces identical values, and doesn't do what a
"GROUP BY" clause in SQL is doing.

Adding a sorted() call around zip() should be enough to make groupby()
actually useful.

But then, this code definitely starts to look like somebody desperately
wanted to find a use for Python's new gimmicks.

Here's more of the same sort ;-)
score)").executemany("insert into tmp(user, score) values (?, ?)",
zip(users, scores)).execute("select user, sum(score) from tmp group by
user").fetchall()
[(1, 6), (2, 4), (3, 2), (4, 8)]

-- Gerhard
 
K

Karsten Heymann

Hi Björn,

BJörn Lindqvist said:
summed_up={}
for user,vote in pairs:
if summed_up.has_key(user):
summed_up[user]+=vote
else:
summed_up[user]=vote

You'll save even more by using:

if user in summed_up:

instead of has_key.

You're right, then it goes down to 1.5s (compared to 0.9 for the pure
list version). Pythons dictionaries are really great :)

Yours
Karsten
 
P

Paddy

Hi Mark,



Mark said:
I have a scenario where I have a list like this:
User Score
1 0
1 1
1 5
2 3
2 1
3 2
4 3
4 3
4 2
And I need to add up the score for each user to get something like
this:
User Score
1 6
2 4
3 2
4 8
Is this possible? If so, how can I do it? I've tried looping through
the arrays and not had much luck so far.

Although your problem has already been solved, I'd like to present a
different approach which can be quite a bit faster. The most common
approach seems to be using a dictionary:

summed_up={}
for user,vote in pairs:
if summed_up.has_key(user):
summed_up[user]+=vote
else:
summed_up[user]=vote

But if the list of users is compact and the maximum value is known
before, the using a list and coding the user into the list position is
much more elegant:

summed_up=list( (0,) * max_user )
for user,vote in pairs:
summed_up[user] += vote

I've run a quick and dirty test on these approaches and found that the
latter takes only half the time than the first. More precisely, with
about 2 million pairs, i got:

* dict approach: 2s
(4s with "try: ... except KeyError:" instead of the "if")
* list approach: 0.9s

BTW this was inspired by the book "Programming Pearls" I read some
years ago where a similar approach saved some magnitudes of time
(using a bit field instead of a list to store reserved/free phone
numbers IIRC).

Yours,
Karsten

How does your solution fare against the defaultdict solution of:

d = collections.defaultdict(int)
for u,s in zip(users,score): d += s


- Paddy
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,020
Latest member
GenesisGai

Latest Threads

Top