Summing a 2D list

Maric Michaud · Jun 13, 2008

Le Friday 13 June 2008 14:12:40 Karsten Heymann, vous avez écrit :

Mark said:
Hi Mark,

Mark said:

I have a scenario where I have a list like this:

User Score
1 0
1 1
1 5
2 3
2 1
3 2
4 3
4 3
4 2

And I need to add up the score for each user to get something like
this:

User Score
1 6
2 4
3 2
4 8

Is this possible? If so, how can I do it? I've tried looping through
the arrays and not had much luck so far.

Click to expand...

Although your problem has already been solved, I'd like to present a
different approach which can be quite a bit faster. The most common
approach seems to be using a dictionary:

summed_up={}
for user,vote in pairs:
if summed_up.has_key(user):
summed_up[user]+=vote
else:
summed_up[user]=vote

But if the list of users is compact and the maximum value is known
before, the using a list and coding the user into the list position is
much more elegant:

So, writing C in python, which has dictionnary as builtin type, should be
considered "more elegant" ?

summed_up=list( (0,) * max_user )
for user,vote in pairs:
summed_up[user] += vote

I've run a quick and dirty test on these approaches and found that the
latter takes only half the time than the first. More precisely, with
about 2 million pairs, i got:

* dict approach: 2s
(4s with "try: ... except KeyError:" instead of the "if")
* list approach: 0.9s

You are comparing apples with lemons, there is no such a difference between
list index access and dictionnary key access in Python.

BTW this was inspired by the book "Programming Pearls" I read some
years ago where a similar approach saved some magnitudes of time
(using a bit field instead of a list to store reserved/free phone
numbers IIRC).

If you know in advance the number and names of users, what prevent you to
initialize completelly the target dictionnary ?

The following code compare the same algorithm, once with list and the second
time with dict :

#!/usr/bin/env python

def do(f, u, v) :
from time import time
n=time()
f(u, v)
return time() -n

def c_dict(users, votes) :
d = dict(((e, 0) for e in users))
for u, v in votes : d += v
return d.values()

def c_list(users, votes) :
d = [ 0 for i in users ]
for u, v in votes : d += v
return d

u = range(3000)

import random

v = list((u[r%3000], random.randint(0,10000)) for r in range(5*10**6))

print "with list", do(c_list, u, v)
print "with dict", do(c_dict, u, v)

The result is pretty close now :

maric@redflag1 17:04:36:~$ ./test.py
with list 1.40726399422
with dict 1.63094091415

So why use list where the obvious and natural data structure is a
dictionnary ?

Karsten Heymann · Jun 13, 2008

Paddy said:
How does your solution fare against the defaultdict solution of:

d = collections.defaultdict(int)
for u,s in zip(users,score): d += s

list: 0.931s
dict + "in": 1.495s
defaultdict : 1.991s
dict + "if": ~2s
dict + "try": ~4s

I've posted the (very rough) code to dpaste:

http://dpaste.com/hold/56468/

Yours
Karsten

Karsten Heymann · Jun 13, 2008

Hi Maric,

Maric Michaud said:
So, writing C in python, which has dictionnary as builtin type,
should be considered "more elegant" ?

IMO that's a bit harsh.

You are comparing apples with lemons, there is no such a difference
between list index access and dictionnary key access in Python. [...]
If you know in advance the number and names of users, what prevent
you to initialize completelly the target dictionnary ?

The following code compare the same algorithm, once with list and
the second time with dict : [...]
The result is pretty close now :

maric@redflag1 17:04:36:~$ ./test.py
with list 1.40726399422
with dict 1.63094091415

So why use list where the obvious and natural data structure is a
dictionnary ?

I'd never argue that using a dictionary is the obvious and natural
data structure for this case. But is it the best? Honestly, as your
very nice example shows, we have two solutions that are equally fast,
equally complex to code and equally robust, but one needs
approximately the double amount of memory compared to the other. So,
as much as i like dictionaries, what's the gain you get from using it
in this corner case?

Yours,
Karsten

Maric Michaud · Jun 13, 2008

Hello,

Le Friday 13 June 2008 17:55:44 Karsten Heymann, vous avez écrit :

IMO that's a bit harsh.

harsh ? Sorry, I'm not sure to understand.

You are comparing apples with lemons, there is no such a difference
between list index access and dictionnary key access in Python.
[...]

If you know in advance the number and names of users, what prevent
you to initialize completelly the target dictionnary ?

The following code compare the same algorithm, once with list and
the second time with dict :
[...]

The result is pretty close now :

maric@redflag1 17:04:36:~$ ./test.py
with list 1.40726399422
with dict 1.63094091415

So why use list where the obvious and natural data structure is a
dictionnary ?

Click to expand...

I'd never argue that using a dictionary is the obvious and natural
data structure for this case. But is it the best? Honestly, as your
very nice example shows, we have two solutions that are equally fast,
equally complex to code and equally robust, but one needs

Yes, but my example take ordered integer for keys (users' names) which they
should not be in a real case, so retrieving the result is by way easier (and
faster) with a dictionnary.

approximately the double amount of memory compared to the other.

I don't see how you came to this conclusion. Are you sure the extra list take
twice more memory than the extra dictionary ?

So, as much as i like dictionaries, what's the gain you get from using it
in this corner case?

It's the very purpose of it's usage, store and retrieve data by key.

Cheers,

Maric Michaud · Jun 13, 2008

Le Friday 13 June 2008 18:55:24 Maric Michaud, vous avez écrit :

I don't see how you came to this conclusion. Are you sure the extra list
take twice more memory than the extra dictionary ?

twice less, I meant, of course...

Karsten Heymann · Jun 14, 2008

Maric Michaud said:
Le Friday 13 June 2008 17:55:44 Karsten Heymann, vous avez Ã©critÂ :

harsh ? Sorry, I'm not sure to understand.

Never mind, I got carried away.

Yes, but my example take ordered integer for keys (users' names)
which they should not be in a real case, so retrieving the result is
by way easier (and faster) with a dictionnary.

Of course. As I wrote in my first post, my proposal dependeds upon the
users being a numerical and compact list with a known maximum, as in
the OP's example. Otherwise my approach makes no sense at all

I don't see how you came to this conclusion. Are you sure the extra
list take twice more memory than the extra dictionary ?

I'm referring to the resulting data structure. A list with n items
should take approx. half the space of a dictionary with n keys and n
entries. Or, the other way round, if i have a dictionary with keys
1..n, why shouldn't I use a list instead.

Yours,
Karsten

sturlamolden · Jun 14, 2008

Is this possible?

def foobar(user,score):
sums = {}
for u,s in zip(user,score):
try:
sums += s
except KeyError:
sums = s
return [(u, sums) for u in sums].sort()

usersum = foobar(user,score)
for u,s in usersum:
print "%d %d" % (u,s)

MRAB · Jun 14, 2008

Is this possible?

Click to expand...

def foobar(user,score):
sums = {}
for u,s in zip(user,score):
try:
sums += s
except KeyError:
sums = s
return [(u, sums) for u in sums].sort()

sort() sorts the list in-place and returns None. Try this instead:

return sorted([(u, sums) for u in sums])

or, better yet:

return sorted(sums.items())

C++, SFML, SDL2 - 2D game	0	Mar 19, 2023
Range / empty list issues??	1	Dec 11, 2023
Reading File Into 2D List	2	Jul 9, 2013
Strange behavior for a 2D list	1	Apr 18, 2013
groupby - summing multiple columns in a list of lists	1	May 17, 2011
Inefficient summing	10	Oct 8, 2008
Average of MultiMode of a list of a list	1	Oct 28, 2022
Copy string from 2D array to a 1D array in C	1	Nov 1, 2023

Summing a 2D list

Maric Michaud

Karsten Heymann

Karsten Heymann

Maric Michaud

Maric Michaud

Karsten Heymann

sturlamolden

MRAB

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads