groupby - summing multiple columns in a list of lists

J

Jackson

I'm currently using a function pasted in below. This allows me to sum
a column (index) in a list of lists.

So if mylist = [[1, 2, 3], [1, 3, 4], [2, 3, 4], [2, 4, 5]]
group_results(mylist,[0],1)

Returns:
[(1, 5), (2, 7)]

What I would like to do is allow a tuple/list of index values, rather
than a single index value to be summed up, so you could say
group_results(mylist,[0],[1,2]) would return [(1, 5,7), (2, 7,9)] but
I'm struggling to do so, any thoughts? Cheers

from itertools import groupby as gb
from operator import itemgetter as ig

def group_results(table,keys,value):
res = []
nkey = ig(*keys)
value = ig(value)
for k, group in gb(sorted(table,key=ig(*keys)),nkey):
res.append((k,sum(value(row) for row in group)))
return res
 
P

Peter Otten

Jackson said:
I'm currently using a function pasted in below. This allows me to sum
a column (index) in a list of lists.

So if mylist = [[1, 2, 3], [1, 3, 4], [2, 3, 4], [2, 4, 5]]
group_results(mylist,[0],1)

Returns:
[(1, 5), (2, 7)]

What I would like to do is allow a tuple/list of index values, rather
than a single index value to be summed up, so you could say
group_results(mylist,[0],[1,2]) would return [(1, 5,7), (2, 7,9)] but
I'm struggling to do so, any thoughts? Cheers

from itertools import groupby as gb
from operator import itemgetter as ig

def group_results(table,keys,value):
res = []
nkey = ig(*keys)
value = ig(value)
for k, group in gb(sorted(table,key=ig(*keys)),nkey):
res.append((k,sum(value(row) for row in group)))
return res

You could write a version of sum() that can cope with tuples:

from itertools import groupby, imap

def itemgetter(keys, rowtype=tuple):
def getitem(value):
return rowtype(value[key] for key in keys)
return getitem

def sum_all(rows):
rows = iter(rows)
sigma = next(rows)
rowtype = type(sigma)
sigma = list(sigma)
for row in rows:
for i, x in enumerate(row):
sigma += x
return rowtype(sigma)

def group_results(table, key, value):
get_key = itemgetter(key)
get_value = itemgetter(value)
table = sorted(table, key=get_key)
for keyvalue, group in groupby(table, get_key):
yield keyvalue + sum_all(imap(get_value, group))

but I'd probably use a dict-based approach:

def group_results(table, key, value):
get_key = itemgetter(key)
get_value = itemgetter(value)
grouped = {}
for row in table:
key = get_key(row)
value = get_value(row)
if key in grouped:
grouped[key] = tuple(a + b for a, b in zip(grouped[key], value))
else:
grouped[key] = value
return [k + v for k, v in sorted(grouped.iteritems())]

if __name__ == "__main__":
items = [(1, 2, 3), (1, 3, 4), (2, 3, 4), (2, 4, 5)]
print list(group_results(items, [0], [1, 2]))

Note that the function built with my version of itemgetter() will always
return a tuple.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,483
Members
44,902
Latest member
Elena68X5

Latest Threads

Top