head for grouped data - looking for best practice


Harald Massa

Old, very old informatical problem: I want to "print" grouped data with
head information, that is:


shall give: ( Braces are not important...)

'Stuttgart', '70197'
--data-- ('Fernsehturm', '20')
--data-- ('Brotmuseum', '123')
--data-- ('Porsche', '123123')
Leipzig', '01491'
--data-- ('Messe', '91822')
--data-- ('Schabidu', '9181231')

my first approach was:

from itertools import groupby
from operator import itemgetter

for key, bereich in groupby(eingabe,itemgetter(0)):
print "Area:",key
for data in bereich:
if headnotprinted:
print "additional head info", data[1]
print "--data--", data[2:]

leading to:

Area: Stuttgart
additional head info 70197
--data-- ('Fernsehturm', '20')
--data-- ('Brotmuseum', '123')
--data-- ('Porsche', '123123')
Area: Leipzig
additional head info 01491
--data-- ('Messe', '91822')
--data-- ('Schabidu', '9181231')

which is quite what I expected. But ...
if headnotprinted:
print "additional head info", data[1]

REALLY looks patched, not programmed.

my second try was:

def getdoublekey(row):
return row[0:2]

for key, bereich in groupby(eingabe,getdoublekey):
print "Area:",key
for data in bereich:
print "--data--", data[2:]

which indeed leeds to the expected result, while looking less "hacky" ..
on the other hand side, that "getdoublekey" ist not very flexible; when
doing the same with 3 Columns forming the head information, I have to
define the next function...

return (row[1], row[0], ---yadda yadda yadda

so, what is the best recommended practice for this usual problem within


Diez B. Roggisch

which indeed leeds to the expected result, while looking less "hacky" ..
on the other hand side, that "getdoublekey" ist not very flexible; when
doing the same with 3 Columns forming the head information, I have to
define the next function...

Make getdoublekey something like this (untested):

def get_key(f=0,t=1):
def _get_key(list_value):
return list_value[f:t]
return _get_key

Then use it like this:

for key, bereich in groupby(eingabe,get_key(t=key_size)):
    print "Area:",key
    for data in bereich:
        print "--data--", data[key_size:]

Peter Otten

Harald said:
def getdoublekey(row):
return row[0:2]

for key, bereich in groupby(eingabe,getdoublekey):
print "Area:",key
for data in bereich:
print "--data--", data[2:]

which indeed leeds to the expected result, while looking less "hacky" ..
on the other hand side, that "getdoublekey" ist not very flexible; when
doing the same with 3 Columns forming the head information, I have to
define the next function...

Function creation is cheap and easily understood by someone reading your
code -- so you may already have the best solution. If Raymond Hettingers
recent suggestion on python-dev makes it into Python 2.5,
itemgetter()/attrgettter() could grow support for the extraction of
multiple attributes/items.

Anyway, here is a generalized getter factory that tries to handle all the
common cases in an intuitive way. E. g. you can create itemgetters using
the [] notation:
extract[::3](range(5)) [0, 3]
extract[3](range(5)) 3
extract[0,3,4](range(5)) (0, 3, 4)
import os
<module 'posixpath' from '/somewhere/posixpath.pyc'>


import itertools
import operator

def tuple_itemgetter(*keys):
"""Create a function that extracts a tuple of items from an
indexable object.
# helper for extract
getters = map(operator.itemgetter, keys)
def get(obj):
return tuple(get(obj) for get in getters)
return get

def tuple_attrgetter(*names):
"""Create a function that extracts a tuple of attributes from an object.
# helper for extract
getters = map(operator.attrgetter, names)
def get(obj):
return tuple(get(obj) for get in getters)
return get

class extract(object):
"""Present unified access to the creation of
attribute and item getters.
def __getitem__(self, index):
if isinstance(index, tuple):
return tuple_itemgetter(*index)
return operator.itemgetter(index)
def __getattribute__(self, name):
return operator.attrgetter(name)
def __call__(self, *names):
return tuple_attrgetter(*names)

extract = extract() # we only ever need one instance

if __name__ == "__main__":
# the demo is an anglo-german hotchpotch, really:
class Site(object):
def __init__(self, stadt, plz, name, nummer):
self.stadt = stadt
self.plz = plz
self.name = name
self.nummer = nummer
def __str__(self):
return "Site(stadt=%r, plz=%r, name=%r, nummer=%r)" % (
self.stadt, self.plz, self.name, self.nummer)
__repr__ = __str__

def show(iterable, groupkey):
print "-" * 20
for group, items in itertools.groupby(iterable, groupkey):
print group
for item in items:
print "\t", item

show(eingabe, extract[1])
show(eingabe, extract[0, 1, 0:2])
show(eingabe, extract[0:2])
show((Site(*e) for e in eingabe), extract("stadt", "plz"))
show((Site(*e) for e in eingabe), extract.stadt)

Steven Bethard

Harald said:
def getdoublekey(row):
return row[0:2]

for key, bereich in groupby(eingabe,getdoublekey):
print "Area:",key
for data in bereich:
print "--data--", data[2:]

Why don't you just pass a slice to itemgetter?

py> eingabe=[
.... ("Stuttgart","70197","Fernsehturm","20"),
.... ("Stuttgart","70197","Brotmuseum","123"),
.... ("Stuttgart","70197","Porsche","123123"),
.... ("Leipzig","01491","Messe","91822"),
.... ("Leipzig","01491","Schabidu","9181231"),
.... ]
py> from itertools import groupby
py> from operator import itemgetter
py> for key, bereich in groupby(eingabe, itemgetter(slice(0, 2))):
.... print "Area:", key
.... for data in bereich:
.... print "--data--", data[2:]
Area: ('Stuttgart', '70197')
--data-- ('Fernsehturm', '20')
--data-- ('Brotmuseum', '123')
--data-- ('Porsche', '123123')
Area: ('Leipzig', '01491')
--data-- ('Messe', '91822')
--data-- ('Schabidu', '9181231')


Harald Massa


Why don't you just pass a slice to itemgetter?
py> for key, bereich in groupby(eingabe, itemgetter(slice(0, 2))):

WHOW, that is great! that makes it really simple, just have to structure
the SQL to make a real "cut first, serve first" structure.

Thanks to all who helped!

also the "function factory function" bei Dietz was very helpfull; and
Peters classes looked really impressive!

Thanks again... now my code will be even clearer.


Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Latest member

Latest Threads
