head for grouped data - looking for best practice

H

Harald Massa

Old, very old informatical problem: I want to "print" grouped data with
head information, that is:


eingabe=[
("Stuttgart","70197","Fernsehturm","20"),
("Stuttgart","70197","Brotmuseum","123"),
("Stuttgart","70197","Porsche","123123"),
("Leipzig","01491","Messe","91822"),
("Leipzig","01491","Schabidu","9181231"),
]

shall give: ( Braces are not important...)

'Stuttgart', '70197'
--data-- ('Fernsehturm', '20')
--data-- ('Brotmuseum', '123')
--data-- ('Porsche', '123123')
Leipzig', '01491'
--data-- ('Messe', '91822')
--data-- ('Schabidu', '9181231')

my first approach was:


from itertools import groupby
from operator import itemgetter

for key, bereich in groupby(eingabe,itemgetter(0)):
print "Area:",key
headnotprinted=True
for data in bereich:
if headnotprinted:
headnotprinted=False
print "additional head info", data[1]
print "--data--", data[2:]

leading to:

Area: Stuttgart
additional head info 70197
--data-- ('Fernsehturm', '20')
--data-- ('Brotmuseum', '123')
--data-- ('Porsche', '123123')
Area: Leipzig
additional head info 01491
--data-- ('Messe', '91822')
--data-- ('Schabidu', '9181231')


which is quite what I expected. But ...
if headnotprinted:
headnotprinted=False
print "additional head info", data[1]

REALLY looks patched, not programmed.

my second try was:


def getdoublekey(row):
return row[0:2]

for key, bereich in groupby(eingabe,getdoublekey):
print "Area:",key
for data in bereich:
print "--data--", data[2:]

which indeed leeds to the expected result, while looking less "hacky" ..
on the other hand side, that "getdoublekey" ist not very flexible; when
doing the same with 3 Columns forming the head information, I have to
define the next function...

gettriplekey(row):
return (row[1], row[0], ---yadda yadda yadda

so, what is the best recommended practice for this usual problem within
Python?

Harald
 
D

Diez B. Roggisch

which indeed leeds to the expected result, while looking less "hacky" ..
on the other hand side, that "getdoublekey" ist not very flexible; when
doing the same with 3 Columns forming the head information, I have to
define the next function...

Make getdoublekey something like this (untested):

def get_key(f=0,t=1):
def _get_key(list_value):
return list_value[f:t]
return _get_key

Then use it like this:

for key, bereich in groupby(eingabe,get_key(t=key_size)):
    print "Area:",key
    for data in bereich:
        print "--data--", data[key_size:]
 
P

Peter Otten

Harald said:
def getdoublekey(row):
return row[0:2]

for key, bereich in groupby(eingabe,getdoublekey):
print "Area:",key
for data in bereich:
print "--data--", data[2:]

which indeed leeds to the expected result, while looking less "hacky" ..
on the other hand side, that "getdoublekey" ist not very flexible; when
doing the same with 3 Columns forming the head information, I have to
define the next function...

Function creation is cheap and easily understood by someone reading your
code -- so you may already have the best solution. If Raymond Hettingers
recent suggestion on python-dev makes it into Python 2.5,
itemgetter()/attrgettter() could grow support for the extraction of
multiple attributes/items.

Anyway, here is a generalized getter factory that tries to handle all the
common cases in an intuitive way. E. g. you can create itemgetters using
the [] notation:
extract[::3](range(5)) [0, 3]
extract[3](range(5)) 3
extract[0,3,4](range(5)) (0, 3, 4)
import os
extract.path(os)
<module 'posixpath' from '/somewhere/posixpath.pyc'>

Peter

import itertools
import operator

def tuple_itemgetter(*keys):
"""Create a function that extracts a tuple of items from an
indexable object.
"""
# helper for extract
getters = map(operator.itemgetter, keys)
def get(obj):
return tuple(get(obj) for get in getters)
return get

def tuple_attrgetter(*names):
"""Create a function that extracts a tuple of attributes from an object.
"""
# helper for extract
getters = map(operator.attrgetter, names)
def get(obj):
return tuple(get(obj) for get in getters)
return get

class extract(object):
"""Present unified access to the creation of
attribute and item getters.
"""
def __getitem__(self, index):
if isinstance(index, tuple):
return tuple_itemgetter(*index)
return operator.itemgetter(index)
def __getattribute__(self, name):
return operator.attrgetter(name)
def __call__(self, *names):
return tuple_attrgetter(*names)

extract = extract() # we only ever need one instance

if __name__ == "__main__":
# the demo is an anglo-german hotchpotch, really:
eingabe=[
("Stuttgart","70197","Fernsehturm","20"),
("Stuttgart","70197","Brotmuseum","123"),
("Stuttgart","70197","Porsche","123123"),
("Leipzig","01491","Messe","91822"),
("Leipzig","01491","Schabidu","9181231"),
]
class Site(object):
def __init__(self, stadt, plz, name, nummer):
self.stadt = stadt
self.plz = plz
self.name = name
self.nummer = nummer
def __str__(self):
return "Site(stadt=%r, plz=%r, name=%r, nummer=%r)" % (
self.stadt, self.plz, self.name, self.nummer)
__repr__ = __str__

def show(iterable, groupkey):
print "-" * 20
for group, items in itertools.groupby(iterable, groupkey):
print group
for item in items:
print "\t", item

show(eingabe, extract[1])
show(eingabe, extract[0, 1, 0:2])
show(eingabe, extract[0:2])
show((Site(*e) for e in eingabe), extract("stadt", "plz"))
show((Site(*e) for e in eingabe), extract.stadt)
 
S

Steven Bethard

Harald said:
def getdoublekey(row):
return row[0:2]

for key, bereich in groupby(eingabe,getdoublekey):
print "Area:",key
for data in bereich:
print "--data--", data[2:]

Why don't you just pass a slice to itemgetter?

py> eingabe=[
.... ("Stuttgart","70197","Fernsehturm","20"),
.... ("Stuttgart","70197","Brotmuseum","123"),
.... ("Stuttgart","70197","Porsche","123123"),
.... ("Leipzig","01491","Messe","91822"),
.... ("Leipzig","01491","Schabidu","9181231"),
.... ]
py> from itertools import groupby
py> from operator import itemgetter
py> for key, bereich in groupby(eingabe, itemgetter(slice(0, 2))):
.... print "Area:", key
.... for data in bereich:
.... print "--data--", data[2:]
....
Area: ('Stuttgart', '70197')
--data-- ('Fernsehturm', '20')
--data-- ('Brotmuseum', '123')
--data-- ('Porsche', '123123')
Area: ('Leipzig', '01491')
--data-- ('Messe', '91822')
--data-- ('Schabidu', '9181231')

STeVe
 
H

Harald Massa

Steve,

Why don't you just pass a slice to itemgetter?
py> for key, bereich in groupby(eingabe, itemgetter(slice(0, 2))):


WHOW, that is great! that makes it really simple, just have to structure
the SQL to make a real "cut first, serve first" structure.

Thanks to all who helped!

also the "function factory function" bei Dietz was very helpfull; and
Peters classes looked really impressive!

Thanks again... now my code will be even clearer.

Harald
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top