R
Raymond Hettinger
The existing groupby() itertool works great when every element in a
group has the same key, but it is not so handy when groups are
determined by boundary conditions.
For edge-triggered events, we need to convert a boundary-event
predicate to groupby-style key function. The code below encapsulates
that process in a new itertool called split_on().
Would love you guys to experiment with it for a bit and confirm that
you find it useful. Suggestions are welcome.
Raymond
-----------------------------------------
from itertools import groupby
def split_on(iterable, event, start=True):
'Split iterable on event boundaries (either start events or stop
events).'
# split_on('X1X23X456X', 'X'.__eq__, True) --> X1 X23 X456 X
# split_on('X1X23X456X', 'X'.__eq__, False) --> X 1X 23X 456X
def transition_counter(x, start=start, cnt=[0]):
before = cnt[0]
if event(x):
cnt[0] += 1
after = cnt[0]
return after if start else before
return (g for k, g in groupby(iterable, transition_counter))
if __name__ == '__main__':
for start in True, False:
for g in split_on('X1X23X456X', 'X'.__eq__, start):
print list(g)
print
from pprint import pprint
boundary = '--===============2615450625767277916==\n'
email = open('email.txt')
for mime_section in split_on(email, boundary.__eq__):
pprint(list(mime_section, 1, None))
print '= = ' * 30
group has the same key, but it is not so handy when groups are
determined by boundary conditions.
For edge-triggered events, we need to convert a boundary-event
predicate to groupby-style key function. The code below encapsulates
that process in a new itertool called split_on().
Would love you guys to experiment with it for a bit and confirm that
you find it useful. Suggestions are welcome.
Raymond
-----------------------------------------
from itertools import groupby
def split_on(iterable, event, start=True):
'Split iterable on event boundaries (either start events or stop
events).'
# split_on('X1X23X456X', 'X'.__eq__, True) --> X1 X23 X456 X
# split_on('X1X23X456X', 'X'.__eq__, False) --> X 1X 23X 456X
def transition_counter(x, start=start, cnt=[0]):
before = cnt[0]
if event(x):
cnt[0] += 1
after = cnt[0]
return after if start else before
return (g for k, g in groupby(iterable, transition_counter))
if __name__ == '__main__':
for start in True, False:
for g in split_on('X1X23X456X', 'X'.__eq__, start):
print list(g)
from pprint import pprint
boundary = '--===============2615450625767277916==\n'
email = open('email.txt')
for mime_section in split_on(email, boundary.__eq__):
pprint(list(mime_section, 1, None))
print '= = ' * 30