# Break up list into groups

Discussion in 'Python' started by danmcleran@yahoo.com, Jul 16, 2007.

1. ### Guest

All,

I can't seem to find an answer to this question anywhere, but I'm
still looking. My problem is I have a list of values like this:

l = [0xF0, 1, 2, 3, 0xF0, 4, 5, 6, 0xF1, 7, 8, 0xF2, 9, 10, 11, 12,
13, 0xF0, 14, 0xF1, 15]

A value with bit 0x80 set delineates the start of a new packet of
information. What I want to do is to group the packets so that 1, 2, 3
go with the 1st packet tagged 0xF0, 4 ,5, 6 go with the 2nd packet
tagged 0xF0, 7 & 8 go with the packet tagged 0xF1 and so on. The
length of the data associated with each tag can vary. I've already
written an algorithm to do this but I was wondering if some
combination of itertools functions could do the job faster?

Here's what I've done and the expected output of the algorithm:

def splitIntoGroups(data):
groups = []
local = []

for value in data:
if 0x80 & value:
if len(local) > 0:
groups.append(local)

local = []
local.append(value)
else:
local.append(value)

if len(local) > 0:
groups.append(local)

return groups

l = [0xF0, 1, 2, 3, 0xF0, 4, 5, 6, 0xF1, 7, 8, 0xF2, 9, 10, 11, 12,
13, 0xF0, 14, 0xF1, 15]

print splitIntoGroups(l)

Desired result:

[[240, 1, 2, 3], [240, 4, 5, 6], [241, 7, 8], [242, 9, 10, 11, 12,
13], [240, 14], [241, 15]]

Thanks,

Dan McLeran

, Jul 16, 2007

2. ### mardukGuest

On Mon, 2007-07-16 at 14:11 -0700, wrote:
> I can't seem to find an answer to this question anywhere, but I'm
> still looking. My problem is I have a list of values like this:
>
> l = [0xF0, 1, 2, 3, 0xF0, 4, 5, 6, 0xF1, 7, 8, 0xF2, 9, 10, 11, 12,
> 13, 0xF0, 14, 0xF1, 15]
>
> A value with bit 0x80 set delineates the start of a new packet of
> information. What I want to do is to group the packets so that 1, 2, 3
> go with the 1st packet tagged 0xF0, 4 ,5, 6 go with the 2nd packet
> tagged 0xF0, 7 & 8 go with the packet tagged 0xF1 and so on. The
> length of the data associated with each tag can vary. I've already
> written an algorithm to do this but I was wondering if some
> combination of itertools functions could do the job faster?
>
> Here's what I've done and the expected output of the algorithm:
>
> def splitIntoGroups(data):
> groups = []
> local = []
>
> for value in data:
> if 0x80 & value:
> if len(local) > 0:
> groups.append(local)
>
> local = []
> local.append(value)
> else:
> local.append(value)
>
> if len(local) > 0:
> groups.append(local)
>
> return groups
>
> l = [0xF0, 1, 2, 3, 0xF0, 4, 5, 6, 0xF1, 7, 8, 0xF2, 9, 10, 11, 12,
> 13, 0xF0, 14, 0xF1, 15]
>
> print splitIntoGroups(l)
>
> Desired result:
>
> [[240, 1, 2, 3], [240, 4, 5, 6], [241, 7, 8], [242, 9, 10, 11, 12,
> 13], [240, 14], [241, 15]]

Assuming you meant '0xF0' instead of '0x80'.... do you mean any value
>=240 starts a new group? If so:

groups = []
current = [] # probably not necessary, but as a safety
for i in l:
if i >= 240:
current = []
groups.append(current)
current.append(i)

marduk, Jul 16, 2007

3. ### mardukGuest

On Mon, 2007-07-16 at 16:31 -0500, marduk wrote:
> Assuming you meant '0xF0' instead of '0x80'.... do you mean any value
> >=240 starts a new group? If so:

>
> groups = []
> current = [] # probably not necessary, but as a safety
> for i in l:
> if i >= 240:
> current = []
> groups.append(current)
> current.append(i)
>
>

Misunderstood... actually that should have read

groups = []
current = [] # probably not necessary, but as a safety
for i in l:
if 240 & i:
current = []
groups.append(current)
current.append(i)

marduk, Jul 16, 2007
4. ### James StroudGuest

wrote:
> All,
>
> I can't seem to find an answer to this question anywhere, but I'm
> still looking. My problem is I have a list of values like this:
>
> l = [0xF0, 1, 2, 3, 0xF0, 4, 5, 6, 0xF1, 7, 8, 0xF2, 9, 10, 11, 12,
> 13, 0xF0, 14, 0xF1, 15]
>
> A value with bit 0x80 set delineates the start of a new packet of
> information. What I want to do is to group the packets so that 1, 2, 3
> go with the 1st packet tagged 0xF0, 4 ,5, 6 go with the 2nd packet
> tagged 0xF0, 7 & 8 go with the packet tagged 0xF1 and so on. The
> length of the data associated with each tag can vary. I've already
> written an algorithm to do this but I was wondering if some
> combination of itertools functions could do the job faster?
>
> Here's what I've done and the expected output of the algorithm:
>
> def splitIntoGroups(data):
> groups = []
> local = []
>
> for value in data:
> if 0x80 & value:
> if len(local) > 0:
> groups.append(local)
>
> local = []
> local.append(value)
> else:
> local.append(value)
>
> if len(local) > 0:
> groups.append(local)
>
> return groups
>
> l = [0xF0, 1, 2, 3, 0xF0, 4, 5, 6, 0xF1, 7, 8, 0xF2, 9, 10, 11, 12,
> 13, 0xF0, 14, 0xF1, 15]
>
> print splitIntoGroups(l)
>
> Desired result:
>
> [[240, 1, 2, 3], [240, 4, 5, 6], [241, 7, 8], [242, 9, 10, 11, 12,
> 13], [240, 14], [241, 15]]
>
> Thanks,
>
> Dan McLeran
>

Here's how I *would* do it:

py> def doit(alist):
.... ary = []
.... for i in alist:
.... if 0xf0 & i:
.... ary.append()
.... else:
.... ary[-1].append(i)
.... return [x for x in ary if x]
....
py> doit(alist)

[[240, 1, 2, 3],
[240, 4, 5, 6],
[241, 7, 8],
[242, 9, 10, 11, 12, 13],
[240, 14],
[241, 15]]

Here's an ugly way I wouldn't recommended (using itertools groupby):

py> from itertools import groupby
py> alist = [0xF0, 1, 2, 3, 0xF0, 4, 5, 6,
.... 0xF1, 7, 8, 0xF2, 9, 10, 11, 12, 13,
.... 0xF0, 14, 0xF1, 15]
py> def doit(alist):
.... i = (list(g) for k,g in groupby(alist, lambda x: 0xf0&x))
.... return [k for k in [j + i.next() for j in i] if len(k)>1]
....
py> doit(alist)

[[240, 1, 2, 3],
[240, 4, 5, 6],
[241, 7, 8],
[242, 9, 10, 11, 12, 13],
[240, 14],
[241, 15]]

James

--
James Stroud
UCLA-DOE Institute for Genomics and Proteomics
Box 951570
Los Angeles, CA 90095

http://www.jamesstroud.com/

James Stroud, Jul 16, 2007
5. ### James StroudGuest

James Stroud wrote:
> Here's how I *would* do it:
>
> py> def doit(alist):
> ... ary = []
> ... for i in alist:
> ... if 0xf0 & i:
> ... ary.append()
> ... else:
> ... ary[-1].append(i)
> ... return [x for x in ary if x]
> ...

To be absolutely compliant with the specifications, it should be

return [x for x in ary if len(x)>1]

in the recommended way. This does not affect the example given, though.

James

--
James Stroud
UCLA-DOE Institute for Genomics and Proteomics
Box 951570
Los Angeles, CA 90095

http://www.jamesstroud.com/

James Stroud, Jul 16, 2007
6. ### MatimusGuest

This is a perfect place for a generator:

<code>
seq = [0xF0, 1, 2, 3, 0xF0, 4, 5, 6, 0xF1, 7, 8, 0xF2, 9, 10, 11, 12,
13, 0xF0, 14, 0xF1, 15]

def gengroups(seq):
group = []
for val in seq:
if val & 0x80 and group:
yield group
group = []
group.append(val)
yield group

if __name__ == "__main__":
print list(gengroups(seq))
</code>

The above assumes that the first value in the input sequence will have
0x80 set. Your implementation seems to makes the same assumption
though.

Also, just a note...
<code>
if len(local) > 0:
...
</code>

is better written

<code>
if local:
...
</code>

Matimus, Jul 16, 2007
7. ### Paul RubinGuest

"" <> writes:
> I can't seem to find an answer to this question anywhere, but I'm
> still looking. My problem is I have a list of values like this:
>
> l = [0xF0, 1, 2, 3, 0xF0, 4, 5, 6, 0xF1, 7, 8, 0xF2, 9, 10, 11, 12,
> 13, 0xF0, 14, 0xF1, 15]
>
> A value with bit 0x80 set delineates the start of a new packet of
> information. What I want to do is to group the packets so that 1, 2, 3
> go with the 1st packet tagged 0xF0, 4 ,5, 6 go with the 2nd packet
> tagged 0xF0, 7 & 8 go with the packet tagged 0xF1 and so on. The
> length of the data associated with each tag can vary. I've already
> written an algorithm to do this but I was wondering if some
> combination of itertools functions could do the job faster?

See:

Paul Rubin, Jul 16, 2007
8. ### Guest

On Jul 16, 3:56 pm, marduk <> wrote:
> On Mon, 2007-07-16 at 16:31 -0500, marduk wrote:
> > Assuming you meant '0xF0' instead of '0x80'.... do you mean any value
> > >=240 starts a new group? If so:

No, I meant 0x80. 0x80 is only the most significant bit of the 0xF0
value. Every time this bit is set indicates the start of a new group.
Anyway, I like this algorithm better than mine, less code. I've
modified yours to make the fn a parameter:

def splitIntoGroups(data, fn):
groups = []
current = []
for i in data:
if fn(i):
current = []
groups.append(current)
current.append(i)

return groups

print splitIntoGroups(l, lambda x : x & 0x80)

, Jul 16, 2007
9. ### Guest

> Here's an ugly way I wouldn't recommended (using itertools groupby):
>
> py> from itertools import groupby
> py> alist = [0xF0, 1, 2, 3, 0xF0, 4, 5, 6,
> ... 0xF1, 7, 8, 0xF2, 9, 10, 11, 12, 13,
> ... 0xF0, 14, 0xF1, 15]
> py> def doit(alist):
> ... i = (list(g) for k,g in groupby(alist, lambda x: 0xf0&x))
> ... return [k for k in [j + i.next() for j in i] if len(k)>1]
> ...
> py> doit(alist)
>
> [[240, 1, 2, 3],
> [240, 4, 5, 6],
> [241, 7, 8],
> [242, 9, 10, 11, 12, 13],
> [240, 14],
> [241, 15]]
>
> James

Wow that is ugly. I think I like the simpler way better.

Thanks

, Jul 16, 2007
10. ### James StroudGuest

Paul Rubin wrote:

Groupby is damn slow as far as I can tell (the Bates numbering in the
above link assumes more than the OP intended, I assume). It looks like
the author's original algorithm is the fastest python way as it bypasses
a lot of lookup, etc.

Here's the output from the script below (doit2 => groupby way):

doit
11.96 usec/pass
doit2
87.14 usec/pass

James

# timer script
from itertools import groupby
from timeit import Timer

alist = [0xF0, 1, 2, 3, 0xF0, 4, 5, 6,
0xF1, 7, 8, 0xF2, 9, 10, 11, 12, 13,
0xF0, 14, 0xF1, 15]

def doit(alist):
ary = []
for i in alist:
if 0xf0 & i:
ary.append()
else:
ary[-1].append(i)
return [x for x in ary if x]

def c(x):
return 0xf0 & x

def doit2(alist):
i = (list(g) for k,g in groupby(alist, c))
return [k for k in [j + i.next() for j in i] if len(k)>1]

print doit(alist)

print 'doit'
t = Timer('doit(alist)',
'from __main__ import groupby, doit, alist, c')
print "%.2f usec/pass" % (1000000 * t.timeit(number=100000)/100000)

print 'doit2'
t = Timer('doit2(alist)',
'from __main__ import groupby, doit2, alist, c')
print "%.2f usec/pass" % (1000000 * t.timeit(number=100000)/100000)

--
James Stroud
UCLA-DOE Institute for Genomics and Proteomics
Box 951570
Los Angeles, CA 90095

http://www.jamesstroud.com/

James Stroud, Jul 17, 2007
11. ### MatimusGuest

I did some more experimenting and came up with the code below. It
shows several methods. When run, the script tests the robustness of
each method (roughly), and profiles it using timeit. The results from
running on my laptop are shown below the code.

<code>
seqs = [# Original:
[0xF0, 1, 2, 3, 0xF0, 4, 5, 6, 0xF1, 7, 8, 0xF2, 9, 10, 11,
12, 13,
0xF0, 14, 0xF1, 15],
# Single entry:
[0xF0, 1, 2, 3],
# empty
[],
# No values with 0x80 set
[1, 2, 3, 14, 15],
[1, 2, 3, 14, 15, 0xF0, 1, 2, 3]]

expected = [# Original:
[[0xF0, 1, 2, 3], [0xF0, 4, 5, 6], [0xF1, 7, 8], [0xF2, 9,
10, 11, 12, 13],
[0xF0, 14], [0xF1, 15]],
# Single entry:
[[0xF0, 1, 2, 3]],
# empty
[],
# No values with 0x80 set
[],
[[0xF0, 1, 2, 3]]]

def gengroups0(seq):
group = None
for val in seq:
if val & 0x80:
if group: yield group
group = []
try:
group.append(val)
except AttributeError:
pass
if group: yield group

def getgroups0(seq):
groups = []
group = None
for val in seq:
if val & 0x80:
if group:
groups.append(group)
group = []
try:
group.append(val)
except AttributeError:
pass
if group:
groups.append(group)
return groups

def gengroups1(seq):
idxs = [i for i,v in enumerate(seq) if v&0x80]
for i,j in zip(idxs,idxs[1:]+[None]):
yield seq[i:j]

def getgroups1(seq):
idxs = [i for i,v in enumerate(seq) if v&0x80]
return [seq[i:j] for i,j in zip(idxs,idxs[1:]+[None])]

# Similar to the partition method on strings
def partition(seq,f=None):
if f is None:
f = lambda x:x
for i,v in enumerate(seq):
if f(v):
return seq[:i],[seq],seq[i+1:]
return seq,[],[]

def rpartition(seq, f=None):
if f is None:
f = lambda x:x
for i,v in zip(range(len(seq)-1,-1,-1),seq[::-1]):
if f(v):
return seq[:i],[seq],seq[i+1:]
return ([],[],seq)

def gengroups2(seq):
while seq:
seq, key, val = rpartition(seq, lambda x: x&0x80)
if key and val: yield key+val

def getgroups2(seq):
groups = []
while seq:
seq, key, val = rpartition(seq, lambda x: x&0x80)
if key and val:
groups.append(key+val)
return groups

def getgroups3(seq):
groups = []
for i in seq:
if 0x80 & i:
groups.append()
else:
groups[-1].append(i)
return [x for x in groups if x]

seq = seqs[0]
if __name__ == "__main__":
from timeit import Timer
import __main__
for i in range(4):
fname = "getgroups"+str(i)
f = getattr(__main__,fname)
print fname
for i,(s,e) in enumerate(zip(seqs,expected)):
print "test %d:"%i,
try:
if f(s) == e:
print "pass"
else:
print "fail"
except:
print "error"

t = Timer(fname+'(seq)',
'from __main__ import seq,'+fname)
print "%.2f usec/pass" % (1000000 * t.timeit(number=100000)/
100000)
print
</code>

Output from running the above:
getgroups0
test 0: pass
test 1: pass
test 2: pass
test 3: pass
test 4: pass
14.85 usec/pass

getgroups1
test 0: pass
test 1: pass
test 2: pass
test 3: pass
test 4: pass
13.81 usec/pass

getgroups2
test 0: fail
test 1: pass
test 2: pass
test 3: pass
test 4: pass
56.38 usec/pass

getgroups3
test 0: pass
test 1: pass
test 2: pass
test 3: error
test 4: error
16.23 usec/pass

`getgropus2' fails test 0 because it produces a reversed list. That
can easily be fixed by re-reversing the output before returning. But,
since it is by far the slowest method, I didn't bother.

`getgroups3' is a method I got from another post in this thread, just
for comparison.

>From my benchmarks it looks like getgroups1 is the winner. I didn't

scour the thread to test all the methods however.

Matimus, Jul 17, 2007

Matimus wrote:
> I did some more experimenting and came up with the code below. It
> shows several methods. When run, the script tests the robustness of
> each method (roughly), and profiles it using timeit. The results from
> running on my laptop are shown below the code.

Try this one...

def groupdata(data, fn):
start = 0
for n in range(1, len(data)):
if fn(data[n]):
yield data[start:n]
start = n
yield data[start:]

print list(groupdata(l, lambda x: x & 0x80))

Cheers ;-)
Ron

> <code>
> seqs = [# Original:
> [0xF0, 1, 2, 3, 0xF0, 4, 5, 6, 0xF1, 7, 8, 0xF2, 9, 10, 11,
> 12, 13,
> 0xF0, 14, 0xF1, 15],
> # Single entry:
> [0xF0, 1, 2, 3],
> # empty
> [],
> # No values with 0x80 set
> [1, 2, 3, 14, 15],
> # Does not start with a value that has 0x80 set
> [1, 2, 3, 14, 15, 0xF0, 1, 2, 3]]
>
> expected = [# Original:
> [[0xF0, 1, 2, 3], [0xF0, 4, 5, 6], [0xF1, 7, 8], [0xF2, 9,
> 10, 11, 12, 13],
> [0xF0, 14], [0xF1, 15]],
> # Single entry:
> [[0xF0, 1, 2, 3]],
> # empty
> [],
> # No values with 0x80 set
> [],
> # Does not start with a value that has 0x80 set
> [[0xF0, 1, 2, 3]]]
>
> def gengroups0(seq):
> group = None
> for val in seq:
> if val & 0x80:
> if group: yield group
> group = []
> try:
> group.append(val)
> except AttributeError:
> pass
> if group: yield group
>
> def getgroups0(seq):
> groups = []
> group = None
> for val in seq:
> if val & 0x80:
> if group:
> groups.append(group)
> group = []
> try:
> group.append(val)
> except AttributeError:
> pass
> if group:
> groups.append(group)
> return groups
>
> def gengroups1(seq):
> idxs = [i for i,v in enumerate(seq) if v&0x80]
> for i,j in zip(idxs,idxs[1:]+[None]):
> yield seq[i:j]
>
> def getgroups1(seq):
> idxs = [i for i,v in enumerate(seq) if v&0x80]
> return [seq[i:j] for i,j in zip(idxs,idxs[1:]+[None])]
>
> # Similar to the partition method on strings
> def partition(seq,f=None):
> if f is None:
> f = lambda x:x
> for i,v in enumerate(seq):
> if f(v):
> return seq[:i],[seq],seq[i+1:]
> return seq,[],[]
>
> def rpartition(seq, f=None):
> if f is None:
> f = lambda x:x
> for i,v in zip(range(len(seq)-1,-1,-1),seq[::-1]):
> if f(v):
> return seq[:i],[seq],seq[i+1:]
> return ([],[],seq)
>
> def gengroups2(seq):
> while seq:
> seq, key, val = rpartition(seq, lambda x: x&0x80)
> if key and val: yield key+val
>
> def getgroups2(seq):
> groups = []
> while seq:
> seq, key, val = rpartition(seq, lambda x: x&0x80)
> if key and val:
> groups.append(key+val)
> return groups
>
> def getgroups3(seq):
> groups = []
> for i in seq:
> if 0x80 & i:
> groups.append()
> else:
> groups[-1].append(i)
> return [x for x in groups if x]
>
> seq = seqs[0]
> if __name__ == "__main__":
> from timeit import Timer
> import __main__
> for i in range(4):
> fname = "getgroups"+str(i)
> f = getattr(__main__,fname)
> print fname
> for i,(s,e) in enumerate(zip(seqs,expected)):
> print "test %d:"%i,
> try:
> if f(s) == e:
> print "pass"
> else:
> print "fail"
> except:
> print "error"
>
> t = Timer(fname+'(seq)',
> 'from __main__ import seq,'+fname)
> print "%.2f usec/pass" % (1000000 * t.timeit(number=100000)/
> 100000)
> print
> </code>
>
> Output from running the above:
> getgroups0
> test 0: pass
> test 1: pass
> test 2: pass
> test 3: pass
> test 4: pass
> 14.85 usec/pass
>
> getgroups1
> test 0: pass
> test 1: pass
> test 2: pass
> test 3: pass
> test 4: pass
> 13.81 usec/pass
>
> getgroups2
> test 0: fail
> test 1: pass
> test 2: pass
> test 3: pass
> test 4: pass
> 56.38 usec/pass
>
> getgroups3
> test 0: pass
> test 1: pass
> test 2: pass
> test 3: error
> test 4: error
> 16.23 usec/pass
>
> `getgropus2' fails test 0 because it produces a reversed list. That
> can easily be fixed by re-reversing the output before returning. But,
> since it is by far the slowest method, I didn't bother.
>
> `getgroups3' is a method I got from another post in this thread, just
> for comparison.
>
>>From my benchmarks it looks like getgroups1 is the winner. I didn't

> scour the thread to test all the methods however.
>

Matimus wrote:
> I did some more experimenting and came up with the code below. It
> shows several methods. When run, the script tests the robustness of
> each method (roughly), and profiles it using timeit. The results from
> running on my laptop are shown below the code.

Try this one...

def groupdata(data, fn):
start = 0
for n in range(1, len(data)):
if fn(data[n]):
yield data[start:n]
start = n
yield data[start:]

print list(groupdata(l, lambda x: x & 0x80))

Cheers ;-)
Ron

> <code>
> seqs = [# Original:
> [0xF0, 1, 2, 3, 0xF0, 4, 5, 6, 0xF1, 7, 8, 0xF2, 9, 10, 11,
> 12, 13,
> 0xF0, 14, 0xF1, 15],
> # Single entry:
> [0xF0, 1, 2, 3],
> # empty
> [],
> # No values with 0x80 set
> [1, 2, 3, 14, 15],
> # Does not start with a value that has 0x80 set
> [1, 2, 3, 14, 15, 0xF0, 1, 2, 3]]
>
> expected = [# Original:
> [[0xF0, 1, 2, 3], [0xF0, 4, 5, 6], [0xF1, 7, 8], [0xF2, 9,
> 10, 11, 12, 13],
> [0xF0, 14], [0xF1, 15]],
> # Single entry:
> [[0xF0, 1, 2, 3]],
> # empty
> [],
> # No values with 0x80 set
> [],
> # Does not start with a value that has 0x80 set
> [[0xF0, 1, 2, 3]]]
>
> def gengroups0(seq):
> group = None
> for val in seq:
> if val & 0x80:
> if group: yield group
> group = []
> try:
> group.append(val)
> except AttributeError:
> pass
> if group: yield group
>
> def getgroups0(seq):
> groups = []
> group = None
> for val in seq:
> if val & 0x80:
> if group:
> groups.append(group)
> group = []
> try:
> group.append(val)
> except AttributeError:
> pass
> if group:
> groups.append(group)
> return groups
>
> def gengroups1(seq):
> idxs = [i for i,v in enumerate(seq) if v&0x80]
> for i,j in zip(idxs,idxs[1:]+[None]):
> yield seq[i:j]
>
> def getgroups1(seq):
> idxs = [i for i,v in enumerate(seq) if v&0x80]
> return [seq[i:j] for i,j in zip(idxs,idxs[1:]+[None])]
>
> # Similar to the partition method on strings
> def partition(seq,f=None):
> if f is None:
> f = lambda x:x
> for i,v in enumerate(seq):
> if f(v):
> return seq[:i],[seq],seq[i+1:]
> return seq,[],[]
>
> def rpartition(seq, f=None):
> if f is None:
> f = lambda x:x
> for i,v in zip(range(len(seq)-1,-1,-1),seq[::-1]):
> if f(v):
> return seq[:i],[seq],seq[i+1:]
> return ([],[],seq)
>
> def gengroups2(seq):
> while seq:
> seq, key, val = rpartition(seq, lambda x: x&0x80)
> if key and val: yield key+val
>
> def getgroups2(seq):
> groups = []
> while seq:
> seq, key, val = rpartition(seq, lambda x: x&0x80)
> if key and val:
> groups.append(key+val)
> return groups
>
> def getgroups3(seq):
> groups = []
> for i in seq:
> if 0x80 & i:
> groups.append()
> else:
> groups[-1].append(i)
> return [x for x in groups if x]
>
> seq = seqs[0]
> if __name__ == "__main__":
> from timeit import Timer
> import __main__
> for i in range(4):
> fname = "getgroups"+str(i)
> f = getattr(__main__,fname)
> print fname
> for i,(s,e) in enumerate(zip(seqs,expected)):
> print "test %d:"%i,
> try:
> if f(s) == e:
> print "pass"
> else:
> print "fail"
> except:
> print "error"
>
> t = Timer(fname+'(seq)',
> 'from __main__ import seq,'+fname)
> print "%.2f usec/pass" % (1000000 * t.timeit(number=100000)/
> 100000)
> print
> </code>
>
> Output from running the above:
> getgroups0
> test 0: pass
> test 1: pass
> test 2: pass
> test 3: pass
> test 4: pass
> 14.85 usec/pass
>
> getgroups1
> test 0: pass
> test 1: pass
> test 2: pass
> test 3: pass
> test 4: pass
> 13.81 usec/pass
>
> getgroups2
> test 0: fail
> test 1: pass
> test 2: pass
> test 3: pass
> test 4: pass
> 56.38 usec/pass
>
> getgroups3
> test 0: pass
> test 1: pass
> test 2: pass
> test 3: error
> test 4: error
> 16.23 usec/pass
>
> `getgropus2' fails test 0 because it produces a reversed list. That
> can easily be fixed by re-reversing the output before returning. But,
> since it is by far the slowest method, I didn't bother.
>
> `getgroups3' is a method I got from another post in this thread, just
> for comparison.
>
>>From my benchmarks it looks like getgroups1 is the winner. I didn't

> scour the thread to test all the methods however.
>

Matt McCredie wrote:
> That certainly is fast, unfortunately it doesn't pass all of the tests.
> I came up with those tests so I don't know how important they are to the
> original poster. I modified it and came up with a generator and a
> non-generator version based (roughly) on your algorithm, that are almost
> as quick, and pass all of the tests. Some of the modifications were done
> just to make it quicker, so it would be fair when comparing against the
> other methods. I hard-coded the comparison instead of using a function
> and created a function that directly generates and returns a list
> instead of a generator. I would probably use the generator version in my
> code, but wrapping `list' around a generator adds about 4us (on my
> machine). Anyway, getgroups7 passes all of the tests I mentioned and it
> was timed at 10.37usec/pass. The down side: the code doesn't seem nearly
> as elegant.
>
> Matt

In most cases you wouldn't wrap the generator version in a list(), but use
it directly as a loop iterator.

A little renaming of variables helps it be a bit more elegant I think ...

def getgroups8(seq):
groups = []
iseq = iter(xrange(len(seq)))
for start in iseq:
if seq[start] & 0x80:
for stop in iseq:
if seq[stop] & 0x80:
groups.append(seq[start:stop])
start = stop
groups.append(seq[start:])
return groups

This passes all the tests and runs about the same speed.

Cheers,
Ron

> <code>
> def gengroups7(seq):
> iseq = iter(xrange(len(seq)))
> start = 0
> for i in iseq:
> if seq&0x80:
> start = i
> break
> else:
> return
> for i in iseq:
> if seq&0x80:
> yield seq[start:i]
> start = i
> yield seq[start:]
>
>
> def getgroups7(seq):
> groups = []
> iseq = iter(xrange(len(seq)))
> start = 0
> for i in iseq:
> if seq&0x80:
> start = i
> break
> else:
> return groups
> for i in iseq:
> if seq&0x80:
> groups.append(seq[start:i])
> start = i
> groups.append(seq[start:])
> return groups
>
> </code>
>
>
>
>

Matt McCredie wrote:
> That certainly is fast, unfortunately it doesn't pass all of the tests.
> I came up with those tests so I don't know how important they are to the
> original poster. I modified it and came up with a generator and a
> non-generator version based (roughly) on your algorithm, that are almost
> as quick, and pass all of the tests. Some of the modifications were done
> just to make it quicker, so it would be fair when comparing against the
> other methods. I hard-coded the comparison instead of using a function
> and created a function that directly generates and returns a list
> instead of a generator. I would probably use the generator version in my
> code, but wrapping `list' around a generator adds about 4us (on my
> machine). Anyway, getgroups7 passes all of the tests I mentioned and it
> was timed at 10.37usec/pass. The down side: the code doesn't seem nearly
> as elegant.
>
> Matt

In most cases you wouldn't wrap the generator version in a list(), but use
it directly as a loop iterator.

A little renaming of variables helps it be a bit more elegant I think ...

def getgroups8(seq):
groups = []
iseq = iter(xrange(len(seq)))
for start in iseq:
if seq[start] & 0x80:
for stop in iseq:
if seq[stop] & 0x80:
groups.append(seq[start:stop])
start = stop
groups.append(seq[start:])
return groups

This passes all the tests and runs about the same speed.

Cheers,
Ron

> <code>
> def gengroups7(seq):
> iseq = iter(xrange(len(seq)))
> start = 0
> for i in iseq:
> if seq&0x80:
> start = i
> break
> else:
> return
> for i in iseq:
> if seq&0x80:
> yield seq[start:i]
> start = i
> yield seq[start:]
>
>
> def getgroups7(seq):
> groups = []
> iseq = iter(xrange(len(seq)))
> start = 0
> for i in iseq:
> if seq&0x80:
> start = i
> break
> else:
> return groups
> for i in iseq:
> if seq&0x80:
> groups.append(seq[start:i])
> start = i
> groups.append(seq[start:])
> return groups
>
> </code>
>
>
>
>

16. ### MikiGuest

Hello Dan,

Yet another option (using itertools.groupby):

from itertools import groupby

class GrouperToggler:
def __init__(self):
self.group = 1

def __call__(self, value):
# New packet, toggle group
if value & 0x80:
self.group = 1 - self.group
return self.group

def group(items):
for group, items in groupby(items, GrouperToggler()):
# groupby return [key, group_iterator]
yield [item for item in items]

i = [
0xF0, 1, 2, 3,
0xF0, 4, 5, 6,
0xF1, 7, 8,
0xF2, 9, 10, 11, 12, 13,
0xF0, 14,
0xF1, 15
]

for g in group(i):
print g

HTH,
--
Miki <>
http://pythonwise.blogspot.com

Miki, Jul 18, 2007
17. ### George SakkisGuest

On Jul 17, 1:48 pm, Matimus <> wrote:
> I did some more experimenting and came up with the code below. It
> shows several methods. When run, the script tests the robustness of
> each method (roughly), and profiles it using timeit. The results from
> running on my laptop are shown below the code.
>
> <code>
> seqs = [# Original:
> [0xF0, 1, 2, 3, 0xF0, 4, 5, 6, 0xF1, 7, 8, 0xF2, 9, 10, 11,
> 12, 13,
> 0xF0, 14, 0xF1, 15],
> # Single entry:
> [0xF0, 1, 2, 3],
> # empty
> [],
> # No values with 0x80 set
> [1, 2, 3, 14, 15],
> # Does not start with a value that has 0x80 set
> [1, 2, 3, 14, 15, 0xF0, 1, 2, 3]]
>
> expected = [# Original:
> [[0xF0, 1, 2, 3], [0xF0, 4, 5, 6], [0xF1, 7, 8], [0xF2, 9,
> 10, 11, 12, 13],
> [0xF0, 14], [0xF1, 15]],
> # Single entry:
> [[0xF0, 1, 2, 3]],
> # empty
> [],
> # No values with 0x80 set
> [],
> # Does not start with a value that has 0x80 set
> [[0xF0, 1, 2, 3]]]
>
> def gengroups0(seq):
> group = None
> for val in seq:
> if val & 0x80:
> if group: yield group
> group = []
> try:
> group.append(val)
> except AttributeError:
> pass
> if group: yield group
>
> def getgroups0(seq):
> groups = []
> group = None
> for val in seq:
> if val & 0x80:
> if group:
> groups.append(group)
> group = []
> try:
> group.append(val)
> except AttributeError:
> pass
> if group:
> groups.append(group)
> return groups
>
> def gengroups1(seq):
> idxs = [i for i,v in enumerate(seq) if v&0x80]
> for i,j in zip(idxs,idxs[1:]+[None]):
> yield seq[i:j]
>
> def getgroups1(seq):
> idxs = [i for i,v in enumerate(seq) if v&0x80]
> return [seq[i:j] for i,j in zip(idxs,idxs[1:]+[None])]
>
> # Similar to the partition method on strings
> def partition(seq,f=None):
> if f is None:
> f = lambda x:x
> for i,v in enumerate(seq):
> if f(v):
> return seq[:i],[seq],seq[i+1:]
> return seq,[],[]
>
> def rpartition(seq, f=None):
> if f is None:
> f = lambda x:x
> for i,v in zip(range(len(seq)-1,-1,-1),seq[::-1]):
> if f(v):
> return seq[:i],[seq],seq[i+1:]
> return ([],[],seq)
>
> def gengroups2(seq):
> while seq:
> seq, key, val = rpartition(seq, lambda x: x&0x80)
> if key and val: yield key+val
>
> def getgroups2(seq):
> groups = []
> while seq:
> seq, key, val = rpartition(seq, lambda x: x&0x80)
> if key and val:
> groups.append(key+val)
> return groups
>
> def getgroups3(seq):
> groups = []
> for i in seq:
> if 0x80 & i:
> groups.append()
> else:
> groups[-1].append(i)
> return [x for x in groups if x]
>
> seq = seqs[0]
> if __name__ == "__main__":
> from timeit import Timer
> import __main__
> for i in range(4):
> fname = "getgroups"+str(i)
> f = getattr(__main__,fname)
> print fname
> for i,(s,e) in enumerate(zip(seqs,expected)):
> print "test %d:"%i,
> try:
> if f(s) == e:
> print "pass"
> else:
> print "fail"
> except:
> print "error"
>
> t = Timer(fname+'(seq)',
> 'from __main__ import seq,'+fname)
> print "%.2f usec/pass" % (1000000 * t.timeit(number=100000)/
> 100000)
> print
> </code>
>
> Output from running the above:
> getgroups0
> test 0: pass
> test 1: pass
> test 2: pass
> test 3: pass
> test 4: pass
> 14.85 usec/pass
>
> getgroups1
> test 0: pass
> test 1: pass
> test 2: pass
> test 3: pass
> test 4: pass
> 13.81 usec/pass
>
> getgroups2
> test 0: fail
> test 1: pass
> test 2: pass
> test 3: pass
> test 4: pass
> 56.38 usec/pass
>
> getgroups3
> test 0: pass
> test 1: pass
> test 2: pass
> test 3: error
> test 4: error
> 16.23 usec/pass
>
> `getgropus2' fails test 0 because it produces a reversed list. That
> can easily be fixed by re-reversing the output before returning. But,
> since it is by far the slowest method, I didn't bother.
>
> `getgroups3' is a method I got from another post in this thread, just
> for comparison.
>
> >From my benchmarks it looks like getgroups1 is the winner. I didn't

>
> scour the thread to test all the methods however.

Here's a small improvement of getgroups1, both in time and memory:

from itertools import islice

def getgroups4(seq):
idxs = [i for i,v in enumerate(seq) if v&0x80]
idxs.append(None)
return [seq[i:j] for i,j in izip(idxs, islice(idxs,1,None))]

George

George Sakkis, Jul 19, 2007
18. ### MatimusGuest

> A little renaming of variables helps it be a bit more elegant I think ...
>
> def getgroups8(seq):
> groups = []
> iseq = iter(xrange(len(seq)))
> for start in iseq:
> if seq[start] & 0x80:
> for stop in iseq:
> if seq[stop] & 0x80:
> groups.append(seq[start:stop])
> start = stop
> groups.append(seq[start:])
> return groups

Excellent work! One more modification and I get below 10us/pass:

def getgroups(seq):
groups = []
push = groups.append
iseq = iter(xrange(len(seq)))
for start in iseq:
if seq[start] & 0x80:
for stop in iseq:
if seq[stop] & 0x80:
push(seq[start:stop])
start = stop
push(seq[start:])
return groups

-Matt

Matimus, Jul 19, 2007

Matimus wrote:

> Excellent work! One more modification and I get below 10us/pass:
>
> def getgroups(seq):
> groups = []
> push = groups.append
> iseq = iter(xrange(len(seq)))
> for start in iseq:
> if seq[start] & 0x80:
> for stop in iseq:
> if seq[stop] & 0x80:
> push(seq[start:stop])
> start = stop
> push(seq[start:])
> return groups
>
> -Matt

Looks good to me.

So a generator versions would be...

(not tested)

def getgroups(seq):
iseq = iter(xrange(len(seq)))
for start in iseq:
if seq[start] & 0x80:
for stop in iseq:
if seq[stop] & 0x80:
yield seq[start:stop]
start = stop
yield seq[start:]

(I also wanted to compare this to Georges solution, maybe later.)

Now if there is some way to generalize this so it can be used in a broader
range of situations without loosing too much of it's efficiency. Of course
then maybe group by would be better. <shrug>

Cheers,
Ron

Matimus wrote:

> Excellent work! One more modification and I get below 10us/pass:
>
> def getgroups(seq):
> groups = []
> push = groups.append
> iseq = iter(xrange(len(seq)))
> for start in iseq:
> if seq[start] & 0x80:
> for stop in iseq:
> if seq[stop] & 0x80:
> push(seq[start:stop])
> start = stop
> push(seq[start:])
> return groups
>
> -Matt

Looks good to me.

So a generator versions would be...

(not tested)

def getgroups(seq):
iseq = iter(xrange(len(seq)))
for start in iseq:
if seq[start] & 0x80:
for stop in iseq:
if seq[stop] & 0x80:
yield seq[start:stop]
start = stop
yield seq[start:]

(I also wanted to compare this to Georges solution, maybe later.)

Now if there is some way to generalize this so it can be used in a broader
range of situations without loosing too much of it's efficiency. Of course
then maybe group by would be better. <shrug>

Cheers,
Ron