Break up list into groups

Discussion in 'Python' started by danmcleran@yahoo.com, Jul 16, 2007.

  1. Guest

    All,

    I can't seem to find an answer to this question anywhere, but I'm
    still looking. My problem is I have a list of values like this:

    l = [0xF0, 1, 2, 3, 0xF0, 4, 5, 6, 0xF1, 7, 8, 0xF2, 9, 10, 11, 12,
    13, 0xF0, 14, 0xF1, 15]

    A value with bit 0x80 set delineates the start of a new packet of
    information. What I want to do is to group the packets so that 1, 2, 3
    go with the 1st packet tagged 0xF0, 4 ,5, 6 go with the 2nd packet
    tagged 0xF0, 7 & 8 go with the packet tagged 0xF1 and so on. The
    length of the data associated with each tag can vary. I've already
    written an algorithm to do this but I was wondering if some
    combination of itertools functions could do the job faster?

    Here's what I've done and the expected output of the algorithm:

    def splitIntoGroups(data):
    groups = []
    local = []

    for value in data:
    if 0x80 & value:
    if len(local) > 0:
    groups.append(local)

    local = []
    local.append(value)
    else:
    local.append(value)

    if len(local) > 0:
    groups.append(local)

    return groups

    l = [0xF0, 1, 2, 3, 0xF0, 4, 5, 6, 0xF1, 7, 8, 0xF2, 9, 10, 11, 12,
    13, 0xF0, 14, 0xF1, 15]

    print splitIntoGroups(l)

    Desired result:

    [[240, 1, 2, 3], [240, 4, 5, 6], [241, 7, 8], [242, 9, 10, 11, 12,
    13], [240, 14], [241, 15]]

    Thanks,

    Dan McLeran
     
    , Jul 16, 2007
    #1
    1. Advertising

  2. marduk Guest

    On Mon, 2007-07-16 at 14:11 -0700, wrote:
    > I can't seem to find an answer to this question anywhere, but I'm
    > still looking. My problem is I have a list of values like this:
    >
    > l = [0xF0, 1, 2, 3, 0xF0, 4, 5, 6, 0xF1, 7, 8, 0xF2, 9, 10, 11, 12,
    > 13, 0xF0, 14, 0xF1, 15]
    >
    > A value with bit 0x80 set delineates the start of a new packet of
    > information. What I want to do is to group the packets so that 1, 2, 3
    > go with the 1st packet tagged 0xF0, 4 ,5, 6 go with the 2nd packet
    > tagged 0xF0, 7 & 8 go with the packet tagged 0xF1 and so on. The
    > length of the data associated with each tag can vary. I've already
    > written an algorithm to do this but I was wondering if some
    > combination of itertools functions could do the job faster?
    >
    > Here's what I've done and the expected output of the algorithm:
    >
    > def splitIntoGroups(data):
    > groups = []
    > local = []
    >
    > for value in data:
    > if 0x80 & value:
    > if len(local) > 0:
    > groups.append(local)
    >
    > local = []
    > local.append(value)
    > else:
    > local.append(value)
    >
    > if len(local) > 0:
    > groups.append(local)
    >
    > return groups
    >
    > l = [0xF0, 1, 2, 3, 0xF0, 4, 5, 6, 0xF1, 7, 8, 0xF2, 9, 10, 11, 12,
    > 13, 0xF0, 14, 0xF1, 15]
    >
    > print splitIntoGroups(l)
    >
    > Desired result:
    >
    > [[240, 1, 2, 3], [240, 4, 5, 6], [241, 7, 8], [242, 9, 10, 11, 12,
    > 13], [240, 14], [241, 15]]


    Assuming you meant '0xF0' instead of '0x80'.... do you mean any value
    >=240 starts a new group? If so:


    groups = []
    current = [] # probably not necessary, but as a safety
    for i in l:
    if i >= 240:
    current = []
    groups.append(current)
    current.append(i)
     
    marduk, Jul 16, 2007
    #2
    1. Advertising

  3. marduk Guest

    On Mon, 2007-07-16 at 16:31 -0500, marduk wrote:
    > Assuming you meant '0xF0' instead of '0x80'.... do you mean any value
    > >=240 starts a new group? If so:

    >
    > groups = []
    > current = [] # probably not necessary, but as a safety
    > for i in l:
    > if i >= 240:
    > current = []
    > groups.append(current)
    > current.append(i)
    >
    >


    Misunderstood... actually that should have read

    groups = []
    current = [] # probably not necessary, but as a safety
    for i in l:
    if 240 & i:
    current = []
    groups.append(current)
    current.append(i)
     
    marduk, Jul 16, 2007
    #3
  4. James Stroud Guest

    wrote:
    > All,
    >
    > I can't seem to find an answer to this question anywhere, but I'm
    > still looking. My problem is I have a list of values like this:
    >
    > l = [0xF0, 1, 2, 3, 0xF0, 4, 5, 6, 0xF1, 7, 8, 0xF2, 9, 10, 11, 12,
    > 13, 0xF0, 14, 0xF1, 15]
    >
    > A value with bit 0x80 set delineates the start of a new packet of
    > information. What I want to do is to group the packets so that 1, 2, 3
    > go with the 1st packet tagged 0xF0, 4 ,5, 6 go with the 2nd packet
    > tagged 0xF0, 7 & 8 go with the packet tagged 0xF1 and so on. The
    > length of the data associated with each tag can vary. I've already
    > written an algorithm to do this but I was wondering if some
    > combination of itertools functions could do the job faster?
    >
    > Here's what I've done and the expected output of the algorithm:
    >
    > def splitIntoGroups(data):
    > groups = []
    > local = []
    >
    > for value in data:
    > if 0x80 & value:
    > if len(local) > 0:
    > groups.append(local)
    >
    > local = []
    > local.append(value)
    > else:
    > local.append(value)
    >
    > if len(local) > 0:
    > groups.append(local)
    >
    > return groups
    >
    > l = [0xF0, 1, 2, 3, 0xF0, 4, 5, 6, 0xF1, 7, 8, 0xF2, 9, 10, 11, 12,
    > 13, 0xF0, 14, 0xF1, 15]
    >
    > print splitIntoGroups(l)
    >
    > Desired result:
    >
    > [[240, 1, 2, 3], [240, 4, 5, 6], [241, 7, 8], [242, 9, 10, 11, 12,
    > 13], [240, 14], [241, 15]]
    >
    > Thanks,
    >
    > Dan McLeran
    >


    Here's how I *would* do it:

    py> def doit(alist):
    .... ary = []
    .... for i in alist:
    .... if 0xf0 & i:
    .... ary.append()
    .... else:
    .... ary[-1].append(i)
    .... return [x for x in ary if x]
    ....
    py> doit(alist)

    [[240, 1, 2, 3],
    [240, 4, 5, 6],
    [241, 7, 8],
    [242, 9, 10, 11, 12, 13],
    [240, 14],
    [241, 15]]


    Here's an ugly way I wouldn't recommended (using itertools groupby):

    py> from itertools import groupby
    py> alist = [0xF0, 1, 2, 3, 0xF0, 4, 5, 6,
    .... 0xF1, 7, 8, 0xF2, 9, 10, 11, 12, 13,
    .... 0xF0, 14, 0xF1, 15]
    py> def doit(alist):
    .... i = (list(g) for k,g in groupby(alist, lambda x: 0xf0&x))
    .... return [k for k in [j + i.next() for j in i] if len(k)>1]
    ....
    py> doit(alist)

    [[240, 1, 2, 3],
    [240, 4, 5, 6],
    [241, 7, 8],
    [242, 9, 10, 11, 12, 13],
    [240, 14],
    [241, 15]]


    James

    --
    James Stroud
    UCLA-DOE Institute for Genomics and Proteomics
    Box 951570
    Los Angeles, CA 90095

    http://www.jamesstroud.com/
     
    James Stroud, Jul 16, 2007
    #4
  5. James Stroud Guest

    James Stroud wrote:
    > Here's how I *would* do it:
    >
    > py> def doit(alist):
    > ... ary = []
    > ... for i in alist:
    > ... if 0xf0 & i:
    > ... ary.append()
    > ... else:
    > ... ary[-1].append(i)
    > ... return [x for x in ary if x]
    > ...



    To be absolutely compliant with the specifications, it should be

    return [x for x in ary if len(x)>1]

    in the recommended way. This does not affect the example given, though.

    James

    --
    James Stroud
    UCLA-DOE Institute for Genomics and Proteomics
    Box 951570
    Los Angeles, CA 90095

    http://www.jamesstroud.com/
     
    James Stroud, Jul 16, 2007
    #5
  6. Matimus Guest

    This is a perfect place for a generator:

    <code>
    seq = [0xF0, 1, 2, 3, 0xF0, 4, 5, 6, 0xF1, 7, 8, 0xF2, 9, 10, 11, 12,
    13, 0xF0, 14, 0xF1, 15]

    def gengroups(seq):
    group = []
    for val in seq:
    if val & 0x80 and group:
    yield group
    group = []
    group.append(val)
    yield group

    if __name__ == "__main__":
    print list(gengroups(seq))
    </code>

    The above assumes that the first value in the input sequence will have
    0x80 set. Your implementation seems to makes the same assumption
    though.

    Also, just a note...
    <code>
    if len(local) > 0:
    ...
    </code>

    is better written

    <code>
    if local:
    ...
    </code>
     
    Matimus, Jul 16, 2007
    #6
  7. Paul Rubin Guest

    "" <> writes:
    > I can't seem to find an answer to this question anywhere, but I'm
    > still looking. My problem is I have a list of values like this:
    >
    > l = [0xF0, 1, 2, 3, 0xF0, 4, 5, 6, 0xF1, 7, 8, 0xF2, 9, 10, 11, 12,
    > 13, 0xF0, 14, 0xF1, 15]
    >
    > A value with bit 0x80 set delineates the start of a new packet of
    > information. What I want to do is to group the packets so that 1, 2, 3
    > go with the 1st packet tagged 0xF0, 4 ,5, 6 go with the 2nd packet
    > tagged 0xF0, 7 & 8 go with the packet tagged 0xF1 and so on. The
    > length of the data associated with each tag can vary. I've already
    > written an algorithm to do this but I was wondering if some
    > combination of itertools functions could do the job faster?


    See:

    http://groups.google.com/group/comp.lang.python/msg/2410c95c7f3b3654
     
    Paul Rubin, Jul 16, 2007
    #7
  8. Guest

    On Jul 16, 3:56 pm, marduk <> wrote:
    > On Mon, 2007-07-16 at 16:31 -0500, marduk wrote:
    > > Assuming you meant '0xF0' instead of '0x80'.... do you mean any value
    > > >=240 starts a new group? If so:

    No, I meant 0x80. 0x80 is only the most significant bit of the 0xF0
    value. Every time this bit is set indicates the start of a new group.
    Anyway, I like this algorithm better than mine, less code. I've
    modified yours to make the fn a parameter:

    def splitIntoGroups(data, fn):
    groups = []
    current = []
    for i in data:
    if fn(i):
    current = []
    groups.append(current)
    current.append(i)

    return groups

    print splitIntoGroups(l, lambda x : x & 0x80)
     
    , Jul 16, 2007
    #8
  9. Guest

    > Here's an ugly way I wouldn't recommended (using itertools groupby):
    >
    > py> from itertools import groupby
    > py> alist = [0xF0, 1, 2, 3, 0xF0, 4, 5, 6,
    > ... 0xF1, 7, 8, 0xF2, 9, 10, 11, 12, 13,
    > ... 0xF0, 14, 0xF1, 15]
    > py> def doit(alist):
    > ... i = (list(g) for k,g in groupby(alist, lambda x: 0xf0&x))
    > ... return [k for k in [j + i.next() for j in i] if len(k)>1]
    > ...
    > py> doit(alist)
    >
    > [[240, 1, 2, 3],
    > [240, 4, 5, 6],
    > [241, 7, 8],
    > [242, 9, 10, 11, 12, 13],
    > [240, 14],
    > [241, 15]]
    >
    > James


    Wow that is ugly. I think I like the simpler way better.

    Thanks
     
    , Jul 16, 2007
    #9
  10. James Stroud Guest

    Paul Rubin wrote:
    > See:
    >
    > http://groups.google.com/group/comp.lang.python/msg/2410c95c7f3b3654


    Groupby is damn slow as far as I can tell (the Bates numbering in the
    above link assumes more than the OP intended, I assume). It looks like
    the author's original algorithm is the fastest python way as it bypasses
    a lot of lookup, etc.

    Here's the output from the script below (doit2 => groupby way):

    doit
    11.96 usec/pass
    doit2
    87.14 usec/pass


    James

    # timer script
    from itertools import groupby
    from timeit import Timer

    alist = [0xF0, 1, 2, 3, 0xF0, 4, 5, 6,
    0xF1, 7, 8, 0xF2, 9, 10, 11, 12, 13,
    0xF0, 14, 0xF1, 15]

    def doit(alist):
    ary = []
    for i in alist:
    if 0xf0 & i:
    ary.append()
    else:
    ary[-1].append(i)
    return [x for x in ary if x]

    def c(x):
    return 0xf0 & x

    def doit2(alist):
    i = (list(g) for k,g in groupby(alist, c))
    return [k for k in [j + i.next() for j in i] if len(k)>1]

    print doit(alist)

    print 'doit'
    t = Timer('doit(alist)',
    'from __main__ import groupby, doit, alist, c')
    print "%.2f usec/pass" % (1000000 * t.timeit(number=100000)/100000)

    print 'doit2'
    t = Timer('doit2(alist)',
    'from __main__ import groupby, doit2, alist, c')
    print "%.2f usec/pass" % (1000000 * t.timeit(number=100000)/100000)


    --
    James Stroud
    UCLA-DOE Institute for Genomics and Proteomics
    Box 951570
    Los Angeles, CA 90095

    http://www.jamesstroud.com/
     
    James Stroud, Jul 17, 2007
    #10
  11. Matimus Guest

    I did some more experimenting and came up with the code below. It
    shows several methods. When run, the script tests the robustness of
    each method (roughly), and profiles it using timeit. The results from
    running on my laptop are shown below the code.


    <code>
    seqs = [# Original:
    [0xF0, 1, 2, 3, 0xF0, 4, 5, 6, 0xF1, 7, 8, 0xF2, 9, 10, 11,
    12, 13,
    0xF0, 14, 0xF1, 15],
    # Single entry:
    [0xF0, 1, 2, 3],
    # empty
    [],
    # No values with 0x80 set
    [1, 2, 3, 14, 15],
    # Does not start with a value that has 0x80 set
    [1, 2, 3, 14, 15, 0xF0, 1, 2, 3]]

    expected = [# Original:
    [[0xF0, 1, 2, 3], [0xF0, 4, 5, 6], [0xF1, 7, 8], [0xF2, 9,
    10, 11, 12, 13],
    [0xF0, 14], [0xF1, 15]],
    # Single entry:
    [[0xF0, 1, 2, 3]],
    # empty
    [],
    # No values with 0x80 set
    [],
    # Does not start with a value that has 0x80 set
    [[0xF0, 1, 2, 3]]]

    def gengroups0(seq):
    group = None
    for val in seq:
    if val & 0x80:
    if group: yield group
    group = []
    try:
    group.append(val)
    except AttributeError:
    pass
    if group: yield group

    def getgroups0(seq):
    groups = []
    group = None
    for val in seq:
    if val & 0x80:
    if group:
    groups.append(group)
    group = []
    try:
    group.append(val)
    except AttributeError:
    pass
    if group:
    groups.append(group)
    return groups

    def gengroups1(seq):
    idxs = [i for i,v in enumerate(seq) if v&0x80]
    for i,j in zip(idxs,idxs[1:]+[None]):
    yield seq[i:j]

    def getgroups1(seq):
    idxs = [i for i,v in enumerate(seq) if v&0x80]
    return [seq[i:j] for i,j in zip(idxs,idxs[1:]+[None])]

    # Similar to the partition method on strings
    def partition(seq,f=None):
    if f is None:
    f = lambda x:x
    for i,v in enumerate(seq):
    if f(v):
    return seq[:i],[seq],seq[i+1:]
    return seq,[],[]

    def rpartition(seq, f=None):
    if f is None:
    f = lambda x:x
    for i,v in zip(range(len(seq)-1,-1,-1),seq[::-1]):
    if f(v):
    return seq[:i],[seq],seq[i+1:]
    return ([],[],seq)

    def gengroups2(seq):
    while seq:
    seq, key, val = rpartition(seq, lambda x: x&0x80)
    if key and val: yield key+val

    def getgroups2(seq):
    groups = []
    while seq:
    seq, key, val = rpartition(seq, lambda x: x&0x80)
    if key and val:
    groups.append(key+val)
    return groups

    def getgroups3(seq):
    groups = []
    for i in seq:
    if 0x80 & i:
    groups.append()
    else:
    groups[-1].append(i)
    return [x for x in groups if x]

    seq = seqs[0]
    if __name__ == "__main__":
    from timeit import Timer
    import __main__
    for i in range(4):
    fname = "getgroups"+str(i)
    f = getattr(__main__,fname)
    print fname
    for i,(s,e) in enumerate(zip(seqs,expected)):
    print "test %d:"%i,
    try:
    if f(s) == e:
    print "pass"
    else:
    print "fail"
    except:
    print "error"

    t = Timer(fname+'(seq)',
    'from __main__ import seq,'+fname)
    print "%.2f usec/pass" % (1000000 * t.timeit(number=100000)/
    100000)
    print
    </code>

    Output from running the above:
    getgroups0
    test 0: pass
    test 1: pass
    test 2: pass
    test 3: pass
    test 4: pass
    14.85 usec/pass

    getgroups1
    test 0: pass
    test 1: pass
    test 2: pass
    test 3: pass
    test 4: pass
    13.81 usec/pass

    getgroups2
    test 0: fail
    test 1: pass
    test 2: pass
    test 3: pass
    test 4: pass
    56.38 usec/pass

    getgroups3
    test 0: pass
    test 1: pass
    test 2: pass
    test 3: error
    test 4: error
    16.23 usec/pass

    `getgropus2' fails test 0 because it produces a reversed list. That
    can easily be fixed by re-reversing the output before returning. But,
    since it is by far the slowest method, I didn't bother.

    `getgroups3' is a method I got from another post in this thread, just
    for comparison.

    >From my benchmarks it looks like getgroups1 is the winner. I didn't

    scour the thread to test all the methods however.
     
    Matimus, Jul 17, 2007
    #11
  12. Ron Adam Guest

    Matimus wrote:
    > I did some more experimenting and came up with the code below. It
    > shows several methods. When run, the script tests the robustness of
    > each method (roughly), and profiles it using timeit. The results from
    > running on my laptop are shown below the code.


    Try this one...

    def groupdata(data, fn):
    start = 0
    for n in range(1, len(data)):
    if fn(data[n]):
    yield data[start:n]
    start = n
    yield data[start:]

    print list(groupdata(l, lambda x: x & 0x80))


    Cheers ;-)
    Ron



    > <code>
    > seqs = [# Original:
    > [0xF0, 1, 2, 3, 0xF0, 4, 5, 6, 0xF1, 7, 8, 0xF2, 9, 10, 11,
    > 12, 13,
    > 0xF0, 14, 0xF1, 15],
    > # Single entry:
    > [0xF0, 1, 2, 3],
    > # empty
    > [],
    > # No values with 0x80 set
    > [1, 2, 3, 14, 15],
    > # Does not start with a value that has 0x80 set
    > [1, 2, 3, 14, 15, 0xF0, 1, 2, 3]]
    >
    > expected = [# Original:
    > [[0xF0, 1, 2, 3], [0xF0, 4, 5, 6], [0xF1, 7, 8], [0xF2, 9,
    > 10, 11, 12, 13],
    > [0xF0, 14], [0xF1, 15]],
    > # Single entry:
    > [[0xF0, 1, 2, 3]],
    > # empty
    > [],
    > # No values with 0x80 set
    > [],
    > # Does not start with a value that has 0x80 set
    > [[0xF0, 1, 2, 3]]]
    >
    > def gengroups0(seq):
    > group = None
    > for val in seq:
    > if val & 0x80:
    > if group: yield group
    > group = []
    > try:
    > group.append(val)
    > except AttributeError:
    > pass
    > if group: yield group
    >
    > def getgroups0(seq):
    > groups = []
    > group = None
    > for val in seq:
    > if val & 0x80:
    > if group:
    > groups.append(group)
    > group = []
    > try:
    > group.append(val)
    > except AttributeError:
    > pass
    > if group:
    > groups.append(group)
    > return groups
    >
    > def gengroups1(seq):
    > idxs = [i for i,v in enumerate(seq) if v&0x80]
    > for i,j in zip(idxs,idxs[1:]+[None]):
    > yield seq[i:j]
    >
    > def getgroups1(seq):
    > idxs = [i for i,v in enumerate(seq) if v&0x80]
    > return [seq[i:j] for i,j in zip(idxs,idxs[1:]+[None])]
    >
    > # Similar to the partition method on strings
    > def partition(seq,f=None):
    > if f is None:
    > f = lambda x:x
    > for i,v in enumerate(seq):
    > if f(v):
    > return seq[:i],[seq],seq[i+1:]
    > return seq,[],[]
    >
    > def rpartition(seq, f=None):
    > if f is None:
    > f = lambda x:x
    > for i,v in zip(range(len(seq)-1,-1,-1),seq[::-1]):
    > if f(v):
    > return seq[:i],[seq],seq[i+1:]
    > return ([],[],seq)
    >
    > def gengroups2(seq):
    > while seq:
    > seq, key, val = rpartition(seq, lambda x: x&0x80)
    > if key and val: yield key+val
    >
    > def getgroups2(seq):
    > groups = []
    > while seq:
    > seq, key, val = rpartition(seq, lambda x: x&0x80)
    > if key and val:
    > groups.append(key+val)
    > return groups
    >
    > def getgroups3(seq):
    > groups = []
    > for i in seq:
    > if 0x80 & i:
    > groups.append()
    > else:
    > groups[-1].append(i)
    > return [x for x in groups if x]
    >
    > seq = seqs[0]
    > if __name__ == "__main__":
    > from timeit import Timer
    > import __main__
    > for i in range(4):
    > fname = "getgroups"+str(i)
    > f = getattr(__main__,fname)
    > print fname
    > for i,(s,e) in enumerate(zip(seqs,expected)):
    > print "test %d:"%i,
    > try:
    > if f(s) == e:
    > print "pass"
    > else:
    > print "fail"
    > except:
    > print "error"
    >
    > t = Timer(fname+'(seq)',
    > 'from __main__ import seq,'+fname)
    > print "%.2f usec/pass" % (1000000 * t.timeit(number=100000)/
    > 100000)
    > print
    > </code>
    >
    > Output from running the above:
    > getgroups0
    > test 0: pass
    > test 1: pass
    > test 2: pass
    > test 3: pass
    > test 4: pass
    > 14.85 usec/pass
    >
    > getgroups1
    > test 0: pass
    > test 1: pass
    > test 2: pass
    > test 3: pass
    > test 4: pass
    > 13.81 usec/pass
    >
    > getgroups2
    > test 0: fail
    > test 1: pass
    > test 2: pass
    > test 3: pass
    > test 4: pass
    > 56.38 usec/pass
    >
    > getgroups3
    > test 0: pass
    > test 1: pass
    > test 2: pass
    > test 3: error
    > test 4: error
    > 16.23 usec/pass
    >
    > `getgropus2' fails test 0 because it produces a reversed list. That
    > can easily be fixed by re-reversing the output before returning. But,
    > since it is by far the slowest method, I didn't bother.
    >
    > `getgroups3' is a method I got from another post in this thread, just
    > for comparison.
    >
    >>From my benchmarks it looks like getgroups1 is the winner. I didn't

    > scour the thread to test all the methods however.
    >
     
    Ron Adam, Jul 17, 2007
    #12
  13. Ron Adam Guest

    Matimus wrote:
    > I did some more experimenting and came up with the code below. It
    > shows several methods. When run, the script tests the robustness of
    > each method (roughly), and profiles it using timeit. The results from
    > running on my laptop are shown below the code.


    Try this one...

    def groupdata(data, fn):
    start = 0
    for n in range(1, len(data)):
    if fn(data[n]):
    yield data[start:n]
    start = n
    yield data[start:]

    print list(groupdata(l, lambda x: x & 0x80))


    Cheers ;-)
    Ron



    > <code>
    > seqs = [# Original:
    > [0xF0, 1, 2, 3, 0xF0, 4, 5, 6, 0xF1, 7, 8, 0xF2, 9, 10, 11,
    > 12, 13,
    > 0xF0, 14, 0xF1, 15],
    > # Single entry:
    > [0xF0, 1, 2, 3],
    > # empty
    > [],
    > # No values with 0x80 set
    > [1, 2, 3, 14, 15],
    > # Does not start with a value that has 0x80 set
    > [1, 2, 3, 14, 15, 0xF0, 1, 2, 3]]
    >
    > expected = [# Original:
    > [[0xF0, 1, 2, 3], [0xF0, 4, 5, 6], [0xF1, 7, 8], [0xF2, 9,
    > 10, 11, 12, 13],
    > [0xF0, 14], [0xF1, 15]],
    > # Single entry:
    > [[0xF0, 1, 2, 3]],
    > # empty
    > [],
    > # No values with 0x80 set
    > [],
    > # Does not start with a value that has 0x80 set
    > [[0xF0, 1, 2, 3]]]
    >
    > def gengroups0(seq):
    > group = None
    > for val in seq:
    > if val & 0x80:
    > if group: yield group
    > group = []
    > try:
    > group.append(val)
    > except AttributeError:
    > pass
    > if group: yield group
    >
    > def getgroups0(seq):
    > groups = []
    > group = None
    > for val in seq:
    > if val & 0x80:
    > if group:
    > groups.append(group)
    > group = []
    > try:
    > group.append(val)
    > except AttributeError:
    > pass
    > if group:
    > groups.append(group)
    > return groups
    >
    > def gengroups1(seq):
    > idxs = [i for i,v in enumerate(seq) if v&0x80]
    > for i,j in zip(idxs,idxs[1:]+[None]):
    > yield seq[i:j]
    >
    > def getgroups1(seq):
    > idxs = [i for i,v in enumerate(seq) if v&0x80]
    > return [seq[i:j] for i,j in zip(idxs,idxs[1:]+[None])]
    >
    > # Similar to the partition method on strings
    > def partition(seq,f=None):
    > if f is None:
    > f = lambda x:x
    > for i,v in enumerate(seq):
    > if f(v):
    > return seq[:i],[seq],seq[i+1:]
    > return seq,[],[]
    >
    > def rpartition(seq, f=None):
    > if f is None:
    > f = lambda x:x
    > for i,v in zip(range(len(seq)-1,-1,-1),seq[::-1]):
    > if f(v):
    > return seq[:i],[seq],seq[i+1:]
    > return ([],[],seq)
    >
    > def gengroups2(seq):
    > while seq:
    > seq, key, val = rpartition(seq, lambda x: x&0x80)
    > if key and val: yield key+val
    >
    > def getgroups2(seq):
    > groups = []
    > while seq:
    > seq, key, val = rpartition(seq, lambda x: x&0x80)
    > if key and val:
    > groups.append(key+val)
    > return groups
    >
    > def getgroups3(seq):
    > groups = []
    > for i in seq:
    > if 0x80 & i:
    > groups.append()
    > else:
    > groups[-1].append(i)
    > return [x for x in groups if x]
    >
    > seq = seqs[0]
    > if __name__ == "__main__":
    > from timeit import Timer
    > import __main__
    > for i in range(4):
    > fname = "getgroups"+str(i)
    > f = getattr(__main__,fname)
    > print fname
    > for i,(s,e) in enumerate(zip(seqs,expected)):
    > print "test %d:"%i,
    > try:
    > if f(s) == e:
    > print "pass"
    > else:
    > print "fail"
    > except:
    > print "error"
    >
    > t = Timer(fname+'(seq)',
    > 'from __main__ import seq,'+fname)
    > print "%.2f usec/pass" % (1000000 * t.timeit(number=100000)/
    > 100000)
    > print
    > </code>
    >
    > Output from running the above:
    > getgroups0
    > test 0: pass
    > test 1: pass
    > test 2: pass
    > test 3: pass
    > test 4: pass
    > 14.85 usec/pass
    >
    > getgroups1
    > test 0: pass
    > test 1: pass
    > test 2: pass
    > test 3: pass
    > test 4: pass
    > 13.81 usec/pass
    >
    > getgroups2
    > test 0: fail
    > test 1: pass
    > test 2: pass
    > test 3: pass
    > test 4: pass
    > 56.38 usec/pass
    >
    > getgroups3
    > test 0: pass
    > test 1: pass
    > test 2: pass
    > test 3: error
    > test 4: error
    > 16.23 usec/pass
    >
    > `getgropus2' fails test 0 because it produces a reversed list. That
    > can easily be fixed by re-reversing the output before returning. But,
    > since it is by far the slowest method, I didn't bother.
    >
    > `getgroups3' is a method I got from another post in this thread, just
    > for comparison.
    >
    >>From my benchmarks it looks like getgroups1 is the winner. I didn't

    > scour the thread to test all the methods however.
    >
     
    Ron Adam, Jul 17, 2007
    #13
  14. Ron Adam Guest

    Matt McCredie wrote:
    > That certainly is fast, unfortunately it doesn't pass all of the tests.
    > I came up with those tests so I don't know how important they are to the
    > original poster. I modified it and came up with a generator and a
    > non-generator version based (roughly) on your algorithm, that are almost
    > as quick, and pass all of the tests. Some of the modifications were done
    > just to make it quicker, so it would be fair when comparing against the
    > other methods. I hard-coded the comparison instead of using a function
    > and created a function that directly generates and returns a list
    > instead of a generator. I would probably use the generator version in my
    > code, but wrapping `list' around a generator adds about 4us (on my
    > machine). Anyway, getgroups7 passes all of the tests I mentioned and it
    > was timed at 10.37usec/pass. The down side: the code doesn't seem nearly
    > as elegant.
    >
    > Matt


    In most cases you wouldn't wrap the generator version in a list(), but use
    it directly as a loop iterator.


    A little renaming of variables helps it be a bit more elegant I think ...

    def getgroups8(seq):
    groups = []
    iseq = iter(xrange(len(seq)))
    for start in iseq:
    if seq[start] & 0x80:
    for stop in iseq:
    if seq[stop] & 0x80:
    groups.append(seq[start:stop])
    start = stop
    groups.append(seq[start:])
    return groups

    This passes all the tests and runs about the same speed.


    Cheers,
    Ron





    > <code>
    > def gengroups7(seq):
    > iseq = iter(xrange(len(seq)))
    > start = 0
    > for i in iseq:
    > if seq&0x80:
    > start = i
    > break
    > else:
    > return
    > for i in iseq:
    > if seq&0x80:
    > yield seq[start:i]
    > start = i
    > yield seq[start:]
    >
    >
    > def getgroups7(seq):
    > groups = []
    > iseq = iter(xrange(len(seq)))
    > start = 0
    > for i in iseq:
    > if seq&0x80:
    > start = i
    > break
    > else:
    > return groups
    > for i in iseq:
    > if seq&0x80:
    > groups.append(seq[start:i])
    > start = i
    > groups.append(seq[start:])
    > return groups
    >
    > </code>
    >
    >
    >
    >
     
    Ron Adam, Jul 18, 2007
    #14
  15. Ron Adam Guest

    Matt McCredie wrote:
    > That certainly is fast, unfortunately it doesn't pass all of the tests.
    > I came up with those tests so I don't know how important they are to the
    > original poster. I modified it and came up with a generator and a
    > non-generator version based (roughly) on your algorithm, that are almost
    > as quick, and pass all of the tests. Some of the modifications were done
    > just to make it quicker, so it would be fair when comparing against the
    > other methods. I hard-coded the comparison instead of using a function
    > and created a function that directly generates and returns a list
    > instead of a generator. I would probably use the generator version in my
    > code, but wrapping `list' around a generator adds about 4us (on my
    > machine). Anyway, getgroups7 passes all of the tests I mentioned and it
    > was timed at 10.37usec/pass. The down side: the code doesn't seem nearly
    > as elegant.
    >
    > Matt


    In most cases you wouldn't wrap the generator version in a list(), but use
    it directly as a loop iterator.


    A little renaming of variables helps it be a bit more elegant I think ...

    def getgroups8(seq):
    groups = []
    iseq = iter(xrange(len(seq)))
    for start in iseq:
    if seq[start] & 0x80:
    for stop in iseq:
    if seq[stop] & 0x80:
    groups.append(seq[start:stop])
    start = stop
    groups.append(seq[start:])
    return groups

    This passes all the tests and runs about the same speed.


    Cheers,
    Ron





    > <code>
    > def gengroups7(seq):
    > iseq = iter(xrange(len(seq)))
    > start = 0
    > for i in iseq:
    > if seq&0x80:
    > start = i
    > break
    > else:
    > return
    > for i in iseq:
    > if seq&0x80:
    > yield seq[start:i]
    > start = i
    > yield seq[start:]
    >
    >
    > def getgroups7(seq):
    > groups = []
    > iseq = iter(xrange(len(seq)))
    > start = 0
    > for i in iseq:
    > if seq&0x80:
    > start = i
    > break
    > else:
    > return groups
    > for i in iseq:
    > if seq&0x80:
    > groups.append(seq[start:i])
    > start = i
    > groups.append(seq[start:])
    > return groups
    >
    > </code>
    >
    >
    >
    >
     
    Ron Adam, Jul 18, 2007
    #15
  16. Miki Guest

    Hello Dan,

    Yet another option (using itertools.groupby):

    from itertools import groupby

    class GrouperToggler:
    def __init__(self):
    self.group = 1

    def __call__(self, value):
    # New packet, toggle group
    if value & 0x80:
    self.group = 1 - self.group
    return self.group

    def group(items):
    for group, items in groupby(items, GrouperToggler()):
    # groupby return [key, group_iterator]
    yield [item for item in items]

    i = [
    0xF0, 1, 2, 3,
    0xF0, 4, 5, 6,
    0xF1, 7, 8,
    0xF2, 9, 10, 11, 12, 13,
    0xF0, 14,
    0xF1, 15
    ]

    for g in group(i):
    print g

    HTH,
    --
    Miki <>
    http://pythonwise.blogspot.com
     
    Miki, Jul 18, 2007
    #16
  17. On Jul 17, 1:48 pm, Matimus <> wrote:
    > I did some more experimenting and came up with the code below. It
    > shows several methods. When run, the script tests the robustness of
    > each method (roughly), and profiles it using timeit. The results from
    > running on my laptop are shown below the code.
    >
    > <code>
    > seqs = [# Original:
    > [0xF0, 1, 2, 3, 0xF0, 4, 5, 6, 0xF1, 7, 8, 0xF2, 9, 10, 11,
    > 12, 13,
    > 0xF0, 14, 0xF1, 15],
    > # Single entry:
    > [0xF0, 1, 2, 3],
    > # empty
    > [],
    > # No values with 0x80 set
    > [1, 2, 3, 14, 15],
    > # Does not start with a value that has 0x80 set
    > [1, 2, 3, 14, 15, 0xF0, 1, 2, 3]]
    >
    > expected = [# Original:
    > [[0xF0, 1, 2, 3], [0xF0, 4, 5, 6], [0xF1, 7, 8], [0xF2, 9,
    > 10, 11, 12, 13],
    > [0xF0, 14], [0xF1, 15]],
    > # Single entry:
    > [[0xF0, 1, 2, 3]],
    > # empty
    > [],
    > # No values with 0x80 set
    > [],
    > # Does not start with a value that has 0x80 set
    > [[0xF0, 1, 2, 3]]]
    >
    > def gengroups0(seq):
    > group = None
    > for val in seq:
    > if val & 0x80:
    > if group: yield group
    > group = []
    > try:
    > group.append(val)
    > except AttributeError:
    > pass
    > if group: yield group
    >
    > def getgroups0(seq):
    > groups = []
    > group = None
    > for val in seq:
    > if val & 0x80:
    > if group:
    > groups.append(group)
    > group = []
    > try:
    > group.append(val)
    > except AttributeError:
    > pass
    > if group:
    > groups.append(group)
    > return groups
    >
    > def gengroups1(seq):
    > idxs = [i for i,v in enumerate(seq) if v&0x80]
    > for i,j in zip(idxs,idxs[1:]+[None]):
    > yield seq[i:j]
    >
    > def getgroups1(seq):
    > idxs = [i for i,v in enumerate(seq) if v&0x80]
    > return [seq[i:j] for i,j in zip(idxs,idxs[1:]+[None])]
    >
    > # Similar to the partition method on strings
    > def partition(seq,f=None):
    > if f is None:
    > f = lambda x:x
    > for i,v in enumerate(seq):
    > if f(v):
    > return seq[:i],[seq],seq[i+1:]
    > return seq,[],[]
    >
    > def rpartition(seq, f=None):
    > if f is None:
    > f = lambda x:x
    > for i,v in zip(range(len(seq)-1,-1,-1),seq[::-1]):
    > if f(v):
    > return seq[:i],[seq],seq[i+1:]
    > return ([],[],seq)
    >
    > def gengroups2(seq):
    > while seq:
    > seq, key, val = rpartition(seq, lambda x: x&0x80)
    > if key and val: yield key+val
    >
    > def getgroups2(seq):
    > groups = []
    > while seq:
    > seq, key, val = rpartition(seq, lambda x: x&0x80)
    > if key and val:
    > groups.append(key+val)
    > return groups
    >
    > def getgroups3(seq):
    > groups = []
    > for i in seq:
    > if 0x80 & i:
    > groups.append()
    > else:
    > groups[-1].append(i)
    > return [x for x in groups if x]
    >
    > seq = seqs[0]
    > if __name__ == "__main__":
    > from timeit import Timer
    > import __main__
    > for i in range(4):
    > fname = "getgroups"+str(i)
    > f = getattr(__main__,fname)
    > print fname
    > for i,(s,e) in enumerate(zip(seqs,expected)):
    > print "test %d:"%i,
    > try:
    > if f(s) == e:
    > print "pass"
    > else:
    > print "fail"
    > except:
    > print "error"
    >
    > t = Timer(fname+'(seq)',
    > 'from __main__ import seq,'+fname)
    > print "%.2f usec/pass" % (1000000 * t.timeit(number=100000)/
    > 100000)
    > print
    > </code>
    >
    > Output from running the above:
    > getgroups0
    > test 0: pass
    > test 1: pass
    > test 2: pass
    > test 3: pass
    > test 4: pass
    > 14.85 usec/pass
    >
    > getgroups1
    > test 0: pass
    > test 1: pass
    > test 2: pass
    > test 3: pass
    > test 4: pass
    > 13.81 usec/pass
    >
    > getgroups2
    > test 0: fail
    > test 1: pass
    > test 2: pass
    > test 3: pass
    > test 4: pass
    > 56.38 usec/pass
    >
    > getgroups3
    > test 0: pass
    > test 1: pass
    > test 2: pass
    > test 3: error
    > test 4: error
    > 16.23 usec/pass
    >
    > `getgropus2' fails test 0 because it produces a reversed list. That
    > can easily be fixed by re-reversing the output before returning. But,
    > since it is by far the slowest method, I didn't bother.
    >
    > `getgroups3' is a method I got from another post in this thread, just
    > for comparison.
    >
    > >From my benchmarks it looks like getgroups1 is the winner. I didn't

    >
    > scour the thread to test all the methods however.


    Here's a small improvement of getgroups1, both in time and memory:

    from itertools import islice

    def getgroups4(seq):
    idxs = [i for i,v in enumerate(seq) if v&0x80]
    idxs.append(None)
    return [seq[i:j] for i,j in izip(idxs, islice(idxs,1,None))]


    George
     
    George Sakkis, Jul 19, 2007
    #17
  18. Matimus Guest

    > A little renaming of variables helps it be a bit more elegant I think ...
    >
    > def getgroups8(seq):
    > groups = []
    > iseq = iter(xrange(len(seq)))
    > for start in iseq:
    > if seq[start] & 0x80:
    > for stop in iseq:
    > if seq[stop] & 0x80:
    > groups.append(seq[start:stop])
    > start = stop
    > groups.append(seq[start:])
    > return groups


    Excellent work! One more modification and I get below 10us/pass:

    def getgroups(seq):
    groups = []
    push = groups.append
    iseq = iter(xrange(len(seq)))
    for start in iseq:
    if seq[start] & 0x80:
    for stop in iseq:
    if seq[stop] & 0x80:
    push(seq[start:stop])
    start = stop
    push(seq[start:])
    return groups

    -Matt
     
    Matimus, Jul 19, 2007
    #18
  19. Ron Adam Guest

    Matimus wrote:

    > Excellent work! One more modification and I get below 10us/pass:
    >
    > def getgroups(seq):
    > groups = []
    > push = groups.append
    > iseq = iter(xrange(len(seq)))
    > for start in iseq:
    > if seq[start] & 0x80:
    > for stop in iseq:
    > if seq[stop] & 0x80:
    > push(seq[start:stop])
    > start = stop
    > push(seq[start:])
    > return groups
    >
    > -Matt



    Looks good to me. :)

    So a generator versions would be...

    (not tested)

    def getgroups(seq):
    iseq = iter(xrange(len(seq)))
    for start in iseq:
    if seq[start] & 0x80:
    for stop in iseq:
    if seq[stop] & 0x80:
    yield seq[start:stop]
    start = stop
    yield seq[start:]

    (I also wanted to compare this to Georges solution, maybe later.)

    Now if there is some way to generalize this so it can be used in a broader
    range of situations without loosing too much of it's efficiency. Of course
    then maybe group by would be better. <shrug>

    Cheers,
    Ron
     
    Ron Adam, Jul 20, 2007
    #19
  20. Ron Adam Guest

    Matimus wrote:

    > Excellent work! One more modification and I get below 10us/pass:
    >
    > def getgroups(seq):
    > groups = []
    > push = groups.append
    > iseq = iter(xrange(len(seq)))
    > for start in iseq:
    > if seq[start] & 0x80:
    > for stop in iseq:
    > if seq[stop] & 0x80:
    > push(seq[start:stop])
    > start = stop
    > push(seq[start:])
    > return groups
    >
    > -Matt



    Looks good to me. :)

    So a generator versions would be...

    (not tested)

    def getgroups(seq):
    iseq = iter(xrange(len(seq)))
    for start in iseq:
    if seq[start] & 0x80:
    for stop in iseq:
    if seq[stop] & 0x80:
    yield seq[start:stop]
    start = stop
    yield seq[start:]

    (I also wanted to compare this to Georges solution, maybe later.)

    Now if there is some way to generalize this so it can be used in a broader
    range of situations without loosing too much of it's efficiency. Of course
    then maybe group by would be better. <shrug>

    Cheers,
    Ron
     
    Ron Adam, Jul 20, 2007
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Petra Hübner
    Replies:
    0
    Views:
    449
    Petra Hübner
    Feb 16, 2004
  2. anonymous
    Replies:
    1
    Views:
    4,599
    Francisco Padron
    May 8, 2005
  3. MetalOne
    Replies:
    4
    Views:
    335
    MetalOne
    Feb 22, 2004
  4. Replies:
    3
    Views:
    365
    Peter Hansen
    Jun 10, 2005
  5. Replies:
    12
    Views:
    968
Loading...

Share This Page