Chunking sequential values in a list

D

David Hirschfield

I have this function:

def sequentialChunks(l, stride=1):
chunks = []
chunk = []
for i,v in enumerate(l[:-1]):
v2 = l[i+1]
if v2-v == stride:
if not chunk:
chunk.append(v)
chunk.append(v2)
else:
if not chunk:
chunk.append(v)
chunks.append(chunk)
chunk = []
if chunk:
chunks.append(chunk)
return chunks

Which takes a list of numerical values "l" and splits it into chunks
where each chunk is sequential, where sequential means each value in a
chunk is
separated from the next by "stride".

So sequentialChunks([1,2,3,5,6,8,12]) returns:

[[1,2,3],[5,6],[8],[12]]

I don't think the code above is the most efficient way to do this, but
it is relatively clear. I tried fiddling with list-comprehension ways of
accomplishing it, but kept losing track of things...so if anyone has a
suggestion, I'd appreciate it.

Thanks,
-Dave
 
B

bearophileHUGS

It looks like homework. Sometimes the simpler code is better:

def splitter(seq):
if not seq:
return []
result = []
current = [seq[0]]
for pos, el in enumerate(seq[1:]):
if el - current[-1] > 1:
result.append(current[:])
current = []
current.append(el)
result.append(current)
return result

print splitter([1, 2, 3, 5, 6, 8, 12])
print splitter([1, 1, 1, 2, 2, 5, 6])
print splitter([1, 1, 1, 1, 1, 1, 1])
print splitter([1, 2, 3])
print splitter([1])
print splitter([])
print splitter([1.0, 1.1, 1.99, 4.01])
print splitter([1.0, 2.0, 3.0, 4.01])

To reduce memory used you can use islice instead of the slice seq[1:]
If you want something less easy to understand, you can try fixing the
following code that doesn't work:

from itertools import groupby
l = [1,2,3,5,6,8,12]
keyf = lambda (pos,x): x-l[pos]>1
[[el[1] for el in gr] for he,gr in groupby(enumerate(l[:-1]), keyf)]

Bye,
bearophile
 
G

Gerard Flanagan

David said:
I have this function:

def sequentialChunks(l, stride=1):
chunks = []
chunk = []
for i,v in enumerate(l[:-1]):
v2 = l[i+1]
if v2-v == stride:
if not chunk:
chunk.append(v)
chunk.append(v2)
else:
if not chunk:
chunk.append(v)
chunks.append(chunk)
chunk = []
if chunk:
chunks.append(chunk)
return chunks

Which takes a list of numerical values "l" and splits it into chunks
where each chunk is sequential, where sequential means each value in a
chunk is
separated from the next by "stride".

So sequentialChunks([1,2,3,5,6,8,12]) returns:

[[1,2,3],[5,6],[8],[12]]

I don't think the code above is the most efficient way to do this, but
it is relatively clear. I tried fiddling with list-comprehension ways of
accomplishing it, but kept losing track of things...so if anyone has a
suggestion, I'd appreciate it.

see the groupby example here:

http://docs.python.org/lib/itertools-example.html

Gerard
 
S

skip

Gerard> see the groupby example here:

Gerard> http://docs.python.org/lib/itertools-example.html

I've never been a fan of the operator module, so I find the example in the
docs ugly even though it's succinct. I'd also never used groupby but
thought it was the solution to the problem. As I was walking home from the
train I realized how it would work. You beat me to the punch with your
reply, but I'll post my solution anyway:

from itertools import groupby

class Counter:
def __init__(self):
self.last = None
self.value = True

def __call__(self, val):
if self.last is not None and self.last+1 != val:
self.value = not self.value
self.last = val
return self.value

for key, group in groupby([1,2,3, 5,6, 8, 12,13,14], Counter()):
print list(group)

Skip
 
P

Paul Rubin

David Hirschfield said:
So sequentialChunks([1,2,3,5,6,8,12]) returns:
[[1,2,3],[5,6],[8],[12]]

Ugly and not too efficient: find the break points and use them to make
sub-lists.

def sequentialChunks(l, stride=1):
p = [0] + [i for i in xrange(1,len(l)) if l-l[i-1] != stride] + [len(l)]
return [l[p:p[i+1]] for i in xrange(len(p)-1)]

print sequentialChunks([1,2,3,5,6,8,12])
 
G

Gerard Flanagan

Gerard> see the groupby example here:

Gerard> http://docs.python.org/lib/itertools-example.html

I've never been a fan of the operator module, so I find the example in the
docs ugly even though it's succinct. I'd also never used groupby but
thought it was the solution to the problem. As I was walking home from the
train I realized how it would work. You beat me to the punch with your
reply, but I'll post my solution anyway:

from itertools import groupby

class Counter:
def __init__(self):
self.last = None
self.value = True

def __call__(self, val):
if self.last is not None and self.last+1 != val:
self.value = not self.value
self.last = val
return self.value

for key, group in groupby([1,2,3, 5,6, 8, 12,13,14], Counter()):
print list(group)

I've no strong feelings about the operator module myself, or the
groupby example, but I do prefer your code in this case. I tweaked it
a little to take into account the stride, though I'm not sure it gives
what the OP wants.

all the best

Gerard

-----------------------------

class Counter:
def __init__(self, stride=1):
self.last = None
self.value = True
self.stride = stride

def __call__(self, val):
if self.last is not None:
d = val - self.last
if d > self.stride:
self.value = not self.value
self.last = val
return self.value

def group(data,stride=1):
for key, group in groupby(data, Counter(stride)):
yield list(group)

-----------------------------------
 
G

Gerard Flanagan

David said:
I have this function:

def sequentialChunks(l, stride=1):
chunks = []
chunk = []
for i,v in enumerate(l[:-1]):
v2 = l[i+1]
if v2-v == stride:
if not chunk:
chunk.append(v)
chunk.append(v2)
else:
if not chunk:
chunk.append(v)
chunks.append(chunk)
chunk = []
if chunk:
chunks.append(chunk)
return chunks

Which takes a list of numerical values "l" and splits it into chunks
where each chunk is sequential, where sequential means each value in a
chunk is
separated from the next by "stride".

So sequentialChunks([1,2,3,5,6,8,12]) returns:

[[1,2,3],[5,6],[8],[12]]

I don't think the code above is the most efficient way to do this, but
it is relatively clear. I tried fiddling with list-comprehension ways of
accomplishing it, but kept losing track of things...so if anyone has a
suggestion, I'd appreciate it.

tweaking the example from the docs to take the stride into account:

def stride(length):
i = 0
while True:
yield i
i += length

def group(data,d=1):
for k, g in groupby(zip(stride(d),data), lambda (i,x):i-x):
yield map(operator.itemgetter(1), g)

data = [1,2,3, 5, 6, 8, 12,13,14]

print list( group( data,2 ) )

hth

Gerard
 
P

Peter Otten

If you want something less easy to understand, you can try fixing the
following code that doesn't work:

from itertools import groupby
l = [1,2,3,5,6,8,12]
keyf = lambda (pos,x): x-l[pos]>1
[[el[1] for el in gr] for he,gr in groupby(enumerate(l[:-1]), keyf)]
from itertools import groupby
items = [1, 2, 3, 5, 6, 8, 12]
[[v for i, v in group] for key, group in groupby(enumerate(items),
lambda (i, v): i-v)]
[[1, 2, 3], [5, 6], [8], [12]]

which is almost identical to the last example in
http://docs.python.org/lib/itertools-example.html

Peter
 
B

bearophileHUGS

Peter Otten:
which is almost identical to the last example in
http://docs.python.org/lib/itertools-example.html

I see, thank you. I haven't had enoug time and brain to fix it (and the
OP question seemed like homework, so leaving some other things to do is
positive).

I think still that too much clever code can be bad, it isn't easy to
understand, fix and modify if you need something a little different.
To solve such problem in 'production code' I probably prefer a longer
function like the one I have shown.

Bye,
bearophile
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,579
Members
45,053
Latest member
BrodieSola

Latest Threads

Top