"Collapsing" a list into a list of changes

Jeremy Bowers · Feb 4, 2005

def straightforward_collapse(myList):
collapsed = [myList[0]]
for n in myList[1:]:
if n != collapsed[-1]:
collapsed.append(n)

return collapsed

Is there an elegant way to do this, or should I just stick with the code
above?

I think that's pretty elegant; I read it and immediately understood what
you were doing. There may be some performance tweaks you could make if you
were doing this to large lists, and my instincts say to re-write it as an
iterator if you use it a lot like:

for item in collapse(yourList):

but other than that which may not even apply, "straightforward" is
generally a *good* thing, don't you think?

Alan McIntyre · Feb 4, 2005

Hi all,

I have a list of items that has contiguous repetitions of values, but
the number and location of the repetitions is not important, so I just
need to strip them out. For example, if my original list is
[0,0,1,1,1,2,2,3,3,3,2,2,2,4,4,4,5], I want to end up with [0,1,2,3,2,4,5].

Here is the way I'm doing this now:

def straightforward_collapse(myList):
collapsed = [myList[0]]
for n in myList[1:]:
if n != collapsed[-1]:
collapsed.append(n)

return collapsed

Is there an elegant way to do this, or should I just stick with the code
above?

Thanks,
Alan McIntyre
http://www.esrgtech.com

Steven Bethard · Feb 4, 2005

Alan said:
Hi all,

I have a list of items that has contiguous repetitions of values, but
the number and location of the repetitions is not important, so I just
need to strip them out. For example, if my original list is
[0,0,1,1,1,2,2,3,3,3,2,2,2,4,4,4,5], I want to end up with [0,1,2,3,2,4,5].

Here is the way I'm doing this now:

def straightforward_collapse(myList):
collapsed = [myList[0]]
for n in myList[1:]:
if n != collapsed[-1]:
collapsed.append(n)

return collapsed

Is there an elegant way to do this, or should I just stick with the code
above?

Well, this does about the same thing, but using enumerate and a list
comprehension:

py> lst = [0,0,1,1,1,2,2,3,3,3,2,2,2,4,4,4,5]
py> [item for i, item in enumerate(lst) if i == 0 or item != lst[i-1]]
[0, 1, 2, 3, 2, 4, 5]

Similar code that doesn't check 'if i == 0' each time through:

py> itr = enumerate(lst)
py> itr.next()
(0, 0)
py> [lst[0]] + [item for i, item in itr if item != lst[i-1]]
[0, 1, 2, 3, 2, 4, 5]

I don't know if either of these is really more elegant though...

Steve

Jack Diederich · Feb 4, 2005

Hi all,

I have a list of items that has contiguous repetitions of values, but
the number and location of the repetitions is not important, so I just
need to strip them out. For example, if my original list is
[0,0,1,1,1,2,2,3,3,3,2,2,2,4,4,4,5], I want to end up with [0,1,2,3,2,4,5].

Here is the way I'm doing this now:

def straightforward_collapse(myList):
collapsed = [myList[0]]
for n in myList[1:]:
if n != collapsed[-1]:
collapsed.append(n)

return collapsed

Is there an elegant way to do this, or should I just stick with the code
above?

If you are using python2.4,

import itertools as it
[x[0] for (x) in it.groupby([0,0,1,1,1,2,2,3,3,3,2,2,2,4,4,4,5])] [0, 1, 2, 3, 2, 4, 5]

Click to expand...

Click to expand...

Since this is 2.4 you could also return a generator expression.
.... return (x[0] for (x) in it.groupby([0,0,1,1,1,2,2,3,3,3,2,2,2,4,4,4,5]))
....

i = iter_collapse([0,0,1,1,1,2,2,3,3,3,2,2,2,4,4,4,5])
i

Click to expand...

list(i) [0, 1, 2, 3, 2, 4, 5]

Click to expand...

Click to expand...

-Jack

Alan McIntyre · Feb 4, 2005

Your first example is along the lines of what I was thinking when I said
"elegant."

I was looking for something that I could drop into one
or two lines of code; I may not do that if I'm writing code that will
have to be maintained, but it's still nice to know how to do it.

Thanks

Alan

Steven Bethard · Feb 4, 2005

Alan said:
Hi all,

I have a list of items that has contiguous repetitions of values, but
the number and location of the repetitions is not important, so I just
need to strip them out. For example, if my original list is
[0,0,1,1,1,2,2,3,3,3,2,2,2,4,4,4,5], I want to end up with [0,1,2,3,2,4,5].

Here is the way I'm doing this now:

def straightforward_collapse(myList):
collapsed = [myList[0]]
for n in myList[1:]:
if n != collapsed[-1]:
collapsed.append(n)

return collapsed

Is there an elegant way to do this, or should I just stick with the code
above?

Here's a solution that works for iterables other than lists:

py> def collapse(iterable):
.... enumeration = enumerate(iterable)
.... _, lastitem = enumeration.next()
.... yield lastitem
.... for i, item in enumeration:
.... if item != lastitem:
.... yield item
.... lastitem = item
....
py> lst = [0,0,1,1,1,2,2,3,3,3,2,2,2,4,4,4,5]
py> list(collapse(lst))
[0, 1, 2, 3, 2, 4, 5]

Again, I'm still not sure I'd call this more elegant...

STeVe

Alan McIntyre · Feb 4, 2005

I think you're right; sometimes I'm susceptible to the "I can do that in
one line of code" temptation.

Since this current bit of code will probably end up in something that's
going to be maintained, I will probably stick with the straightforward
method just to be nice to the maintainer (especially if it's me!).

Alan McIntyre · Feb 4, 2005

Jack,

I'm not using 2.4 yet; still back in 2.3x.

Thanks for the examples,
though - they are clear enough that I will probably use them when I upgrade.

Thanks,
Alan

Jack said:
If you are using python2.4,

import itertools as it
[x[0] for (x) in it.groupby([0,0,1,1,1,2,2,3,3,3,2,2,2,4,4,4,5])]

Click to expand...

Click to expand...

[0, 1, 2, 3, 2, 4, 5]

Since this is 2.4 you could also return a generator expression.

... return (x[0] for (x) in it.groupby([0,0,1,1,1,2,2,3,3,3,2,2,2,4,4,4,5]))
...

i = iter_collapse([0,0,1,1,1,2,2,3,3,3,2,2,2,4,4,4,5])
i

Click to expand...

list(i)

Click to expand...

Click to expand...

[0, 1, 2, 3, 2, 4, 5]

-Jack

Mike C. Fletcher · Feb 4, 2005

Alan McIntyre wrote:
....

I have a list of items that has contiguous repetitions of values, but
the number and location of the repetitions is not important, so I just
need to strip them out. For example, if my original list is
[0,0,1,1,1,2,2,3,3,3,2,2,2,4,4,4,5], I want to end up with
[0,1,2,3,2,4,5].
....

Is there an elegant way to do this, or should I just stick with the
code above?

.... last = None
.... for value in dataset:
.... if value != last:
.... yield value
.... last = value
....
which is quite readable/elegant IMO.

Have fun,
Mike

________________________________________________
Mike C. Fletcher
Designer, VR Plumber, Coder
http://www.vrplumber.com
http://blog.vrplumber.com
PyCon is coming...

Steven Bethard · Feb 4, 2005

Mike said:
Alan McIntyre wrote:
...

I have a list of items that has contiguous repetitions of values, but
the number and location of the repetitions is not important, so I just
need to strip them out. For example, if my original list is
[0,0,1,1,1,2,2,3,3,3,2,2,2,4,4,4,5], I want to end up with
[0,1,2,3,2,4,5].

...

Is there an elegant way to do this, or should I just stick with the
code above?

def changes( dataset ):

Click to expand...

Click to expand...

... last = None
... for value in dataset:
... if value != last:
... yield value
... last = value
... >>> print list(changes(data ))

which is quite readable/elegant IMO.

But fails if the list starts with None:

py> lst = [None,0,0,1,1,1,2,2,3,3,3,2,2,2,4,4,4,5]
py> def changes(dataset):
.... last = None
.... for value in dataset:
.... if value != last:
.... yield value
.... last = value
....
py> list(changes(lst))
[0, 1, 2, 3, 2, 4, 5]

A minor modification that does appear to work:

py> def changes(dataset):
.... last = object()
.... for value in dataset:
.... if value != last:
.... yield value
.... last = value
....
py> list(changes(lst))
[None, 0, 1, 2, 3, 2, 4, 5]

STeVe

John J. Lee · Feb 4, 2005

Steven Bethard said:
Mike C. Fletcher wrote: [...]

... last = None
... for value in dataset:
... if value != last:
... yield value
... last = value
... >>> print list(changes(data ))
which is quite readable/elegant IMO.

Click to expand...

But fails if the list starts with None:

py> lst = [None,0,0,1,1,1,2,2,3,3,3,2,2,2,4,4,4,5]
py> def changes(dataset):
... last = None
... for value in dataset:
... if value != last:
... yield value
... last = value
...
py> list(changes(lst))
[0, 1, 2, 3, 2, 4, 5]

A minor modification that does appear to work:

py> def changes(dataset):
... last = object()
... for value in dataset:
... if value != last:
... yield value
... last = value
...
py> list(changes(lst))
[None, 0, 1, 2, 3, 2, 4, 5]

Unless the first object in the list has a weird __cmp__ (does
happen...). OK, weird __cmp__s are nasty anyway, but still, why
compound it through cleverness when you can write a really plodding
function that *always* does what it says on the tin?

clever-is-evil-ly y'rs,

John

Nick Coghlan · Feb 4, 2005

John said:
Mike C. Fletcher wrote:
[...]

def changes( dataset ):
... last = None
... for value in dataset:
... if value != last:
... yield value
... last = value
... >>> print list(changes(data ))
which is quite readable/elegant IMO.

Click to expand...

But fails if the list starts with None:

py> lst = [None,0,0,1,1,1,2,2,3,3,3,2,2,2,4,4,4,5]
py> def changes(dataset):
... last = None
... for value in dataset:
... if value != last:
... yield value
... last = value
...
py> list(changes(lst))
[0, 1, 2, 3, 2, 4, 5]

A minor modification that does appear to work:

py> def changes(dataset):
... last = object()
... for value in dataset:
... if value != last:
... yield value
... last = value
...
py> list(changes(lst))
[None, 0, 1, 2, 3, 2, 4, 5]

Click to expand...

Unless the first object in the list has a weird __cmp__ (does
happen...). OK, weird __cmp__s are nasty anyway, but still, why
compound it through cleverness when you can write a really plodding
function that *always* does what it says on the tin?

In that case:

Py> def collapsed(iterable):
.... itr = iter(iterable)
.... last = itr.next()
.... yield last
.... for value in itr:
.... if value != last:
.... yield value
.... last = value
....
Py> from Tkinter import _flatten #**
Py> lst = list(_flatten([ * 5 for i in range(10)]))
Py> lst
[0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 5, 5
, 5, 5, 5, 6, 6, 6, 6, 6, 7, 7, 7, 7, 7, 8, 8, 8, 8, 8, 9, 9, 9, 9, 9]
Py> list(collapsed(lst))
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

** Hmm, this function really is quite handy when generating test data. Maybe in
Python 2.5 the line should be spelt "from itertools import flatten"

Cheers,
Nick.

Nick Coghlan · Feb 4, 2005

Jack said:
Since this is 2.4 you could also return a generator expression.

... return (x[0] for (x) in it.groupby([0,0,1,1,1,2,2,3,3,3,2,2,2,4,4,4,5]))
...

But why write a function that returns a generator expression, when you could
just turn the function itself into a generator?

Py>def iter_collapse(myList):
.... for x in it.groupby(myList):
.... yield x[0]

Cheers,
Nick.

Alex Martelli · Feb 5, 2005

Steven Bethard said:
Here's a solution that works for iterables other than lists:

py> def collapse(iterable):
... enumeration = enumerate(iterable)
... _, lastitem = enumeration.next()
... yield lastitem
... for i, item in enumeration:
... if item != lastitem:
... yield item
... lastitem = item
...
py> lst = [0,0,1,1,1,2,2,3,3,3,2,2,2,4,4,4,5]
py> list(collapse(lst))
[0, 1, 2, 3, 2, 4, 5]

Again, I'm still not sure I'd call this more elegant...

Hmmmm, what role does the enumeration play here? I don't see how you're
using it, at all. Why not just:

def collapse(iterable):
it = iter(iterable)
lastitem = it.next()
yield lastitem
for item in it:
if item != lastitem:
yield item
lastitem = item

that's basically just the same as your code but without the strangeness
of making an enumerate for the purpose of ignoring what the enumerate
adds to an ordinary iterator.

Alex

Tony · Feb 5, 2005

Alan said:
Hi all,

I have a list of items that has contiguous repetitions of values, but

the number and location of the repetitions is not important, so I just
need to strip them out. For example, if my original list is
[0,0,1,1,1,2,2,3,3,3,2,2,2,4,4,4,5], I want to end up with [0,1,2,3,2,4,5].

Here is the way I'm doing this now:

def straightforward_collapse(myList):
collapsed = [myList[0]]
for n in myList[1:]:
if n != collapsed[-1]:
collapsed.append(n)

return collapsed

Is there an elegant way to do this, or should I just stick with the code
above?

Thanks,
Alan McIntyre
http://www.esrgtech.com

Here is a version using dictionary properties, ie no duplication of
keys.

def condense(l):
d={}
for item in l:
d[item]=1
l=d.keys()
return l

Cheers
Tony

Alan McIntyre · Feb 5, 2005

Tony,

Actually I only want to remove a certain kind of duplication; if an item
occurs twice - say like this: [1,1,1,2,2,2,1,1,1], then I need to keep
the order and occurrence of the individual values: [1,2,1]. Using a
dict as you proposed loses the order of occurrence, as well as multiple
occurrences of groups of the same item.

If I didn't need those two qualities of the list to be preserved,
though, I think I'd use something like your solution (if I was using a
Python older than 2.3) or Steve Coats' solution posted above using Set.

Thanks!
Alan

Alan McIntyre · Feb 5, 2005

Alex,

Wow, that method turns out to be the fastest so far in a simple
benchmark on Python2.3 (on my machine, of course, YMMV); it takes 14%
less time than the one that I deemed most straightforward.

Thanks,
Alan

Jack Diederich · Feb 5, 2005

Jack said:
Jack said:

Since this is 2.4 you could also return a generator expression.

def iter_collapse(myList):

Click to expand...

... return (x[0] for (x) in
it.groupby([0,0,1,1,1,2,2,3,3,3,2,2,2,4,4,4,5]))
...

Click to expand...

But why write a function that returns a generator expression, when you
could just turn the function itself into a generator?

Py>def iter_collapse(myList):
... for x in it.groupby(myList):
... yield x[0]

Here is where I say, "because it is faster!"
except it isn't. maybe. The results are unexpected, anyway.

import timeit
import itertools as it

def collapse1(myl):
for x in it.groupby(myl):
yield x[0]

def collapse2(myl):
return (x[0] for (x) in it.groupby(myl))

list_str = '[0,0,1,1,1,2,2,3,3,3,2,2,2,4,4,4,5]'

print "collapse1", timeit.Timer('collapse1(%s)' % (list_str), 'from __main__ import collapse1').timeit()
print "collapse2", timeit.Timer('collapse2(%s)' % (list_str), 'from __main__ import collapse2').timeit()
print "list(collapse1)", timeit.Timer('list(collapse1(%s))' % (list_str), 'from __main__ import collapse1').timeit()
print "list(collapse2)", timeit.Timer('list(collapse2(%s))' % (list_str), 'from __main__ import collapse2').timeit()

collapse1 1.06855082512
collapse2 3.40627384186
list(collapse1) 8.31489896774
list(collapse2) 9.49903011322

The overhead of creating the generator expression seems much higher than
creating the equivalent function. If you subtract our the setup difference
actually running through the whole iterator is slightly faster for genexps
than functions that yield. At a guess it has something to do with how
they handle lexical scope and some byte code differences.

I said guess, right?

-Jack

Tony · Feb 5, 2005

Alan said:
Tony,

Actually I only want to remove a certain kind of duplication; <snip>

How about this one liner?
def condense(m):
print [m[0]]+[m[k] for k in range(1,len(m)) if
m[k]!=m[k-1]]
b=[1,1,1,2,2,2,1,1,1]
condense(b)
Tony Clarke

Steven Bethard · Feb 5, 2005

Alex said:
Here's a solution that works for iterables other than lists:

py> def collapse(iterable):
... enumeration = enumerate(iterable)
... _, lastitem = enumeration.next()
... yield lastitem
... for i, item in enumeration:
... if item != lastitem:
... yield item
... lastitem = item
...
py> lst = [0,0,1,1,1,2,2,3,3,3,2,2,2,4,4,4,5]
py> list(collapse(lst))
[0, 1, 2, 3, 2, 4, 5]

Again, I'm still not sure I'd call this more elegant...

Click to expand...

Hmmmm, what role does the enumeration play here?

None =) Sorry, it was an accidental carry-over from the list solution
that looked at lst[i-1]. I thought about correcting it when I realized
this (after posting), but I was too lazy.

Steve

Problem with Collapsing List Script	0	Nov 20, 2007
is list comprehension necessary?	15	Oct 26, 2010
groupby - summing multiple columns in a list of lists	1	May 17, 2011
make sublists of a list broken at nth certain list items	2	Jul 8, 2013
How to try a range of hex values in C# code ?	0	Nov 19, 2022
Engineering a list container. Part 1.	71	Dec 7, 2013
can't generate iterator from list	3	Sep 9, 2011
Even basic math is at risk? Why is a simple math and logic solution being ignored?	2	Jul 3, 2025

"Collapsing" a list into a list of changes

Jeremy Bowers

Alan McIntyre

Steven Bethard

Jack Diederich

Alan McIntyre

Steven Bethard

Alan McIntyre

Alan McIntyre

Mike C. Fletcher

Steven Bethard

John J. Lee

Nick Coghlan

Nick Coghlan

Alex Martelli

Tony

Alan McIntyre

Alan McIntyre

Jack Diederich

Tony

Steven Bethard

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads