"Collapsing" a list into a list of changes

J

Jeremy Bowers

def straightforward_collapse(myList):
collapsed = [myList[0]]
for n in myList[1:]:
if n != collapsed[-1]:
collapsed.append(n)

return collapsed

Is there an elegant way to do this, or should I just stick with the code
above?

I think that's pretty elegant; I read it and immediately understood what
you were doing. There may be some performance tweaks you could make if you
were doing this to large lists, and my instincts say to re-write it as an
iterator if you use it a lot like:

for item in collapse(yourList):

but other than that which may not even apply, "straightforward" is
generally a *good* thing, don't you think? :)
 
A

Alan McIntyre

Hi all,

I have a list of items that has contiguous repetitions of values, but
the number and location of the repetitions is not important, so I just
need to strip them out. For example, if my original list is
[0,0,1,1,1,2,2,3,3,3,2,2,2,4,4,4,5], I want to end up with [0,1,2,3,2,4,5].

Here is the way I'm doing this now:

def straightforward_collapse(myList):
collapsed = [myList[0]]
for n in myList[1:]:
if n != collapsed[-1]:
collapsed.append(n)

return collapsed

Is there an elegant way to do this, or should I just stick with the code
above?

Thanks,
Alan McIntyre
http://www.esrgtech.com
 
S

Steven Bethard

Alan said:
Hi all,

I have a list of items that has contiguous repetitions of values, but
the number and location of the repetitions is not important, so I just
need to strip them out. For example, if my original list is
[0,0,1,1,1,2,2,3,3,3,2,2,2,4,4,4,5], I want to end up with [0,1,2,3,2,4,5].

Here is the way I'm doing this now:

def straightforward_collapse(myList):
collapsed = [myList[0]]
for n in myList[1:]:
if n != collapsed[-1]:
collapsed.append(n)

return collapsed

Is there an elegant way to do this, or should I just stick with the code
above?

Well, this does about the same thing, but using enumerate and a list
comprehension:

py> lst = [0,0,1,1,1,2,2,3,3,3,2,2,2,4,4,4,5]
py> [item for i, item in enumerate(lst) if i == 0 or item != lst[i-1]]
[0, 1, 2, 3, 2, 4, 5]

Similar code that doesn't check 'if i == 0' each time through:

py> itr = enumerate(lst)
py> itr.next()
(0, 0)
py> [lst[0]] + [item for i, item in itr if item != lst[i-1]]
[0, 1, 2, 3, 2, 4, 5]

I don't know if either of these is really more elegant though...

Steve
 
J

Jack Diederich

Hi all,

I have a list of items that has contiguous repetitions of values, but
the number and location of the repetitions is not important, so I just
need to strip them out. For example, if my original list is
[0,0,1,1,1,2,2,3,3,3,2,2,2,4,4,4,5], I want to end up with [0,1,2,3,2,4,5].

Here is the way I'm doing this now:

def straightforward_collapse(myList):
collapsed = [myList[0]]
for n in myList[1:]:
if n != collapsed[-1]:
collapsed.append(n)

return collapsed

Is there an elegant way to do this, or should I just stick with the code
above?
If you are using python2.4,
import itertools as it
[x[0] for (x) in it.groupby([0,0,1,1,1,2,2,3,3,3,2,2,2,4,4,4,5])] [0, 1, 2, 3, 2, 4, 5]

Since this is 2.4 you could also return a generator expression.
.... return (x[0] for (x) in it.groupby([0,0,1,1,1,2,2,3,3,3,2,2,2,4,4,4,5]))
....
i = iter_collapse([0,0,1,1,1,2,2,3,3,3,2,2,2,4,4,4,5])
i
list(i) [0, 1, 2, 3, 2, 4, 5]


-Jack
 
A

Alan McIntyre

Your first example is along the lines of what I was thinking when I said
"elegant." :) I was looking for something that I could drop into one
or two lines of code; I may not do that if I'm writing code that will
have to be maintained, but it's still nice to know how to do it.

Thanks :)
Alan
 
S

Steven Bethard

Alan said:
Hi all,

I have a list of items that has contiguous repetitions of values, but
the number and location of the repetitions is not important, so I just
need to strip them out. For example, if my original list is
[0,0,1,1,1,2,2,3,3,3,2,2,2,4,4,4,5], I want to end up with [0,1,2,3,2,4,5].

Here is the way I'm doing this now:

def straightforward_collapse(myList):
collapsed = [myList[0]]
for n in myList[1:]:
if n != collapsed[-1]:
collapsed.append(n)

return collapsed

Is there an elegant way to do this, or should I just stick with the code
above?

Here's a solution that works for iterables other than lists:

py> def collapse(iterable):
.... enumeration = enumerate(iterable)
.... _, lastitem = enumeration.next()
.... yield lastitem
.... for i, item in enumeration:
.... if item != lastitem:
.... yield item
.... lastitem = item
....
py> lst = [0,0,1,1,1,2,2,3,3,3,2,2,2,4,4,4,5]
py> list(collapse(lst))
[0, 1, 2, 3, 2, 4, 5]

Again, I'm still not sure I'd call this more elegant...

STeVe
 
A

Alan McIntyre

I think you're right; sometimes I'm susceptible to the "I can do that in
one line of code" temptation. :)

Since this current bit of code will probably end up in something that's
going to be maintained, I will probably stick with the straightforward
method just to be nice to the maintainer (especially if it's me!).
 
A

Alan McIntyre

Jack,

I'm not using 2.4 yet; still back in 2.3x. :) Thanks for the examples,
though - they are clear enough that I will probably use them when I upgrade.

Thanks,
Alan

Jack said:
If you are using python2.4,

import itertools as it
[x[0] for (x) in it.groupby([0,0,1,1,1,2,2,3,3,3,2,2,2,4,4,4,5])]

[0, 1, 2, 3, 2, 4, 5]


Since this is 2.4 you could also return a generator expression.


... return (x[0] for (x) in it.groupby([0,0,1,1,1,2,2,3,3,3,2,2,2,4,4,4,5]))
...
i = iter_collapse([0,0,1,1,1,2,2,3,3,3,2,2,2,4,4,4,5])
i

[0, 1, 2, 3, 2, 4, 5]



-Jack
 
M

Mike C. Fletcher

Alan McIntyre wrote:
....
I have a list of items that has contiguous repetitions of values, but
the number and location of the repetitions is not important, so I just
need to strip them out. For example, if my original list is
[0,0,1,1,1,2,2,3,3,3,2,2,2,4,4,4,5], I want to end up with
[0,1,2,3,2,4,5].
....

Is there an elegant way to do this, or should I just stick with the
code above?
.... last = None
.... for value in dataset:
.... if value != last:
.... yield value
.... last = value
....
which is quite readable/elegant IMO.

Have fun,
Mike

________________________________________________
Mike C. Fletcher
Designer, VR Plumber, Coder
http://www.vrplumber.com
http://blog.vrplumber.com
PyCon is coming...
 
S

Steven Bethard

Mike said:
Alan McIntyre wrote:
...
I have a list of items that has contiguous repetitions of values, but
the number and location of the repetitions is not important, so I just
need to strip them out. For example, if my original list is
[0,0,1,1,1,2,2,3,3,3,2,2,2,4,4,4,5], I want to end up with
[0,1,2,3,2,4,5].

...

Is there an elegant way to do this, or should I just stick with the
code above?
def changes( dataset ):
... last = None
... for value in dataset:
... if value != last:
... yield value
... last = value
... >>> print list(changes(data ))

which is quite readable/elegant IMO.

But fails if the list starts with None:

py> lst = [None,0,0,1,1,1,2,2,3,3,3,2,2,2,4,4,4,5]
py> def changes(dataset):
.... last = None
.... for value in dataset:
.... if value != last:
.... yield value
.... last = value
....
py> list(changes(lst))
[0, 1, 2, 3, 2, 4, 5]

A minor modification that does appear to work:

py> def changes(dataset):
.... last = object()
.... for value in dataset:
.... if value != last:
.... yield value
.... last = value
....
py> list(changes(lst))
[None, 0, 1, 2, 3, 2, 4, 5]

STeVe
 
J

John J. Lee

Steven Bethard said:
Mike C. Fletcher wrote: [...]
... last = None
... for value in dataset:
... if value != last:
... yield value
... last = value
... >>> print list(changes(data ))
which is quite readable/elegant IMO.

But fails if the list starts with None:

py> lst = [None,0,0,1,1,1,2,2,3,3,3,2,2,2,4,4,4,5]
py> def changes(dataset):
... last = None
... for value in dataset:
... if value != last:
... yield value
... last = value
...
py> list(changes(lst))
[0, 1, 2, 3, 2, 4, 5]

A minor modification that does appear to work:

py> def changes(dataset):
... last = object()
... for value in dataset:
... if value != last:
... yield value
... last = value
...
py> list(changes(lst))
[None, 0, 1, 2, 3, 2, 4, 5]

Unless the first object in the list has a weird __cmp__ (does
happen...). OK, weird __cmp__s are nasty anyway, but still, why
compound it through cleverness when you can write a really plodding
function that *always* does what it says on the tin?

clever-is-evil-ly y'rs,


John
 
N

Nick Coghlan

John said:
Mike C. Fletcher wrote:
[...]
def changes( dataset ):
... last = None
... for value in dataset:
... if value != last:
... yield value
... last = value
... >>> print list(changes(data ))
which is quite readable/elegant IMO.

But fails if the list starts with None:

py> lst = [None,0,0,1,1,1,2,2,3,3,3,2,2,2,4,4,4,5]
py> def changes(dataset):
... last = None
... for value in dataset:
... if value != last:
... yield value
... last = value
...
py> list(changes(lst))
[0, 1, 2, 3, 2, 4, 5]

A minor modification that does appear to work:

py> def changes(dataset):
... last = object()
... for value in dataset:
... if value != last:
... yield value
... last = value
...
py> list(changes(lst))
[None, 0, 1, 2, 3, 2, 4, 5]


Unless the first object in the list has a weird __cmp__ (does
happen...). OK, weird __cmp__s are nasty anyway, but still, why
compound it through cleverness when you can write a really plodding
function that *always* does what it says on the tin?

In that case:

Py> def collapsed(iterable):
.... itr = iter(iterable)
.... last = itr.next()
.... yield last
.... for value in itr:
.... if value != last:
.... yield value
.... last = value
....
Py> from Tkinter import _flatten #**
Py> lst = list(_flatten([ * 5 for i in range(10)]))
Py> lst
[0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 5, 5
, 5, 5, 5, 6, 6, 6, 6, 6, 7, 7, 7, 7, 7, 8, 8, 8, 8, 8, 9, 9, 9, 9, 9]
Py> list(collapsed(lst))
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

** Hmm, this function really is quite handy when generating test data. Maybe in
Python 2.5 the line should be spelt "from itertools import flatten"

Cheers,
Nick.
 
N

Nick Coghlan

Jack said:
Since this is 2.4 you could also return a generator expression.


... return (x[0] for (x) in it.groupby([0,0,1,1,1,2,2,3,3,3,2,2,2,4,4,4,5]))
...

But why write a function that returns a generator expression, when you could
just turn the function itself into a generator?

Py>def iter_collapse(myList):
.... for x in it.groupby(myList):
.... yield x[0]

Cheers,
Nick.
 
A

Alex Martelli

Steven Bethard said:
Here's a solution that works for iterables other than lists:

py> def collapse(iterable):
... enumeration = enumerate(iterable)
... _, lastitem = enumeration.next()
... yield lastitem
... for i, item in enumeration:
... if item != lastitem:
... yield item
... lastitem = item
...
py> lst = [0,0,1,1,1,2,2,3,3,3,2,2,2,4,4,4,5]
py> list(collapse(lst))
[0, 1, 2, 3, 2, 4, 5]

Again, I'm still not sure I'd call this more elegant...

Hmmmm, what role does the enumeration play here? I don't see how you're
using it, at all. Why not just:

def collapse(iterable):
it = iter(iterable)
lastitem = it.next()
yield lastitem
for item in it:
if item != lastitem:
yield item
lastitem = item

that's basically just the same as your code but without the strangeness
of making an enumerate for the purpose of ignoring what the enumerate
adds to an ordinary iterator.


Alex
 
T

Tony

Alan said:
Hi all,

I have a list of items that has contiguous repetitions of values, but
the number and location of the repetitions is not important, so I just
need to strip them out. For example, if my original list is
[0,0,1,1,1,2,2,3,3,3,2,2,2,4,4,4,5], I want to end up with [0,1,2,3,2,4,5].

Here is the way I'm doing this now:

def straightforward_collapse(myList):
collapsed = [myList[0]]
for n in myList[1:]:
if n != collapsed[-1]:
collapsed.append(n)

return collapsed

Is there an elegant way to do this, or should I just stick with the code
above?

Thanks,
Alan McIntyre
http://www.esrgtech.com

Here is a version using dictionary properties, ie no duplication of
keys.

def condense(l):
d={}
for item in l:
d[item]=1
l=d.keys()
return l

Cheers
Tony
 
A

Alan McIntyre

Tony,

Actually I only want to remove a certain kind of duplication; if an item
occurs twice - say like this: [1,1,1,2,2,2,1,1,1], then I need to keep
the order and occurrence of the individual values: [1,2,1]. Using a
dict as you proposed loses the order of occurrence, as well as multiple
occurrences of groups of the same item.

If I didn't need those two qualities of the list to be preserved,
though, I think I'd use something like your solution (if I was using a
Python older than 2.3) or Steve Coats' solution posted above using Set.

Thanks!
Alan
 
A

Alan McIntyre

Alex,

Wow, that method turns out to be the fastest so far in a simple
benchmark on Python2.3 (on my machine, of course, YMMV); it takes 14%
less time than the one that I deemed most straightforward. :)

Thanks,
Alan
 
J

Jack Diederich

Jack said:
Since this is 2.4 you could also return a generator expression.

def iter_collapse(myList):

... return (x[0] for (x) in
it.groupby([0,0,1,1,1,2,2,3,3,3,2,2,2,4,4,4,5]))
...

But why write a function that returns a generator expression, when you
could just turn the function itself into a generator?

Py>def iter_collapse(myList):
... for x in it.groupby(myList):
... yield x[0]
Here is where I say, "because it is faster!"
except it isn't. maybe. The results are unexpected, anyway.

import timeit
import itertools as it

def collapse1(myl):
for x in it.groupby(myl):
yield x[0]

def collapse2(myl):
return (x[0] for (x) in it.groupby(myl))

list_str = '[0,0,1,1,1,2,2,3,3,3,2,2,2,4,4,4,5]'

print "collapse1", timeit.Timer('collapse1(%s)' % (list_str), 'from __main__ import collapse1').timeit()
print "collapse2", timeit.Timer('collapse2(%s)' % (list_str), 'from __main__ import collapse2').timeit()
print "list(collapse1)", timeit.Timer('list(collapse1(%s))' % (list_str), 'from __main__ import collapse1').timeit()
print "list(collapse2)", timeit.Timer('list(collapse2(%s))' % (list_str), 'from __main__ import collapse2').timeit()

collapse1 1.06855082512
collapse2 3.40627384186
list(collapse1) 8.31489896774
list(collapse2) 9.49903011322

The overhead of creating the generator expression seems much higher than
creating the equivalent function. If you subtract our the setup difference
actually running through the whole iterator is slightly faster for genexps
than functions that yield. At a guess it has something to do with how
they handle lexical scope and some byte code differences.

I said guess, right?

-Jack
 
T

Tony

Alan said:
Tony,

Actually I only want to remove a certain kind of duplication; <snip>

How about this one liner?
def condense(m):
print [m[0]]+[m[k] for k in range(1,len(m)) if
m[k]!=m[k-1]]
b=[1,1,1,2,2,2,1,1,1]
condense(b)
Tony Clarke
 
S

Steven Bethard

Alex said:
Here's a solution that works for iterables other than lists:

py> def collapse(iterable):
... enumeration = enumerate(iterable)
... _, lastitem = enumeration.next()
... yield lastitem
... for i, item in enumeration:
... if item != lastitem:
... yield item
... lastitem = item
...
py> lst = [0,0,1,1,1,2,2,3,3,3,2,2,2,4,4,4,5]
py> list(collapse(lst))
[0, 1, 2, 3, 2, 4, 5]

Again, I'm still not sure I'd call this more elegant...


Hmmmm, what role does the enumeration play here?

None =) Sorry, it was an accidental carry-over from the list solution
that looked at lst[i-1]. I thought about correcting it when I realized
this (after posting), but I was too lazy. ;)

Steve
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,434
Messages
2,571,691
Members
48,796
Latest member
Greg L.

Latest Threads

Top