"Collapsing" a list into a list of changes

Alan McIntyre · Feb 6, 2005

Thanks to everybody that responded; I appreciate all the input, even if
I didn't respond to all of it individually.

Francis Girard · Feb 6, 2005

This is a prefect use case for the good old "reduce" function:

--BEGIN SNAP

a_lst = [None,0,0,1,1,1,2,2,3,3,3,2,2,2,4,4,4,5]

def straightforward_collapse(lst):
return reduce(lambda v,e: v[-1]!=e and v+[e] or v, lst[1:], [lst[0]])

def straightforward_collapse_secu(lst):
return lst and reduce(lambda v,e: v[-1]!=e and v+[e] or v, lst[1:],
[lst[0]]) or []

print straightforward_collapse(a_lst)
print straightforward_collapse_secu([])

--END SNAP

Regards

Francis Girard

Le vendredi 4 Février 2005 20:08, Steven Bethard a écrit :

Mike said:
Mike said:

Alan McIntyre wrote:
...

I have a list of items that has contiguous repetitions of values, but
the number and location of the repetitions is not important, so I just
need to strip them out. For example, if my original list is
[0,0,1,1,1,2,2,3,3,3,2,2,2,4,4,4,5], I want to end up with
[0,1,2,3,2,4,5].
...

Is there an elegant way to do this, or should I just stick with the
code above?

def changes( dataset ):

Click to expand...

... last = None
... for value in dataset:
... if value != last:
... yield value
... last = value
... >>> print list(changes(data ))

which is quite readable/elegant IMO.

Click to expand...

But fails if the list starts with None:

py> lst = [None,0,0,1,1,1,2,2,3,3,3,2,2,2,4,4,4,5]
py> def changes(dataset):
... last = None
... for value in dataset:
... if value != last:
... yield value
... last = value
...
py> list(changes(lst))
[0, 1, 2, 3, 2, 4, 5]

A minor modification that does appear to work:

py> def changes(dataset):
... last = object()
... for value in dataset:
... if value != last:
... yield value
... last = value
...
py> list(changes(lst))
[None, 0, 1, 2, 3, 2, 4, 5]

STeVe

Peter Otten · Feb 7, 2005

Francis said:
This is a prefect use case for the good old "reduce" function:

--BEGIN SNAP

a_lst = [None,0,0,1,1,1,2,2,3,3,3,2,2,2,4,4,4,5]

def straightforward_collapse(lst):
return reduce(lambda v,e: v[-1]!=e and v+[e] or v, lst[1:], [lst[0]])

reduce() magically increases the number of function calls by len(a_list).
There will be no good use cases for reduce() until Python can inline
functions.

Apodictically Yours
Peter

Adam Przybyla · Feb 7, 2005

Alan McIntyre said:
Hi all,

I have a list of items that has contiguous repetitions of values, but
the number and location of the repetitions is not important, so I just
need to strip them out. For example, if my original list is
[0,0,1,1,1,2,2,3,3,3,2,2,2,4,4,4,5], I want to end up with [0,1,2,3,2,4,5].

Here is the way I'm doing this now:

def straightforward_collapse(myList):
collapsed = [myList[0]]
for n in myList[1:]:
if n != collapsed[-1]:
collapsed.append(n)

return collapsed

Is there an elegant way to do this, or should I just stick with the code
above?

p=[1,1,1,1,1,4,4,4,8,8,9]
filter(lambda y: y>0, map(lambda x,y: x==y and -1 or x,[0]+p,p+[0])) [1, 4, 8, 9]

Click to expand...

Click to expand...

Z powazaniem
Adam Przybyla

Francis Girard · Feb 7, 2005

Zut !

I'm very sorry that there is no good use case for the "reduce" function in
Python, like Peter Otten pretends. That's an otherwise very useful tool for
many use cases. At least on paper.

Python documentation should say "There is no good use case for the reduce
function in Python and we don't know why we bother you offering it."

Francis Girard

Le lundi 7 Février 2005 08:24, Peter Otten a écrit :

Francis said:
Francis said:

This is a prefect use case for the good old "reduce" function:

--BEGIN SNAP

a_lst = [None,0,0,1,1,1,2,2,3,3,3,2,2,2,4,4,4,5]

def straightforward_collapse(lst):
return reduce(lambda v,e: v[-1]!=e and v+[e] or v, lst[1:], [lst[0]])

Click to expand...

reduce() magically increases the number of function calls by len(a_list).
There will be no good use cases for reduce() until Python can inline
functions.

Apodictically Yours
Peter

Peter Otten · Feb 7, 2005

Francis said:
Python documentation should say "There is no good use case for the reduce
function in Python and we don't know why we bother you offering it."

Submit a patch

Peter

Steven Bethard · Feb 7, 2005

Francis said:
I'm very sorry that there is no good use case for the "reduce" function in
Python, like Peter Otten pretends. That's an otherwise very useful tool for
many use cases. At least on paper.

Clarity aside[1], can you give an example of where reduce is as
efficient as the eqivalent loop in Python?

Steve

[1] I generally find most reduce solutions much harder to read, but
we'll set aside my personal limitations for the moment.

John Lenton · Feb 7, 2005

Zut !

I'm very sorry that there is no good use case for the "reduce" function in
Python, like Peter Otten pretends. That's an otherwise very useful tool for
many use cases. At least on paper.

Python documentation should say "There is no good use case for the reduce
function in Python and we don't know why we bother you offering it."

I am guessing you are joking, right? I think Peter exaggerates when he
says that "there will be no good use cases for reduce"; it is very
useful, in writing very compact code when it does exactly what you
want (and you have to go through hoops to do it otherwise). It also
can be the fastest way to do something. For example, the fastest way
to get the factorial of a (small enough) number in pure python is

factorial = lambda n: reduce(operator.mul, range(1, n+1))

--
John Lenton ([email protected]) -- Random fortune:
Laugh and the world thinks you're an idiot.

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (GNU/Linux)

iD8DBQFCB7i6gPqu395ykGsRAvzgAKCnnxlkuwfR5sHRDd8zCimRTFi5rQCgoCwB
oRkSfburxlGnGrbQbhCZpho=
=2pvS
-----END PGP SIGNATURE-----

Francis Girard · Feb 7, 2005

No. Just wanted to naively use one of the base tool Python had to offer. I
tought it was very usable and very readable. I might be wrong.

Is there someone on this list using this tool and happy with it ? Or is my
mind too much targeted on FP paradigm and most of you really think that all
the functions that apply another function to each and every elements of a
list are bad (like "reduce", "map", "filter") ?

Francis Girard

Le lundi 7 Février 2005 19:25, Steven Bethard a écrit :

Francis said:
Francis said:

I'm very sorry that there is no good use case for the "reduce" function
in Python, like Peter Otten pretends. That's an otherwise very useful
tool for many use cases. At least on paper.

Click to expand...

Clarity aside[1], can you give an example of where reduce is as
efficient as the eqivalent loop in Python?

Steve

[1] I generally find most reduce solutions much harder to read, but
we'll set aside my personal limitations for the moment.

Steven Bethard · Feb 7, 2005

Francis said:
Is there someone on this list using this tool and happy with it ? Or is my
mind too much targeted on FP paradigm and most of you really think that all
the functions that apply another function to each and every elements of a
list are bad (like "reduce", "map", "filter") ?

I know there are definitely proponents for map and filter, especially
for simple cases like:

map(int, lst)
filter(str.strip, lst)

Note that, unlike reduce, map and filter aren't really going to increase
the number of function calls. Consider the equivalent list comprehensions:

[int(x) for x in lst]
[x for x in lst if str.strip(x)] [1]

The list comprehensions also require the same number of function calls
in these cases. Of course, in cases where a function does not already
exist, map and filter will require more function calls. Compare:

map(lambda x: x**2 + 1, lst)

with

[x**2 + 1 for x in lst]

Where the LC allows you to essentially "inline" the function. (You can
dis.dis these to see the difference if you like.)

As far as my personal preferences go, while the simple cases of map and
filter (the ones using existing functions) are certainly easy enough for
me to read, for more complicated cases, I find things like:

[x**2 + 1 for x in lst]
[x for x in lst if (x**2 + 1) % 3 == 1]

much more readable than the corresponding:

map(lambda x: x**2 + 1, lst)
filter(lambda x: (x**2 + 1) % 3 == 1, lst)

especially since I avoid lambda usage, and would have to write these as:

def f1(x):
return x**2 + 1
map(f1, lst)

def f2(x):
return (x**2 + 1) % 3 == 1
map(f2, lst)

(I actually find the non-lambda code clearer, but still more complicated
than the list comprehensions.)

Given that I use list comprehensions for the complicated cases, I find
it to be more consistent if I use list comprehensions in all cases, so I
even tend to avoid the simple map and filter cases.

As far as reduce goes, I've never seen code that I thought was clearer
using reduce than using a for-loop. I have some experience with FP, and
I can certainly figure out what a given reduce call is doing given
enough time, but I can always understand the for-loop version much more
quickly.

Of course, YMMV.

STeVe

[1] While it's not technically equivalent, this would almost certainly
be written as:
[x for x in lst if x.strip()]
which in fact takes better advantage of Python's duck-typing -- it will
work for unicode objects too.

Steven Bethard · Feb 7, 2005

John said:
For example, the fastest way
to get the factorial of a (small enough) number in pure python is

factorial = lambda n: reduce(operator.mul, range(1, n+1))

Gah! I'll never understand why people use lambda, which is intended to
create _anonymous_ functions, to create named functions! This code is
basically equivalent to:

def factorial(n):
return reduce(operator.mul, range(1, n+1))

except that factorial.__name__ is now the meaningful 'factorial',
instead of '<lambda>'. See "Inappropriate use of Lambda" in
http://www.python.org/moin/DubiousPython

My personal beef with the inappropriate use of lambda aside, and
ignoring the fact that the reduce solution as written fails for
factorial(0), some timings are in order. Given fact.py:

----------------------------------------------------------------------
import operator

def f1(n):
result = 1
for x in xrange(1, n+1):
result *= x
return result

def f2(n):
return reduce(operator.mul, range(1, n+1))

def f3(n):
return reduce(operator.mul, xrange(1, n+1))
----------------------------------------------------------------------

I get the following timeit results (with Python 2.4):

$ python -m timeit -s "import fact" "[fact.f1(x) for x in xrange(1, 100)]"
100 loops, best of 3: 2.53 msec per loop

$ python -m timeit -s "import fact" "[fact.f2(x) for x in xrange(1, 100)]"
100 loops, best of 3: 2.46 msec per loop

$ python -m timeit -s "import fact" "[fact.f3(x) for x in xrange(1, 100)]"
100 loops, best of 3: 2.17 msec per loop

So it does look like reduce is actually faster in this case, especially
if you use xrange instead of range.

Thanks for the example!

Steve

Francis Girard · Feb 7, 2005

Le lundi 7 FÃ©vrier 2005 19:51, John Lenton a Ã©critÂ :

I am guessing you are joking, right?

Of course I am joking. I meant the exact contrary. Only wanted to show that
Peter did exaggerate. Thank you for citing me with the full context.

I think Peter exaggerates when he
says that "there will be no good use cases for reduce"; it is very
useful, in writing very compact code when it does exactly what you
want (and you have to go through hoops to do it otherwise). It also
can be the fastest way to do something. For example, the fastest way
to get the factorial of a (small enough) number in pure python is

factorial = lambda n: reduce(operator.mul, range(1, n+1))

Great.

Regards

Francis Girard

Francis Girard · Feb 7, 2005

Le lundi 7 Février 2005 20:30, Steven Bethard a écrit :

Francis said:
Francis said:

Is there someone on this list using this tool and happy with it ? Or is
my mind too much targeted on FP paradigm and most of you really think
that all the functions that apply another function to each and every
elements of a list are bad (like "reduce", "map", "filter") ?

Click to expand...

I know there are definitely proponents for map and filter, especially
for simple cases like:

map(int, lst)
filter(str.strip, lst)

Note that, unlike reduce, map and filter aren't really going to increase
the number of function calls. Consider the equivalent list comprehensions:

[int(x) for x in lst]
[x for x in lst if str.strip(x)] [1]

The list comprehensions also require the same number of function calls
in these cases. Of course, in cases where a function does not already
exist, map and filter will require more function calls. Compare:

map(lambda x: x**2 + 1, lst)

with

[x**2 + 1 for x in lst]

Where the LC allows you to essentially "inline" the function. (You can
dis.dis these to see the difference if you like.)

I see.

As far as my personal preferences go, while the simple cases of map and
filter (the ones using existing functions) are certainly easy enough for
me to read, for more complicated cases, I find things like:

[x**2 + 1 for x in lst]
[x for x in lst if (x**2 + 1) % 3 == 1]

much more readable than the corresponding:

map(lambda x: x**2 + 1, lst)
filter(lambda x: (x**2 + 1) % 3 == 1, lst)

I agree.

especially since I avoid lambda usage, and would have to write these as:

Why avoid "lambda" usage ? You find them too difficult to read (I mean in
general) ?

def f1(x):
return x**2 + 1
map(f1, lst)

def f2(x):
return (x**2 + 1) % 3 == 1
map(f2, lst)

(I actually find the non-lambda code clearer, but still more complicated
than the list comprehensions.)

Given that I use list comprehensions for the complicated cases, I find
it to be more consistent if I use list comprehensions in all cases, so I
even tend to avoid the simple map and filter cases.

As far as reduce goes, I've never seen code that I thought was clearer
using reduce than using a for-loop. I have some experience with FP, and
I can certainly figure out what a given reduce call is doing given
enough time, but I can always understand the for-loop version much more
quickly.

I think it's a question of habits. But I agree that we should always go with
code we find easy to read and that we think others find the same.
I think I will stop using that function in Python if Python practionners find
it difficult to read in general. There is no point in being cool just for
being cool.

Thank you
Francis Girard

Of course, YMMV.

STeVe

[1] While it's not technically equivalent, this would almost certainly
be written as:
[x for x in lst if x.strip()]
which in fact takes better advantage of Python's duck-typing -- it will
work for unicode objects too.

Steven Bethard · Feb 7, 2005

Francis said:
Le lundi 7 Février 2005 20:30, Steven Bethard a écrit :

Why avoid "lambda" usage ? You find them too difficult to read (I mean in
general) ?

Yup, basically a readability thing. I also tend to find that if I
actually declare the function, I can often find a way to refactor things
to make that function useful in more than one place.

Additionally, 'lambda' is on Guido's regrets list, so I'm avoiding it's
use in case it gets yanked for Python 3.0. I think most code can be
written now without it, and in most cases code so written is clearer.
Probably worth looking at is a thread I started that went through some
stdlib uses of lambda and how they could be rewritten:

http://mail.python.org/pipermail/python-list/2004-December/257990.html

Many were rewritable with def statements, list comprehensions, the
operator module functions, or unbound or bound methods.

Steve

Peter Otten · Feb 7, 2005

John said:
I am guessing you are joking, right? I think Peter exaggerates when he
says that "there will be no good use cases for reduce"; it is very
useful, in writing very compact code when it does exactly what you
want (and you have to go through hoops to do it otherwise). It also
can be the fastest way to do something. For example, the fastest way
to get the factorial of a (small enough) number in pure python is

factorial = lambda n: reduce(operator.mul, range(1, n+1))

I see that you are seduced by the beauty of the expression. Otherwise, if
you would really care for speed:

$ cat factorial.py

#ugly but efficient
factorial = [1]
for i in range(1, 50):
factorial.append(i*factorial[-1])

import operator
def factorial_reduce(n):
return reduce(operator.mul, xrange(1, n+1), 1)

if __name__ == "__main__":
assert factorial[0] == factorial_reduce(0) == 1
assert factorial[5] == factorial_reduce(5) == 1*2*3*4*5

$ python2.4 -m timeit -s'import factorial as f' 'f.factorial_reduce(30)'
100000 loops, best of 3: 13.3 usec per loop
$ python2.4 -m timeit -s'import factorial as f' 'f.factorial[30]'
1000000 loops, best of 3: 0.328 usec per loop

Having factorial raise a UseStirlingsFormulaError for large values is left
as an exercise to the reader

Peter

John Lenton · Feb 7, 2005

John said:
John said:

For example, the fastest way
to get the factorial of a (small enough) number in pure python is

factorial = lambda n: reduce(operator.mul, range(1, n+1))

Click to expand...

I see that you are seduced by the beauty of the expression. Otherwise, if
you would really care for speed:

[...]

that's cheating: you moved the calculation into the setup. You aren't
timing what you say you are.

--
John Lenton ([email protected]) -- Random fortune:
I'd rather push my Harley than ride a rice burner.

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (GNU/Linux)

iD8DBQFCB9WegPqu395ykGsRAmQFAJ96qzLBiM+/DmYt7+Snpf2jGYoNYACcD70U
DkQ/nRGUFFex22oWpTQ7zpc=
=zvS1
-----END PGP SIGNATURE-----

Francis Girard · Feb 7, 2005

Le lundi 7 Février 2005 21:21, Steven Bethard a écrit :

Yup, basically a readability thing. I also tend to find that if I
actually declare the function, I can often find a way to refactor things
to make that function useful in more than one place.

Additionally, 'lambda' is on Guido's regrets list, so I'm avoiding it's
use in case it gets yanked for Python 3.0. I think most code can be
written now without it, and in most cases code so written is clearer.
Probably worth looking at is a thread I started that went through some
stdlib uses of lambda and how they could be rewritten:

http://mail.python.org/pipermail/python-list/2004-December/257990.html

Many were rewritable with def statements, list comprehensions, the
operator module functions, or unbound or bound methods.

Steve

I see. I personnaly use them frequently to bind an argument of a function with
some fixed value. Modifying one of the example in
http://mail.python.org/pipermail/python-list/2004-December/257990.html
I frequently have something like :

SimpleXMLRPCServer.py: server.register_1arg-place_function(lambda x: x+2,
'add2')

If "Guido" don't like lambdas, he would have to give me some way to easily do
this. Something like (or similar to fit Python syntax) :

SimpleXMLRPCServer.py: server.register_1arg-place_function(\+2, 'add2')

This would be great.

Regards

Francis Girard

server.register_function(operator.add, 'add')

Steven Bethard · Feb 7, 2005

Francis said:
I see. I personnaly use them frequently to bind an argument of a function with
some fixed value. Modifying one of the example in
http://mail.python.org/pipermail/python-list/2004-December/257990.html
I frequently have something like :

SimpleXMLRPCServer.py: server.register_1arg-place_function(lambda x: x+2,
'add2')

PEP 309 has already been accepted, so I assume it will appear in Python 2.5:

http://www.python.org/peps/pep-0309.html

Your code could be written as:

server.register_1arg-place_function(partial(operator.add, 2))

If your task is actually what you describe above -- "to bind an argument
of a function with some fixed value"[1] -- then in general, you should
be able to write this as[2]:

function_with_fixed_value = partial(function, fixed_value)

Steve

[1] As opposed to binding a name to be used in an _expression_ as you do
in your example.

[2] The partial function can also be used to fix multiple argument
values and keyword argument values, if that's necessary for your purposes.

Peter Otten · Feb 7, 2005

John said:
John said:

For example, the fastest way
to get the factorial of a (small enough) number in pure python is

factorial = lambda n: reduce(operator.mul, range(1, n+1))

Click to expand...

I see that you are seduced by the beauty of the expression. Otherwise, if
you would really care for speed:

[...]

Click to expand...

that's cheating: you moved the calculation into the setup. You aren't
timing what you say you are.

*You* are cheating when you take a predefined function implemented in C
(operator.mul) and then claim you are using pure Python. This gives you the
extra 50 percent that overcompensate the inefficiency of reduce() -- but
only by a hair's breadth:

$ python2.4 -m timeit -s'mul = lambda x, y: x * y' -s'a=1' 'mul(a, a)'
1000000 loops, best of 3: 0.522 usec per loop
$ python2.4 -m timeit -s'from operator import mul' -s'a=1' 'mul(a, a)'
1000000 loops, best of 3: 0.345 usec per loop

Peter

Francis Girard · Feb 7, 2005

Le lundi 7 Février 2005 22:53, Steven Bethard a écrit :

Francis said:
Francis said:

I see. I personnaly use them frequently to bind an argument of a function
with some fixed value. Modifying one of the example in
http://mail.python.org/pipermail/python-list/2004-December/257990.html
I frequently have something like :

SimpleXMLRPCServer.py: server.register_1arg-place_function(lambda x:
x+2, 'add2')

Click to expand...

PEP 309 has already been accepted, so I assume it will appear in Python
2.5:

http://www.python.org/peps/pep-0309.html

Your code could be written as:

server.register_1arg-place_function(partial(operator.add, 2))

If your task is actually what you describe above -- "to bind an argument
of a function with some fixed value"[1] -- then in general, you should
be able to write this as[2]:

function_with_fixed_value = partial(function, fixed_value)

Great ! Great ! Great !
Really, Python is going in the right direction all of the time !
Very good news.

Regards

Francis Girard

Steve

[1] As opposed to binding a name to be used in an _expression_ as you do
in your example.

[2] The partial function can also be used to fix multiple argument
values and keyword argument values, if that's necessary for your purposes.

Problem with Collapsing List Script	0	Nov 20, 2007
is list comprehension necessary?	15	Oct 26, 2010
groupby - summing multiple columns in a list of lists	1	May 17, 2011
make sublists of a list broken at nth certain list items	2	Jul 8, 2013
How to try a range of hex values in C# code ?	0	Nov 19, 2022
Engineering a list container. Part 1.	71	Dec 7, 2013
can't generate iterator from list	3	Sep 10, 2011
Dangerous behavior of list(generator)	0	Dec 13, 2009

"Collapsing" a list into a list of changes

Alan McIntyre

Francis Girard

Peter Otten

Adam Przybyla

Francis Girard

Peter Otten

Steven Bethard

John Lenton

Francis Girard

Steven Bethard

Steven Bethard

Francis Girard

Francis Girard

Steven Bethard

Peter Otten

John Lenton

Francis Girard

Steven Bethard

Peter Otten

Francis Girard

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads