reduce() anomaly?

  • Thread starter Stephen C. Waterbury
  • Start date
S

Stephen C. Waterbury

This seems like it ought to work, according to the
description of reduce(), but it doesn't. Is this
a bug, or am I missing something?

Python 2.3.2 (#1, Oct 20 2003, 01:04:35)
[GCC 3.2.2 20030222 (Red Hat Linux 3.2.2-5)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> d1 = {'a':1}
>>> d2 = {'b':2}
>>> d3 = {'c':3}
>>> l = [d1, d2, d3]
>>> d4 = reduce(lambda x, y: x.update(y), l)
Traceback (most recent call last):
File "<stdin>", line 1, in ?
Traceback (most recent call last):
File "<stdin>", line 1, in ?
File "<stdin>", line 1, in <lambda>
AttributeError: 'NoneType' object has no attribute 'update'

- Steve.
 
A

Alex Martelli

Stephen said:
This seems like it ought to work, according to the
description of reduce(), but it doesn't. Is this
a bug, or am I missing something?

the latter.
Python 2.3.2 (#1, Oct 20 2003, 01:04:35)
[GCC 3.2.2 20030222 (Red Hat Linux 3.2.2-5)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
d1 = {'a':1}
d2 = {'b':2}
d3 = {'c':3}
l = [d1, d2, d3]
d4 = reduce(lambda x, y: x.update(y), l)

the update method returns None.
Traceback (most recent call last):
File "<stdin>", line 1, in ?
File "<stdin>", line 1, in <lambda>
AttributeError: 'NoneType' object has no attribute 'update'
right.

Traceback (most recent call last):
File "<stdin>", line 1, in ?
File "<stdin>", line 1, in <lambda>
AttributeError: 'NoneType' object has no attribute 'update'

same issue.

If you want to abuse reduce at all costs for this purpose,
reduce(lambda x, y: x.update(y) or x, l) might work.


Alex
 
D

David C. Fox

Stephen said:
This seems like it ought to work, according to the
description of reduce(), but it doesn't. Is this
a bug, or am I missing something?

Python 2.3.2 (#1, Oct 20 2003, 01:04:35)
[GCC 3.2.2 20030222 (Red Hat Linux 3.2.2-5)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
d1 = {'a':1}
d2 = {'b':2}
d3 = {'c':3}
l = [d1, d2, d3]
d4 = reduce(lambda x, y: x.update(y), l)
Traceback (most recent call last):
File "<stdin>", line 1, in ?
Traceback (most recent call last):
File "<stdin>", line 1, in ?
File "<stdin>", line 1, in <lambda>
AttributeError: 'NoneType' object has no attribute 'update'

- Steve.

The update method updates the dictionary in place, but returns None.
Thus, after the first call to x.update(y), reduce is trying to call
x.update(y) with x equal to None. Hence the error.

Alternatives which work include

def rupdate(d, other):
d.update(other)
return d

reduce(rupdate, l)

and

d = {}
map(lambda x: d.update(x), l)

David
 
P

Peter Otten

Stephen said:
This seems like it ought to work, according to the
description of reduce(), but it doesn't. Is this
a bug, or am I missing something?

Python 2.3.2 (#1, Oct 20 2003, 01:04:35)
[GCC 3.2.2 20030222 (Red Hat Linux 3.2.2-5)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
d1 = {'a':1}
d2 = {'b':2}
d3 = {'c':3}
l = [d1, d2, d3]
d4 = reduce(lambda x, y: x.update(y), l)
Traceback (most recent call last):
File "<stdin>", line 1, in ?
Traceback (most recent call last):
File "<stdin>", line 1, in ?
File "<stdin>", line 1, in <lambda>
AttributeError: 'NoneType' object has no attribute 'update'

- Steve.

No bug, your lambda evaluates d1.update(d2) on the first call and then
returns the result of the update() method which is None. So on the secend
call None.update(d3) fails. Here's what you might have intended:
d1 = {'a':1}
d2 = {'b':2}
d3 = {'c':3}
l = [d1, d2, d3]
d4 = reduce(lambda x, y: x.update(y) or x, l)
d4
{'a': 1, 'c': 3, 'b': 2}

Note the side effect on d1
{'a': 1, 'c': 3, 'b': 2}

if you don't provide an initial dictionary.

Peter
 
F

Francis Avila

Alex Martelli said:
Stephen C. Waterbury wrote:
If you want to abuse reduce at all costs for this purpose,
reduce(lambda x, y: x.update(y) or x, l) might work.

Just out of curiosity, for what kind of problems do we find reduce to just
be the Right Way? I mean, summing a big list of numbers is fun and all, but
I never find any use for it otherwise. I *often* try to think if reduce
would be useful when I come across some collection-of-values manipulation
task, but usually one wants to repeat an operation on a whole set of values,
not reduce the set to one value.

So, are there any *non-trivial* and *unabusive* uses of reduce, where any
other way would be tedious, repetitive, slow, unclear, etc.? I'm very
curious to see them.

reduce--the black sheep of the functional-Python herd?
 
E

Erik Max Francis

Francis said:
Just out of curiosity, for what kind of problems do we find reduce to
just
be the Right Way? I mean, summing a big list of numbers is fun and
all, but
I never find any use for it otherwise. I *often* try to think if
reduce
would be useful when I come across some collection-of-values
manipulation
task, but usually one wants to repeat an operation on a whole set of
values,
not reduce the set to one value.

I don't use reduce extremely routinely, but I certainly do find myself
using it. Grepping through my Python sources, the most common uses I
find are summing together lists of numeric values and joining together
(flattening) a list of lists (though only when the contained lists
themselves do not contain any sequences as elements).
So, are there any *non-trivial* and *unabusive* uses of reduce, where
any
other way would be tedious, repetitive, slow, unclear, etc.? I'm very
curious to see them.

My cellular automaton engine CAGE uses reduce several times, although
admittedly this use is academic (obviously a good way to implement a
ReductionRule is almost by definition with a reduce operation :). Even
then, there are times when, say, you want to get the sum of the states
of the cells in the neighborhood, and "neighborhood" is defined in a
sufficiently generic way that the states of the neighborhood are just a
list of state values.

My Solar System calculator BOTEC has a similar application; when
plotting courses, you can have an arbitrary number of transfers, and
those transfers can each have an arbitrary number of burns. So it's
quite convenient to do a reduce on the list of burns (a sequence of
sequences), and then a reduce on the list of deltavees for the transfers
(a sequence of numerics).
 
A

Alex Martelli

Erik Max Francis wrote:
...
...
I don't use reduce extremely routinely, but I certainly do find myself
using it. Grepping through my Python sources, the most common uses I
find are summing together lists of numeric values and joining together

In Python 2.3, we have sum for that (much faster, too).
(flattening) a list of lists (though only when the contained lists
themselves do not contain any sequences as elements).

_UN_fortunately sums works for that too -- but almost as slowly as reduce.
E.g., on a small example:

[alex@lancelot bo]$ timeit.py -c -s'lol=[range(20)]*20'
'reduce(list.__add__, lol, [])'
10000 loops, best of 3: 91 usec per loop

[alex@lancelot bo]$ timeit.py -c -s'lol=[range(20)]*20' -s'import operator'
'reduce(operator.add, lol, [])'
10000 loops, best of 3: 88 usec per loop

[alex@lancelot bo]$ timeit.py -c -s'lol=[range(20)]*20' 'sum(lol, [])'
10000 loops, best of 3: 82 usec per loop

while a simple loop is way faster:

[alex@lancelot bo]$ timeit.py -c -s'lol=[range(20)]*20' 'x=[]' 'for l in
lol: x.extend(l)'
10000 loops, best of 3: 26 usec per loop

and can easily be sped up a little more:

[alex@lancelot bo]$ timeit.py -c -s'lol=[range(20)]*20' 'x=[]' 'xe=x.extend'
'for l in lol: xe(l)'
10000 loops, best of 3: 20 usec per loop


Given the typical use cases of reduce are covered by sum -- and sometimes
even better by simple loops &c -- then I would say that in Python 2.3 and
following reduce should not be used often at all.


Alex
 
A

Alex Martelli

Francis Avila wrote:
...
Just out of curiosity, for what kind of problems do we find reduce to just
be the Right Way? I mean, summing a big list of numbers is fun and all,

And best handled by sum (in Python 2.3).
reduce--the black sheep of the functional-Python herd?

Nope -- apply beats it, given that in the last few years apply(f, args) is
best spelled f(*args) ...!-)


Alex
 
A

anton muhin

Just out of curiosity, for what kind of problems do we find reduce to just
be the Right Way? I mean, summing a big list of numbers is fun and all, but
I never find any use for it otherwise. I *often* try to think if reduce
would be useful when I come across some collection-of-values manipulation
task, but usually one wants to repeat an operation on a whole set of values,
not reduce the set to one value.

So, are there any *non-trivial* and *unabusive* uses of reduce, where any
other way would be tedious, repetitive, slow, unclear, etc.? I'm very
curious to see them.

reduce--the black sheep of the functional-Python herd?
--

IMHO, not. Once I read an article that explains that foldr and
foldl---almost Python's reduce---are most important HOFs. Actually,
functions like map can be easily defined with reduce:

def my_map(func, seq):
return reduce(lambda seq, el: seq + [func(el)], seq, [])

print my_map(lambda x: 2*x, [1, 2, 3, 4, 5])

regards,
anton.
 
J

Jeremy Fincher

Duncan Booth said:
which can be written more concisely without the lambda:

d = {}
map(d.update, l)

Although he shouldn't be using map at all for this case; he's not
using the resulting list.

Just a plan for loop will suffice:

for otherDict in l:
d.update(otherDict)

Jeremy
 
A

Alex Martelli

Jeremy said:
Although he shouldn't be using map at all for this case; he's not
using the resulting list.

Absolutely true.

Just a plan for loop will suffice:

for otherDict in l:
d.update(otherDict)

And if one needs to get top performance, an _almost_ plain for loop
will give performance just as good as map (rather than, say, 15% worse
or so -- not a worry in 99.999% of the cases, of course...):

d_upda = d.update
for otherDictin l:
d_upda(otherDict)

[the usual case of "manual constant subexpression hoisting" where the
constant subexpression is the attribute lookup for d.update ...).


Alex
 
B

Bob Gailer

Just out of curiosity, for what kind of problems do we find reduce to just
be the Right Way? I mean, summing a big list of numbers is fun and all, but
I never find any use for it otherwise. I *often* try to think if reduce
would be useful when I come across some collection-of-values manipulation
task, but usually one wants to repeat an operation on a whole set of values,
not reduce the set to one value.

So, are there any *non-trivial* and *unabusive* uses of reduce, where any
other way would be tedious, repetitive, slow, unclear, etc.? I'm very
curious to see them.

One's prior programming experience can affect one's view of reduce. My
favorite language, prior to Python, was APL. APL's native data container is
the array, and reduce is a native operator in APL. So we used reduce a lot,
sometimes to add things up, other times to check for all or any conditions,
other times for other at this moment forgotten purposes. A companion of
reduce in APL is scan, which did reduce but preserved all the intermediate
values (cumulative sum for example).

To expand your reduce horizons in Python, consider reduce as a way to
examine the relationship between successive pairs of a sequence. For
example one might wants the difference between successive elements: given
seq = [1,3,4,7] we'd like a differences == [2,1,3].

differences = []
def neighborDifference(left, right):
differences.append(right - left)
return right
reduce(neighborDifference, seq)

Bob Gailer
(e-mail address removed)
303 442 2625
 
D

David Eppstein

Bob Gailer said:
To expand your reduce horizons in Python, consider reduce as a way to
examine the relationship between successive pairs of a sequence. For
example one might wants the difference between successive elements: given
seq = [1,3,4,7] we'd like a differences == [2,1,3].

differences = []
def neighborDifference(left, right):
differences.append(right - left)
return right
reduce(neighborDifference, seq)

seq=[1,3,4,7]
[y-x for x,y in zip(seq,seq[1:])]
 
G

Georgy Pruss

seq=[1,3,4,7]
map( int.__sub__, seq[1:], seq[:-1] ) # nearly twice faster than ....zip.... for long arrays

G-:

David Eppstein said:
seq = [1,3,4,7] we'd like a differences == [2,1,3].
<...>

seq=[1,3,4,7]
[y-x for x,y in zip(seq,seq[1:])]
 
A

Alex Martelli

Bob Gailer wrote:
...
One's prior programming experience can affect one's view of reduce. My
favorite language, prior to Python, was APL. APL's native data container
is the array, and reduce is a native operator in APL. So we used reduce a

I've done a _lot_ of APL and APL2 (albeit well before I met Python --
basically in the 1978-1988 time frame) and I don't think that "affects
my view of reduce" _in Python_ -- any more than (e.g.) having done a
lot of Scheme would affect my view of lambda and tail recursion _in
Python_, a lot of Icon that of generators, a lot of Perl that of RE's, ...

Basically, the old quip about "coding Fortran in any language" does not
apply to Fortran _only_: you can, up to a point, code (APL, Icon, Scheme,
Perl, ...) in any language... but it's _still_ basically a bad idea.

Mastering a language means (in part) mastering the idioms thereof --
what works well/smoothly/elegantly in that language, what most of that
language's best practitioners _do_. Knowledge of other languages --
the more the merrier -- can help, giving you perspective, but in the
end it shouldn't really affect your view of the target language.

The reduce method of Numeric's ufuncs _is_ a good, close analogy
for APL's reduce (more generally, APL concepts -- though not syntax --
do help a lot with Numeric, far more than they do with base Python).
Numeric.add.reduce(xs) IS a perfect equivalent to APL's +/xs -- if
xs is a Numeric.array (else the conversion overhead's gonna kill you:):

$ timeit.py -c -s'import Numeric' -s'import operator' -s'xs=range(9999)'
'reduce(operator.add, xs)'
100 loops, best of 3: 2.9e+03 usec per loop
$ timeit.py -c -s'import Numeric' -s'import operator' -s'xs=range(9999)'
'Numeric.add.reduce(xs)'
10 loops, best of 3: 2.2e+04 usec per loop
$ timeit.py -c -s'import Numeric' -s'import operator' -s'xs=range(9999)'
'sum(xs)'
1000 loops, best of 3: 1.15e+03 usec per loop
$ timeit.py -c -s'import Numeric' -s'import operator' -s'xs=range(9999)'
-s'xa=Numeric.array(xs)' 'Numeric.add.reduce(xa)'
10000 loops, best of 3: 54 usec per loop

Yeah, I know, speed isn't everything. But when the speed of a perfectly
analogous operation drops by two times (from reduce to sum) or even
more when it drops by 50 times (from reduce on a list to Numeric.add.
reduce on a Numeric.array) one _does_ get a sense of when one's using
a language "reasonably idiomatically" -- within the design space of that
language, within the pathways most often taken by the language's best
practitioners and therefore most likely to be solid and optimized.

companion of reduce in APL is scan, which did reduce but preserved all the
intermediate values (cumulative sum for example).

Yep, that's e.g. Numeric.add.accumulate in Numeric + Python.

To expand your reduce horizons in Python, consider reduce as a way to
examine the relationship between successive pairs of a sequence. For
example one might wants the difference between successive elements: given
seq = [1,3,4,7] we'd like a differences == [2,1,3].

differences = []
def neighborDifference(left, right):
differences.append(right - left)
return right
reduce(neighborDifference, seq)

def difs_reduce(seq):
differences = []
def neighborDifference(left, right):
differences.append(right - left)
return right
reduce(neighborDifference, seq)
return differences

def difs_zip(seq):
return [ b-a for a, b in zip(seq,seq[1:]) ]

import itertools
def difs_izip(seq):
return [ b-a for a, b in itertools.izip(seq,seq[1:]) ]

def difs_loop(seq):
differences = []
it = iter(seq)
a = it.next()
for b in it:
differences.append(b-a)
a = b
return differences

$ timeit.py -c -s'import a' -s'x=range(9999)' 'a.difs_reduce(x)'
100 loops, best of 3: 1.8e+04 usec per loop
$ timeit.py -c -s'import a' -s'x=range(9999)' 'a.difs_zip(x)'
100 loops, best of 3: 1.8e+04 usec per loop

so, the list comprehension with zip has just the same performance
as the clever use of reduce -- and is far more concise. But if
you're looking for speed, then the humble loop...:

timeit.py -c -s'import a' -s'x=range(9999)' 'a.difs_loop(x)'
100 loops, best of 3: 9e+03 usec per loop

....is twice as fast! Simplicity isn't to be sneered at. And
if you're keep for higher-level reasoning and expression, itertools
is quite good for that:

timeit.py -c -s'import a' -s'x=range(9999)' 'a.difs_izip(x)'
100 loops, best of 3: 7.6e+03 usec per loop

a one-liner, and another 15% speed boost wrt the humble loop.
[I can shave another 10% performance improvement with an
experimental 'w2' type I've coded to do specifically the
window-by-two job, but that small level of acceleration is
probably not enough to justify an extra type...;-)]


Yeah, I know, sigh, I come across as a micro-optimization freak,
which most definitely _ISN'T_ the point. I'm using speed and
concision as proxies for *idiomatic use* -- simplicity and
elegance and maintainability... "elegance" &c we could argue about
forever, so I go for numbers that aren't quite as arguable:).


I think reduce (like apply, filter, and probably map too) has had
its day but really doesn't have enough good use cases any more, in
this day and age of list comprehensions and iterator-tools, to
justify its existence as a Python built-in except for legacy reasons.


Alex
 
B

Bengt Richter

Bob Gailer said:
To expand your reduce horizons in Python, consider reduce as a way to
examine the relationship between successive pairs of a sequence. For
example one might wants the difference between successive elements: given
seq = [1,3,4,7] we'd like a differences == [2,1,3].

differences = []
def neighborDifference(left, right):
differences.append(right - left)
return right
reduce(neighborDifference, seq)

(IMO that's an ugly way to get a global result ;-).
seq=[1,3,4,7]
[y-x for x,y in zip(seq,seq[1:])]

The OP might like to see how generators make this sort of thing easy, e.g.,
... itseq = iter(seq)
... earlier = itseq.next()
... for later in itseq:
... yield earlier, later
... earlier = later
...

Reincarnating your list comprehension, kind of:
>>> [later-earlier for earlier,later in pairs(seq)]
[2, 1, 3]

or factoring out the first difference ...
... for earlier,later in pairs(seq): yield later-earlier
...

that becomes
[2, 1, 3]

or perhaps better
[2, 1, 3]

And then we can play with other kinds of sequences and first differences, e.g.,
... abc
... abx
... bx
... zax
... """
>>> import sets
>>> for d in diff1([sets.Set(line) for line in linetext.splitlines()]): print d
...
Set(['x'])
Set([])
Set(['a', 'z'])

That was the first differences of:
>>> for d in ([sets.Set(line) for line in linetext.splitlines()]): print d
...
Set(['a', 'c', 'b'])
Set(['a', 'x', 'b'])
Set(['x', 'b'])
Set(['a', 'x', 'z'])

Regards,
Bengt Richter
 
E

Erik Max Francis

Alex said:
Yeah, I know, speed isn't everything. But when the speed of a
perfectly
analogous operation drops by two times (from reduce to sum) or even
more when it drops by 50 times (from reduce on a list to Numeric.add.
reduce on a Numeric.array) one _does_ get a sense of when one's using
a language "reasonably idiomatically" -- within the design space of
that
language, within the pathways most often taken by the language's best
practitioners and therefore most likely to be solid and optimized.

But reduce isn't simply intended for adding up numbers. It's for doing
any kind of reduction.

Yes, I'm sure specific reductions based on summing are faster than using
reduce. But that goes without saying. Furthermore, Numeric is not
builtin to Python, so that seems a red herring to me. Either you should
compare builtins to builtins, or not.
 
A

Alex Martelli

Erik said:
But reduce isn't simply intended for adding up numbers. It's for doing
any kind of reduction.

However, so many of reduce's practical use cases are eaten up by sum,
that reduce is left without real use cases to justify its existence.

If you consider the couple of uses left in the standard library, for
example... e.g., take cvs.py:

quotechar = reduce(lambda a, b, quotes = quotes:
(quotes[a] > quotes) and a or b, quotes.keys())

....a masterpiece of clarity, right? The simple Python alternative,

quotechar = None
for k in quotes:
if not quotechar or quotes[k]>quotes[quotechar]:
quotechar = k

may be deemed boring, but then why not go for speed & concision with...:

quotechar = max([ (v,k) for k,v in quotes.iteritems() ])[-1]

....?-) All 4 uses in csv.py are similar to this, and the one
in difflib.py:

matches = reduce(lambda sum, triple: sum + triple[-1],
self.get_matching_blocks(), 0)

is clearly best expressed in Python 2.3 as:

matches = sum([ triple[-1] for triple in self.get_matching_blocks() ])


The only other uses are in Lib/test/ -- and even there, apart from
tests of reduce itself, all are "reduce(add..." [i.e., sum!] save
for *one*...:

def factorial(n):
return reduce(int.__mul__, xrange(1, n), 1)

even in that one case (and apart from the confusing choice of having
factorial(n) return the factorial of n-1...), the most simplistic
implementation:

def fac(n):
result = 1
for i in xrange(2, n):
result *= n
return result

is only 3 times slower, and, if one is in a hurry, recursion and
memoization are obviously preferable:

def facto(n, _memo={1:1}):
try: return _memo[n]
except KeyError:
result = _memo[n] = (n-1) * facto(n-1)
return result

the performance numbers being:

[alex@lancelot bo]$ timeit.py -c -s'import facs' 'facs.factorial(13)'
100000 loops, best of 3: 10.3 usec per loop

[alex@lancelot bo]$ timeit.py -c -s'import facs' 'facs.fac(13)'
10000 loops, best of 3: 32 usec per loop

[alex@lancelot bo]$ timeit.py -c -s'import facs' 'facs.facto(13)'
1000000 loops, best of 3: 1.26 usec per loop

Yes, I'm sure specific reductions based on summing are faster than using
reduce. But that goes without saying. Furthermore, Numeric is not
builtin to Python, so that seems a red herring to me. Either you should
compare builtins to builtins, or not.

If you want APL-ish functionality with Python, Numeric is where you'll
find it (one day numarray, Numeric's successor, may finally gain entrance
into the standard library, but, don't hold your breath...).

But comparing plain Python code to a built-in that's almost bereft of
good use cases, and finding the plain Python code _faster_ on such a
regular basis, is IMHO perfectly legitimate. If a built-in gives me
obfuscated or slow code, where plain good old "let's code it out in
Python" gains clarity or speed, then it's time for that built-in to
go. 'reduce' exists for purely legacy reasons, and, IMHO, would count
as a wart were it not for Python's admirable determination to keep
old code running (however, even that determination can be overdone,
and I look forwards to the 3.0 release where old by-now-unwarranted
built-ins can be put out to pasture...).


Alex
 
J

JCM

if one is in a hurry, recursion and
memoization are obviously preferable:
def facto(n, _memo={1:1}):
try: return _memo[n]
except KeyError:
result = _memo[n] = (n-1) * facto(n-1)
return result
the performance numbers being:
[alex@lancelot bo]$ timeit.py -c -s'import facs' 'facs.factorial(13)'
100000 loops, best of 3: 10.3 usec per loop
[alex@lancelot bo]$ timeit.py -c -s'import facs' 'facs.fac(13)'
10000 loops, best of 3: 32 usec per loop
[alex@lancelot bo]$ timeit.py -c -s'import facs' 'facs.facto(13)'
1000000 loops, best of 3: 1.26 usec per loop

I'm going off topic, but it's really not fair to compare a memoized
function to non-memoized implementations using a "best of 3" timing
test.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top