reduce() anomaly?

Alex Martelli · Nov 17, 2003

SUZUKI Hisao wrote:
...

As to sum(), when learning string addition (concatenation),
one may wonder why sum() does not handle it:

I originally had sum try to detect the "summing strings" case and
delegate under the covers to ''.join -- Guido vetoed that as too
"clever" (which has a BAD connotation in Python) and had me forbid
the "summing strings" case instead, for simplicity.

sum(['a', 'b', 'c'])

Click to expand...

Click to expand...

Traceback (most recent call last):
File "<stdin>", line 1, in ?
TypeError: unsupported operand type(s) for +: 'int' and 'str'

whereupon, one hopes, the user checks sum's doc:
sum(sequence, start=0) -> value

Returns the sum of a sequence of numbers (NOT strings) plus the value
of parameter 'start'. When the sequence is empty, returns start.
and if one tries anyway:

sum(['a','b','c'],'')

Click to expand...

Click to expand...

Traceback (most recent call last):
File "<stdin>", line 1, in ?
TypeError: sum() can't sum strings [use ''.join(seq) instead]

the error message should direct the user to the proper way to sum
strings. Having more than one "obvious way to do it" was avoided.

while general reduce() does it just as _expected_:

reduce(str.__add__, ['a', 'b', 'c'])

Click to expand...

Click to expand...

'abc'

Well, if what you "expect" is the following performance...:

[alex@lancelot tmp]$ timeit.py -c -s'x=map(str,range(999))'
'reduce(str.__add__, x)'
1000 loops, best of 3: 1.82e+03 usec per loop
[alex@lancelot tmp]$ timeit.py -c -s'x=map(str,range(999))' "''.join(x)"
10000 loops, best of 3: 68 usec per loop

i.e., a slowdown by about 2700% for a 999-items sequence,

[alex@lancelot tmp]$ timeit.py -c -s'x=map(str,range(1999))'
'reduce(str.__add__, x)'
100 loops, best of 3: 5e+03 usec per loop
[alex@lancelot tmp]$ timeit.py -c -s'x=map(str,range(1999))' "''.join(x)"
10000 loops, best of 3: 143 usec per loop

growing to 3500% for a 1999-items sequence, and so on without bounds,
then no doubt ``reduce does it just as expected'' by YOU.

Most people, however, EXPECT sensible performance, not slow-downs by
factors of tens or hundreds of times, when they use constructs that are
considered "normal and supported" in the language and its built-ins.

This makes reduce a terrible performance trap just waiting to catch the
unwary. It SEEMS to work all right, but in fact it's doing nothing of
the kind, nor can it -- it's defined to iteratively run N repetitions
of whatever function you pass as the first argument, therefore it can
never have O(N) performance when used to add up a sequence of strings,
but always, necessarily O(N squared). It's _conceivable_ (although it
currently appears unlikely that Guido will ever countenance it) that
'sum' can be specialcased (to use faster approaches to summation when it
is dealing with sequences, not numbers) to give the O(N) performance
most people (sensibly) DO expect; that just depends on loosening its
current specs, while maintaining the concept of "sum of a sequence".
No such avenue is open for 'reduce': it will always be a terrible
performance trap just waiting to pounce on the unwary.

It may be sum() that is more difficult to learn...

I have enough experience teaching both built-in functions, by now,
that I can rule this hypothesis out entirely.

For this particular problem, it is better to use
''.join(['a', 'b', 'c']), you know.

Yes, and sum's error messages tells you so, so, if you DON'T know,
you learn immediately.

However, it is important for Python to have an easy and generic
way to do things. If we have to read the manual through to solve
anything, what is the point to use Python instead of Perl (or Ruby,
to some extent)?

Why do you need to read the manual after getting an error message
that tells you to """ use ''.join(seq) instead """? As for the
importance of "easy and generic", I would agree -- I'd FAR rather
have 'sum' be able to handle _well_ sums of any kinds -- but Guido
doesn't, so far. If you have arguments that might convince him, make
then. But 'reduce' just isn't a candidate, as the "easy" part simply
cannot apply.

However, It may be better to give reduce() some nice notation.

"Syntax sugar causes cancer of the semicolon". The nicer the
notation, the more superficially attractive you make it to use
a construct with unacceptable performance implications, the
craziest and most terrible the performance trap you're building
for the poor unwary users. I think it's an appalling direction
to want to move the language in.

Alex

Alex Martelli · Nov 17, 2003

Bengt Richter wrote:
...

Ok, but what about returning an iterator -- e.g., funumerate(f, seq) --
that supplies f(x),x pairs like enumerate(seq) supplies i,x?

That doesn't seem to conflict with Raymond Hettinger's vision (he's the
itertools guru), but you might check with him directly.

[I'd suggest extending enumerate, but I already want to pass optional
[range parameters there,
so one could control the numeric values returned, e.g.,
enumerate(seq,<params>) corresponding to zip(xrange(<params>),seq))].

I know Raymond wants to be able to control the starting point if Guido
will allow that extension (RH is also enumerate's author) but I don't
know if he's interested in the stride and stop values, too.

[BTW, should xrange() default to xrange(0,sys.maxint,1)?]

I doubt xrange has enough of a future to be worth extending it. I'd
rather have an irange returning an iterator (and given that itertools.count
already does basically the job you mention, treating irange without args
as an error may be more useful). No special reason to single out sys.maxint
when the int/long distinction is rapidly withering, btw.

Sorry, I try to remember to always say something like "now (in the 2.4
pre-alpha on CVS)" but sometimes I do forget to repeat this every time I
mention 2.4 novelties.

This is a new one on me:
Traceback (most recent call last):
File "<stdin>", line 1, in ?
TypeError: sort() takes no keyword arguments

In 2.3, yes. Get a 2.4 CVS snapshot and you'll see it work.

Do you mean the comparison function? Or is there something else now too?
I'm beginning to infer that key= is actually a keyword arg for a
_function_ to get a "key" value from a composite object (in which case
Right.

ISTM "getkeyvalue" or "valuefunc" would be a better name). But IMO "key"
suggests it will be used on elements x like x[key], not passing a
definition key=whatever and then using key(x) to get values.

The concept and terminology of a "sort key" or "sorting key" is very
popular, so the concision of the 'key' attribute name was chosen in
preference to your longer suggestions. E.g., to sort a list of strings
in order of increasing string length (and otherwise stably),

thelist.sort(key=len)

was deemed preferable to

thelist.sort(getkeyvalue=len)

Considering that the key= parameter is meant to take the place of most
uses of DSU (and thus likely become extremely popular), I concur with
the current choice for the name. However, it's not etched in stone yet:
2.4 is at the pre-alpha stage. You can check the python-dev archives
for past discussions of the issue, then join python-dev to offer your
contributions, if you think they're helpful; this is a course of action
that is always open to anybody, of course.

think the obvious approach would be to add the same optional argument to
min
and max with exactly the same semantics. I.e., just as today:

x = max(somelist)
somelist.sort()
assert x == somelist[-1]

we'd also have

x = max(somelist, key=whatever)
somelist.sort(key=whatever)
assert x == somelist[-1]

Click to expand...

I think I like it, other than the name. Maybe s/key/valuefunc/ ;-)

If max and min do acquire such an optional argument, it will of course
have the same name and semantics as it does for sort.

That's a nice option for max (and min, and ??), but ISTM that it would

max and min are it (unless heapq or bisect grow new _classes_ where such
comparison-flexibility would also be natural to have; in the current,
function-centered state of heapq, such an optional argument would not
fit well, and the situation for bisect is not crystal-clear either way).

also be nice to have a factory for efficient iterators of this kind.
It would probably be pretty efficient then to write

maxlen, maxitem = max(funumerate(len,seq))

Yes, roughly as efficient as it is today (2.4, again) to write the
less concise
max(izip(imap(len, seq), seq))
[for a RE-iterable seq] using more general itertools entries. [If
seq is not necessarily reiterable, you'd need to add a tee to this;
again, see the itertools in the current 2.4 snapshot].

However, this does not deal with the issue of seq items needing
to be comparable (and at reasonable cost, too) when they have the
same value for len(item). max and min with an optional argument
would know to ONLY compare the "sort key", just like sort does,
NEVER the items themselves. So, to fully emulate the proposed
max and min, you need to throw in an enumerate as well.

or

def longest(seq):
return max(funumerate(len,seq))[-1]

and it would be available as infrastructure for other efficient loops in
addition to being tied in to specific sequence processors like max and
min.

The only issue here is whether izip(imap(len, seq), seq) is frequent
enough to warrant a new itertools entry. As it cannot take the place
of the proposed optional argument to min and max, it doesn't really
affect them directly. The general interest of
izip(imap(len, seq), enumerate(seq))
is, I think, too low to warrant a special entry just to express it.

Alex

Alex Martelli · Nov 17, 2003

Andrew said:
Alex:

I've only been skimming this thread. Now upon reread I see you also
looked at the standard library. No fair -- I'm supposed to be the only
one who does that!

On python-dev one had better make ready by such a preliminary check
for the inevitable objections.

Yup. There's no good, existing, natural place for such a function, that
I can tell.

I agree. Which is why I've been claiming that we need a module of
"iterator consumers" (accumulators?) in 2.4 to get full advantage
from the new genexp's and itertools enhancements.

That follows Moshe's comment that a "Decorate Max Undecorate",
in parallel to Decorate Sort Undecorate. Nice.

Hadn't seen Moshe comment on it, but, yes, it IS the same idea.

we'd also have

x = max(somelist, key=whatever)
somelist.sort(key=whatever)
assert x == somelist[-1]

Click to expand...

Is 'key' also an iterable? ...

No, a callable (just like in 2.4's sort).

Ahh, it's the function used to get the keys. What about 'getkey'
as a name?

See my comments to Bengt on this thread. Summarizing: 2.4 is in
pre-alpha, therefore suggestions to change it will be quite
appropriate on python-dev. The name and semantics of such an
optional argument must of course be the same as whatever ends up
being accepted for lists' sort method (currently, it's key=).

I looked through the back thread but didn't see your function
definition. Is it backwards compatible to the existing max
function?... You know, I never realized how strange max was
in the first place, so that
max(1, 2)
and
max( (1, 2) )
are both valid. It must assume that if there's only one argument
then the latter form is appropriate. Then your form must say
Yes:

Traceback (most recent call last):
File "<stdin>", line 1, in ?
TypeError: iteration over non-sequence

that if there's only one non-keyword argument then it's of
the form

max(seq, **args)

But
max(a, b, key=len)
seems reasonable as well. Then again, so does
max(seq, len)

max(seq, len) must of course remain compatible with today. Keyword
arguments are handled separately (in built-in functions).

My guess is that reaction is based on the need to have both forms
max(a, b)
and
max(seq)
and allow a key function to be passed in, and make it feel
natural without running into problems when the two might be
mixed up.

Interestingly enough, this objection wasn't raised. If it were,
then it would easily be squashed by analogy with somelist.sort:
somelist.sort(foo) must mean to use foo as the comparator func,
for backwards compatibility, so that one MUST give the argument
name, somelist.sort(key=foo), to use a key-extracting func instead.

Yeah, I worry about that bias error when I do my analysis.
I really should also scan some of the other (proprietary) code
bases I have access to, but then it leads to claims of being
irreproducible. Besides, despite being a scientific programmer,
I mostly do "string and text processing, networking &c."

A corpus of freely available code from the Vaults of Parnassus
might make for a useful resource for such tasks.

Alex

Alex Martelli · Nov 17, 2003

Ville Vainio wrote:
...

[stuff that should remain builtins]

iter, len, pow [for the crucial 3-arguments case], range (or some
preferable variant that returns an iterator), and zip seem pretty

Click to expand...

+enumerate

Yep, it sure should be there.

locals and globals seem to have a natural place in builtins IMHO.

Why? They only allow a limited form of introspection. Why are
they more essentially "built-in" than other stuff in inspect?

How about the wiki at python.org?

Sure, one could write things up there, but I don't think it's a
medium as suitable to debate as a newsgroup or mailing list.

Alex

Ville Vainio · Nov 17, 2003

[ locals() and globals() ]

Why? They only allow a limited form of introspection. Why are
they more essentially "built-in" than other stuff in inspect?

Somehow their existence in builtins has a connotation that we are
referring to the "current" namespace, instead of, say,
inspect.locals() (which might bear the connotation that we are talking
about locals of the module inspect).

I've mostly used locals() and globals() as "%(myvar)s" % locals(), and
never for inspect-ish tasks.

Sure, one could write things up there, but I don't think it's a
medium as suitable to debate as a newsgroup or mailing list.

Yes, it isn't. How about a "py3k-sig"?

Alex Martelli · Nov 17, 2003

Douglas said:
Alex Martelli said:

But to be consistent with your other arguments, no doubt you'd argue
for a .sort() followed by [-1] as "more general" than max...

Click to expand...

That would have a larger big O time growth value -- most likely
O(n * log n) vs. O(n), for reasonable implemenations. And while I

If the sequence is carefully randomized, yes. If the sequence has
any semblance of pre-existing order, the timsort is amazingly good
at exploiting it, so that in many real-world cases it DOES run as
O(N).

wouldn't sweat a factor of 2 for a feature of a RAD or scripting
language, I would be more concerned about moving to a larger big O
value.

Me too! That's why I'd like to make SURE that some benighted soul
cannot code:

onebigstring = reduce(str.__add__, lotsofstrings)

and get O(N squared) performance, versus the O(N) performance of
the correct Python idiom, ''.join(lotsofstrings) . At least sum
does give an error message, AND a reminder that ''.join should be
used, when you try sum(lotsofstrings, '') -- reduce just slows
your program down by a LOT silently and underhandedly...

Your proposed extension to max() and min() has all the same problems.

Not at all. Maybe you have totally misunderstood "my proposed
extension"? There is NO "callable which accepts two arguments"
involved -- and therefore no complications at all. Indeed, what
I propose is simply to ensure that under all circumstances

x = max(iterable, key=k)

binds x to exactly the same value which would be be bound by

x = list.sorted(iterable, key=k)[-1]

in Python 2.4 (max can of course guarantee O(N) behavior, while
list.sorted cannot -- this optimization, as well as the slight
simplification, would be my motivations for having key= in max).

But reasonable programmers don't abuse this generality, and so there

So, you're claiming that ALL people who were defending 'reduce' by
posting use cases which DID "abuse this generality" are unreasonable?

this urge to be stiffled. Don't take my word for it -- ask Paul
Graham. I believe he was even invited to give the Keynote Address at
a recent PyCon.

However, if you agree with Paul Graham's theories on language design,
you should be consistent, and use Lisp. If you consider Python to be
preferable, then there must be some point on which you disagree with
him. In my case, I would put "simplicity vs generality" issues as
the crux of my own disagreements with Dr. Graham.

So, Python is not designed as PG would design it (else it would be
Lisp). It's designed with far greater care for simplicity, and for
practicality, and with a jaundiced eye against excess generality.
Indeed, 'sum' itself lost substantial generality between my original
conception of it and the way Guido eventually accepted it into the
built-ins -- and Guido's on record as regretting that he did not
remove even _more_ generality from it.

Just what is it that I don't grasp again? I think my position is
clear: I have no intention to abuse reduce(), so I don't worry myself
with ways in which I might be tempted to.

Yet you want reduce to keep accepting ANY callable that takes two
arguments as its first argument, differently from APL's / (which does
NOT accept arbitrary functions on its left); and you claimed that reduce
could be removed if add, mul, etc, would accept arbitrary numbers of
arguments. This set of stances is not self-consistent.

So, now you *do* want multiple obviously right ways to do the same
thing?

sum(sequence) is the obviously right way to sum the numbers that are
the items of sequence. If that maps to add.reduce(sequence), no problem;
nobody in their right mind would claim the latter as "the one obvious
way", exactly because it IS quite un-obvious.

The English world "reduce" certainly has multiple meanings, but so
does "sum". I can be a noun or a verb, for instance. It can mean

Any noun can be verbed, but that's a problem, or feature, of English
as a natural language, and unrelated to programming languages.

The point is that the primary meaning of "reduce" is "diminish",
and when you're summing (positive

numbers you are not diminishing
anything whatsoever ... unless you think in terms of multidimensional
arrays and diminishing dimensionality, but while that's quite OK in
APL or Numeric, which DO have multidimensional arrays, it's silly in
Python proper, which doesn't. The primary meaning of "sum" is "sum",
so I have never met anybody having the slightest problem understanding
or using it (including both people with English as their mother
tongue, and others, all the way to people with near-zero command of
English: it helps, here, that "summare" is a _Latin_ word -- so is
"reducere", but with the same primary meaning as in English

.

"summary" or "gist" in addition to addition. It also can be confusing
by appearing to be just a synonym for "add". Now people might have
trouble remember what the difference between sum() and add() is.

Got any relevant experience teaching Python? I have plenty and I have
never met ANY case of the "trouble" you mention.

In Computer Science, however, "reduce" typically only has one meaning
when provided as a function in a language, and programmers might as
well learn that sooner than later.

I think you're wrong. "reduce dimensionality of a multi-dimensional
array by 1 by operating along one axis" is one such meaning, but there
are many others. For example, the second Google hit for "reduce
function" gives me:

http://www.gits.nl/dg/node65.html

where 'reduce' applies to rewriting for multi-dot grammars, and
the 5th hit is

http://www.dcs.ed.ac.uk/home/stg/NOTES/node31.html

which uses a much more complicated generalization:

reduce(g, e, m, n, f)=g(e,g(f(m),g(f(m+1),...g(f(n-1),g(f(n)))...)))

not to mention Python's own __reduce__, etc. And the 'reduce'
given at
http://w3future.com/html/stories/hop.xml
is what we might code as sum(map(templateFunction, sequence))
while
http://csdl.computer.org/comp/trans/tp/1993/04/i0364abs.htm
deals with "the derivation of general methods for the L/sub 2/
approximation of signals by polynomial splines" and defines
REDUCE as "prefilter and down-sampler" (which is exactly as I
might expect it to be defined in any language dealing mostly
with signal processing, of course).

So, it's quite sensible for people to be confused about the
meaning of 'reduce' within a programming language.

That's very easy to fix:

FAQ
---
Q. Should I ever pass a function with side effects into reduce() or
map()?

A. No.

(Unless the side-effect is just printing out debugging information,
or saving away statistics, or something like that.)

Designing an over-general approach, and "fixing it in the docs" by
telling people not to use 90% of the generality they so obviously
get, is not a fully satisfactory solution. Add in the caveats about
not using reduce(str.__add__, manystrings), etc, and any reasonable
observer would agree that reduce had better be redesigned.

Again, I commend APL's approach, also seen with more generality in Numeric
(in APL you're stuck with the existing operator on the left of + -- in
Numeric you can, in theory, write your own ufuncs), as saner. While not
quite as advisable, allowing callables such as operator.add to take
multiple arguments would afford a similarly _correctly-limited generality_
effect. reduce + a zillion warnings about not using most of its potential
is just an unsatisfactory combination.

Alex

Alex Martelli · Nov 17, 2003

Terry said:
Alex Martelli said:

to be consistent with your other arguments, no doubt you'd argue for
a .sort() followed by [-1] as "more general" than max...

Click to expand...

By my definition of "more general", a function that returns all order
statistics of a list is trivially more general than one that returns
any fewer, including just one. It conveys a superset of the

Of course. So...?

infomation conveyed by less general functions and answers a superset
of the questions they answer. If you add a control parameter
specifying which order statistics to return, then less general
functions have a restriced domain.

So I have no idea your intent in appearing to challenge this and how
you think it contributes to your overall argument. Is your idea of
'more general' so different?

Hmmm, nobody else seemed to have any problem understanding my quip,
including the poster at which it was directed. Let me spell it out
for you: a few people are attacking 'sum' and defending 'reduce'
because the latter is "more general". Why, so is sorting a sequence
"more general" than finding its maximum; for "consistency" with their
other arguments (which by this quip I'm alleging are no good) they
should then be clamoring for the removal of 'max' to leave only 'sort'.
(I did expect an O()-based objection and punctually got it, ready for
my repartee about reduce(str.add, lotsastrings) being O(N squared),
etc, etc).

So, no, I don't think my idea of "more general" is different from
yours: e.g., a function that, given a sequence, returns its length
AND the number of times 'None' occurs as an item, is more general
than one which just returns the length. That does not make it in
any intrinsic way "necessarily preferable" -- and THAT is my point.

Alex

Alex Martelli · Nov 17, 2003

Terry said:
Alex Martelli said:

Dave Brueck wrote:
...

results = [ func(x) for x in sequence ]
... instead of ...
results = sequence.map(func) ??
Because I find the first much more readable>

Click to expand...

Click to expand...

I entirely agree with both points.

Click to expand...

For this pair, I like the second better. Different aesthetics.

I guess. I just can't imagine EVERY iterable automagically
growing a 'map' method without feelign shudders of terror at
the total ugliness and gratuitous "blackmagicness" of the idea.

They're even clearer when the contrast is between, e.g.:
results = [ x+23 for x in sequence ]
and:
results = sequence.map(lambda x: x+23)
where using the HOF approach forces you
to understand (and read) lambda too.

Click to expand...

Here I might take the first. 'lambda' is something I feed 'stuck'
with.

I cherish being able to use the same construct, list comprehension,
whether I already have a callable ready or not.

Would the hypothetical
results = sequence.map(func x: x+23)
be any better?

Just replacing the keyword 'lambda' with 'func'? If you were
designing a green-field language, and couldn't find any other way
to express callable literals -- so it only came down to a 2-way
choice between lambda and func as keywords with the same semantics,
I guess I would suggest func as the lesser evil.

How about a very hypothetical (post ``==repr deprecation)
results = sequence..map(`x+23`)

How would this notation imply that x is an argument rather than,
say, a global?

Alex

Alex Martelli · Nov 17, 2003

Terry said:
What a liberating thought. I sometimes forget that the only thing I
am really stuck with when programming in Python is the core syntax (in
the Ref Manual) and that I am free to wrap or replace anything else
(in the Lib Manual). For a project I hope to start soon, I have been
worried about how to work around or explain away the deficiencies of
Python's of reduce(). Instead, I should just write the version I want
for my purposes (especially since speed does not matter). So your
voluminous writing on this has benefited at least one person.

Glad to hear this.

I have thought some about this also. Type objects could be put/left
in types, but as constructors, they are too useful and basic not to
have immediately available (unless builtins was reduced to nearly
nothing).

Yes, I think the types should stay right in the builtins.

Does it have to be in builtins for import statements to work?

No, it would be technically possible to redefine "import"'s semantics
to take it from anywhere.

Direct use of __import__ seems relatively rare.

It depends on what you're comparing against, I guess, but yes,
it's not a staple of most programming styles. It's there more
to be overridden than to be directly called.

I had to re-lineate this to parse apart the three thoughts. A static
frequency-of-use analysis for some large code collection would be
interesting. I might write something eventually.

I'm pretty sure that checking for existence and getting the value
are more frequent operations than setting a new value, and that, in
turn, more frequent than deletion. That's "just the way it is" --
it's much the same as in, e.g., SQL (SELECT overwhelming more used,
INSERT and UPDATE less often, DELETE quite rarely).

I group this with type constructor objects, even though not one
itself

It's a factory function, yes. Not technically a type, but the
connection in terms of use is indeed close.

unless made a method, keep in

A method of every sequence and container on Earth? Eeek.

pow [for the crucial 3-arguments case]

Click to expand...

very specialized and rarely used? move to math as pow3

range (or some preferable variant that returns an iterator)

Click to expand...

how about iter(3), etc, currently an arg type error?
think of 3 as the sentinal value.
is not typing '0,' worth the wartiness of optional first arg?

and zip seem pretty fundamental

Click to expand...

agree except for pow

Yes, I can see pow could be controversial here.

Alex

Paul Rubin · Nov 17, 2003

Andrew Dalke said:
Yes, I think you're right about that, and I misinterpreted
his statement.

Yes, I was talking about "list-alikes".

Then again, is the change that "all Python list-alikes must
implement stable sort" or that "Python native lists shall
now and forever implement stable sort"?

The change was that native lists will implement stable sort. My gripe
is that if native lists are required to sort stably and list-alikes
can sort unstably, then list.sort and listalike.sort act differently
in a way that can lead to subtle bugs.

That's a statement only that list.sort shall be stable and
not that all .sort() methods must be stable.

It would be icky if some .sort() methods are required to be stable but
others are not.

Note that the most obvious way to implement sort() in C is to call the
standard C library qsort function, which is unstable.

David Eppstein · Nov 17, 2003

That's a statement only that list.sort shall be stable and
not that all .sort() methods must be stable.

It would be icky if some .sort() methods are required to be stable but
others are not.

Note that the most obvious way to implement sort() in C is to call the
standard C library qsort function, which is unstable.[/QUOTE]

You're complaining that stability makes implementing a list-alike's sort
trickier. However it also can make using sort simpler. Which do you
think happens more often?

Donn Cave · Nov 17, 2003

Ville Vainio said:
I asked my non-programmer girlfriend what she thinks your construct
does - she didn't know. She immediately understood what sum(seq) does.

You guys are hysterical. "reduce" is a performance trap waiting
to pounce! (But while profiting from this bogus argument, let's
not support any solution for that problem.) It's hard to
understand, just ask my girlfriend! (Great. By the way, what
does she think of list comprehensions, generators, etc.?)

Go ahead and get rid of reduce, that sounds like a good idea to me.
The reason though is just that it's not very useful in the context
of a language like Python, and it seems to confuse people who draw
the conclusion that Python must be some kind of functional programming
language. This will be a wake-up call on that score.

Donn Cave, (e-mail address removed)

Paul Rubin · Nov 17, 2003

David Eppstein said:
You're complaining that stability makes implementing a list-alike's sort
trickier. However it also can make using sort simpler. Which do you
think happens more often?

I generally haven't found stability to be important. When I've cared
about doing something other than sorting (possibly unstably) on some
obvious key, I've generally needed some kind of DSU. Just sorting
stably wouldn't be enough. If I'm using DSU anyway, then getting
stability is trivial if I happen to need it.

Anyway, requiring stability makes implementing list.sort trickier in
addition to making listalike.sort trickier. That's no big deal for
CPython or (apparently) Jython, since the work is already done, but
typical sorting libraries don't necessarily provide stability. If
stability were so important so much of the time, those libraries would
provide it.

David Eppstein · Nov 17, 2003

Paul Rubin said:
I generally haven't found stability to be important. When I've cared
about doing something other than sorting (possibly unstably) on some
obvious key, I've generally needed some kind of DSU. Just sorting
stably wouldn't be enough. If I'm using DSU anyway, then getting
stability is trivial if I happen to need it.

If you're doing the DSU by hand, getting stability is not so hard.
But it's not obvious how to do it with the new key= sort argument for
simplifying DSU. So there was a long discussion on python-dev about how
maybe sort needed yet another keyword argument on top of key= for
specifying that the DSU should include the item positions and be stable;
but this seemed redundant and overcomplicated given that both current
Python sorts are already stable. So Guido ended the discussion by
declaring that sorts would remain stable, hence no extra keyword
argument is necessary.

Since DSU is now built in to the sort mechanism anyway, if you're
rolling your own sort to match that mechanism you shouldn't find it
difficult to include the positions on top of the other DSU you already
have to do.

Ville Vainio · Nov 17, 2003

not support any solution for that problem.) It's hard to
understand, just ask my girlfriend! (Great. By the way, what
does she think of list comprehensions, generators, etc.?)

I was merely arguing that 'reduce' is not more readable or intuitive
than 'sum', which was the argument of OP.

Go ahead and get rid of reduce, that sounds like a good idea to me.

I don't think reduce should be altogether removed, it just shouldn't
be in stdlib. And neither should sum, btw.

The reason though is just that it's not very useful in the context
of a language like Python, and it seems to confuse people who draw
the conclusion that Python must be some kind of functional programming
language. This will be a wake-up call on that score.

I wouldn't mind Python getting more influence from functional realm,
as Python seems to me to be *the* hybrid language that can pull the FP
thing while still remaining practical and intuitive (and delightfully
non-academic).

Paul Rubin · Nov 17, 2003

Ville Vainio said:
I wouldn't mind Python getting more influence from functional realm,
as Python seems to me to be *the* hybrid language that can pull the FP
thing while still remaining practical and intuitive (and delightfully
non-academic).

Python sometimes seems to go out of its way to thrwart the use of
functional style. Look at list.sort returning None, for example.

David Eppstein · Nov 17, 2003

Paul Rubin said:
Python sometimes seems to go out of its way to thrwart the use of
functional style. Look at list.sort returning None, for example.

The issue here is not that it returns None, but that it changes its
input. To be truly functional, it should return a new list and leave
the original list unchanged. Returning None is just a helpful reminder
that it's not functional. Of course, the functional version would often
be less efficient...

Paul Rubin · Nov 17, 2003

David Eppstein said:
The issue here is not that it returns None, but that it changes its
input. To be truly functional, it should return a new list and leave
the original list unchanged. Returning None is just a helpful reminder
that it's not functional. Of course, the functional version would often
be less efficient...

Hmmm, good point. Returning None is still inconvenient of course.

Steve Lamb · Nov 17, 2003

This whole thread is reminiscent of vi vs emacs or an os war or similar.
It's a pity that people with a preferred style should be so dogmatic
that they want to remove language features to prevent others using them.

The difference there is clear. vi vs. emacs, OS A vs. OS B are two
completely different entities. We're talking about the same one here. That
one has a basic philosophy.

The whole 'only one way to do it' concept is almost certainly wrong.

Erm, no.

There should be maximal freedom to express algorithms. As others have
stated min, max,... sum et al are useful specialisations, but because
they cover 99% of the space doesn't mean that reduce is redundant.
-Eliminate reducespeak and control the future-ly yrs-

You have quite a few languages you can do that in. 5+ years in Perl and
I'm done with TIMTOWTDI, thank-you-very-much. I'm glad that in Python I don't
have to learn several different ways to do the same basic thing. I lament
every time I have to step into another language which has several ways to do
the same thing and if at any time Python fits the problem space that language
occupies perfectly (as in the case of Perl) then I advocate the hell out of
it.

I'm glad I no longer have to deal with 4 ways of doing a simple if
statement. I'm glad that there are only two loop constructs; one for
iterating over a sequence and one that runs until a condition is met. It
means that at the core level I can read the code and immediately see what is
going on instead of having to memorize a dozen or so specilized ways of doing
things.

Oddly enough it is in Python that I have had the most fun programming. It
is in Python that I find myself not only the most expressive but the most
elegant in my programming. In Python my code is the clearest and most
concise. I don't for one instant feel constrained by Python. I feel
liberated by it.

Because as much as it helps when reading the code to only have to learn a
minimal set of controls the same applies to writing code as well. When I
approach a problem I don't have to agonize over "well, should I do a
do...until(), a for(;

, a while(), or something else?" It breaks down to
this. Is it a sequence? For. Is it a condition to be met? While. There,
done, move along.

Terry Reedy · Nov 18, 2003

Alex Martelli said:
The point is that the primary meaning of "reduce" is "diminish",
and when you're summing (positive numbers you are not diminishing
anything whatsoever

Yes you are: you are reducing the number of numbers. Data reduction
is a standard term (in US at least) for literally reducing lots of
numbers to a fewer number of numbers, like count, sum, mean,
sum_of_squares, variance, and/or maybe a few others. As the volumn of
data generated by observational and experimental studies explodes,
useful reduction becomes even more important.

.. unless you think in terms of multidimensional

arrays and diminishing dimensionality,

Reducing a dimension 1 vector to a dimension 0 number is reduction in
both senses. But even reduction of a homogeneous array to tuple of
statistics is reduction in number if not dimension.

Perhaps my career in statistics and data reduction made reduce() more
immediately obvious to me than some other people.

Terry J. Reedy

Read STDIN as bytes rather than a string	0	Jun 19, 2012
NamedTemporaryFile does not match documentation	0	Feb 21, 2013
SSL on 3.2.2	0	Feb 25, 2012
Random module missing?	0	Sep 23, 2009
Why aren't OrderedDicts comparable with < etc?	17	Jul 16, 2009
trying to find nose.tools.assert_raises_regexp	2	May 23, 2014
(any)dbm module lacks a context manager	0	Jan 31, 2013
how to solve it?	1	Aug 1, 2011

reduce() anomaly?

Alex Martelli

Alex Martelli

Alex Martelli

Alex Martelli

Ville Vainio

Alex Martelli

Alex Martelli

Alex Martelli

Alex Martelli

Paul Rubin

David Eppstein

Donn Cave

Paul Rubin

David Eppstein

Ville Vainio

Paul Rubin

David Eppstein

Paul Rubin

Steve Lamb

Terry Reedy

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads