In <
[email protected]> Terry Reedy
I hope it's not (1): I want the language I use to include more
"batteries" not fewer.
How many more? Ten? Twenty? A hundred? Ten thousand? Ten million? Where
do you draw the line?
Every extra battery is (say) a hundred extra lines of code, which means
fifty thousand more potential bugs. It means more burden on the
maintainers, and more burden on people learning the language. More
batteries can make it more, not less, difficult to code:
# Python 1.1
"I need a collection of values, should I use a list, a tuple or a dict?"
vs
# Python 5000:
"I need a collection of values, should I use a list, a tuple, a dict, an
array, a bag, a SortedBag, a table, a SortedTable, a NamedTuple, a
Record, a stack, a queue, a deque, a linked list, a HashTable, a doubly-
linked list, a set, a frozenset, a skip list, a jump list, a binary tree,
a B tree, a B+ tree, a SortedList, a red-black tree, an A* tree, a B*
tree, a SortedList, a StringList, a CaseInsensitiveSortedStringList, a
CaseInsensitiveRedBlackTree, a HashSet, a CaseInsensitiveHashSet, an
ArrayList, a ConcurrentQueue, a ConcurrentStack, a KeyedCollection, an
EnumerableBag, a MultiSet, a StringMultiSet, a SortedSet, a
DoubleEndedQueue, a Buffer, a CircularQueue, a heap, a binary search
tree, an AssociatedArray, a Vector, a SparseArray, an XOR-linked list, a
weight-balanced tree, a RandomizedBinarySearchTree, a ThreadedBinaryTree,
an AVL tree, a splay tree, a rope, a binomial heap, a fibonacci heap, a
trie, a B-trie, a judy array, an and/or tree, an enfilade, a treap, a
dancing tree, a queap, or a fusion tree?" (Whew!)
There's nothing wrong with the existence of all these data structures. If
you need one, you can use it. But they don't all need to be included in
the standard library. That just increases the cognitive load on
programmers without really adding that much benefit. I've seen people
stress over the choice of a tuple or a list.
<petpeeve>It seems that a similar "simplicity argument" was invoked to
strip the cmp option from sort in Python 3. Grrrr. Simplicity is
great, but when the drive for it starts causing useful functionality to
be thrown out, then it is going too far. Yes, I know that it is
possible to do everything that sort's cmp option does through clever
tricks with key,
Not that clever. In general, key-based sorting is simpler than cmp-based
sorting. In addition, the use of a key function is a basic technique
which every coder should have in their mental toolbox, and it is
applicable to more than just sorting. E.g. max() and min() also support
key functions in Python 2.6 and better. On the other hand a cmp function
is specific to sorting, and nothing but sorting.
but grokking and coding this maneuver requires a *lot*
more Python-fu than cmp did, which makes this functionality a lot less
accessible to beginners that the intrinsic complexity of the problem
warrants.
I disagree that key-based sorting requires any more Python-fu than cmp
sorting. I believe they're equivalent. There are cmp functions you can
write which aren't (easily?) convertible to key, but most of them (all?)
aren't useful or sensible. E.g. they rely on side-effects, don't define a
sensible sort order, or break stability of sorting:
text = "here is a number of words in some order".split()
sorted(text, cmp=lambda a,b: cmp(a.upper(), b.lower())) ['order', 'some', 'in', 'words', 'of', 'number', 'a', 'is', 'here']
sorted(text[2:] + text[:2], cmp=lambda a,b: cmp(a.upper(), b.lower()))
['is', 'here', 'order', 'some', 'in', 'words', 'of', 'number', 'a']
Key-based sorting doesn't let you do this, but since I can't think of any
application where you want the sort order to be dependent on the initial
order, I believe this is a good thing. It is much harder to write buggy
sorts using key than with cmp.
And for what? To get rid of an *option* that could be easily
disregarded by anyone who found it too "complex"? It makes no sense to
me.</petpeeve>
It's not that cmp is too complex, but that it's mere existence adds
complexity to the list data type, it is usually slow and inefficient, and
it can introduce subtle sorting bugs.
In the past, I've suggested that if people really believe that there is a
use-case for sorting with a comparison function, they should fork it from
the built-in list object and release it on PyPI as a third-party package.
I still believe this.