While we're talking about annoyances

Steven D'Aprano · Apr 29, 2007

Am I the only one who finds that I'm writing more documentation than code?

I recently needed to write a function to generate a rank table from a
list. That is, a list of ranks, where the rank of an item is the position
it would be in if the list were sorted:

alist = list('defabc')
ranks = [3, 4, 5, 0, 1, 2]

To do that, I needed to generate an index table first. In the book
"Numerical Recipes in Pascal" by William Press et al there is a procedure
to generate an index table (46 lines of code) and one for a rank table
(five lines).

In Python, my index function is four lines of code and my rank function is
five lines. I then wrote three more functions for verifying that my index
and rank tables were calculated correctly (17 more lines) and four more
lines to call doctest, giving a total of 30 lines of code.

I also have 93 lines of documentation, including doctests, or three
lines of documentation per line of code.

For those interested, here is how to generate an index table and rank
table in Python:

def index(sequence):
decorated = zip(sequence, xrange(len(sequence)))
decorated.sort()
return [idx for (value, idx) in decorated]

def rank(sequence):
table = [None] * len(sequence)
for j, idx in enumerate(index(sequence)):
table[idx] = j
return table

You can write your own damn documentation. *wink*

GHUM · Apr 29, 2007

Steven,

def index(sequence):
decorated = zip(sequence, xrange(len(sequence)))
decorated.sort()
return [idx for (value, idx) in decorated]

would'nt that be equivalent code?

def index(sequence):
return [c for _,c in sorted((b,a) for a, b in
enumerate(sequence))]

tested, gave same results. But worsens your doc2code ratio

Harald Armin Massa
--

Michael Hoffman · Apr 29, 2007

GHUM said:
Steven,

def index(sequence):
decorated = zip(sequence, xrange(len(sequence)))
decorated.sort()
return [idx for (value, idx) in decorated]

Click to expand...

would'nt that be equivalent code?

def index(sequence):
return [c for _,c in sorted((b,a) for a, b in
enumerate(sequence))]

Or even these:

def index(sequence):
return sorted(range(len(sequence)), key=sequence.__getitem__)

def rank(sequence):
return sorted(range(len(sequence)),
key=index(sequence).__getitem__)

Hint: if you find yourself using a decorate-sort-undecorate pattern,
sorted(key=func) or sequence.sort(key=func) might be a better idea.

=?ISO-8859-1?Q?BJ=F6rn_Lindqvist?= · Apr 29, 2007

To do that, I needed to generate an index table first. In the book
"Numerical Recipes in Pascal" by William Press et al there is a procedure
to generate an index table (46 lines of code) and one for a rank table
(five lines).

51 lines total.

In Python, my index function is four lines of code and my rank function is
five lines. I then wrote three more functions for verifying that my index
and rank tables were calculated correctly (17 more lines) and four more
lines to call doctest, giving a total of 30 lines of code.

So 9 lines for Python, excluding tests.

I also have 93 lines of documentation, including doctests, or three
lines of documentation per line of code.

Then, without documentation, Python is roughly 560% (51/9) as
efficient as Pascal. But with documentation (assuming you need the
same amount of documentation for the Python code as the Pascal code),
(51 + 93)/(9 + 93) = 1.41 so only 141% as efficient as Pascal.

I wonder what that means? Maybe Python the language is approaching the
upper bound for how efficient an imperative programming language can
be? On the other hand, there seem to be some progress that could be
made to reduce the amount of work in writing documentation.
Documentation in Esperanto instead of English maybe?

Ben Finney · Apr 29, 2007

BJÃ¶rn Lindqvist said:
On the other hand, there seem to be some progress that could be made
to reduce the amount of work in writing documentation.
Documentation in Esperanto instead of English maybe?

Lojban <URL:http://www.lojban.org/> is both easier to learn world-wide
than Euro-biased Esperanto, and computer-parseable. Seems a better[0]_
choice for computer documentation to me.

... _[0] ignoring the fact that it's spoken by even fewer people than
Esperanto.

Jarek Zgoda · Apr 29, 2007

Ben Finney napisa³(a):

On the other hand, there seem to be some progress that could be made
to reduce the amount of work in writing documentation.
Documentation in Esperanto instead of English maybe?

Click to expand...

Lojban <URL:http://www.lojban.org/> is both easier to learn world-wide
than Euro-biased Esperanto, and computer-parseable. Seems a better[0]_
choice for computer documentation to me.

German seems to be less "wordy" than English, despite the fact that most
of nouns is much longer.

Arnaud Delobelle · Apr 29, 2007

GHUM said:
GHUM said:

Steven,

def index(sequence):
decorated = zip(sequence, xrange(len(sequence)))
decorated.sort()
return [idx for (value, idx) in decorated]

Click to expand...

Click to expand...

would'nt that be equivalent code?

Click to expand...

def index(sequence):
return [c for _,c in sorted((b,a) for a, b in
enumerate(sequence))]

Click to expand...

Or even these:

def index(sequence):
return sorted(range(len(sequence)), key=sequence.__getitem__)

def rank(sequence):
return sorted(range(len(sequence)),
key=index(sequence).__getitem__)

Better still:

def rank(sequence):
return index(index(sequence))

But really these two versions of rank are slower than the original one
(as sorting a list is O(nlogn) whereas filling a table with
precomputed values is O(n) ).

Anyway I would like to contribute my own index function:

def index(seq):
return sum(sorted(map(list,enumerate(seq)), key=list.pop), [])

It's short and has the advantage of being self-documenting, which will
save Steven a lot of annoying typing I hope

Who said Python
couldn't rival with perl?

Paul Rubin · Apr 29, 2007

Steven D'Aprano said:
I recently needed to write a function to generate a rank table from a
list. That is, a list of ranks, where the rank of an item is the position
it would be in if the list were sorted:

alist = list('defabc')
ranks = [3, 4, 5, 0, 1, 2]

fst = operator.itemgetter(0) # these should be builtins...
snd = operator.itemgetter(1)

ranks=map(fst, sorted(enumerate(alist), key=snd))

Arnaud Delobelle · Apr 29, 2007

Steven D'Aprano said:
Steven D'Aprano said:

I recently needed to write a function to generate a rank table from a
list. That is, a list of ranks, where the rank of an item is the position
it would be in if the list were sorted:

Click to expand...

alist = list('defabc')
ranks = [3, 4, 5, 0, 1, 2]

Click to expand...

fst = operator.itemgetter(0) # these should be builtins...
snd = operator.itemgetter(1)

ranks=map(fst, sorted(enumerate(alist), key=snd))

This is what the OP calls the index table, not the ranks table (both
are the same for the example above, but that's an unfortunate
coincidence...)

Raymond Hettinger · Apr 29, 2007

[Steven D'Aprano]

I recently needed to write a function to generate a rank table from a
list. That is, a list of ranks, where the rank of an item is the position
it would be in if the list were sorted:

alist = list('defabc')
ranks = [3, 4, 5, 0, 1, 2] .. . .
def rank(sequence):
table = [None] * len(sequence)
for j, idx in enumerate(index(sequence)):
table[idx] = j
return table

FWIW, you can do ranking faster and more succinctly with the sorted()
builtin:

def rank(seq):
return sorted(range(len(seq)), key=seq.__getitem__)

Raymond

Alex Martelli · Apr 30, 2007

Arnaud Delobelle said:
...
But really these two versions of rank are slower than the original one
(as sorting a list is O(nlogn) whereas filling a table with
precomputed values is O(n) ).

Wrong, because the original one also had a sort step, of course, so it
was also, inevitably, O(N log N) -- I've quoted the .sort step above.

Anyway I would like to contribute my own index function:

def index(seq):
return sum(sorted(map(list,enumerate(seq)), key=list.pop), [])

It's short and has the advantage of being self-documenting, which will
save Steven a lot of annoying typing I hope Who said Python
couldn't rival with perl?

sum is for summing NUMBERS -- using it on lists is O(N squared).

So, this solution is asymptotically VERY slow, as well as obfuscated.

Alex

Michael Hoffman · Apr 30, 2007

Alex said:
Wrong, because the original one also had a sort step, of course, so it
was also, inevitably, O(N log N) -- I've quoted the .sort step above.

Well, counting the index() function that is called in both cases, the
original rank() had one sort, but my version has two sorts.

Alex Martelli · Apr 30, 2007

Michael Hoffman said:
Well, counting the index() function that is called in both cases, the
original rank() had one sort, but my version has two sorts.

That doesn't affet the big-O behavior -- O(N log N) holds whether you
have one sort, or three, or twentyseven.

Alex

Arnaud Delobelle · Apr 30, 2007

...

Wrong, because the original one also had a sort step, of course, so it
was also, inevitably, O(N log N) -- I've quoted the .sort step above.

I am fully aware of the meaning of O(...). Nevertheless (speed !=
asymptotic speed) and one sort is still better than two sorts IMHO.
Moreover the second sort is redundant and less clear than a simple
loop.

Anyway I would like to contribute my own index function:

Click to expand...

def index(seq):
return sum(sorted(map(list,enumerate(seq)), key=list.pop), [])

Click to expand...

It's short and has the advantage of being self-documenting, which will
save Steven a lot of annoying typing I hope Who said Python
couldn't rival with perl?

Click to expand...

sum is for summing NUMBERS -- using it on lists is O(N squared).

So, this solution is asymptotically VERY slow, as well as obfuscated.

And it was also a JOKE. There were some clues:
* I claimed that the function was self-documenting, even though it
was obviously obfuscated (as you rightly pointed out).
* It relies on a side effect of the 'key' function list.pop, which
is very bad form.
* It is indeed very slow (yes, sum() is not the best for lists)
* I mentioned that Python could rival perl.

I was meant to be a clumsy but 'concise' amalgam that would perform
the task (although not efficiently) while being difficult to make
sense of.

Michael Hoffman · Apr 30, 2007

Alex said:
That doesn't affet the big-O behavior -- O(N log N) holds whether you
have one sort, or three, or twentyseven.

I've taught programming classes before, and I would have had to fail
anybody who misunderstood speed badly enough to claim that something
repeating an O(N log N) algorithm 27 times was no faster than doing it
once. ;-)

As Arnaud points out, asymptotic behavior is not the same as speed. His
original statement that the more recently proposed definitions of rank()
are slower than the OP's may be correct. And if it's not, it's not
because they're all O(N log N).

Alex Martelli · Apr 30, 2007

Michael Hoffman said:
I've taught programming classes before, and I would have had to fail
anybody who misunderstood speed badly enough to claim that something
repeating an O(N log N) algorithm 27 times was no faster than doing it
once. ;-)

As for me, I would fail somebody who thought they could compare speed
this way -- if the O(N log N) executed 27 times took (on a given
machine) 1 millisecond times N times log N, and the other one (on the
same machine) 1 second times N times log N (and in the big-O notation
you can NEVER infer what the multiplicative constant is), for example,
such a comparison would be totally of base.

As Arnaud points out, asymptotic behavior is not the same as speed. His
original statement that the more recently proposed definitions of rank()
are slower than the OP's may be correct. And if it's not, it's not
because they're all O(N log N).

And if it is, it's not because of the "one sort" versus "two sorts": by
that sole criterion you just cannot guess (sensibly) at speed (measuring
is much better, though it has its own pitfalls).

Alex

Arnaud Delobelle · May 1, 2007

...

Again we're not talking about big-O: we're talking about which one is
faster.

As for me, I would fail somebody who thought they could compare speed
this way -- if the O(N log N) executed 27 times took (on a given
machine) 1 millisecond times N times log N, and the other one (on the
same machine) 1 second times N times log N (and in the big-O notation
you can NEVER infer what the multiplicative constant is), for example,
such a comparison would be totally of base.

Of course you are right: this is basic asymptotic analysis.
You also know very well that this is not what I or Michael were
thinking of. He was simply saying that repeating the *same thing* 27
times takes more time than simply doing it once, as it was made clear
by my earlier post that his solution involved repeating the same
index() function twice.

And if it is, it's not because of the "one sort" versus "two sorts": by
that sole criterion you just cannot guess (sensibly) at speed (measuring
is much better, though it has its own pitfalls).

I'm afraid you can draw some conclusions in this case, precisely
*because* one version has one sort less than the other (that is what I
was hinting at in my first post).

Let's say the time complexity of the first sort is:
f(n) ~ Knlogn
The time complexity of the second sort is:
g(n) ~ Lnlogn
The time complexity of the simple loop that can replace the second
sort is:
h(n) ~ Hn

Now because the whole algorithm consists of doing one sort THEN the
second thing(sort or loop), the time complexities of the two versions
are as follow:

Two sort version:
f(n)+g(n) ~ Knlogn+Lnlogn
~ (K+L)nlogn
One sort version:
f(n)+h(n) ~ Knlogn + Hn
~ Knlogn(1+H/Klogn)
~ Knlogn

So the two sort version is (K+L)/K times slower than the one-sort
version asymptotically.
(in practice I reckon K=L so that makes it twice slower)

Tim Roberts · May 2, 2007

Michael Hoffman said:
Hint: if you find yourself using a decorate-sort-undecorate pattern,
sorted(key=func) or sequence.sort(key=func) might be a better idea.

Is it? I thought I remember reading on this very list some years ago that
the performance of sequence.sort became much, much worse when a key
function was supplied.

I've tended to favor the "Schwarzian transform" (decorate-sort-undecorate)
because of that.

Steven D'Aprano · May 2, 2007

Is it? I thought I remember reading on this very list some years ago that
the performance of sequence.sort became much, much worse when a key
function was supplied.

You're probably thinking of a comparison function:

sequence.sort(cmp=lambda x, y: cmp(y, x))

is a slow way of sorting a sequence in reverse order. The fast
way is the two liner:

sequence.sort()
sequence.reverse()

Faster(?) still is:

sequence.sort(reverse=True)

I've tended to favor the "Schwarzian transform" (decorate-sort-undecorate)
because of that.

That's what the key= argument does. cmp= is slow because the comparison
function is called for EVERY comparison. The key= function is only called
once per element.

Michael Hoffman · May 2, 2007

Steven said:
That's what the key= argument does. cmp= is slow because the comparison
function is called for EVERY comparison. The key= function is only called
once per element.

Right. Using sort(key=keyfunc) is supposed to be faster than
decorate-sort-undecorate. And I think it is clearer too.

Trouble with prediction code, for the life of me I can't figure out why it isnt running properly. Help would be appreciated.	0	Jul 8, 2023
Python: Code is ignoring the if and else	11	Aug 3, 2013
My newbie annoyances so far	25	Apr 27, 2007
Range / empty list issues??	1	Dec 11, 2023
Yet another "simple" headscratcher	4	May 31, 2014
When draggable=false & draggable=true dont work ?	1	Jan 14, 2023
Class attributes / methods..... full Python script	1	Mar 1, 2010
Question of Python second loop break and indexes	1	May 9, 2012

While we're talking about annoyances

Steven D'Aprano

GHUM

Michael Hoffman

=?ISO-8859-1?Q?BJ=F6rn_Lindqvist?=

Ben Finney

Jarek Zgoda

Arnaud Delobelle

Paul Rubin

Arnaud Delobelle

Raymond Hettinger

Alex Martelli

Michael Hoffman

Alex Martelli

Arnaud Delobelle

Michael Hoffman

Alex Martelli

Arnaud Delobelle

Tim Roberts

Steven D'Aprano

Michael Hoffman

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads