Clarity vs. code reuse/generality

K

kj

I'm will be teaching a programming class to novices, and I've run
into a clear conflict between two of the principles I'd like to
teach: code clarity vs. code reuse. I'd love your opinion about
it.

The context is the concept of a binary search. In one of their
homeworks, my students will have two occasions to use a binary
search. This seemed like a perfect opportunity to illustrate the
idea of abstracting commonalities of code into a re-usable function.
So I thought that I'd code a helper function, called _binary_search,
that took five parameters: a lower limit, an upper limit, a
one-parameter function, a target value, and a tolerance (epsilon).
It returns the value of the parameter for which the value of the
passed function is within the tolerance of the target value.

This seemed straightforward enough, until I realized that, to be
useful to my students in their homework, this _binary_search function
had to handle the case in which the passed function was monotonically
decreasing in the specified interval...

The implementation is still very simple, but maybe not very clear,
particularly to programming novices (docstring omitted):

def _binary_search(lo, hi, func, target, epsilon):
assert lo < hi
assert epsilon > 0
sense = cmp(func(hi), func(lo))
if sense == 0:
return None
target_plus = sense * target + epsilon
target_minus = sense * target - epsilon
while True:
param = (lo + hi) * 0.5
value = sense * func(param)
if value > target_plus:
hi = param
elif value < target_minus:
lo = param
else:
return param

if lo == hi:
return None

My question is: is the business with sense and cmp too "clever"?

Here's the rub: the code above is more general (hence more reusable)
by virtue of this trick with the sense parameter, but it is also
a bit harder to understand.

This not an unusual situation. I find that the processing of
abstracting out common logic often results in code that is harder
to read, at least for the uninitiated...

I'd love to know your opinions on this.

TIA!

kj
 
S

Steven D'Aprano

... I find that the processing of
abstracting out common logic often results in code that is harder to
read ...

Yes. There is often a conflict between increasing abstraction and ease of
understanding.



[...]
The implementation is still very simple, but maybe not very clear,
particularly to programming novices (docstring omitted):

def _binary_search(lo, hi, func, target, epsilon):
assert lo < hi
assert epsilon > 0

You're using assert for data sanitation. That is a very bad habit for you
to be teaching novices. Your code will fail to behave as expected
whenever the caller runs Python with the -O switch.


[...]
My question is: is the business with sense and cmp too "clever"?

For production code? Maybe -- you would need to profile the code, and
compare it to a less general, more simple-minded function, and see which
performs better. If there is an excessive performance penalty, that's a
good sign that perhaps you're being too clever, too general, or too
abstract -- or all three.

Too clever for novices? That depends on how inexperienced they are -- are
they new to Python or new to programming in general? Are they children or
adults? Do they have a PhD degree or are they still at primary school? It
depends on their experience and their intelligence -- dare I say it, some
people will *never* get programming, let alone code-reuse. It also
depends on how you lead up to it -- are you dropping them straight into
the abstract code, or giving them two pieces of nearly the same code and
showing them how to pull out the common functionality?
 
A

Alan G Isaac

The context is the concept of a binary search. In one of their
homeworks, my students will have two occasions to use a binary
search. This seemed like a perfect opportunity to illustrate the
idea of abstracting commonalities of code into a re-usable function.
So I thought that I'd code a helper function, called _binary_search,
that took five parameters: a lower limit, an upper limit, a
one-parameter function, a target value, and a tolerance (epsilon).
It returns the value of the parameter for which the value of the
passed function is within the tolerance of the target value.

This seemed straightforward enough, until I realized that, to be
useful to my students in their homework, this _binary_search function
had to handle the case in which the passed function was monotonically
decreasing in the specified interval...

The implementation is still very simple, but maybe not very clear,
particularly to programming novices (docstring omitted):

def _binary_search(lo, hi, func, target, epsilon):
assert lo < hi
assert epsilon > 0
sense = cmp(func(hi), func(lo))
if sense == 0:
return None
target_plus = sense * target + epsilon
target_minus = sense * target - epsilon
while True:
param = (lo + hi) * 0.5
value = sense * func(param)
if value > target_plus:
hi = param
elif value < target_minus:
lo = param
else:
return param

if lo == hi:
return None



1. Don't use assertions to test argument values!

2.
from scipy.optimize import bisect
def _binary_search(lo, hi, func, target, epsilon):
def f(x): return func(x) - target
return bisect(f, lo, high, xtol=epsilon)

3. If you don't want to use SciPy (why?), have them
implement http://en.wikipedia.org/wiki/Bisection_method#Pseudo-code
to produce their own `bisect` function.

hth,
Alan Isaac
 
A

Aahz

This seemed straightforward enough, until I realized that, to be
useful to my students in their homework, this _binary_search function
had to handle the case in which the passed function was monotonically
decreasing in the specified interval...

def _binary_search(lo, hi, func, target, epsilon):
assert lo < hi
assert epsilon > 0
sense = cmp(func(hi), func(lo))
if sense == 0:
return None
target_plus = sense * target + epsilon
target_minus = sense * target - epsilon
while True:
param = (lo + hi) * 0.5
value = sense * func(param)
if value > target_plus:
hi = param
elif value < target_minus:
lo = param
else:
return param

if lo == hi:
return None

My question is: is the business with sense and cmp too "clever"?

First of all, cmp() is gone in Python 3, unfortunately, so I'd avoid
using it. Second, assuming I understand your code correctly, I'd change
"sense" to "direction" or "order".
 
B

Bearophile

I'm will be teaching a programming class to novices, and I've run
into a clear conflict between two of the principles I'd like to
teach: code clarity vs. code reuse.

They are both important principles, but clarity is usually more
important because short code that can't be read can't be fixed and
modified, while long code that can be read is alive still.

So I thought that I'd code a helper function, called _binary_search,
that took five parameters: a lower limit, an upper limit, a
one-parameter function, a target value, and a tolerance (epsilon).

Five arguments are a bit too much, even for non-novices. 2-3 arguments
are more than enough for novices.

Where are doctests'? Novices have to learn to think of tests as part
of the normal code :)

I think the main problem in all this discussion is that generally to
learn to program you have to write code yourself, so your students
have to invent and write their own binary search. Reading your code
they are going to learn much less. Creating a binary search from
almost scratch can turn out to be too much hard for them, it means you
have to ask them to invent something simpler :)

Programming is (among other things) problem solving, so they have to
learn to solve not easy problems from the beginning. Showing them
famous algorithms (and very good code to copy from) can be useful, but
it's less important than learning basic problem solving skills, and
much less important than learning why solving problems has to became
their purpose and pleasure :)

Bye,
bearophile
 
B

Bruno Desthuilliers

kj a écrit :
I'm will be teaching a programming class to novices, and I've run
into a clear conflict between two of the principles I'd like to
teach: code clarity vs. code reuse. I'd love your opinion about
it.

(snip - others already commented on this code)
Here's the rub: the code above is more general (hence more reusable)
by virtue of this trick with the sense parameter, but it is also
a bit harder to understand.

Perhaps better naming (s/sense/direction/g ?) and a small comment could
help ?
This not an unusual situation. I find that the processing of
abstracting out common logic often results in code that is harder
to read, at least for the uninitiated...

IOW : the notion of "clarity" depends on the context (including the
reader). Anyway, there are algorithms (or implementations of...) that
are definitly and inherently on the 'hard to read' side - well,
complexity is something you have to learn to live with, period. The key
is to try and make it as simple and readable *as possible*.

Also, factoring out common code - or even slightly complex code - often
makes _client_ code (much) more readable.
 
J

Jean-Michel Pichavant

kj said:
I'm will be teaching a programming class to novices, and I've run
into a clear conflict between two of the principles I'd like to
teach: code clarity vs. code reuse. I'd love your opinion about
it.
[...]
sense = cmp(func(hi), func(lo))
if sense == 0:
return None
My suggestion on how to improve this part for python novices:
# assuming func is monotonous
if func(high) > func(low):
direction = 1 # aka sign of the func derivative
elif func(low) > func(high):
direction = -1
else:
return None

Avoid using cmp, this will prevent the "what the hell is cmp doing ?" ,
unless you want to teach your students how to search for inline python
documentation.
Some other list members have suggested to improve the variable naming. I
couldn't agree more, in your case, I think clarity can be achieved as
well with the abstraction (these notions does not necessarily collide).

Here's a link on my how-to-name bible :
http://tottinge.blogsome.com/meaningfulnames/

Jean-Michel
 
L

Lie Ryan

kj said:
I'm will be teaching a programming class to novices, and I've run
into a clear conflict between two of the principles I'd like to
teach: code clarity vs. code reuse. I'd love your opinion about
it.

Sometimes when the decision between clarity and generality becomes too
hard; you might fare better to save the code, go out for a walk to
forget the code, and start a blank text editor. Being in a fresh mind,
you may found an alternative approach, e.g.:

from __future__ import division
def binary_search(lo, hi, func, target, epsilon):
# reverses hi and lo if monotonically decreasing
lo, hi = (lo, hi) if func(hi) > func(lo) else (hi, lo)

param = (lo + hi) / 2

# loop while not precise enough
while abs(func(param) - target) > epsilon:
param = (lo + hi) / 2

if target < func(param):
hi = param
else:
lo = param
return param
 
P

Paul Rubin

kj said:
sense = cmp(func(hi), func(lo))
if sense == 0:
return None
target_plus = sense * target + epsilon
target_minus = sense * target - epsilon
...

The code looks confusing to me and in some sense incorrect. Suppose
func(hi)==func(lo)==target. In this case the solver gives up
and returns None even though it already has found a root.

Also, the stuff with the sense parameter, and target_minus and
target_plus looks messy. I do like to use cmp. I'd just write
something like (untested):

def _binary_search(lo, hi, func, target, epsilon):
y_hi, y_lo = func(hi), func(lo)

while True:
x_new = (lo + hi) * 0.5
y_new = func(x_new)
if abs(y_new - target) < epsilon:
return x_new
elif cmp(y_new, target) == cmp(y_hi, target):
hi = x_new
else:
lo = x_new
if lo == hi:
return None

This uses an extra couple function calls in that trivial case, of
course.
 
S

Steven D'Aprano

In <[email protected]> Alan G Isaac


Out of curiosity, where does this come from?

Assertions are disabled when you run Python with the -O (optimise) flag.
If you rely on assertions to verify data, then any time the user runs
python with -O your code will be running without error checking.

assert should be used to verify program logic, not to sanitize data.
 
C

Charles Yeomans

Assertions are disabled when you run Python with the -O (optimise)
flag.
If you rely on assertions to verify data, then any time the user runs
python with -O your code will be running without error checking.

assert should be used to verify program logic, not to sanitize data.


I wouldn't describe the use of an assert statement in the original
code as data sanitizing, but rather as a precondition. And with that
description, the use of an assert statement that might be compiled
away is not unreasonable; indeed, it certainly is so in the context of
design by contract.

Charles Yeomans
 
T

Terry Reedy

Steven said:
Assertions are disabled when you run Python with the -O (optimise) flag.
If you rely on assertions to verify data, then any time the user runs
python with -O your code will be running without error checking.

assert should be used to verify program logic, not to sanitize data.

In other words, assertions should never happen, whereas your assertions
could easily happen with 'bad' input. An example of program logic
assertion would be 'assert <loop-invariant>' at the top of the loop.
That would be useful for development, but would not be needed for
production use of tested code.

tjr
 
P

Pablo Torres N.

The context is the concept of a binary search.  In one of their
homeworks, my students will have two occasions to use a binary
search.  This seemed like a perfect opportunity to illustrate the
idea of abstracting commonalities of code into a re-usable function.
So I thought that I'd code a helper function, called _binary_search,
that took five parameters: a lower limit, an upper limit, a
one-parameter function, a target value, and a tolerance (epsilon).
It returns the value of the parameter for which the value of the
passed function is within the tolerance of the target value.

Five params for a novice function is overkill. I'd drop the epsilon
thing and, assuming you are searching in a list, tuple or something
similar, make the lower limit default to the first index and the upper
default to the last one.

In fact, I'd let them to realize that a function is convenient, and
base some of the grading in whether they wrote it or not. Just a
thought.
 
K

kj

The code looks confusing to me and in some sense incorrect. Suppose
func(hi)==func(lo)==target.

In hindsight, I too think that it is incorrect, but for a different
reason. I've rewritten it like this:

sense = cmp(func(hi), func(lo))
assert sense != 0, "func is not strictly monotonic in [lo, hi]"

I regard the very special case of func(hi)==func(lo)==target as
pathological (analogous to the fact that a stopped watch is "exactly
right" twice a day), and not one I care to support.

Thanks for your feedback!

kj
 
K

kj

In said:
http://docs.python.org/reference/simple_stmts.html#grammar-token-assert_stmt
"The current code generator emits no code for an assert statement when optimization is requested at compile time."

Sorry, this doesn't say anything like "don't use assertions to test
argument values". I'm aware of the fact that assertions are silenced
under certain circumstances. But where does the principle 1. in
your first reply come from? I've never seen it before, and would
like to understand the reasoning behind it.

kj
 
P

Pablo Torres N.

Sorry, this doesn't say anything like "don't use assertions to test
argument values".  I'm aware of the fact that assertions are silenced
under certain circumstances.  But where does the principle 1. in
your first reply come from?  I've never seen it before, and would
like to understand the reasoning behind it.

kj

But...if no code is generated for assertions on some occasions, then the
parameters would go unchecked, potentially breaking your code in said
occasions.
 
K

kj

In said:
timization is requested at compile time."
But...if no code is generated for assertions on some occasions, then the
parameters would go unchecked, potentially breaking your code in said
occasions.

This implies that code that uses *any* assert statement (other than
perhaps the trivial and meaningless ones like "assert True") is
liable to break, because whatever it is that these assert statements
are checking "on some occasions, ... would go unchecked, potentially
breaking your code."

I'm beginning to think that the original "precept" was simply "cargo
cult," i.e. one of those rules that are parroted without being
fully understood.

As I wrote in my original post, the function I posted was an internal
("helper") function, not to be used outside of the file. This fact
was further (ahem) underscored by the leading underscore in the
function's name. Under these circumstances, the assert statements
seem to me perfectly in keeping with the intended use for them.

kj
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,577
Members
45,052
Latest member
LucyCarper

Latest Threads

Top