Exception as the primary error handling mechanism?

P

Peng Yu

Peng said:
I observe that python library primarily use exception for error
handling rather than use error code.

In the article API Design Matters by Michi Henning

Communications of the ACM
Vol. 52 No. 5, Pages 46-56
10.1145/1506409.1506424
http://cacm.acm.org/magazines/2009/5/24646-api-design-matters/fulltext

It says "Another popular design flaw—namely, throwing exceptions for
expected outcomes—also causes inefficiencies because catching and
handling exceptions is almost always slower than testing a return
value."

My observation is contradicted to the above statement by Henning. If
my observation is wrong, please just ignore my question below.

Otherwise, could some python expert explain to me why exception is
widely used for error handling in python? Is it because the efficiency
is not the primary goal of python?

Correct; programmer efficiency is a more important goal for Python
instead.
Python is ~60-100x slower than C;[1] if someone is worried by the
inefficiency caused by exceptions, then they're using completely the
wrong language.

Could somebody let me know how the python calls and exceptions are
dispatched? Is there a reference for it?

The source?

http://python.org/ftp/python/2.6.4/Python-2.6.4.tgz

These are really deep internals that - if they really concern you - need
intensive studies, not casual reading of introductionary documents. IMHO you
shouldn't worry, but then, there's a lot things you seem to care I
wouldn't... :)

For my own interest, I want understand the run time behavior of python
and what details causes it much slower. Although people choose python
for its programming efficiency, but sometimes the runtime still
matters. This is an important aspect of the language. I'm wondering
this is not even documented. Why everybody has to go to the source
code to understand it?

Are you sure that there is no document that describes how python is
working internally (including exceptions)?
 
M

Martin v. Loewis

For my own interest, I want understand the run time behavior of python
and what details causes it much slower. Although people choose python
for its programming efficiency, but sometimes the runtime still
matters. This is an important aspect of the language. I'm wondering
this is not even documented. Why everybody has to go to the source
code to understand it?

There are two answers to this question:

a) Because the source is the most precise and most complete way of
documenting it. Any higher-level documentation would necessarily be
incomplete.
b) Because nobody has contributed documentation.

The two causes correlate: because writing documentation of VM
internals takes a lot of effort and is of questionable use, nobody
has written any.
Are you sure that there is no document that describes how python is
working internally (including exceptions)?

Such documents certainly exist, but not as part of the Python
distribution. See

http://wiki.python.org/moin/CPythonVmInternals

for one such document.

Regards,
Martin
 
T

Terry Reedy

For my own interest, I want understand the run time behavior of python

That depends on the implementation.
and what details causes it much slower.

A language feature that slows all implementation is the dynamic
name/slot binding and resolution. Any implementation can be made faster
by restricting the dynamism (which makes the imlementaion one of a
subset of Python).
> Although people choose python
for its programming efficiency, but sometimes the runtime still
matters.

There is no 'the' runtime. Whether or not there even *is* a runtime, as
usually understoold, is a matter of the implementation.

This is an important aspect of the language.

It is an aspect of each implementation, of which there are now more than
one.

Terry Jan Reedy
 
M

Michi

In the article API Design Matters by Michi Henning

Communications of the ACM
Vol. 52 No. 5, Pages 46-56
10.1145/1506409.1506424http://cacm.acm.org/magazines/2009/5/24646-api-design-matters/fulltext

It says "Another popular design flaw—namely, throwing exceptions for
expected outcomes—also causes inefficiencies because catching and
handling exceptions is almost always slower than testing a return
value."

My observation is contradicted to the above statement by Henning. If
my observation is wrong, please just ignore my question below.

Seeing that quite a few people have put their own interpretation on
what I wrote, I figured I'll post a clarification.

The quoted sentence appears in a section of the article that deals
with efficiency. I point out in that section that bad APIs often have
a price not just in terms of usability and defect rate, but that they
are often inefficient as well. (For example, wrapper APIs often
require additional memory allocations and/or data copies.) Incorrect
use of exceptions also incurs an efficiency penalty.

In many language implementations, exception handling is expensive;
significantly more expensive than testing a return value. Consider the
following:

int x;
try {
x = func();
} catch (SomeException) {
doSomething();
return;
}
doSomethingElse();

Here is the alternative without exceptions. (func() returns
SpecialValue instead of throwing.)

int x;
x = func();
if (x == SpecialValue) {
doSomething();
return;
}
doSomethingElse();

In many language implementations, the second version is considerably
faster, especially when the exception may be thrown from deep in the
bowels of func(), possibly many frames down the call tree.

If func() throws an exception for something that routinely occurs in
the normal use of the API, the extra cost can be noticeable. Note that
I am not advocating not to use exceptions. I *am* advocating to not
throw exceptions for conditions that are not exceptional.

The classic example of this are lookup functions that, for example,
retrieve the value of an environment variable, do a table lookup, or
similar. Many such APIs throw an exception when the lookup fails
because the key isn't the table. However, very often, looking for
something that isn't there is a common case, such as when looking for
a value and, if the value isn't present already, adding it. Here is an
example of this:

KeyType k = ...;
ValueType v;

try {
v = collection.lookup(k);
} catch (NotFoundException) {
collection.add(k, defaultValue);
v = defaultValue;
}
doSomethingWithValue(v);

The same code if collection doesn't throw when I look up something
that isn't there:

KeyType k = ...;
ValueType v;

v = collection.lookup(k);
if (v == null) {
collection.add(k, defaultValue);
v = defaultValue;
}
doSomethingWithValue(v);

The problem is that, if I do something like this in a loop, and the
loop is performance-critical, the exception version can cause a
significant penalty.

As the API designer, when I make the choice between returning a
special value to indicate some condition, or throwing an exception, I
should consider the following questions:

* Is the special condition such that, under most conceivable
circumstances, the caller will treat the condition as an unexpected
error?

* Is it appropriate to force the caller to deal with the condition in
a catch-handler?

* If the caller fails to explicitly deal with the condition, is it
appropriate to terminate the program?

Only if the answer to these questions is "yes" is it appropriate to
throw an exception. Note the third question, which is often forgotten.
By throwing an exception, I not only force the caller to handle the
exception with a catch-handler (as opposed to leaving the choice to
the caller), I also force the caller to *always* handle the exception:
if the caller wants to ignore the condition, he/she still has to write
a catch-handler and failure to do so terminates the program.

Apart from the potential performance penalty, throwing exceptions for
expected outcomes is bad also because it forces a try-catch block on
the caller. One example of this is the .NET socket API: if I do non-
blocking I/O on a socket, I get an exception if no data is ready for
reading (which is the common and expected case), and I get a zero
return value if the connection was lost (which is the uncommon and
unexpected case).

In other words, the .NET API gets this completely the wrong way round.
Code that needs to do non-blocking reads from a socket turns into a
proper mess as a result because the outcome of a read() call is tri-
state:

* Data was available and returned: no exception

* No data available: exception

* Connection lost: no exception

Because such code normally lives in a loop that decrements a byte
count until the expected number of bytes have been read, the control
flow because really awkward because the successful case must be dealt
with in both the try block and the catch handler, and the error
condition must be dealt with in the try block as well.

If the API did what it should, namely, throw an exception when the
connection is lost, and not throw when I do a read (whether data was
ready or not), the code would be far simpler and far more
maintainable.

At no point did I ever advocate not to use exception handling.
Exceptions are the correct mechanism to handle errors. However, what
is considered an error is very much in the eye of the beholder. As the
API creator, if I indicate errors with exceptions, I make a policy
decision about what is an error and what is not. It behooves me to be
conservative in that policy: I should throw exceptions only for
conditions that are unlikely to arise during routine and normal use of
the API.

Cheers,

Michi.
 
M

MRAB

Michi said:
Seeing that quite a few people have put their own interpretation on
what I wrote, I figured I'll post a clarification.

The quoted sentence appears in a section of the article that deals
with efficiency. I point out in that section that bad APIs often have
a price not just in terms of usability and defect rate, but that they
are often inefficient as well. (For example, wrapper APIs often
require additional memory allocations and/or data copies.) Incorrect
use of exceptions also incurs an efficiency penalty.

In many language implementations, exception handling is expensive;
significantly more expensive than testing a return value. Consider the
following:

int x;
try {
x = func();
} catch (SomeException) {
doSomething();
return;
}
doSomethingElse();

Here is the alternative without exceptions. (func() returns
SpecialValue instead of throwing.)

int x;
x = func();
if (x == SpecialValue) {
doSomething();
return;
}
doSomethingElse();

In many language implementations, the second version is considerably
faster, especially when the exception may be thrown from deep in the
bowels of func(), possibly many frames down the call tree.

If func() throws an exception for something that routinely occurs in
the normal use of the API, the extra cost can be noticeable. Note that
I am not advocating not to use exceptions. I *am* advocating to not
throw exceptions for conditions that are not exceptional.

The classic example of this are lookup functions that, for example,
retrieve the value of an environment variable, do a table lookup, or
similar. Many such APIs throw an exception when the lookup fails
because the key isn't the table. However, very often, looking for
something that isn't there is a common case, such as when looking for
a value and, if the value isn't present already, adding it. Here is an
example of this:

KeyType k = ...;
ValueType v;

try {
v = collection.lookup(k);
} catch (NotFoundException) {
collection.add(k, defaultValue);
v = defaultValue;
}
doSomethingWithValue(v);

The same code if collection doesn't throw when I look up something
that isn't there:

KeyType k = ...;
ValueType v;

v = collection.lookup(k);
if (v == null) {
collection.add(k, defaultValue);
v = defaultValue;
}
doSomethingWithValue(v);

The problem is that, if I do something like this in a loop, and the
loop is performance-critical, the exception version can cause a
significant penalty.
In Python, of course, there's a method for this: setdefault.
As the API designer, when I make the choice between returning a
special value to indicate some condition, or throwing an exception, I
should consider the following questions:

* Is the special condition such that, under most conceivable
circumstances, the caller will treat the condition as an unexpected
error?

* Is it appropriate to force the caller to deal with the condition in
a catch-handler?

* If the caller fails to explicitly deal with the condition, is it
appropriate to terminate the program?

Only if the answer to these questions is "yes" is it appropriate to
throw an exception. Note the third question, which is often forgotten.
By throwing an exception, I not only force the caller to handle the
exception with a catch-handler (as opposed to leaving the choice to
the caller), I also force the caller to *always* handle the exception:
if the caller wants to ignore the condition, he/she still has to write
a catch-handler and failure to do so terminates the program.

Apart from the potential performance penalty, throwing exceptions for
expected outcomes is bad also because it forces a try-catch block on
the caller. One example of this is the .NET socket API: if I do non-
blocking I/O on a socket, I get an exception if no data is ready for
reading (which is the common and expected case), and I get a zero
return value if the connection was lost (which is the uncommon and
unexpected case).

In other words, the .NET API gets this completely the wrong way round.
Code that needs to do non-blocking reads from a socket turns into a
proper mess as a result because the outcome of a read() call is tri-
state:

* Data was available and returned: no exception

* No data available: exception

* Connection lost: no exception

Because such code normally lives in a loop that decrements a byte
count until the expected number of bytes have been read, the control
flow because really awkward because the successful case must be dealt
with in both the try block and the catch handler, and the error
condition must be dealt with in the try block as well.

If the API did what it should, namely, throw an exception when the
connection is lost, and not throw when I do a read (whether data was
ready or not), the code would be far simpler and far more
maintainable.

At no point did I ever advocate not to use exception handling.
Exceptions are the correct mechanism to handle errors. However, what
is considered an error is very much in the eye of the beholder. As the
API creator, if I indicate errors with exceptions, I make a policy
decision about what is an error and what is not. It behooves me to be
conservative in that policy: I should throw exceptions only for
conditions that are unlikely to arise during routine and normal use of
the API.
In another area, string slicing in C# uses the Substring method, where
you provide the start position and number of characters. If the start
index is out of bounds (it must be >= 0 and < length) or the string is
too short, then it throws an exception. In practice I find Python's
behaviour easier to use (and the code is shorter too!).

C# also misses Python's trick (in Python 2.6 and above) of giving string
instances a format method, instead making it a class method, so you need
to write:

string.format(format_string, ...)

instead of Python's:

format_string.format(...)

On the other hand, C#'s equivalent of raw strings treat backslashes
always as a normal character. I think it's the only feature of C#'s
string handling that I prefer to Python's.
 
S

Steven D'Aprano

The quoted sentence appears in a section of the article that deals with
efficiency. I point out in that section that bad APIs often have a price
not just in terms of usability and defect rate, but that they are often
inefficient as well.

This is very true, but good APIs often trade-off increased usability and
reduced defect rate against machine efficiency too. In fact, I would
argue that this is a general design principle of programming languages:
since correctness and programmer productivity are almost always more
important than machine efficiency, the long-term trend across virtually
all languages is to increase correctness and productivity even if doing
so costs some extra CPU cycles.


(For example, wrapper APIs often require additional
memory allocations and/or data copies.) Incorrect use of exceptions also
incurs an efficiency penalty.

And? *Correct* use of exceptions also incur a penalty. So does the use of
functions. Does this imply that putting code in functions is a poor API?
Certainly not.

In many language implementations, exception handling is expensive;
significantly more expensive than testing a return value.

And in some it is less expensive.

But no matter how much more expensive, there will always be a cut-off
point where it is cheaper on average to suffer the cost of handling an
exception than it is to make unnecessary tests.

In Python, for dictionary key access, that cut-off is approximately at
one failure per ten or twenty attempts. So unless you expect more than
one in ten attempts to lead to a failure, testing first is actually a
pessimation, not an optimization.



Consider the following:

int x;
try {
x = func();
} catch (SomeException) {
doSomething();
return;
}
doSomethingElse();

Here is the alternative without exceptions. (func() returns SpecialValue
instead of throwing.)

int x;
x = func();
if (x == SpecialValue) {
doSomething();
return;
}
doSomethingElse();


In some, limited, cases you might be able to use the magic return value
strategy, but this invariably leads to lost programmer productivity, more
complex code, lowered readability and usability, and more defects,
because programmers will invariably neglect to test for the special value:

int x;
x = func();
doSomething(x);
return;

Or worse, they will write doSomething() so that it too needs to know
about SpecialValue, and so do all the functions it calls. Instead of
dealing with the failure in one place, you can end up having to deal with
it in a dozen places.


But even worse is common case that SpecialValue is a legal value when
passed to doSomething, and you end up with the error propagating deep
into the application before being found. Or even worse, it is never found
at all, and the application simply does the wrong thing.



In many language implementations, the second version is considerably
faster, especially when the exception may be thrown from deep in the
bowels of func(), possibly many frames down the call tree.

This is a classic example of premature optimization. Unless such
inefficiency can be demonstrated to actually matter, then you do nobody
any favours by preferring the API that leads to more defects on the basis
of *assumed* efficiency.

If your test for a special value is 100 times faster than handling the
exception, and exceptions occur only one time in 1000, then using a
strategy of testing for a special value is actually ten times slower on
average than catching an exception.


If func() throws an exception for something that routinely occurs in the
normal use of the API, the extra cost can be noticeable.

"Can be". But it also might not be noticeable at all.


[...]
Here is an example of this:

KeyType k = ...;
ValueType v;

try {
v = collection.lookup(k);
} catch (NotFoundException) {
collection.add(k, defaultValue);
v = defaultValue;
}
doSomethingWithValue(v);

The same code if collection doesn't throw when I look up something that
isn't there:

KeyType k = ...;
ValueType v;

v = collection.lookup(k);
if (v == null) {
collection.add(k, defaultValue);
v = defaultValue;
}
doSomethingWithValue(v);

The problem is that, if I do something like this in a loop, and the loop
is performance-critical, the exception version can cause a significant
penalty.


No, the real problems are:

(1) The caller has to remember to check the return result for the magic
value. Failure to do so leads to bugs, in some cases, serious and hard-to-
find bugs.

(2) If missing keys are rare enough, the cost of all those unnecessary
tests will out-weigh the saving of avoiding catching the exception. "Rare
enough" may still be very common: in the case of Python, the cross-over
point is approximately 1 time in 15.

(3) Your collection now cannot use the magic value as a legitimate value.

This last one can be *very* problematic. In the early 1990s, I was
programming using a callback API that could only return an integer. The
standard way of indicating an error was to return -1. But what happens if
-1 is a legitimate return value, e.g. for a maths function? The solution
used was to have the function create a global variable holding a flag:

result = function(args)
if result == -1:
if globalErrorState == -1:
print "An error occurred"
exit
doSomething(result)


That is simply horrible.


As the API designer, when I make the choice between returning a special
value to indicate some condition, or throwing an exception, I should
consider the following questions:

* Is the special condition such that, under most conceivable
circumstances, the caller will treat the condition as an unexpected
error?

Wrong.

It doesn't matter whether it is an error or not. They are called
EXCEPTIONS, not ERRORS. What matters is that it is an exceptional case.
Whether that exceptional case is an error condition or not is dependent
on the application.


* Is it appropriate to force the caller to deal with the condition in
a catch-handler?

* If the caller fails to explicitly deal with the condition, is it
appropriate to terminate the program?

Only if the answer to these questions is "yes" is it appropriate to
throw an exception. Note the third question, which is often forgotten.
By throwing an exception, I not only force the caller to handle the
exception with a catch-handler (as opposed to leaving the choice to the
caller), I also force the caller to *always* handle the exception: if
the caller wants to ignore the condition, he/she still has to write a
catch-handler and failure to do so terminates the program.

That's a feature of exceptions, not a problem.


Apart from the potential performance penalty, throwing exceptions for
expected outcomes is bad also because it forces a try-catch block on the
caller.

But it's okay to force a `if (result==MagicValue)` test instead?

Look, the caller has to deal with exceptional cases (which may include
error conditions) one way or the other. If you don't deal with them at
all, your code will core dump, or behave incorrectly, or something. If
the caller fails to deal with the exceptional case, it is better to cause
an exception that terminates the application immediately than it is to
allow the application to generate incorrect results.


One example of this is the .NET socket API: if I do non-
blocking I/O on a socket, I get an exception if no data is ready for
reading (which is the common and expected case), and I get a zero return
value if the connection was lost (which is the uncommon and unexpected
case).

In other words, the .NET API gets this completely the wrong way round.

Well we can agree on that!

If the API did what it should, namely, throw an exception when the
connection is lost, and not throw when I do a read (whether data was
ready or not), the code would be far simpler and far more maintainable.

At no point did I ever advocate not to use exception handling.
Exceptions are the correct mechanism to handle errors. However, what is
considered an error is very much in the eye of the beholder. As the API
creator, if I indicate errors with exceptions, I make a policy decision
about what is an error and what is not. It behooves me to be
conservative in that policy: I should throw exceptions only for
conditions that are unlikely to arise during routine and normal use of
the API.

But lost connections *are* routine and normal. Hopefully they are rare.
 
R

Roy Smith

Steven D'Aprano said:
This last one can be *very* problematic. In the early 1990s, I was
programming using a callback API that could only return an integer. The
standard way of indicating an error was to return -1. But what happens if
-1 is a legitimate return value, e.g. for a maths function?

One of the truly nice features of Python is the universally distinguished
value, None.
 
S

Steven D'Aprano

One of the truly nice features of Python is the universally
distinguished value, None.


What happens if you need to return None as a legitimate value?


Here's a good example: iterating over a list. Python generates an
exception when you hit the end of the list. If instead, Python returned
None when the index is out of bounds, you couldn't store None in a list
without breaking code.

So we produce a special sentinel object EndOfSequence. Now we can't do
this:

for obj in ["", 12, None, EndOfSequence, [], {}]:
print dir(obj) # or some other useful operation

The fundamental flaw of using magic values is that, there will always be
some application where you want to use the magic value as a non-special
value, and then you're screwed.

This is why, for instance, it's difficult for C strings to contain a null
byte, and there are problems with text files on DOS and CP/M (and Windows
under some circumstances) that contain a ^Z byte.
 
M

Michi

This is very true, but good APIs often trade-off increased usability and
reduced defect rate against machine efficiency too. In fact, I would
argue that this is a general design principle of programming languages:
since correctness and programmer productivity are almost always more
important than machine efficiency, the long-term trend across virtually
all languages is to increase correctness and productivity even if doing
so costs some extra CPU cycles.

Yes, I agree with that in general. Correctness and productivity are
more important, as a rule, and should be given priority.
And? *Correct* use of exceptions also incur a penalty. So does the use of
functions. Does this imply that putting code in functions is a poor API?
Certainly not.

It does imply that incorrect use of exceptions incurs an unnecessary
performance penalty, no more, no less, just as incorrect use of
wrappers incurs an unnecessary performance penalty.
But no matter how much more expensive, there will always be a cut-off
point where it is cheaper on average to suffer the cost of handling an
exception than it is to make unnecessary tests.

In Python, for dictionary key access, that cut-off is approximately at
one failure per ten or twenty attempts. So unless you expect more than
one in ten attempts to lead to a failure, testing first is actually a
pessimation, not an optimization.

What this really comes down to is how frequently or infrequently a
particular condition arises before that condition should be considered
an exceptional condition rather than a normal one. It also relates to
how the set of conditions partitions into "normal" conditions and
"abnormal" conditions. The difficulty for the API designer is to make
these choices correctly.
In some, limited, cases you might be able to use the magic return value
strategy, but this invariably leads to lost programmer productivity, more
complex code, lowered readability and usability, and more defects,
because programmers will invariably neglect to test for the special value:

I disagree here, to the extent that, whether something is an error or
not can very much depend on the circumstances in which the API is
used. The collection case is a very typical example. Whether failing
to locate a value in a collection is an error very much depends on
what the collection is used for. In some cases, it's a hard error
(because it might, for example, imply that internal program state has
been corrupted); in other cases, not finding a value is perfectly
normal.

For the API designer, the problem is that an API that throws an
exception when it should not sucks just as much as an API that doesn't
throw an exception when it should. For general-purpose APIs, such as a
collection API, as the designer, I usually cannot know. As I said
elsewhere in the article, general-purpose APIs should be policy-free,
and special-purpose APIs should be policy-rich. As the designer, the
more I know about the circumstances in which the API will be used, the
more fascist I can be in the design and bolt down the API more in
terms of static and run-time safety.

Wanting to ignore a return value from a function is perfectly normal
and legitimate in many cases. However, if a function throws instead of
returning a value, ignoring that value becomes more difficult for the
caller and can extract a performance penalty that may be unacceptable
to the caller. The problem really is that, at the time the API is
designed, there often is no way to tell whether this will actually be
the case; in turn, no matter whether I choose to throw an exception or
return an error code, it will be wrong for some people some of the
time.
This is a classic example of premature optimization. Unless such
inefficiency can be demonstrated to actually matter, then you do nobody
any favours by preferring the API that leads to more defects on the basis
of *assumed* efficiency.

I agree with the concern about premature optimisation. However, I
don't agree with a blanket statement that special return values always
and unconditionally lead to more defects. Returning to the .NET non-
blocking I/O example, the fact that the API throws an exception when
it shouldn't very much complicates the code and introduces a lot of
extra control logic that is much more likely to be wrong than a simple
if-then-else statement. As I said, throwing an exception when none
should be thrown can be just as harmful as the opposite case.
It doesn't matter whether it is an error or not. They are called
EXCEPTIONS, not ERRORS. What matters is that it is an exceptional case.
Whether that exceptional case is an error condition or not is dependent
on the application.

Exactly. To me, that implies that making something an exception that,
to the caller, shouldn't be is just as inconvenient as the other way
around.
That's a feature of exceptions, not a problem.

Yes, and didn't say that it is a problem. However, making the wrong
choice for the use of the feature is a problem, just as making the
wrong choice for not using the feature is.
But it's okay to force a `if (result==MagicValue)` test instead?

Yes, in some cases it is. For example:

int numBytes;
int fd = open(...);
while ((numBytes = read(fd, …)) > 0)
{
// process data...
}

Would you prefer to see EOF indicated by an exception rather than a
zero return value? I wouldn't.
Look, the caller has to deal with exceptional cases (which may include
error conditions) one way or the other. If you don't deal with them at
all, your code will core dump, or behave incorrectly, or something. If
the caller fails to deal with the exceptional case, it is better to cause
an exception that terminates the application immediately than it is to
allow the application to generate incorrect results.

I agree that failing to deal with exceptional cases causes problems. I
also agree that exceptions, in general, are better than error codes
because they are less likely to go unnoticed. But, as I said, it
really depends on the caller whether something should be an exception
or not.

The core problem isn't whether exceptions are good or bad in a
particular case, but that most APIs make this an either-or choice. For
example, if I had an API that allowed me to choose at run time whether
an exception will be thrown for a particular condition, I could adapt
that API to my needs, instead of being stuck with whatever the
designer came up with.

There are many ways this could be done. For example, I could have a
find() operation on a collection that throws if a value isn't found,
and I could have findNoThrow() if I want a sentinel value returned.
Or, the API could offer a callback hook that decides at run time
whether to throw or not. (There are many other possible ways to do
this, such as setting the behaviour at construction time, or by having
different collection types with different behaviours.)

The point is that a more flexible API is likely to be more useful than
one that sets a single exception policy for everyone.
But lost connections *are* routine and normal. Hopefully they are rare.

In the context of my example, they are not. The range of behaviours
naturally falls into these categories:

* No data ready
* Data ready
* EOF
* Socket error

The first three cases are the "normal" ones; they operate on the same
program state and they are completely expected: while reading a
message off the wire, the program will almost certainly encounter the
first two conditions and, if there is no error, it will always
encounter the EOF condition. The fourth case is the unexpected one, in
the sense that this case will often not arise at all. That's not to
say that lost connections aren't routine; they are. But, when a
connection is lost, the program has to do different things and operate
on different state than when the connection stays up. This strongly
suggests that the first three conditions should be dealt with by
return values and/or out parameters, and the fourth condition should
be dealt with as an exception.

Cheers,

Michi.
 
M

MRAB

Michi said:
On Jan 4, 1:30 pm, Steven D'Aprano


Yes, and didn't say that it is a problem. However, making the wrong
choice for the use of the feature is a problem, just as making the
wrong choice for not using the feature is.


Yes, in some cases it is. For example:

int numBytes;
int fd = open(...);
while ((numBytes = read(fd, …)) > 0)
{
// process data...
}

Would you prefer to see EOF indicated by an exception rather than a
zero return value? I wouldn't.
I wouldn't consider zero to be a magic value in this case. Returning a
negative number if an error occurred would be magic. A better comparison
might be str.find vs str.index, the former returning a magic value -1.
Which is used more often?
I agree that failing to deal with exceptional cases causes problems. I
also agree that exceptions, in general, are better than error codes
because they are less likely to go unnoticed. But, as I said, it
really depends on the caller whether something should be an exception
or not.

The core problem isn't whether exceptions are good or bad in a
particular case, but that most APIs make this an either-or choice. For
example, if I had an API that allowed me to choose at run time whether
an exception will be thrown for a particular condition, I could adapt
that API to my needs, instead of being stuck with whatever the
designer came up with.

There are many ways this could be done. For example, I could have a
find() operation on a collection that throws if a value isn't found,
and I could have findNoThrow() if I want a sentinel value returned.
Or, the API could offer a callback hook that decides at run time
whether to throw or not. (There are many other possible ways to do
this, such as setting the behaviour at construction time, or by having
different collection types with different behaviours.)
Or find() could have an extra keyword argument, eg.
string.find(substring, default=-1), although that should probably be
string.index(substring, default=-1) as a replacement for
string.find(substring).
 
S

Steven D'Aprano

Yes, I agree with that in general. Correctness and productivity are more
important, as a rule, and should be given priority.

I'm glad we agree on that, but I wonder why you previously emphasised
machine efficiency so much, and correctness almost not at all, in your
previous post?

It does imply that incorrect use of exceptions incurs an unnecessary
performance penalty, no more, no less, just as incorrect use of wrappers
incurs an unnecessary performance penalty.

If all you're argument is that we shouldn't write crappy APIs, then I
agree with you completely. The .NET example you gave previously is a good
example of an API that is simply poor: using exceptions isn't a panacea
that magically makes code better. So I can't disagree that using
exceptions badly incurs an unnecessary performance penalty, but it also
incurs an unnecessary penalty against correctness and programmer
productivity.

What this really comes down to is how frequently or infrequently a
particular condition arises before that condition should be considered
an exceptional condition rather than a normal one. It also relates to
how the set of conditions partitions into "normal" conditions and
"abnormal" conditions. The difficulty for the API designer is to make
these choices correctly.

The first case is impossible for the API designer to predict, although
she may be able to make some educated estimates based on experience. For
instance I know that when I search a string for a substring, "on average"
I expect to find the substring present more often than not. I've put "on
average" in scare-quotes because it's not a statistical average at all,
but a human expectation -- a prejudice in fact. I *expect* to have
searching succeed more often than fail, not because I actually know how
many searches succeed and fail, but because I think of searching for an
item to "naturally" find the item. But if I actually profiled my code in
use on real data, who knows what ratio of success/failure I would find?

In the second case, the decision of what counts as "ordinary" and what
counts as "exceptional" should, in general, be rather obvious. (That's
not to discount the possibility of unobvious cases, but that's probably a
case that the function is too complex and tries to do too much.) Take the
simplest description of what the function is supposed to do: (e.g. "find
the offset of a substring in a source string"). That's the ordinary case,
and should be returned. Is there anything else that the function may do?
(E.g. fail to find the substring because it isn't there.) Then that's an
exceptional case.

(There may be other exceptional cases, which is another reason to prefer
exceptions to magic return values. In general, it's much easier to deal
with multiple exception types than it is to test for multiple magic
return values. Consider a function that returns a pointer. You can return
null to indicate an error. What if you want to distinguish between two
different error states? What about ten error states?)

I argue that as designers, we should default to raising an exception and
only choose otherwise if there is a good reason not to. As we agreed
earlier, exceptions (in general) are better for correctness and
productivity, which in turn are (in general) more important than machine
efficiency. The implication of this is that in general, we should prefer
exceptions, and only avoid them when necessary. Your argument seems to be
that we should avoid exceptions by default, and only use them if
unavoidable. I think that is backwards.

I disagree here, to the extent that, whether something is an error or
not can very much depend on the circumstances in which the API is used.

That's certainly true: a missing key (for example) may be an error, or a
present key may be an error, or neither may be an error, just different
branches of an algorithm. That's an application-specific decision. But I
don't see how that relates to my claim that magic return values are less
robust and usable than exceptions. Whether it is an error or not, it
still needs to be handled. If the caller neglects to handle the special
case, an exception-based strategy will almost certainly lead to the
application halting (hopefully leading to a harmless bug report rather
than the crash of a billion-dollar space probe), but a magic return value
will very often lead to the application silently generating invalid
results.

[...]
Wanting to ignore a return value from a function is perfectly normal and
legitimate in many cases.

I wouldn't say that's normal. If you don't care about the function's
result, why are you calling it? For the side-effects? In languages that
support procedures, such mutator functions should be written as
procedures that don't return anything. For languages that don't, like
Python, they should be written as de-facto procedures, always return
None, and allow the user to pretend that nothing was returned.

That is to say, ignoring the return value is acceptable as a work-around
for the lack of true procedures. But even there, procedures necessarily
operate by side-effect, and side-effects should be avoided as much as
possible. So I would say, ideally, wanting to ignore the return value
should be exceptionally rare.

However, if a function throws instead of
returning a value, ignoring that value becomes more difficult for the
caller and can extract a performance penalty that may be unacceptable to
the caller.

There's that premature micro-optimization again.

The problem really is that, at the time the API is designed,
there often is no way to tell whether this will actually be the case; in
turn, no matter whether I choose to throw an exception or return an
error code, it will be wrong for some people some of the time.

I've been wondering when you would reach the conclusion that an API
should offer both forms. For example, Python offers both key-lookup that
raises exceptions (dict[key]) and key-lookup that doesn't (dict.get(key)).

The danger of this is that it complicates the API, leads to a more
complex implementation, and may result in duplicated code (if the two
functions have independent implementations). But if you don't duplicate
the code, then the assumed performance benefit of magic return values
over exceptions might very well be completely negated:

def get(self, key):
# This is not the real Python dict.get implementation!
# This is merely an illustration of how it *could* be.
try:
return self[key]
except KeyError:
return None


This just emphasises the importance of not optimising code by assumption.
If you haven't *measured* the speed of a function you don't know whether
it will be faster or slower than catching an exception.

You will note that the above has nothing to do with the API, but is
entirely an implementation decision. This to me demonstrates that the
question of machine efficiency is irrelevant to API design.

I agree with the concern about premature optimisation. However, I don't
agree with a blanket statement that special return values always and
unconditionally lead to more defects.

I can't say that they *always* lead to more defects, since that also
depends on the competence of the caller, but I will say that as a general
principle, they should be *expected* to lead to more defects.

Returning to the .NET non-
blocking I/O example, the fact that the API throws an exception when it
shouldn't very much complicates the code and introduces a lot of extra
control logic that is much more likely to be wrong than a simple
if-then-else statement. As I said, throwing an exception when none
should be thrown can be just as harmful as the opposite case.

In this case, it's worse than that -- they use a special return value
when there should be an exception, and an exception when there should be
an ordinary, non-special value (an empty string, if I recall correctly).

Exactly. To me, that implies that making something an exception that, to
the caller, shouldn't be is just as inconvenient as the other way
around.

Well, obviously I agree that you should only make things be an exception
if they actually should be an exception. I don't quite see where the
implication is -- I find myself in the curious position of agreeing with
your conclusion while questioning your reasoning, as if you had said
something like:

All cats have four legs, therefore cats are mammals.

Yes, in some cases it is. For example:

int numBytes;
int fd = open(...);
while ((numBytes = read(fd, …)) > 0) {
// process data...
}

Would you prefer to see EOF indicated by an exception rather than a zero
return value? I wouldn't.

Why not? Assuming this is a blocking read, once you hit EOF you will
never recover from it. Is this about the micro-optimisation again? Disc
IO is almost certainly a thousand times slower than any exception you
could catch here.

In Python, we *do* use exceptions for file reads. An explicit read
returns an empty string, and we might write:


f = open(filename)
while 1:
block = f.read(buffersize)
if not block:
f.close()
break
process(block)


This would arguably be easier to write and read, and demonstrates the
intent of the while loop better:

f = open(filename)
try:
while 1:
process(f.read(buffersize))
except EOFError:
f.close()

(But the above doesn't work, because an explicit read doesn't raise an
exception.)

However, there's another idiom for reading a file which does use an
exception: line-by-line reading.

f = open(filename)
for line in f:
process(line)
f.close()

Because iterating over the file generates a StopIteration when EOF is
reached, the for loop automatically breaks. If you wanted to handle that
by hand, something like this should work (but is unnecessary, because
Python already does it for you):


f = open(filename)
try:
while 1:
process(f.next())
except StopIteration:
f.close()


[...]
The core problem isn't whether exceptions are good or bad in a
particular case, but that most APIs make this an either-or choice. For
example, if I had an API that allowed me to choose at run time whether
an exception will be thrown for a particular condition, I could adapt
that API to my needs, instead of being stuck with whatever the designer
came up with.

There are many ways this could be done. For example, I could have a
find() operation on a collection that throws if a value isn't found, and
I could have findNoThrow() if I want a sentinel value returned. Or, the
API could offer a callback hook that decides at run time whether to
throw or not. (There are many other possible ways to do this, such as
setting the behaviour at construction time, or by having different
collection types with different behaviours.)

The point is that a more flexible API is likely to be more useful than
one that sets a single exception policy for everyone.


This has costs of its own. The costs of developer education -- learning
about, memorising, and deciding between such multiple APIs does not come
for free. The costs of developing and maintaining the multiple functions.
The risks of duplicated code in the implementation. The cost of writing
documentation. A bloated API is not free of costs.


In the context of my example, they are not. The range of behaviours
naturally falls into these categories:

* No data ready
* Data ready
* EOF
* Socket error

Right -- that fourth example is one of the NATURAL categories that any
half-way decent developer needs to be aware of. When you say something
isn't natural, and then immediately contradict yourself, that's a sign
you need to think about what you really mean :)

The first three cases are the "normal" ones; they operate on the same
program state and they are completely expected: while reading a message
off the wire, the program will almost certainly encounter the first two
conditions and, if there is no error, it will always encounter the EOF
condition.

I would call these the ordinary cases, as opposed to the exceptional
cases.

The fourth case is the unexpected one, in the sense that this
case will often not arise at all.

But it is still expected -- you have to expect that you might get a
socket error, and code accordingly.

That's not to say that lost connections aren't routine; they are.

Right -- we actually agree on this, we just disagree on the terminology.
I believe that talking about "normal" and "errors" is misleading. Better
is to talk about "ordinary" and "exceptional".

But, when a connection is lost,
the program has to do different things and operate on different state
than when the connection stays up. This strongly suggests that the first
three conditions should be dealt with by return values and/or out
parameters, and the fourth condition should be dealt with as an
exception.

Agreed.
 
R

r0g

Michi said:
On Jan 4, 1:30 pm, Steven D'Aprano


I disagree here, to the extent that, whether something is an error or
not can very much depend on the circumstances in which the API is
used. The collection case is a very typical example. Whether failing
to locate a value in a collection is an error very much depends on
what the collection is used for. In some cases, it's a hard error
(because it might, for example, imply that internal program state has
been corrupted); in other cases, not finding a value is perfectly
normal.



A pattern I have used a few times is that of returning an explicit
success/failure code alongside whatever the function normally returns.
While subsequent programmers might not intuit the need to test for
(implicit) "magic" return values they ought to notice if they start
getting tuples back where they expected scalars...

def foo(x)
if x>0:
return True, x*x
else:
return False, "Bad value of x in foo:",str(x)

ok, value = foo(-1)
if ok:
print "foo of x is", value
else:
print "ERROR:", value


Roger.
 
L

Lie Ryan

A pattern I have used a few times is that of returning an explicit
success/failure code alongside whatever the function normally returns.
While subsequent programmers might not intuit the need to test for
(implicit) "magic" return values they ought to notice if they start
getting tuples back where they expected scalars...

def foo(x)
if x>0:
return True, x*x
else:
return False, "Bad value of x in foo:",str(x)

ok, value = foo(-1)
if ok:
print "foo of x is", value
else:
print "ERROR:", value

Except that that is a reinvention of try-wheel:

def foo(x):
if x > 0:
return x*x
else:
raise MathError("Bad value of x in foo: %s" % x)

try:
print foo(-1)
except MathError, e:
print "ERROR: System integrity is doubted"

or rather; that is perhaps a good example of when to use 'assert'. If
the domain of foo() is positive integers, calling -1 on foo is a bug in
the caller, not foo().

I have been looking at Haskell recently and the way the pure functional
language handled exceptions and I/O gives me a new distinct "insight"
that exceptions can be thought of as a special return value that is
implicitly wrapped and unwrapped up the call stack until it is
explicitly handled.
 
S

Steven D'Aprano

A pattern I have used a few times is that of returning an explicit
success/failure code alongside whatever the function normally returns.

That doesn't work for languages that can only return a single result,
e.g. C or Pascal. You can fake it by creating a struct that contains a
flag and the result you want, but that means doubling the number of data
types you deal with.

While subsequent programmers might not intuit the need to test for
(implicit) "magic" return values they ought to notice if they start
getting tuples back where they expected scalars...

What if they're expecting tuples as the result?


def foo(x)
if x>0:
return True, x*x
else:
return False, "Bad value of x in foo:",str(x)

ok, value = foo(-1)

Oops, that gives:

ValueError: too many values to unpack


because you've returned three items instead of two. When an idiom is easy
to get wrong, it's time to think hard about it.


if ok:
print "foo of x is", value
else:
print "ERROR:", value


Whenever I come across a function that returns a flag and a result, I
never know whether the flag comes first or second. Should I write:

flag, result = foo(x)

or

result, flag = foo(x)



I've seen APIs that do both.

And I never know if the flag should be interpreted as a success or a
failure. Should I write:

ok, result = foo(x)
if ok: process(result)
else: fail()

or


err, result = foo(x)
if err: fail()
else: process(result)


Again, I've seen APIs that do both.

And if the flag indicates failure, what should go into result? An error
code? An error message? That's impossible for statically-typed languages,
unless they have variant records or the function normally returns a
string.

And even if you dismiss all those concerns, it still hurts readability by
obfuscating the code. Consider somebody who wants to do this:

result = foo(bar(x))

but instead has to do this:


flag, result = bar(x)
if flag: # I think this means success
flag, result = foo(x) # oops, I meant result

Again, it's error-prone and messy. Imagine writing:


flag, a = sin(x)
if flag:
flag, b = sqrt(x)
if flag:
flag, c = cos(b)
if flag:
flag, d = exp(a + c)
if flag:
flag, e = log(x)
if flag:
# Finally, the result we want!!!
flag, y = d/e
if not flag:
fail(y)
else:
fail(e)
else:
fail(d)
else:
fail(c)
else:
fail(b)
else:
fail(a)



Compare that to the way with exceptions:

y = exp(sin(x) + cos(sqrt(x)))/log(x)


Which would you prefer?
 
R

r0g

Lie said:
Except that that is a reinvention of try-wheel:


True, but there's more than one way to skin a cat! Mine's faster if you
expect a high rate of failures (over 15%).



def foo(x):
if x > 0:
return x*x
else:
raise MathError("Bad value of x in foo: %s" % x)

try:
print foo(-1)
except MathError, e:
print "ERROR: System integrity is doubted"

or rather; that is perhaps a good example of when to use 'assert'. If
the domain of foo() is positive integers, calling -1 on foo is a bug in
the caller, not foo().


Maybe, although I recently learned on here that one can't rely on assert
statements in production code, their intended use is to aid debugging
and testing really.

Besides, that was just a toy example.


I have been looking at Haskell recently and the way the pure functional
language handled exceptions and I/O gives me a new distinct "insight"
that exceptions can be thought of as a special return value that is
implicitly wrapped and unwrapped up the call stack until it is
explicitly handled.



Yes there's some very interesting paradigms coming out of functional
programming but, unless you're a maths major, functional languages are a
long way off being productivity tools! Elegant: yes, provable: maybe,
practical for everyday coding: not by a long shot!


Roger.
 
C

Chris Rebert

<much snippage>
Yes there's some very interesting paradigms coming out of functional
programming but, unless you're a maths major, functional languages are a
long way off being productivity tools! Elegant: yes, provable: maybe,
practical for everyday coding: not by a long shot!

Methinks the authors of Real World Haskell (excellent read btw) have a
bone to pick with you.

Cheers,
Chris
 
R

r0g

Steven said:
That doesn't work for languages that can only return a single result,
e.g. C or Pascal. You can fake it by creating a struct that contains a
flag and the result you want, but that means doubling the number of data
types you deal with.


No, but that's why I try not to use languages where you can only return
a single result, I always found that an arbitrary and annoying
constraint to have. I leads to ugly practices like "magic" return values
in C or explicitly packing things into hashtables like PHP, yuk!

What if they're expecting tuples as the result?




Oops, that gives:

ValueError: too many values to unpack


because you've returned three items instead of two. When an idiom is easy
to get wrong, it's time to think hard about it.


That seems pretty clear to me, "too many values to unpack", either I've
not given it enough variables to unpack the result into or I've returned
too many things. That would take a couple of seconds to notice and fix.
In fact I was trying to make the point that it would be quite noticable
if a function returned more things than the programmer was expecting,
this illustrates that quite well :)



Whenever I come across a function that returns a flag and a result, I
never know whether the flag comes first or second. Should I write:


Flag then result, isn't it obvious? The whole point of returning a flag
AND a result is so you can test the flag so you know what to do with the
result so that implies a natural order. Of course it doesn't matter
technically which way you do it, make a convention and stick to it. If
you get perpetually confused as to the order of parameters then you'd
better avoid this kind of thing, can't say as I've ever had a problem
with it though.



And I never know if the flag should be interpreted as a success or a
failure. Should I write:

ok, result = foo(x)
if ok: process(result)
else: fail()



Yes. That would be my strong preference anyway. Naturally you can do it
the other way round if you like, as long as you document it properly in
your API. As you say different APIs do it differently... Unix has a
convention of returning 0 on no-error but unix has to encapsulate a lot
in that "error code" which is a bit of an anachronism these days. I'd
argue in favour of remaining positive and using names like ok or
success, this is closer to the familiar paradigm of checking a result
does not evaluate to false before using it...

name = ""
if name:
print name




And if the flag indicates failure, what should go into result? An error
code? An error message? That's impossible for statically-typed languages,
unless they have variant records or the function normally returns a
string.


Yeah, in my example it's an error message. Maybe I shouldn't have used
the word "pattern" above though as it has overtones of "universally
applicable" which it clearly isn't.


Again, it's error-prone and messy. Imagine writing:


flag, a = sin(x)
if flag:
flag, b = sqrt(x)
if flag:
Compare that to the way with exceptions:

y = exp(sin(x) + cos(sqrt(x)))/log(x)


Which would you prefer?

LOL, straw man is straw!

You know full well I'm not suggesting every function return a flag, that
would be silly. There's no reason returning flag and a value shouldn't
be quite readable and there may be times when it's preferable to raising
an exception.

I use exceptions a lot as they're often the right tool for the job and
they seem pleasingly pythonic but from time to time they can be too slow
or verbose, where's the sense in forcing yourself to use them then?

Roger.
 
P

Paul Rudin

r0g said:
No, but that's why I try not to use languages where you can only return
a single result, I always found that an arbitrary and annoying
constraint to have. I leads to ugly practices like "magic" return values
in C or explicitly packing things into hashtables like PHP, yuk!

Doesn't python just return a single result? (I know it can be a tuple and
assignment statements will unpack a tuple for you.)
 
R

r0g

Chris said:
<much snippage>


Methinks the authors of Real World Haskell (excellent read btw) have a
bone to pick with you.

Cheers,
Chris


LOL, it seems things have come a long way since ML! I'm impressed how
many useful libraries Haskell has, and that they've included
IF-THEN-ELSE in the syntax! :) For all its advantages I still think you
need to be fundamentally cleverer to write the same programs in a
functional language than an old fashioned "English like" language.

Maybe I'm just mistrusting of the new school though and you'll see me on
comp.lang.haskell in a few years having to eat my own monads!

Roger.
 
R

r0g

Paul said:
Doesn't python just return a single result? (I know it can be a tuple and
assignment statements will unpack a tuple for you.)


Yes, it returns a tuple if you return more than one value, it just has a
lovely syntax for it. In static languages you'd need to manually create
an new array or struct, pack your return vars into it and unpack them on
the other side. That's something I'd be happy never to see again, sadly
I have to write in PHP sometimes :(

Roger.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,579
Members
45,053
Latest member
BrodieSola

Latest Threads

Top