Error handling in C

J

James Kuyper

Richard said:
James Kuyper said:


Doing things right tends to turn out cheaper in the long run.

That depends entirely on how far you go to "do things right". The
original comment that led us down this track was consideration of a bug
whose fix would require so much developer time that it would increase
the cost of a game from £40/unit to £100/unit. It was also specified
that this bug would, in normal use, be triggered only about once per year.

Since most of the costs of a software fix are salary, that strongly
implies a corresponding increase in the total development time of the
game. The price increase would dramatically reduces both sales and
profits, and the delay in delivery of the game would increase the amount
of time that passes between the time money is spent on developing the
game, and the time that revenue is collected by selling the game - such
time delays cost money, lots of it, either though increased lending
costs or through the cost of missed investment opportunities.

Would you care to suggest ANY plausible mechanism whereby fixing such a
minor bug at such great expense could produce compensatory savings of
sufficient size to make doing so a good idea?
... Telling your customers their disk is dirty (when
what you really mean is that your program screwed up) is not a
route to excellence.

I'm in perfect agreement on that point; that's fraud, pure and simple.
However, the only person I've seen suggesting that approach was Malcolm
McLean, and his suggestion had nothing to do with the line of discussion
leading up to my message.
 
T

Tony

Kenny McCormack said:
The only fact is that you are are akin to a warrior ant in defense of the
hive. Noted. And buh bye.

(Now switching into CLC-pedant-but-totally-missing-the-point,
can-you-say-Whoosh!!!, mode)

Ants live in hills; bees live in hives.
But both are off-topic in comp.lang.c. May I helpfully and oh-so-smugly
suggest you try alt.insects instead?
[/QUOTE]

Surely you're not suggesting that I go there rather than here after solving
all the world's problems at the local watering hole!?

Tony
 
J

James Kuyper

Richard said:
James Kuyper said:



Bear in mind that it was a hypothetical example constructed by
Malcolm in an effort to support his point. I suggest that it was an
unrealistic example.

Nonetheless, it is this unrealistic example that you treated as if it
indicated sloppy development practices. It does not. In the unlikely
(but not impossible) event that such a bug were discovered, I would
expect even the most meticulous game developer to make the same choice
that Malcolm suggested, unless that game developer were insane, or at
least a masochist.

....
The problem here is not the bug, but the environment in which
bug-fixing is so ludicrously expensive that it makes the bug
uneconomic to fix in the short term.

I've never run into a bug that ludicrously expensive to fix; but I have
frequently run into bugs that were too expensive to justify fixing them
immediately, and I've even run into bugs so expensive to fix, and with
so little corresponding benefit, that I don't expect that I'll ever be
able to justify spending the resources needed to fix them.

For instance, one program I'm responsible for processes 138 million
packets per day, filtering out a few dozen corrupted packets per day. It
has a bug that lets a dozen or so corrupted packets slip through
undetected per year. I've worked out, at the conceptual level, a bug-fix
that would catch virtually all of those packets, but it's just a
day-dream, because that fix would require a major redesign of the
program, which would keep me busy full time for a month or two. I don't
think there's anyone who would argue that filtering out a dozen corrupt
packets per year comes anywhere near to being as valuable as two months
of my salary, not by several orders of magnitude. It's certainly not an
argument I intend to attempt.
 
G

Guest

I have inclined mostly to use the return value from functions, 0 for
success and everything else for failure (mostly -1 ).
ok

The only one problem
is I want to exit most of the time, or you can day

day? say?
that is the way my
current program runs and for checking error conditions at 8 places I have
to put exit( EXIT_FAULRE) at 8 places along using __func__ as an argument
to fprintf().

void error (const char *func, const char *err)
{
fprintf ("error in %s: %s\n", func, err);
exit (EXIT_FAILURE);
}

void do_sock_stuff()
{
if (sock_call_1() == -1)
error (__func__, "sock_call_1 failed");
}

note __func__ is part of the 1999 C Standard (so-called C99).
C99 is not widely fully implemented. So __func__
may not always be available.
 
G

Guest

Which would be a very acceptable strategy for a large class of errors if C
had a return value type that "exploded" (C++ term) if it wasn't checked.
Then, the "return an error indicator" strategy becomes robust and not "error
prone" as so many C++ afficionados continually harp as a major justification
for exception handling.

"exploding return value" sounds very much like an exception
Note that many functions do not generate errors and can have a void return.
It is worth trying to make as many functions as possible void return
functions.

I come to the exactly opposite conclusion! If you want a robust
C program then you want *more* functions that return error values.
Many more functions could be void return functions if C had
pass-by-reference arguments (null ptr checks need not be done within the
function as a precondition check then).

and if the pre-condition fails?
So, I'm suggesting those 2 things (exploding error return val and
pass-by-reference) be introduced into C. :)

and I'm trying to describe an error handling for C, not for tony-C
 
J

James Kuyper

blargg wrote:
....
When I was a child, my friends and I knew good games from badly-programmed
ones, and avoided the latter. These days, bad programming won't just cause
occasional graphical glitches as when I was a kid; it can easily freeze
the entire machine.

Freezing the entire machine is not just a problem "these days"; that's
been a problem since the earliest days of DOS. It's not a problem with
decent operating systems, where application programs don't have the
privileges needed to bring your system to a halt. If a game does cause
your system to freeze up, it's due to a bug in your system, in that it
allows the game to have sufficient control to freeze the machine.
 
R

Rick Dearman

No, you can't say that - at least, not truthfully and knowledgeably.
Firstly, either the bug is there or it isn't - if it *is* there,
you might *encounter* it less than once a year in normal use, or
you might not. Secondly, playtesters are not normal users (I've
never actually been one, but I've met several professional
playtesters over the years, and believe me, they ain't normal where
games are concerned). Thirdly, your probability theory is faulty -
you're claiming that if a low-probability event doesn't occur in a
given timespan T, it will occur on average less than once per T.
That's simply wrong.

Richard, can you please elaborate on the correct probability for this
event possibility? I'm a little fuzzy on statistical probability.


BTW, I do agree that you should always do your best to have
zero-defect code, however I disagree that games programmers are
cavalier, I have seen some wonderful games which require no patches.


((Am I the only one who noticed this is off-topic?)) *Ducking the
virtual bricks being thrown at me now.*
 
J

jameskuyper

Richard said:
James Kuyper said:


Um, yes it does.


Consider his (hypothetical) program's reaction: it displays a
message about the need to clean the disk (instead of saying
something like "internal error"). Therefore, the problem has been
anticipated. The cheapest point at which to fix the bug is the
point at which the developer discovered it and copped out of it by
displaying a misleading message.

I agree that this way of dealing with the problem is morally
indefensible. However, he only added that detail to the discussion
long after you made the comment I'm talking about. I see nothing
inappropriate about never fixing such a bug, so long as users are
adequately (and accurately) informed about it's existence and nature.
That Malcolm later indicated that users were being lied to about this
bug does not justify identifying his earlier comment as a sign of poor
software development processes.

....
If the program is not being sold for profit, I can understand that
attitude.

NASA is paying us to create the program, and to make it freely
available to the general public (after signing off on a NASA Software
Usage Agreement), and to use it to produce data products which are
also freely available to the public. We're making a profit off of the
contract. I'm not sure whether that counts as "being sold for profit"
for your purposes, so perhaps your comment is relevant.

However, it's not at all clear to me that this attitude would be any
less appropriate with a different business model. Even if our programs
were sold to the general public for profit, and our file formats were
proprietary so that people could only access the data by using our
programs, I still doubt that
 
W

Willem

Richard Heathfield wrote:
) Consider his (hypothetical) program's reaction: it displays a
) message about the need to clean the disk (instead of saying
) something like "internal error"). Therefore, the problem has been
) anticipated. The cheapest point at which to fix the bug is the
) point at which the developer discovered it and copped out of it by
) displaying a misleading message.

There are several ways in which an "internal error"/"clean the disk"
message could have been implemented without actually having to encounter
any bugs.

Example 1: "This can never happen" checks, such as taking the square
root of negative numbers in some polygon engine or something.

Example 2: A watchdog timer/thread that has to be updated by the main loop
every decisecond, or else it displays the above error message and halts.


I think that most "internal error" messages are produced by code that makes
such sanity checks, and not by specific errors. (And most of the rest are
probably external libraries or drivers throwing unknown error codes that
have never come up in dev/test)


SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT
 
R

Richard Bos

Bartc said:
But, obscure bugs discovered after perhaps 1000 man-years of use did tend to
get low-priority. Especially if they weren't easily repeatable..

A non-repeatable bug isn't a bug, it's a pilot error.

Richard
 
T

Tony

Which would be a very acceptable strategy for a large class of errors if C
had a return value type that "exploded" (C++ term) if it wasn't checked.
Then, the "return an error indicator" strategy becomes robust and not
"error
prone" as so many C++ afficionados continually harp as a major
justification
for exception handling.

""exploding return value" sounds very much like an exception"

Not really. There is no handling of "an explosion". The only thing to do is
to fix the source code by doing a check where it is required. That's the
extent of functionality that exploding return objects have.
Note that many functions do not generate errors and can have a void
return.
It is worth trying to make as many functions as possible void return
functions.

"I come to the exactly opposite conclusion! If you want a robust
C program then you want *more* functions that return error values."

No. A void return indicates that no error can occur within a function. Read:
the function has be analyzed and designed to eliminate potential error.
Hence no exploding return val or check need to be done. How you would find
the other scenario as better sounds bizarre.
Many more functions could be void return functions if C had
pass-by-reference arguments (null ptr checks need not be done within the
function as a precondition check then).

"and if the pre-condition fails?"

There aren't any in the scenario I gave after pointer args are replaced with
reference args. You won't get past compiling with ref args as you would with
ptr args.
So, I'm suggesting those 2 things (exploding error return val and
pass-by-reference) be introduced into C. :)

"and I'm trying to describe an error handling for C, not for tony-C"

Yes, you are thinking in the box while I am thinking outside of the box and
toward the next standard.

Tony
 
B

Ben Bacarisse

Richard Heathfield said:
Rick Dearman said:


If all you have is a single datum, and a negative datum at that,
it's not possible to determine the mean frequency of an event.

Consider this analogy: every minute, I show you a playing card,
which is selected according to either a heuristic or an algorithm
of which you have not been informed. I then return the card to the
deck (in a way that you cannot observe - e.g. I hide the deck
behind a screen when replacing each card). After an hour, I haven't
shown you any clubs. What is the probability, over the long term,
that I will show you a club less often than once per hour?

Your analogy shows that you prefer to make a different set of
assumptions to Malcolm. He is obviously happy to model gaming errors
as a Poisson process.

A single observation is simply insufficient for establishing an
average that is even remotely meaningful.

If the model is a Poisson process, you don't have one observation, but
an infinity of observations forming a dense set (the infinity is not
really important, the fact that it is dense is -- you have to have a
set with non-zero measure). It is quite justifiable to draw some
conclusions about the average rate of events from this set of
observations.

Obviously you don't accept that model (which is fine) but your message
suggested that Malcolm was being illogical. I think he is just making
different assumptions. I don't think the actual numbers were quite
right, but that is another matter.
 
R

Richard Bos

Attitudes like that are why we're saddled with unreliable, buggy
software. Undefined behavior isn't necessarily repeatable.

No, but there's a difference between "hard to repeat" and "cannot be
repeated". I'd love to say that all bug reports are equally worthwhile,
but I've seen too many "bugs" which turned out to be the operator being,
for example, unable to read a simple instruction on the screen.

Richard
 
J

jameskuyper

Richard said:
No, but there's a difference between "hard to repeat" and "cannot be
repeated".

Yes - "hard to repeat" is something that can be said about real world
bugs. "cannot be repeated" is something we'd like to be able to say,
but it is inherently impossible to collect enough sufficient data to
justify saying so. The simple fact that you could not reproduce a
problem reported by a user might be due to any of a number of
problems. It could be that your system differs from the user's system
in some subtle way that you are unaware of. It might depend upon time
or the environment in some way that was covered by none of your
attempts to reproduce the problem. It might be due to a communications
failure between you and the user (and keep in mind that this failure
might not be entirely the user's fault).
... I'd love to say that all bug reports are equally worthwhile,
but I've seen too many "bugs" which turned out to be the operator being,
for example, unable to read a simple instruction on the screen.

If you know that the problem was due to the operator being unable to
read a simple instruction, then the right diagnosis might be "operator
error", but it could also be "poorly written instruction". This is
particularly true if the instruction was written by the developer -
such instructions often assume that the user should understand as much
about the internals of the program as the developer does.
In neither case is "unrepeatable" an appropriate diagnosis.
 
G

Guest

A non-repeatable bug isn't a bug, it's a pilot error.

why? That's just crazy! I've had all sorts of hard to reproduce
bugs (and "only happens once" is just the ultimate hard to reproduce
bug (or is "has never happened" the UHtRB?)).

Hard to reproduce:

- timing dependent
- address dependent
- data dependent
- data size dependent
 
G

Guest

Malcolm McLean said:

No, you can't say that - at least, not truthfully and knowledgeably.
Firstly, either the bug is there or it isn't

no. It depends on your definition of bug. I've sat in System Test
meets Development meetings where counting dancing angels would
have been regarded as laughably trivial compared with "is this
a bug".

To a QA guy this is easy. "The system is in error if it can be
shown not to comply with its formal specification". This of assumes
you a complete and correct formal specification. And the end user
can meaningfully read it.

As an end user put it to me "if I bought a car off you I'd expect
it to have a steering wheel. I don't have to write that down."

So the working definition I use is "it's a bug if the system
behaves in a manner the end user wouldn't reasonably expect".
Which has a nice weasel in it.

The is no cure for:
Marketing Guy: "your monitoring system is reporting a lot
of bugs in our equipment"
"Development Guy: "thats's because there are"
MG: "but it makes us look bad!"

<snip>
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,263
Messages
2,571,061
Members
48,769
Latest member
Clifft

Latest Threads

Top