K
Kelsey Bjarnason
[snips]
Sure. Now let me ask you this: how often do *you* write your code to,
oh, cache all the local OS files so that if something goes wrong, you can
recover?
Right, you don't. Part of the contract the software is designed to is
that it has a functioning OS. It's not the application's job to ensure
this.
This assumes the reviewers know what they're looking for, and at, which
is not always the case.
Take the case in question: every predictable failure - server outages,
for example - is dealt with in the code, which is *very* robust about
such things.
So what caused our failure? Was it a server outage? No. It was someone
breaking the contract. The code is designed a particular way, based on a
particular flow of data through the system. That flow was tampered with,
with undesirable results. Yet in normal operation - "normal" including
all failure modes that the environment can actually expect to encounter
in usage - causes no such problem.
So, is this a case of a bad design, bad code? No; it's a case of someone
violating the contract to which the code is written. So what'll a code
review show? Oh, yes, the code does not deal with the case of someone
with admin level access injecting invalid data into the system. Well, of
course not; it wasn't designed to, as in normal operation it does not
*get* invalid data, and in every error mode which can be predicted, it
simply gets no data at all.
You cannot prevent all failures; you can only prevent the ones which are
predictable. Nor can you even *detect* all failures, only the ones which
stand out in some algorithmically detectable way - but even there, the
effort to do so may simply not be worth it: if it requires some thumb-
fingered twit to manually mess things up in the first place, do you spend
an extra three weeks writing code to deal with that, or do you just tell
said thumb-fingered twit "don't do that"?
I've been around for some time, and I am yet to see a single case where
a customer (external or internal) knows upfront what requirements she
actually has.
It is a sign of maturity of the programmer to anticipate (and suggest!)
changes in specifications, - and write code with guards against
attempted violations.
Sure. Now let me ask you this: how often do *you* write your code to,
oh, cache all the local OS files so that if something goes wrong, you can
recover?
Right, you don't. Part of the contract the software is designed to is
that it has a functioning OS. It's not the application's job to ensure
this.
If things /really/ work, the skills have been adequate (by definition).
Not every parking garage has to be an architectural marvel. None should
collapse under its weight.
Coding skills stand out immediately in e.g. code reviews.
This assumes the reviewers know what they're looking for, and at, which
is not always the case.
Take the case in question: every predictable failure - server outages,
for example - is dealt with in the code, which is *very* robust about
such things.
So what caused our failure? Was it a server outage? No. It was someone
breaking the contract. The code is designed a particular way, based on a
particular flow of data through the system. That flow was tampered with,
with undesirable results. Yet in normal operation - "normal" including
all failure modes that the environment can actually expect to encounter
in usage - causes no such problem.
So, is this a case of a bad design, bad code? No; it's a case of someone
violating the contract to which the code is written. So what'll a code
review show? Oh, yes, the code does not deal with the case of someone
with admin level access injecting invalid data into the system. Well, of
course not; it wasn't designed to, as in normal operation it does not
*get* invalid data, and in every error mode which can be predicted, it
simply gets no data at all.
You cannot prevent all failures; you can only prevent the ones which are
predictable. Nor can you even *detect* all failures, only the ones which
stand out in some algorithmically detectable way - but even there, the
effort to do so may simply not be worth it: if it requires some thumb-
fingered twit to manually mess things up in the first place, do you spend
an extra three weeks writing code to deal with that, or do you just tell
said thumb-fingered twit "don't do that"?