Hi Ian,
Because it is easier to generate code than it is to parse code and
extract information from an entire code base. It is also easier to
maintain a single document in a form that can be translated in to things
other than code (especially if you have a need to get some of the
messages translated).
So, you have a "single document" for <stdio>, another for <math>,
another for <string>, another for <motor>, another for... (i.e.,
you *don't* have a single document *or* that single document
is the concatenation of all of these OTHER single documents).
What happens when some *aspect* of <math> never is used in your
final link? Do you modify *that* "single document" and put it under
configuration management/version control for this project/product?
Quite. The table of error codes/messages/whatever is all generated from
one place. If a new error code is required , it has to be added to the
master file.
And you have to manually verify that the two documents are in sync.
That the "explanation" of the error/condition/exception is in agreement
with the actual tests being performed that detect it.
Nothing. How many of the error codes in a typical system header (such as
<errno.h> on a Unix system) does any one application actually use?
But they aren't *exposed* to the end user/etc.
You put a description for EDOMAIN in your "single document" (whether
that is the <math> single document or the composite single document
formed from all-of-the-above).
The product goes to final test.
"OK, we've verified the motor subsystem responds as required to
all possible operating conditions. Let's move on. Now, how do
I cause the EDOMAIN error to manifest?"
"Um, let's see... No, it doesn't appear that any of the
functions that COULD signal that error are actually present
in the executable. So, to answer your question, EDOMAIN is
something that /* CAN'T HAPPEN */"
"<frown> And how am *I*, charged with ensuring that your product
meets its stated/documented specifications, supposed to know
which of these errors can and can't occur? How do I know that
your code isn't buggy and FAILING to accurately detect them?"
If new errors are added to the master file as required, this will not
really be a problem.
We did, however code editors tend to be piss poor word processors and
word processors tend to be piss poor text editors! It's quite easy to
write code in Open Office and extract the code sections for compilation,
but it's a process not well suited to contemporary tools (IDEs).
On the project I mentioned, the master XML was used to generate many
different components, including database tables and their equivalent C
structures and enumerations. So this file had a lot more value than it
would had it simply been used for error codes and messages. Elsewhere in
this thread someone mentioned a function to check input values, well the
types and range of common data types were also specified in this file,
so all the range checking and error reporting was centralised.
Our first "experience" with integrating separate tools for
"documentation" and "coding" followed this same rationale.
We reasoned that the sorts of things we wanted to do in the
documentation (ended up in HTML) just weren't effectively
handled in a "text/programmer's editor". And, of course, as
is always the case, no one wanted to give up the tools that
*they* had grown fond of!
We figured that we could just "dress up" our formal specifications
in a more colloquial tone and reorganize them in a way more suitable
to an "on-line manual" (of sorts). Perfect! since it also allowed
folks to start working on that aspect of the documentation in
parallel with the coding.
We ended up with gorgeous documentation. The overhead for accessing
particular parts of the documentation was trivial -- effectively a
URL that was passed between the actual code and the "help system".
Things like EDOMAIN are easy to describe -- and forget -- once.
Things like MOTOR_OVERHEAT tend to get a bit more involved. And,
remain unsettled a lot longer: "Oh, Bob, we stumbled on another
example of a common situation that, with prolonged use, will
cause the motor to overheat. You should probably add it to the
documentation..."
It was always a game of cat and mouse -- keeping the documentation
in sync with the actual code base (these are fairly sized projects;
~40MB for the documentation) took constant diligence. We only
used two levels of "messaging" -- a simple, short phrase (which
was enough for an experienced user to recognize The Problem)
and a link to the portion of the manual describing the issue
in detail (e.g., "Section 23.1.5: Specifying Numeric Values").
We naively thought this was *ideal* -- have the manual at the
user's fingertips without requiring him to keep it nearby!
But, the manual presented too much information and, so, ended
up being largely ignored (!). Calls coming in to Tech Support
were *always* answered just by the support person consulting
that very same portion of the manual as had been offered to the
user at run time. I.e., if the user had bothered to READ it,
it would have admirably served its purpose!
Talking to users showed us that we had straddled the "sweet spot"
with our error messages -- too little or too much. This led to
the current approach of gradually increasing detail. At the
extreme end, the user finds himself "in" the "Manual", again,
where we can take our time/space explaining things in detail
along with examples illustrating usage, problems and potential
remedies (including interactive troubleshooting).
This increases the number of "references" between code and
"documentation" almost by an order of magnitude (e.g., in the
examples presented here, I show just two levels of "user
information"; in practice, it's more like 5 or 6 for each
"error"). More things to potentially get out of sync.
In this approach, we already acknowledge that programmers don't
always make good "teachers" ("explainers"). So, we don't expect
messages to be particularly "user friendly". Watching the
sorts of messages that people come up with shows big differences
in style, presentation, etc.
But, programmers can be expected to ACCURATELY describe what their
code is actually doing (or *supposedly* doing). And, can explain
to someone better equipped with "language" how errors "nest".
This other person can then clean up the actual language for each
message and (another) handle translations to other languages, etc.
Putting these in separate "master documents" just leads to more
"conflicts" as the two developers (linguist and coder) drift out of
sync:
"Here are the texts of the error messages you need, Bob"
"Huh? No, I no longer have to worry about decimal points
AT ALL! I chose to pass data in units of 1/10 so I can
eliminate the need for decimal points -- and the testing
for MULTIPLE decimal points -- in those interactions.
Clever, eh? Oh, and, by the way, you need to reflect
this in the 'out of range' messages since those values
can now be 10 times higher..."
If you decouple these even *more* (i.e., linguist updates the
"master file" and Bob never has to worry about consulting it
since the build mechanism does that FOR him) then you end up
with an unreferenced error message (Err_TooManyDecimalPoints)
or an *incorrect* error message (Err_OutOfRange)
I believe the the tools required to generate code from XML are much
simpler than those required to extract documentation form code (other
than from code comments).
That was why I opted for a syntactically simple implementation:
"MakeErrorCode" is just a "tag" that I can locate and excise from
the source file (replacing the line(s) that it occupies with
empty lines). The tool that does so doesn't need to be aware
of any C syntax. Nor does it interfere with any C *statements*.
It plays the role of a "#specialcomment" (I had thought of trying
to implement it as a #pragma but figured that tied it to the
language too much)
But it doesn't solve the uniqueness problem. With a master file, you
don't have to worry, or even define the error values. The translator can
do that for you.
See my response to Shao. Mechanically, it (well, this newest version)
provides everything I want -- uniqueness, unused errors "disappear"
from the documentation automatically, scales linearly, can be used
independently by multiple developers (only the errors that are
defined in the modules *you* are using need be concerned with), tells
me where it is "defined"/raised in the code, etc.
But, it doesn't "encourage" (force is too strong a word) developers
to "use it the right way". A "conforming" use might be, for example
(pseudocode):
function() {
...
// test for "foo" condition
if () {
MakeErrorCode(foo1, ...)
// handle foo condition
...
error = foo1;
}
...
// test for "bar" condition
if () {
MakeErroCode(bar, ...)
// handle bar condition
...
error = bar;
}
...
// test for some other variant/instance of "foo" condition
if () {
MakeErrorCode(foo2, ...)
// handle other foo condition
...
error = foo2;
}
...
// report <whatever> condition
...
}
This causes each error code to be associated with the actual
test (line number) that detected it and ensures that each
such test is uniquely identified by an error code.
But, there is nothing that prevents the developer from doing:
MakeErrorCode(foo, ...)
MakeErrorCode(bar, ...)
function() {
...
// test for "foo" condition
if () {
// handle foo condition
...
error = foo;
}
...
// test for "bar" condition
if () {
// handle bar condition
...
error = bar;
}
...
// test for some other variant/instance of "foo" condition
if () {
// handle other foo condition
...
error = foo;
}
...
// report <whatever> condition
...
}
Note that two different criteria that enforce "foo" have been
reported using the same error code (foo). And, that neither foo
nor bar give any clue as to which line -- or *function* -- is
associated with the "error". (Indeed, the developer could have
placed the MakeErrorCode instances in a *header* file and
things would still have worked!)
So, it ends up much like "goto" -- relying on peer pressure, code
reviews, "policy", etc. to ensure that it is "deployed" properly.
I'm looking for ways to increase the likelihood of "conforming"
behavior from developers...