Mechanism to generate annotated "error codes"

I

Ian Collins

Hi Ian,



Yes. I document them IN THE CODE.


Why?

Because it is easier to generate code than it is to parse code and
extract information from an entire code base. It is also easier to
maintain a single document in a form that can be translated in to things
other than code (especially if you have a need to get some of the
messages translated).
It ensures EVERY error code is documented. And, it keeps that
proximate to the actual code itself. *This* is what the code is
testing for... not something else that you THINK it is enforcing!

Quite. The table of error codes/messages/whatever is all generated from
one place. If a new error code is required , it has to be added to the
master file.
What happens when an error code is never referenced? How does
your document ELIDE that error code and its documentation?

Nothing. How many of the error codes in a typical system header (such
as <errno.h> on a Unix system) does any one application actually use?

If new errors are added to the master file as required, this will not
really be a problem.
Fair enough. You should look at Literate Programming for a
different take on this (see URL later)

We did, however code editors tend to be piss poor word processors and
word processors tend to be piss poor text editors! It's quite easy to
write code in Open Office and extract the code sections for compilation,
but it's a process not well suited to contemporary tools (IDEs).

On the project I mentioned, the master XML was used to generate many
different components, including database tables and their equivalent C
structures and enumerations. So this file had a lot more value than it
would had it simply been used for error codes and messages. Elsewhere
in this thread someone mentioned a function to check input values, well
the types and range of common data types were also specified in this
file, so all the range checking and error reporting was centralised.
Fair enough. My approach is "inspired" by Literate Programming
(http://en.wikipedia.org/wiki/Literate_programming). In either
approach, you need to have a tool/utility massage a file (or
files) to generate the actual "source" and supporting files.

I believe the the tools required to generate code from XML are much
simpler than those required to extract documentation form code (other
than from code comments).
I bundle it in the source because I can just let MakeErrorCode
be a macro that doesn't alter the source (Literate Programming
requires the source to be "extracted" -- "tangled").

But it doesn't solve the uniqueness problem. With a master file, you
don't have to worry, or even define the error values. The translator
can do that for you.
 
D

Don Y

Hi Ian,

Because it is easier to generate code than it is to parse code and
extract information from an entire code base. It is also easier to
maintain a single document in a form that can be translated in to things
other than code (especially if you have a need to get some of the
messages translated).

So, you have a "single document" for <stdio>, another for <math>,
another for <string>, another for <motor>, another for... (i.e.,
you *don't* have a single document *or* that single document
is the concatenation of all of these OTHER single documents).

What happens when some *aspect* of <math> never is used in your
final link? Do you modify *that* "single document" and put it under
configuration management/version control for this project/product?
Quite. The table of error codes/messages/whatever is all generated from
one place. If a new error code is required , it has to be added to the
master file.

And you have to manually verify that the two documents are in sync.
That the "explanation" of the error/condition/exception is in agreement
with the actual tests being performed that detect it.
Nothing. How many of the error codes in a typical system header (such as
<errno.h> on a Unix system) does any one application actually use?

But they aren't *exposed* to the end user/etc.

You put a description for EDOMAIN in your "single document" (whether
that is the <math> single document or the composite single document
formed from all-of-the-above).

The product goes to final test.

"OK, we've verified the motor subsystem responds as required to
all possible operating conditions. Let's move on. Now, how do
I cause the EDOMAIN error to manifest?"
"Um, let's see... No, it doesn't appear that any of the
functions that COULD signal that error are actually present
in the executable. So, to answer your question, EDOMAIN is
something that /* CAN'T HAPPEN */"
"<frown> And how am *I*, charged with ensuring that your product
meets its stated/documented specifications, supposed to know
which of these errors can and can't occur? How do I know that
your code isn't buggy and FAILING to accurately detect them?"
If new errors are added to the master file as required, this will not
really be a problem.


We did, however code editors tend to be piss poor word processors and
word processors tend to be piss poor text editors! It's quite easy to
write code in Open Office and extract the code sections for compilation,
but it's a process not well suited to contemporary tools (IDEs).

On the project I mentioned, the master XML was used to generate many
different components, including database tables and their equivalent C
structures and enumerations. So this file had a lot more value than it
would had it simply been used for error codes and messages. Elsewhere in
this thread someone mentioned a function to check input values, well the
types and range of common data types were also specified in this file,
so all the range checking and error reporting was centralised.

Our first "experience" with integrating separate tools for
"documentation" and "coding" followed this same rationale.
We reasoned that the sorts of things we wanted to do in the
documentation (ended up in HTML) just weren't effectively
handled in a "text/programmer's editor". And, of course, as
is always the case, no one wanted to give up the tools that
*they* had grown fond of!

We figured that we could just "dress up" our formal specifications
in a more colloquial tone and reorganize them in a way more suitable
to an "on-line manual" (of sorts). Perfect! since it also allowed
folks to start working on that aspect of the documentation in
parallel with the coding.

We ended up with gorgeous documentation. The overhead for accessing
particular parts of the documentation was trivial -- effectively a
URL that was passed between the actual code and the "help system".

Things like EDOMAIN are easy to describe -- and forget -- once.
Things like MOTOR_OVERHEAT tend to get a bit more involved. And,
remain unsettled a lot longer: "Oh, Bob, we stumbled on another
example of a common situation that, with prolonged use, will
cause the motor to overheat. You should probably add it to the
documentation..."

It was always a game of cat and mouse -- keeping the documentation
in sync with the actual code base (these are fairly sized projects;
~40MB for the documentation) took constant diligence. We only
used two levels of "messaging" -- a simple, short phrase (which
was enough for an experienced user to recognize The Problem)
and a link to the portion of the manual describing the issue
in detail (e.g., "Section 23.1.5: Specifying Numeric Values").

We naively thought this was *ideal* -- have the manual at the
user's fingertips without requiring him to keep it nearby!

But, the manual presented too much information and, so, ended
up being largely ignored (!). Calls coming in to Tech Support
were *always* answered just by the support person consulting
that very same portion of the manual as had been offered to the
user at run time. I.e., if the user had bothered to READ it,
it would have admirably served its purpose!

Talking to users showed us that we had straddled the "sweet spot"
with our error messages -- too little or too much. This led to
the current approach of gradually increasing detail. At the
extreme end, the user finds himself "in" the "Manual", again,
where we can take our time/space explaining things in detail
along with examples illustrating usage, problems and potential
remedies (including interactive troubleshooting).

This increases the number of "references" between code and
"documentation" almost by an order of magnitude (e.g., in the
examples presented here, I show just two levels of "user
information"; in practice, it's more like 5 or 6 for each
"error"). More things to potentially get out of sync.

In this approach, we already acknowledge that programmers don't
always make good "teachers" ("explainers"). So, we don't expect
messages to be particularly "user friendly". Watching the
sorts of messages that people come up with shows big differences
in style, presentation, etc.

But, programmers can be expected to ACCURATELY describe what their
code is actually doing (or *supposedly* doing). And, can explain
to someone better equipped with "language" how errors "nest".
This other person can then clean up the actual language for each
message and (another) handle translations to other languages, etc.

Putting these in separate "master documents" just leads to more
"conflicts" as the two developers (linguist and coder) drift out of
sync:
"Here are the texts of the error messages you need, Bob"
"Huh? No, I no longer have to worry about decimal points
AT ALL! I chose to pass data in units of 1/10 so I can
eliminate the need for decimal points -- and the testing
for MULTIPLE decimal points -- in those interactions.
Clever, eh? Oh, and, by the way, you need to reflect
this in the 'out of range' messages since those values
can now be 10 times higher..."
If you decouple these even *more* (i.e., linguist updates the
"master file" and Bob never has to worry about consulting it
since the build mechanism does that FOR him) then you end up
with an unreferenced error message (Err_TooManyDecimalPoints)
or an *incorrect* error message (Err_OutOfRange)
I believe the the tools required to generate code from XML are much
simpler than those required to extract documentation form code (other
than from code comments).

That was why I opted for a syntactically simple implementation:
"MakeErrorCode" is just a "tag" that I can locate and excise from
the source file (replacing the line(s) that it occupies with
empty lines). The tool that does so doesn't need to be aware
of any C syntax. Nor does it interfere with any C *statements*.

It plays the role of a "#specialcomment" (I had thought of trying
to implement it as a #pragma but figured that tied it to the
language too much)
But it doesn't solve the uniqueness problem. With a master file, you
don't have to worry, or even define the error values. The translator can
do that for you.

See my response to Shao. Mechanically, it (well, this newest version)
provides everything I want -- uniqueness, unused errors "disappear"
from the documentation automatically, scales linearly, can be used
independently by multiple developers (only the errors that are
defined in the modules *you* are using need be concerned with), tells
me where it is "defined"/raised in the code, etc.

But, it doesn't "encourage" (force is too strong a word) developers
to "use it the right way". A "conforming" use might be, for example
(pseudocode):

function() {
...
// test for "foo" condition
if () {
MakeErrorCode(foo1, ...)
// handle foo condition
...
error = foo1;
}
...
// test for "bar" condition
if () {
MakeErroCode(bar, ...)
// handle bar condition
...
error = bar;
}
...
// test for some other variant/instance of "foo" condition
if () {
MakeErrorCode(foo2, ...)
// handle other foo condition
...
error = foo2;
}
...
// report <whatever> condition
...
}

This causes each error code to be associated with the actual
test (line number) that detected it and ensures that each
such test is uniquely identified by an error code.

But, there is nothing that prevents the developer from doing:

MakeErrorCode(foo, ...)
MakeErrorCode(bar, ...)

function() {
...
// test for "foo" condition
if () {
// handle foo condition
...
error = foo;
}
...
// test for "bar" condition
if () {
// handle bar condition
...
error = bar;
}
...
// test for some other variant/instance of "foo" condition
if () {
// handle other foo condition
...
error = foo;
}
...
// report <whatever> condition
...
}

Note that two different criteria that enforce "foo" have been
reported using the same error code (foo). And, that neither foo
nor bar give any clue as to which line -- or *function* -- is
associated with the "error". (Indeed, the developer could have
placed the MakeErrorCode instances in a *header* file and
things would still have worked!)

So, it ends up much like "goto" -- relying on peer pressure, code
reviews, "policy", etc. to ensure that it is "deployed" properly.
I'm looking for ways to increase the likelihood of "conforming"
behavior from developers...
 
I

Ian Collins

Hi Ian,



So, you have a "single document" for<stdio>, another for<math>,
another for<string>, another for<motor>, another for... (i.e.,
you *don't* have a single document *or* that single document
is the concatenation of all of these OTHER single documents).

Only for our own stuff. Everything else is already documented and tested.

That was why I opted for a syntactically simple implementation:
"MakeErrorCode" is just a "tag" that I can locate and excise from
the source file (replacing the line(s) that it occupies with
empty lines). The tool that does so doesn't need to be aware
of any C syntax. Nor does it interfere with any C *statements*.

I don't think we are disagreeing, merely describing different approaches
to a similar problem.

In our case, the master files (they were hierarchical for a family of
products with a common core) where used for a lot more than errors.
They included definitions of the types, structures and enumerations in
our database. They also defined the database and user menu structure.
A lot of code was generated from them, including range checking and
serialisation. So it made sense to do as much as possible in one place.
 
D

Don Y

Hi Ian,

I don't think we are disagreeing, merely describing different approaches
to a similar problem.

Agreed. I'm trying to point out the issues that we've encountered
that "strongly encourage" less "human involvement" in the process.

E.g., trying to "force" MakeErrorCode to be placed at the right
spot in the file will, I *know*, lead to arguments over exactly
where that spot should be in each instance (IMO, "wherever it
becomes clear that this *is* an instance of the particular 'error'")

*You* have to come up with a solution that fits the size, personnel,
skill and discipline of your application/team. If it works, great!
If it doesn't, tweak it (which is how *we* ended up going down this
road).
In our case, the master files (they were hierarchical for a family of
products with a common core) where used for a lot more than errors. They
included definitions of the types, structures and enumerations in our
database. They also defined the database and user menu structure. A lot
of code was generated from them, including range checking and
serialisation. So it made sense to do as much as possible in one place.

You would find our current approach, here, interesting. Damn near
*every* "table" has been removed from the code and instantiated in
the database. Couple this with lots of table driven code and
you magically expose (for modification) much of the software
without necessitating actually modifying the executables.

(the jury is still out on what the actual performance hit will be)
 
B

BartC

Ian Collins said:
On 03/ 5/12 06:21 PM, Don Y wrote:
On the project I mentioned, the master XML was used to generate many
different components, including database tables and their equivalent C
structures and enumerations. So this file had a lot more value than it
would had it simply been used for error codes and messages.

Sounds like you weren't writing in C at all. Especially if the source is
supposed to look like XML.

Or do your tools allow you to write actual C code (for the executable
stuff), but store it in this high-level format, and only produce properly
formatted C for the purposes of compilation? (Or to allow perusal by anybody
else?)

What would a heavily documented, commented and error-checked version of
Hello World look like in this scheme?
 
I

Ian Collins

Sounds like you weren't writing in C at all. Especially if the source is
supposed to look like XML.

Or do your tools allow you to write actual C code (for the executable
stuff), but store it in this high-level format, and only produce properly
formatted C for the purposes of compilation? (Or to allow perusal by anybody
else?)

We generated types, and some boilerplate, not the application logic.
The exception was the menu structure, which was another XML file read by
the application at run time.
 
S

Shao Miller

Hi Shao,


Ignoring those compilers that don't expand __FILE__ to the full path
of the file, this works. In an earlier implementation, I used a
hash of these three to generate the actual "error code".

The bigger problem is programmer discipline (something that I
obviously don't trust :> )

E.g., the programmer is under no compulsion to invoke MakeErrorCode
in a "meaningful place". So, he could choose to put all of these
"invocations" (for want of a better word) at the very top of the
file. In which case, the __LINE__ and _func_ don't evaluate to
anything "useful".

I can think of a couple of tricks for "forcing" a programmer to adhere
to certain standards of returning status to a caller that might work:

1. A rule that _all_ functions must return the same type. If that type
is a pointer type, then it can be "difficult" for a programmer to be
lazy... They can return '0' or 'NULL' or some other null pointer value.
If that type is a 'struct' type, it can be awkward to construct such a
'struct' value. ...Or they can use whatever interface you provide.

2. A rule that _no_ functions have a non-'void' return type. If the
programmer wishes to return any status at all, it might be easiest for
them to use whatever interface you provide.
Similarly, there is nothing to force a developer to differentiate
between conditionals that result in a particular error code being
returned.

E.g.,

MakeErrorCode(foo...)
// Note that the FILE/LINE/func point to the above line, which may
// not be *any* of the following conditionals!
...
...
if (value > 27)
return foo;
...
if (value < 36)
return foo;
...
if (today == tuesday)
return foo;

Of course, there are countless ways of arranging control structures
to result in this same sort of behavior.

I had considered turning MakeErrorCode(error, ...) into a legitimate
*statement* that resolved to:
return error;
This would *ensure* that the error code identified the exact place
where the error was "returned". But, it still doesn't avoid the
above problem:

while (forever) {

if (value > 27)
break;
...
if (value < 36)
break;
...
if (today == tuesday)
break;

}
MakeErrorCode(foo, ...)

There are many other limitations that this imposes on coding style,
as well.

True. Beyond encouraging the use of some kind of interface for
returning status and making it easy, I don't know what else you can do,
short of actual discipline.

Suppose you have:

#include <string.h>
#include <stdio.h>

/*** Macros */
#define MakeStatus_(name_, ...) ( \
(const struct s_status){ \
.name = # name_, \
.file = __FILE__, \
.line = __LINE__, \
.func = __func__, \
__VA_ARGS__ \
} \
)
#define MakeStatus(...) MakeStatus_(__VA_ARGS__,)
#define Success MakeStatus(Success)
#define Status(status_, name_) ( \
!strcmp((status_).name, # name_) \
)

/*** Object types */
typedef struct s_status status_t;

/*** Function declarations */
status_t foo(int);
status_t bar(double);
void ShowStatus(const status_t);

/*** Struct/union definitions */
struct s_status {
const char * name;
const char * file;
long line;
const char * func;
const char * detail;
const char * long_detail;
};

/*** Function definitions */
int main(void) {
status_t status;

status = foo(13);
ShowStatus(status);
return 0;
}

status_t foo(int x) {
status_t status;

if (!x)
return MakeStatus(ErrZero);

status = bar(x);
if (Status(status, ErrBadNum)) {
return MakeStatus(
ErrBadNum,
.detail = "A bad 'bar' number resulted",
.long_detail = "Calling 'bar' with the value provided"
" resulted in 'bar' yielding an exception"
);
}

return Success;
}

status_t bar(double d) {
if (d == 13.0)
return MakeStatus(ErrBadNum);

return Success;
}

void ShowStatus(const status_t status) {
printf(
"%s : %s() : line %ld",
status.file,
status.func,
status.line
);
if (status.name)
printf(" : %s", status.name);
if (status.detail)
printf(" : %s\n", status.detail);
if (status.long_detail)
printf(" %s\n", status.long_detail);
return;
}

If you bury the macros and definition of 'struct s_status', the
programmer might find it easy to use the interfaces you provide, such as
'MakeStatus' and 'Success' and 'Status'.

Above, the 'MakeStatus' invocations shown happen to initialize a
'const'-qualified structure with constants. GCC happened to find it
convenient to make each invocation an object with 'static' duration
"behind the scenes."
[I've not come up with a way to force developers to "do the right
thing"]

Stand over their shoulder, perhaps.
Correct. My first approach generated "static const char *" to embed
the error messages in the modules. The sizes grow too quickly. I
then tried parameterizing "MakeErrorCode" so that I could specify
*where* (segment) to store the messages. Same problem only different.

Wait, why was that the "same problem only different"? A linker script
should be able to keep things separate. The executable could omit the
extra detail and the linker could produce some object file for all the
extra detail... No?
Finally, I decided the messages need not have any relationship to
the code *other* than the "error code" which acts as a handle/index
to tie everything together.

Ok, whatever unique value the caller needs to be able to test for.
So, I could pull all of this text out of the "executable" and
load it as needed, on demand. Application starts up faster and
has a smaller footprint.

Sure. And the extra detail can still be accessed, as needed.

"Errors are the exception, not the norm"
I'm not sure that I agree with you, there. "Error" and "warning" and
"success" to me are just "status" or "state." If you branch based on
the current state, that seems pretty normal, to me.
Exactly ----------------------------------------^^^^^^^^^^^^^^^^^^^

I was using the C terminology, just in case you didn't catch that. By
"translation process," I simply meant the process that we sometimes see
split into preprocessing, compilation, assembly, linking.
Yes. (More follows)

The most important thing is for this to be REALLY painless for the
developer. If he has to write code to manipulate structures he's
just going to take the easy road and have *every* error handled
by a single "error exit".

Macros can help, I think.
He still has to implement the functionality to check for those
error conditions (i.e., he can't allow a string of characters
with two decimal points to be accepted as a valid numeric value!).
But, if he has to do "something extra" to report this as a
*different* "error" than "Err_MissingSign" or "Err_MultipleSigns",
he's more likely to just settle on "Err_BadValue" and lump all
of those "situations" into a single error report:

"There's something wrong with the value you typed in"

[I think most people are lazy when it comes to anything beyond
"getting the code to work"]

I don't know about "most." Maybe... I haven't met "most," so can't
really be sure.
As regarding "uniqueness", yes, all the error codes should exist
in a single namespace. In something like C++ you could easily
support multiple namespaces -- so each "library" could implement
its own "error code-space". You can do similarly in C (with a bit
of work). But, in each case, you would then have to qualify
Err_<whatever> with an indication of the namespace in which it
was defined.

I thought to tackle this with FILE/LINE/func but how does the
*developer* indicate that he wants the "Err_Overflow" that
is defined by "doubleMath.c/27/dadd" and not the one that is
defined by "networkDriver.c/85/enqueuePacket"?

I think one of the biggest challenges here is the two criteria:
- Status codes should be established in the body of the functions that
produce them.
- Status codes need to be available to calling functions.

With C, I don't know of a way without extensions for subjects defined in
a function body to be readily available to another function. If you
want this kind of granularity, every function could have "a namespace."
Every function could go proceed thusly:

/* foo function header */
struct s_foo_result { char c; };
typedef const struct s_foo_result * foo_result_t;
typedef const struct s_foo_result const_foo_result_t[1];

extern const_foo_result_t ErrFooTooManyDecimalPoints;
extern const_foo_result_t ErrFooOutOfRange;

foo_result_t foo(const char * num_str);

Then the .c file that implements 'foo' can #include this header and the
..c files that call 'foo' can use this header. The callers can check for
'result == ErrFooTooManyDecimalPoints' and 'foo' itself can return
'ErrFooTooManyDecimalPoints'. While this satisfies the second criteria,
it doesn't satisfy the first; the error codes are declared outside of
the function body.

If you really want to satisfy both criteria, you could use something
similar to the 'strcmp' in the code example up above.
- No header-changes for new error codes.
- Error codes are produced in the functions... That produce them.

This trades some performance, of course, but _something_ has to "give"
sooner or later, doesn't it?
It was easier to treat file/line/func as *data* that tags along
with the error code instead of making it *part* of the error code.

I'm sure a 'MakeStatus' macro could use just the 'name' member ("error
code") and "send" the other detail ("additional detail") to other linker
sections.
Stepping ahead to answer your question a few (of your) lines
further along:

In a single namespace, you run the risk of two (independant)
developers picking the same name for an error code. Just
like they could pick the same name for a function, global,
etc.

<someone> sets policy to ensure this doesn't happen. (e.g.,
all my names begin with "don_").

What I want from this error scheme (the question you pose below)
is for the mechanism to tell me if two people have chosen the
same name. I.e., if there are two error codes that I can't
distinguish *solely* based on their names. (since I want to
use *just* the name as the index into the "list of explanations",
etc.).

If you build a table of #defines (or equivalent), then the
compiler can tell you if you have redefined a symbol. But,
you may have split that table into smaller tables so that not
all of the error code "symbols" are in one place! You might
choose to do this to better manage the namespace (so you
can develop separate subsystems/libraries without having to
coordinate your choice of error names *in* a file). Or, as
an efficiency hack (so all files that refer to ANY error code
don't have to parse the entire list of error codes). Or, to
reduce the number of make dependencies (foo.c only depends
on matherrorcodes.h instead of allerrorcodes.h). Etc.

If you've split them, you need a special step to reconcile
them against each other to ferret out potential duplicates.
If you forget to do this, you run the risk of ambiguity in
an error report!

Perhaps a convention of prefixing the status code with the name of the
function that returns that status would do. In fact, given:

struct s_status {
struct s_status (* func)();
const char * name;
};

a function could populate such a 'struct' with a pointer to itself along
with the short status code (name). The result of a function call could
be assigned to a 'struct s_status result;' and a comparison function
could compare to such structures as equal if their 'func' member pointed
to the same function (namespace) and 'strcmp' revealed them to have the
same code (name). No header changes here, when handling a new state.
Building/populating data structures with explicit values
that can then be reported, etc. like your:
memset(status, 0, sizeof *status);
SetStatusMessage(status, "Bar failed while blargling");
SetStatusSourcePoint(status, __FILE__, __LINE__);
etc. What happens if the developer forgets to take some
or all of these steps? (Or, if he's just lazy!)

That's something that macros can be good at helping with, I think.
I don't follow here, either. Are you saying that, in this example, the
'Err_TooManyDecimalPoints' in this function would not collide with
another function using 'Err_TooManyDecimalPoints' in a similar fashion?

If so, what would a caller use to detect that error?

These were the questions I anticipated above.
He concentrates on writing the code to implement the functionality
desired (as specified in the contract), detecting the exceptions
that are encountered (also in the contract) and DESCRIBING those
exceptions -- *here* (where their nature is most apparent) without
requiring those descriptions to *remain* "here" (in the deployed
code).

Aha! This seems to answer one of the questions I had, above. You wish
for the additional detail associated with the exceptional condition to
be [possibly] stored _outside_ of the final resulting program. This
really seems like you could benefit from some extensions that allow you
to specify sections for things. Then a linker script can omit those
sections from the final resulting program and build something else from
those sections, used for error lookup.

Yes. As I said, I tried that in one incarnation of MakeErrorReport.
I have moved to the point where I now have different tools to
extract different bits of information from the sources. Much like
LP's tangle and weave

(see http://en.wikipedia.org/wiki/Literate_programming)

Since the error *messages* don't usually need to be updated
(i.e., available in the run-time) during debugging, this lets
me skip a fair bit of processing while I am concentrating on
"getting the code right".

Ok.
Am I following your meaning, here? Are you doing this already?


Do you mean that the programmer uses their convenient "default" language
with 'MakeErrorCode', but the entire blob of additional error code
detail can be swapped with one for another language, etc.?

MakeErrorCode will (eventually... I only speak one language -- and
poorly at that! :> ) allow a developer to add "tags" to each
message. I.e., "This is the en_US version of the short error
message for this error code." "This is the pt_BR version of
the long error message for that same error code."

In the newly-posted code example offered way above, the 'MakeStatus'
macro accepts '.detail = "more detail"' and '.long_detail = "even more
detail"' arguments. I suppose you could add multiple languages, seeing
as how the designators are optional!

Again, if you use extensions (which I think you say you already have, at
one point) to put things in particular linker sections, you can
hopefully keep them out of the program.
In theory, you could do this "elsewhere" -- i.e., have someone
browse the "master error message table" and add columns for
each supported language, doing the translation *there*. But,
you would then probably want a mechanism for "back porting"
that information to the source files (so that the source files
become the authoritative reference for ALL versions of the
messages). You'd hate to have to rehire that same translator
*next* project to tell you, AGAIN, what the spanish phrase for
"out of memory" is...

I can understand your wish to keep the meta-data close to the associated
code-points. Some people just want to see code. Some people want
meta-data to tag along with code.
[Note that you could always configure the "translation process(or)"
(your term) to only extract en_US messages (for a US market) or
some *combination* for an international market.]

My earlier use of "translation" clarified above, but I get your point.
[Grrrr... I was just distracted by a friend's (banal) question
which caused me to forget what I was going to write! :< ]

As I said, I already *have* all this. Now, I am trying to have it
EFFICIENTLY! :>

Do you mean, you want a rebuild process to rebuild only those things
whose detail/dependencies have changed, and not the entire code-base?

Yes! I'm using this in *really* big projects and can't afford to
wait a day to "make world" each time some new error code is created,
deleted, changed, etc.

Again, people are lazy (I am a people! :> ) If you waste a day
because of some mechanism, you will quickly learn NOT to use it!

It'd seem that a string comparison strategy might work for this. Adding
a string literal only changes the code where it's added. A macro or
identifier for some kind of constant that's available globally would
seem to need to have file scope, and could require more retranslation
than the former.
But that (speaking from ignorance, here) handled the problem by
partitioning the error code. I.e., math routines could use
errors 0x2340 thru 0x234F; disk I/O routines could use 0x2350 - 0x2357,
etc. (?)

I wasn't referring to the error codes, actually. I was referring to
iPXE's use of extensions to accomplish magic: Weak symbols, specifying
linker sections, defining elements of an array from separate translation
units, etc.
So, an arbitrary source file could #include (or manually synthesize!)
a particular "SUBSYSTEMerrorcode.h" without concern for how or where
the errors were being signaled in the modules that made up that
SUBSYSTEM.

Ok.


Understood. But, that would intimidate the developer. He'd quickly
rationalize why he "doesn't need to report error details". If,
instead, all he has to do is *differentiate* individual error
"instances" and give a description of what they signify/indicate,
it gets hard for him to justify NOT doing this. Especially when
it is easy for all of his peers to do so. Reliably!

Macros that hide the complications of implementation "intimidate the
developer?"
"Gee, Bob, how come those aspects of the product/program that
you coded all give 'error 27' messages and nothing more? Alice's
part gives all sorts of detail explaining the exact nature of
each DIFFERENT problem! People are having a hard time using
*your* features..."


I'm saying that the above is the bare minimum that a developer
implementing a "parse string for numeric value" function would
have to code (well, he could technically treat ALL errors as
BAD_VALUE -- which might be a *tiny* bit less complex).

Then, I'm saying "compare this BARE MINIMUM to what I am proposing".
What I am *imposing* on the developer really doesn't look like a
whole helluvalot!
"You walked all the way to the grocery store and you
couldn't bring back a LOTTERY TICKET for me???"

The less I impose, the harder it is for folks to rationalize
not complying.

Sure.

I.e., if "Alice's" (above) parts of the project are slicker'n snot
BUT TOOK HER MONTHS LONGER THAN NECESSARY to add that extra
reporting functionality, then Bob can point this out as yet
another justification for NOT doing things that way! That
"lottery ticket" doesn't really take up much space in your pocket
as you walk back from the grocery store! :>

I'm not sure how macros to make the tricky stuff opaque and simple are
leading to this kind of response. There are probably a few ways to
accomplish your goal with different trade-offs, but certainly one of the
easier parts could be using macros to make it comfortable for programmers.
[your code elided as I fear I am already reaching the post-length
limit :< ]
Would it be difficult to wrap this up with a macro and call that macro
'MakeErrorCode'? Or is that what you're doing?

MakeErrorCode just shows the "translation processor" where the
information regarding the error code resides in the source file.
(I.e., it makes my life -- in writing that processor -- easier).
It doesn't generate *any* code (in the examples I've shown so
far). All it does (for the source file) is ensure that
"Err_Whatever" can be resolved by the compiler.

Aha. Well if you have a tool that can work with the source code and
doesn't require C at all, then you can probably accomplish anything you
like, including tracking dependencies and adding logic to a 'make'
process to control what needs rebuilding after changes are made.
[...other stuff...]
So taking the example of 'Err_TooManyDecimalPlaces':

#1 Before that is ever used in any code anywhere, you build your code.
Then one day you come along and realize it's an exceptional condition
that you'd like to be able to document and catch. So are you saying
you'd like to be able to make a minimal change to 'whatever.c' where you
introduce the exception and associate additional detail (documentation)
with it, and not have to recompile any of the other translation units
(roughly: .c files) files but simply be able to re-link and have the
"additional detail database" updated accordingly?

Correct. Before that "one day", you can grep the sources and
never turn up a reference to Err_TooManyDecimalPlaces (I should
have picked a shorter name! :< ).

On that "one day", you add a MakeErrorCode(Err_TooManyDecimalPlaces,...)
to whatever.c. Of course, whatever.c has now changed so *it* has to
be recompiled (presumably, you also referred to Err_TooManyDecimalPlaces
somewhere inside whatever.c -- in addition to "defining" it.

But, since no one else references Err_TooMany..., no one else should
NEED to be recompiled!

Yet, any functions that call the function in whatever.c that can
*return* Err_TooMany... will now recognize that this is STILL an
error code (i.e., whateverFunction() has returned yet another
particular error code!) AND, if asked to expound on the nature
of that error, will be able to lookup the message associated
with Err_TooMany... and provide it to the user.

Ah yes.
Correct. caller.c should just be able to refer to Err_TooMany...
without a whole lot of fuss.

Ok.


EXACTLY!!!!! This is the approach I am now trying.

In essence, the Err_ are treated as extern's. So, the compiler
doesn't care that they aren't (yet) defined. They just have
to be resolvable at linkage time.

Instead, the linkage editor resolves and binds them to their actual
values (I have to fabricate a dummy errorcodes.c that instantiates
all of the Err_'s as "addresses").

But, it (appears to?) satisfy those goals of minimizing recompiles,
etc. Though I still have to "process" the source file to determine
which Err_'s are defined within (or, force the developer to explicitly
declare them in global scope). Duplicate names are detected by
the linker, etc. And, I can get a cross reference map showing what
is referenced, where.

Unfortunately (?) it means sizeof(result_t) is now the same as
that of a void *. I guess I could artificially map them anywhere
in the address space and play pointer math games to ensure
SUCCESS, warn and error are easily derived from the resulting
values. That way I could shrink them to sizeof(short int), for
example...

'sizeof (result_t)' is not guaranteed to be 'sizeof (void *)', that I
know of. If that's what you get, that hardly seems expensive, given
your objectives. Sometimes 'sizeof (int) == sizeof (void *)', and 'int'
is pretty common as a return type.
I'm still chewing on this approach (busy coding something else
at the moment) but I can't see any insurmountable problems so
far...

Best of luck to you.
 
D

Don Y

Hi Shao,

On 3/7/2012 11:41 PM, Shao Miller wrote:

[much elided, throughout]
I can think of a couple of tricks for "forcing" a programmer to adhere
to certain standards of returning status to a caller that might work:
1. A rule that _all_ functions must return the same type. If that type
2. A rule that _no_ functions have a non-'void' return type. If the

Yes, but there is no way to force the developer to use the "macros"
in the right place. I had thought of letting MakeErrorCode actually
*return* the error code (i.e., in addition to side-effects, having
it generate "return error;"). But, that means you can't synthesize
an error code someplace where it is MEANINGFUL and return it
from someplace *else* (e.g., after some cleanup/housekeeping)
True. Beyond encouraging the use of some kind of interface for returning
status and making it easy, I don't know what else you can do, short of
actual discipline.
---^^^^^^^^^^^^^^
[I've not come up with a way to force developers to "do the right
thing"]

Stand over their shoulder, perhaps.

<frown> I like to walk away from projects once they've been shown
to work (or *be* workable). I don't want to spend my time holding
peoples' hands or arguing with them about why/when/where they should
do something.

I think showing a *pattern* is the biggest inducement to getting
folks to "do as you've done". People are lazy and easily intimidated
by big/complex pieces of code. They don't *want* to understand
what's happening. They want to be able to figure out -- with some
degree of certainty -- how to make the changes that they need
without making a career (or a FIASCO!) out of the task.

If you've ever had to interview job applicants, you've seen the
"deer-in-the-headlights" look that results from offering them a
BLANK sheet of paper and asking for them to code some <whatever>.
E.g., "Write a program to print the known elements".

Most folks have a really hard time figuring out where to start.
They often don't even know what *questions* to ask to refine
the goal -- until after they've started writing something!

Should the elements be listed in order of atomic number? Or
alphabetically by name? Or alphabeticallyOr some other criteria?

OTOH, give them a bunch of tuples defining their numbers,
masses, etc. and you can see how this subtly influences how
they *interpret* the stated goal. E.g.,

{"Hydrogen", "H", 1},
{"Helium", "He", 2},
....

would *probably* see a solution that mimics the order in the
periodic table. OTOH:

{"Actinium", "Ac", 89},
("Gold", "Ag", 47},
....

would probably show a solution that presents the elements listed
by symbol.

And, regardless of the actual implementation, a subsequent request
to ADD A NEW ELEMENT or reverse the order of presentation, etc.
is much easier (and less intimidating!) for them to address.
Wait, why was that the "same problem only different"? A linker script
should be able to keep things separate. The executable could omit the
extra detail and the linker could produce some object file for all the
extra detail... No?

Sorry, I meant that putting the text in ONE "program segment" was
no different than putting it in some *other* program segment. I.e.,
you're eating up address space *somewhere* for something (messages)
that really don't *need* to reside in ANY address space (at least
not permanently)
I'm not sure that I agree with you, there. "Error" and "warning" and
"success" to me are just "status" or "state." If you branch based on the
current state, that seems pretty normal, to me.

Understood. But, this mechanism is intended to address "unexpected
circumstances" (I.e., "What did I do wrong?") for the user. It's
different than, for example, using one factoring scheme for small
numbers vs. another for large numbers. Instead, it addresses the
case of "That number can't be factored (because it is too large
for me; or because it is not an integer; or because it is irrational;
etc.)

My point being that there is no need to make those messages
available "QUICKLY" (i.e., having them resident in the
program image) since they aren't expected to be needed,
normally.
He still has to implement the functionality to check for those
error conditions (i.e., he can't allow a string of characters
with two decimal points to be accepted as a valid numeric value!).
But, if he has to do "something extra" to report this as a
*different* "error" than "Err_MissingSign" or "Err_MultipleSigns",
he's more likely to just settle on "Err_BadValue" and lump all
of those "situations" into a single error report:

"There's something wrong with the value you typed in"

[I think most people are lazy when it comes to anything beyond
"getting the code to work"]

I don't know about "most." Maybe... I haven't met "most," so can't
really be sure.

<grin> As "proof" I offer the FOSS products that you (and I)
can freely inspect.

For years, I listened to programmers complaining that the
reason their code wasn't {documented, well structured,
formally specified, thoroughly tested, etc.} was because
their *boss* didn't give them the time to do it. Pressures
of the workplace.

OK, let's accept that, for the time being.

But, why, when you have NO such workplace pressures (as is
the case for a FOSS project) do you STILL not document,
structure, test, specify, etc. YOUR CODE?? Which PHB is
standing over your shoulder *then*? :>

There is a greater sense of reward getting a piece of code
to "work" than actually finishing up all the other "stuff"
that goes along with the task.

Its far more fun to fell a tree -- than to clean up the
yard full of branches and leaves that are left behind
afterwards.

Its far more enjoyable to EAT a meal than to wash the
dishes and utensils that were used in its preparation.

By contrast, its a lot more TEDIOUS to sit down and
*formally* specify how something will work; to enumerate
a litany of test cases and expected results that verify
the boundaries of a specification (and implementation!);
etc.

At least that's what I've concluded from *my* observations.
YMMV.
I think one of the biggest challenges here is the two criteria:
- Status codes should be established in the body of the functions that
produce them.

To be pedantic, the *code*/message don't need to be established
in the body of the function. Rather, some means of tying the
code *to* a special point *in* the function (from which a
developer could determine EXACTLY where the "condition" was
detected) is necessary.

My example tried to bundle everything into one package
(MakeErrorCode) -- again, in the belief that it would be
more likely for developers to do *one* thing than to
have to do *several*.
- Status codes need to be available to calling functions.

With C, I don't know of a way without extensions for subjects defined in
a function body to be readily available to another function. If you want
this kind of granularity, every function could have "a namespace." Every
function could go proceed thusly:

/* foo function header */
struct s_foo_result { char c; };
typedef const struct s_foo_result * foo_result_t;
typedef const struct s_foo_result const_foo_result_t[1];

extern const_foo_result_t ErrFooTooManyDecimalPoints;
extern const_foo_result_t ErrFooOutOfRange;

foo_result_t foo(const char * num_str);

Then the .c file that implements 'foo' can #include this header and the
.c files that call 'foo' can use this header. The callers can check for
'result == ErrFooTooManyDecimalPoints' and 'foo' itself can return
'ErrFooTooManyDecimalPoints'. While this satisfies the second criteria,
it doesn't satisfy the first; the error codes are declared outside of
the function body.

If you really want to satisfy both criteria, you could use something
similar to the 'strcmp' in the code example up above.
- No header-changes for new error codes.
- Error codes are produced in the functions... That produce them.

This trades some performance, of course, but _something_ has to "give"
sooner or later, doesn't it?

Note my current approach of handling this in the linkage editor
seems to give me this without adding other constraints (though
it forces error codes into the global namespace, unfortunately).

In C++, you could create a spearate namespace for each function
and publish those errors (header files?) so different functions
could reuse names without conflict. But, that's just another
level of complexity that I don't think a C implementation
necessarily needs...
Perhaps a convention of prefixing the status code with the name of the
function that returns that status would do. In fact, given:

Sure, you're just imposing a *naming* discipline.
In the newly-posted code example offered way above, the 'MakeStatus'
macro accepts '.detail = "more detail"' and '.long_detail = "even more
detail"' arguments. I suppose you could add multiple languages, seeing
as how the designators are optional!

Yes. As I was going to use the MakeErrorCode as an easy to
identify *token* in the source file (for a separate tool
to extract), I could embellish it with "tags" for each
particular locale and the external "processor" could then
decide how to handle those different translations.

E.g., if you want to build "with spanish language support",
that processor would extract the "es" tags, etc.
I can understand your wish to keep the meta-data close to the associated
code-points. Some people just want to see code. Some people want
meta-data to tag along with code.

I think I will ultimately end up moving more towards an LP-style
implementation. There is just *way* too much information that
has to be (or, *should* be, IMO) "close" to the code. It will
get difficult to sort through the documentation/error mechanisms
to figure out what the code is actually *doing* :<
Macros that hide the complications of implementation "intimidate the
developer?"

I try to NOT intimidate the developer by just adding one
concept to the file's syntax: MakeErrorCode. The developer
doesn't have to do anything other than add instances of
this *and* the associated messages. All the rest of the
mechanism is hidden from him -- no "other" macros that he
has to remember to invoke.

E.g., I rewrite the source file (after parsing it) so he
doesn't even have to remember declaring Err_TooManyDecimalPoints
as "extern" -- despite the fact that the "macro" that causes
that symbol to have meaning is invoked deep inside some code
fragment.
I'm not sure how macros to make the tricky stuff opaque and simple are
leading to this kind of response. There are probably a few ways to
accomplish your goal with different trade-offs, but certainly one of the
easier parts could be using macros to make it comfortable for programmers.

Yes. But the more macros the developer has to invoke -- and the
more constraints on how/where they are invoked -- the more tedious
the issue becomes.

E.g., when I build finite automata in my code, I have macros
that let me define individual "state transition tables" using
a nice, reader friendly syntax.

But, the developer still has to combine those tables and
inter-reference them with other structures that he needs to
create. E.g., one is a list of pointers to all "state transition
tables". He (me) has to build this manually -- each time he
adds a new state transition table, he has to remember to put an
entry for that in the "list of pointers".

There's no easy way of doing this with macros because the
effects have to apply in different parts of the file. (I
could do it by writing a tool that parses the file and
creates the other entries *prior* to compiling the file).

So, this is a brittle implementation. If I get lazy, I
get bit.
Aha. Well if you have a tool that can work with the source code and
doesn't require C at all, then you can probably accomplish anything you
like, including tracking dependencies and adding logic to a 'make'
process to control what needs rebuilding after changes are made.

The linkage editor approach, I think, will give me the bare minimum
impact on the build process. It moves everything to the global
level and lets the linker deal with resolving things. You
don't have to recompile every module that uses printf(3c) just
because you changed the implementation of printf!
'sizeof (result_t)' is not guaranteed to be 'sizeof (void *)', that I

Of course, that depends on how I typedef result_t! :>
know of. If that's what you get, that hardly seems expensive, given your

It means that I now have something on the order of a 32b "key"
that I use to "lookup" messages (instead of, perhaps, a short).
objectives. Sometimes 'sizeof (int) == sizeof (void *)', and 'int' is
pretty common as a return type.


Best of luck to you.

<shrug> Yet another learning experience! ;-)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,012
Latest member
RoxanneDzm

Latest Threads

Top