Real-time developers/designers: Can abort() be used to fail-fast in a safety-critical system?


M

Marc

Here is an opportunity to shine. I only seek answers from very
experienced real-time safety-critical system designers and implementors.

Can you convince me that abort() can be used to fail-fast in a
safety-critical system?

If you say "it depends", explain, but don't stay in theory land or "it's
a team process"-land, as only true usage/end-product counts this time.
Any and all real examples that you have implemented in safety-critical
systems are fair game. You did the ejection seat system design and coding
for the F15? Great! YOU are the one I would like an answer from and such
others. The more responses, the better, as long as the are from a top gun
in the field.

Can you provide an actual example that you implemented and were
responsible for? Long-term full-time real-time developers of
safety-critical systems at the level of designer/architect of entire
systems or major safety-critical subsystems as well as being the
low-level implementor of many such things for many years would help
weight your answer. Please don't answer if you have just read about it or
are theorizing and have not many years of guru-level experience designing
and implementing safety-critical real-time systems or if you simply
worked on such a project without being the technical and responsible
lead. Full-time and many years of real-time safety-critical
implementation experience only please. Don't be one of those who has 20
years of experience but repeated year one 20 times. I know that it is
rare when experience counts, but this time it does. <wink>. This is not a
job interview or screening.

In helping you answer this question to my satisfaction, expansion of
instruction-level code and an actual use case would be "a picture that
says a thousand words", but don't let that prevent your own approach. The
use case is so important and C or C++ are both fine.

(I realize I should have asked this in another forum, but since I started
it here in another thread, I will try and finish it here too if
possible.)
 
Ad

Advertisements

S

Seebs

Here is an opportunity to shine. I only seek answers from very
experienced real-time safety-critical system designers and implementors.

Ah, well, my answer will be of no interest to you then, but maybe someone
else will care.
Can you convince me that abort() can be used to fail-fast in a
safety-critical system?

I can't. And, for that matter, I'd argue that this isn't just because
you're very demanding in qualifications, but because it Just Ain't So.
Please don't answer if you have just read about it or
are theorizing and have not many years of guru-level experience designing
and implementing safety-critical real-time systems or if you simply
worked on such a project without being the technical and responsible
lead.

I thought about this request, and decided to refer you to Arkell v.
Pressdram.

I'm not coming to this from the position of a mythical guru in
safety-critical systems, whose twenty years of experience could be
largely outdated now, but from the position of someone who knows a
decent bit about C and C implementations.

I suppose someone could in theory develop a C implementation in which
abort() could be a reasonable choice for such a thing, but it wouldn't
be something they'd be expected to do for standards conformance, and
it wouldn't be a likely implementation choice for most systems.

-s
 
R

robertwessel2

Here is an opportunity to shine. I only seek answers from very
experienced real-time safety-critical system designers and implementors.

Can you convince me that abort() can be used to fail-fast in a
safety-critical system?

If you say "it depends", explain, but don't stay in theory land or "it's
a team process"-land, as only true usage/end-product counts this time.
Any and all real examples that you have implemented in safety-critical
systems are fair game. You did the ejection seat system design and coding
for the F15? Great! YOU are the one I would like an answer from and such
others. The more responses, the better, as long as the are from a top gun
in the field.

Can you provide an actual example that you implemented and were
responsible for? Long-term full-time real-time developers of
safety-critical systems at the level of designer/architect of entire
systems or major safety-critical subsystems as well as being the
low-level implementor of many such things for many years would help
weight your answer. Please don't answer if you have just read about it or
are theorizing and have not many years of guru-level experience designing
and implementing safety-critical real-time systems or if you simply
worked on such a project without being the technical and responsible
lead. Full-time and many years of real-time safety-critical
implementation experience only please. Don't be one of those who has 20
years of experience but repeated year one 20 times. I know that it is
rare when experience counts, but this time it does. <wink>. This is not a
job interview or screening.

In helping you answer this question to my satisfaction, expansion of
instruction-level code and an actual use case would be "a picture that
says a thousand words", but don't let that prevent your own approach. The
use case is so important and C or C++ are both fine.

(I realize I should have asked this in another forum, but since I started
it here in another thread, I will try and finish it here too if
possible.)


Well, if you're using MISRA, rule 126 specifically prohibits the use
of abort().
 
G

Goran Pusic

Here is an opportunity to shine. I only seek answers from very
experienced real-time safety-critical system designers and implementors.

I am not that person, so no answer from me to you.
Can you convince me that abort() can be used to fail-fast in a
safety-critical system?

IMO, this is a mighty vague question and a "guru" that does give an
answer to it, bloody isn't.

What does "fail-fast" mean? What is the system in question? What about
hooking on SIGABRT? What speed do you need? what speed can you achieve
on your system with some example uses? What are abort() speed
guarantees __on your implementation__? You're talking about real-time;
which flavor? "hard", where you control perf aspect of every single
artifact; or "soft" which in itself is too vague to answer a question?

Perhaps what abort() is supposed to do is already way too slow on
hardware or implementation you're using. Did you even measure
anything? abort() should close open file streams. How much time does
that take __on your system__ (depends on the number of handles, you
know)? Do you care if they are not closed? Do you have a system where
they stay open after you crash (OS doesn't clean up after a process
crash)? If yes, and you restart the process, you will eventually run
out of resources. Or do you re-boot the system after the crash? If so,
you don't care about those handles and for speed reasons you could
avoid abort.

Frankly, if OP had an idea/opinion/experience about things above __on
his target system__, he would not be asking here.

Methinks this question is more of a clueless shot in the dark than
anything else.

Goran.
 
C

Chris H

Marc <[email protected]> said:
Here is an opportunity to shine. I only seek answers from very
experienced real-time safety-critical system designers and implementors.

Then you are probably in the wrong news group.
Try the York safety group or similar
Can you convince me that abort() can be used to fail-fast in a
safety-critical system?
No.

If you say "it depends", explain,

Ok... It depends entirely on the specific context in your application.
There are far to many variable to give a generic answer.
Can you provide an actual example that you implemented and were
responsible for?

I doubt any one would do that in a public space.
 
C

Chris H

In message <[email protected]
..com> said:
Well, if you're using MISRA, rule 126 specifically prohibits the use
of abort().

Of course MISRA-C:98 Rule 126 could be deviated if you have grounds to
do it. Read the notes under the rule or better still use the 2004
version of MISRA.
 
Ad

Advertisements

J

James Kanze

Here is an opportunity to shine. I only seek answers from very
experienced real-time safety-critical system designers and implementors.
Can you convince me that abort() can be used to fail-fast in a
safety-critical system?

You've already had the answer, several times. From people (like
myself) with real experience in real-type safety-critical
systems.

[...]
Can you provide an actual example that you implemented and were
responsible for?

Locomotive brake system. We didn't use abort, because it wasn't
present (no underlying OS to return to); we did the equivalent,
however, shutting the system down as rapidly as possible.

Other more or less critical systems I've worked on (electric
power distribution, and a lot of telephone routing systems)
behaved similarly.
 
S

Seebs


I was thinking about this, and I've concluded that the answer is
almost certainly "yes". If you read carefully, you will note that his
question is not "Can abort() be reasonably and successfully used
to fail-fast in a safety-critical system without violating requirements
or specifications."

There are two obvious ways to get to a "yes" answer. One is to observe
that the OP never specified that the usage had to be successful, correct,
or acceptable to the client, or not result in people dying. The other
is to observe that the OP is apparently a bit on the careless side and
much impressed by Credentials in and of themselves. Thus, I would argue
both that the answer to the question literally asked is "yes", and that
even if it weren't, it would be easy for someone to convince the OP that
it was.

-s
 
Ö

Öö Tiib

I was thinking about this, and I've concluded that the answer is
almost certainly "yes".  If you read carefully, you will note that his
question is not "Can abort() be reasonably and successfully used
to fail-fast in a safety-critical system without violating requirements
or specifications."

Correct answer is "uncertain". Question is about possibility to
convince him that the technique can be used to fail fast. If someone
can be convinced in something or not is uncertain unless proven
otherwise.

[...]
much impressed by Credentials in and of themselves.  Thus, I would argue
both that the answer to the question literally asked is "yes", and that
even if it weren't, it would be easy for someone to convince the OP that
it was.

People have said to him that it can be used and has been used several
times. He is still not convinced but displays interest in it so it is
still uncertain. Your opinion displays (a surprising trait in usenet)
that you haven't perhaps meet enough such people who are *hard* to
convince. ;)
 
S

Seebs

Correct answer is "uncertain". Question is about possibility to
convince him that the technique can be used to fail fast. If someone
can be convinced in something or not is uncertain unless proven
otherwise.

I would say that, given a bit of research into psychology, and the fact
that he's asking the question, we can be reasonably confident that *someone*
could convince him.
People have said to him that it can be used and has been used several
times. He is still not convinced but displays interest in it so it is
still uncertain. Your opinion displays (a surprising trait in usenet)
that you haven't perhaps meet enough such people who are *hard* to
convince. ;)

Oh, I've met people who are hard to convince. But after all, when people
ask a yes or no question, they usually want the answer you think most likely,
not only answers which you are sure you can fully prove.

Usually. :)

-s
 
C

Chris H

Seebs said:
I would say that, given a bit of research into psychology, and the fact
that he's asking the question, we can be reasonably confident that *someone*
could convince him.

It must be a Friday afternoon :))))
 
Ad

Advertisements

A

Adam Skutt

(I realize I should have asked this in another forum, but since I started
it here in another thread, I will try and finish it here too if
possible.)

Starting another topic asking the exact same question in this forum
will not get you the answers you seek, for the reasons I already
outlined to you before. Like it or not, you don't need to directly
converse with an expert to get the answers you seek.

There's an abundance of literature on the subject all over the
Internet. Look it up. Though if you can't understand the basic
precept, "Life-critical covers a huge field of products and what is
acceptable coding entirely depends on which products you're talking
about," one begins to wonder if such a simple task is entirely beyond
you.

Adam
 
A

Adam Skutt

abort() should close open file streams. How much time does
that take __on your system__ (depends on the number of handles, you
know)?
No, the only thing abort() has to do is never return (in ANSI C).
Anything else is best effort behavior, like most forms of process
termination in C / C++ (including the actual termination of the
process itself, perhaps strangely enough). Several implementations of
UNIX flush stdio streams only. POSIX used to mandate that
implementations effect fclose(), this was reduced to 'may affect
fclose()' for ANSI C compatibility.

I've worked on multiple embedded platforms where abort() was really
just a way to call the processor-specific halt instruction. You
didn't even get SIGABRT, because the platform lacked signal handling.

Adam
 
J

James Kanze

On Dec 19, 4:11 am, (e-mail address removed) (Gordon Burditt) wrote:

[...]
If you're controlling the fuel rods in a nuclear reactor, killing
the power to the rod controls (which should let the rods fall back
in by gravity) and shutting down may be sufficient. Shutting down
the coolant pumps, however, is *not* acceptable, if those are also
under control by the same system.

Critical systems are, by definition, systems; the software is
only a part. Continuing operation in case of an error is not an
option; the software might actually generate commands to put the
rods out. And of course, any system in which the failure of one
piece of software would cause the coolant system to fail would
not be acceptable.
 
Ad

Advertisements

M

Michael Doubez

On Dec 19, 4:11 am, (e-mail address removed) (Gordon Burditt) wrote:

    [...]
If you're controlling the fuel rods in a nuclear reactor, killing
the power to the rod controls (which should let the rods fall back
in by gravity) and shutting down may be sufficient.  Shutting down
the coolant pumps, however, is *not* acceptable, if those are also
under control by the same system.

Critical systems are, by definition, systems; the software is
only a part.  Continuing operation in case of an error is not an
option; the software might actually generate commands to put the
rods out.  And of course, any system in which the failure of one
piece of software would cause the coolant system to fail would
not be acceptable.

This is a question of fail-safe. The OP question is rather about fail-
fast.

Whether the backup system is mechanical ( such as a safe state of the
system) or a continuation of service (duplication, load balancing ...)
or none at all (blue screen) is IMHO outside the scope of this
thread.

I have heard of systems where the buggy systems continues to live and
may be reused when its state converges to the backup system but I
don't know how they work (i.e. decide the transitional buggy state has
passed).
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top