Signal dispositions

G

Giorgos Keramidas

I believe you are completely wrong on this point. Very often a SIGSEGV
will be caused by (say) a single bad array access - the consequences
will be highly localized, and carrying on with the program will not
cause any significant problems.

So the program may 'think' it has saved the important dataset from a
medical patient's important test, but the data has disappeared because
it was written ... well, nowhere in particular.

Do you *really* want this program to go on?
 
L

Logan Shaw

Leet said:
Perhaps you are unaware that some C code is run in safety-critical
environments - having a program that dumps core at the drop of a hat
rather than carrying on running could literally be the difference
between life and death.

If a SIGSEGV can be the difference between life and death, then such
code has *no* *right* to ever *cause* a SIGSEGV, regardless of how the
system is going to respond to the SIGSEGV (ignoring it and letting
the program continue, or aborting it).

There are several solutions that could proper here:
(1) Keep the code simple enough that you can use mathematics to
prove it correct. This has been been done successfully with
some designs. It's not easy, but then we're talking about a
life or death situation here.
(2) Exhaustively test the code. Sometimes this is not possible
due to exponential explosion of test cases, but sometimes
it actually is.
(3) Nearly-exhaustively test the code. Maybe testing every possible
program path isn't possible, but very thorough test coverage
(not just of lines of code, but of "interesting" combination
of inputs) is possible. That might be acceptable if combined
with other quality efforts.
(4) Use a system where, on a *local* basis, *individual* faults can
be determined to be harmless and the program can proceed.
Notice that this is not the same thing as ignoring SIGSEGV
for the entire program and assuming all invalid memory
accesses are OK. Instead, what I'm talking about is a
system where you can say "if THIS block of code goes
outside the bounds of THAT array, then THAT ONE THING
should not be a fatal error, and here is the routine that
will do the error handling and keep the system in a known
good state".

Of course, it's silly to be having a discussion about safety-critical
software in comp.unix.programmer. Maybe there's one that I don't know
about, but as far as I know, there isn't a version of Unix that is
meant to be used in an environment like that. In fact, where I've
checked, license agreements often specifically exclude the use of the
software in such an environment. And for good reason: a system that
can get somebody killed needs to use software that's simpler that Unix.

- Logan
 
S

santosh

Logan Shaw wrote:

<snip>

You might consider dropping c.l.c. from the cross-post and perhaps
replace it with comp.programming and set followups to the same.
 
G

Golden California Girls

Leet said:
Perhaps you are unaware that some C code is run in safety-critical
environments - having a program that dumps core at the drop of a hat
rather than carrying on running could literally be the difference
between life and death. OK, so *maybe* the error condition causing the
SIGSEGV will propagate and bring the program down later, but taking that
chance is a better option than immediately failing.

~Jon~

Continue after error
http://www.netcomp.monash.edu.au/cpe9001/assets/readings/www_uguelph_ca_~tgallagh_~tgallagh.html

yeah right.
 
C

CBFalconer

Leet said:
.... snip ...

Perhaps you are unaware that some C code is run in safety-critical
environments - having a program that dumps core at the drop of a
hat rather than carrying on running could literally be the
difference between life and death. OK, so *maybe* the error
condition causing the SIGSEGV will propagate and bring the program
down later, but taking that chance is a better option than
immediately failing.

Apparently you are unaware that such a program has absolutely no
business running in such a 'safety critical environment'.
 
A

Al Balmer

Send any feedback, ideas, suggestions, test results to
Here's some feedback: Your advertising, release notes, and privacy
policy are inappropriate here, even in a sig block.

Limit your signature to three or four lines, which is plenty of space
to include your URL.
 
A

Al Balmer

Perhaps you are unaware that some C code is run in safety-critical
environments

That's pretty funny, considering I wrote safety-critical code for the
process control industry for over twenty years. Food, petroleum,
polymers, paper, you name it.

If the coolant control program on a PVC reactor crashes, you don't
ignore it and keep cooking. You kill not only the program, but the
process. Otherwise, you kill people.
- having a program that dumps core at the drop of a hat
rather than carrying on running could literally be the difference
between life and death. OK, so *maybe* the error condition causing the
SIGSEGV will propagate and bring the program down later, but taking that
chance is a better option than immediately failing.
Don't bother applying for a job here. We don't insist that all
new-hires be expert, but we do want them to be trainable.
 
J

Jim Cochrane

["Followup-To:" header set to comp.unix.programmer.]
Perhaps you are unaware that some C code is run in safety-critical
environments - having a program that dumps core at the drop of a hat
rather than carrying on running could literally be the difference
between life and death. OK, so *maybe* the error condition causing the

Allowing a process with corrupted data to continue running can also
cause death.
SIGSEGV will propagate and bring the program down later, but taking that
chance is a better option than immediately failing.

~Jon~


--
 
N

Nick Keighley

On 2 Nov 2007 at 20:34, Al Balmer wrote:


Perhaps you are unaware that some C code is run in safety-critical
environments

OHMYGOD

*please* tell me you don't write safety critical code!
 
G

Golden California Girls

Nick said:
OHMYGOD

*please* tell me you don't write safety critical code!


I suspect he did, until his boss found out what he was planning and canned his
ass. Now he's looking to prove his boss wrong. That of he's a troll.
 
C

Charlie Gordon

Al Balmer said:
How on earth would you know what the consequences might be? If the
program in question is calculating my paycheck, I don't want any bad
array access to be ignored.

Someone else might want to check first if the error is worth such a drastic
treatment.

With your suggested behaviour, the paycheck is not printed, and who knows
when the problem will be fixed... If you can wait for your paycheck, you'll
be OK, else too bad.

Alternately, let it print the damn check, there is a good chance the check
will be correct and arrice in time. There is some possibility that the
error is so small as to not be worth reporting. If the error is large, the
you can complain and have it fixed... Or you will not complain and wait for
the bank to figure where these millions came from ;-)

If you are the payer, you probably want the process to stop. If you are the
payee, it is not so obvious.
What kind of programs do you write? Games?

What's not professional is writing code that causes segfaults.


In production code, those signals should never be generated. If they
are, they should crash, so that the user can complain, and someone can
fix it.

If they are, they should be logged and reported yet best efforts should be
extended to minimize the impact on the user. Warning the user of potential
malfunction, requesting urgent attention may be more appropriate than a core
dump with no warning and no restart. Use common sense to determine what be
least impact the user. When the oil gauge trips, the dashboard turns a
light on, it does not immediately block the engine, fire the ejector seats
and vaporize the contents of the trunk.
 
J

James Kuyper

Charlie said:
Someone else might want to check first if the error is worth such a drastic
treatment.

With your suggested behaviour, the paycheck is not printed, and who knows
when the problem will be fixed... If you can wait for your paycheck, you'll
be OK, else too bad.

You're much better off with the program aborting than with it producing
large amounts of problems, which is the most likely case. It might
transfer money from one account to another, without making the balancing
adjustment to another account. It might accidentally wipe out all of the
employee records. It might accidentally wipe out a small random subset
of the employee records, which is worse because it will take longer to
notice.
Alternately, let it print the damn check, there is a good chance the check
will be correct and arrice in time. There is some possibility that the
error is so small as to not be worth reporting. If the error is large, the
you can complain and have it fixed... Or you will not complain and wait for
the bank to figure where these millions came from ;-)

Everything about that paragraph is wrong. The chances are not good that
the paycheck will be correct and arrive on time. There's a large
probability that the error will be a big one. There is no error so small
that it's not worth reporting; tax auditors tend to get very concerned
about even small errors, because they think they might be a signs of
something more serious (and they are right to think that). If the error
is large, fixing it can be very expensive for the payer, and a lot of
hassle for the payee.

....
If they are, they should be logged and reported yet best efforts should be
extended to minimize the impact on the user. Warning the user of potential
malfunction, requesting urgent attention may be more appropriate than a core
dump with no warning and no restart.

The core dump IS your warning, and restart should NOT be attempted until
the problem has been resolved, otherwise you could easily add to the
damage created by the first run of the program.
Use common sense to determine what be
least impact the user. When the oil gauge trips, the dashboard turns a
light on, it does not immediately block the engine, fire the ejector seats
and vaporize the contents of the trunk.

Yes, but the oil guage isn't analogous to a SIGSEGV. A better analogy
would be to compare a SIGSEGV to the motion sensor alarm which triggers
an air bag to explode in your face. When that air bag explodes, you are
guaranteed to lose control of the car, if you haven't already done so
(as I can unfortunately testify to from personal experience). However,
if that motion sensor trips, the situation is generally so serious that
you're probably better off losing control with an airbag in your face,
then you would be if you retained control with no airbag protection.
This isn't necessarily true; in some cases the airbag can kill you when
you would have survived without it, but in general you're safer with it
exploding in your face. That is a very accurate analogy to a SIGSEGV.
 
A

Al Balmer

Someone else might want to check first if the error is worth such a drastic
treatment.

With your suggested behaviour, the paycheck is not printed, and who knows
when the problem will be fixed... If you can wait for your paycheck, you'll
be OK, else too bad.

And if the error is in a control process that blows up a reactor and
kills a few people? How do you correct that mistake?

I think your point is that the problem analysis should take account of
the consequences of an error - that's obvious. Basic systems
engineering. I'm not advocating that the only possible way to treat a
segfault is to stop the program, though in a properly designed control
system, it's usually the best way.
Alternately, let it print the damn check, there is a good chance the check
will be correct and arrice in time. There is some possibility that the
error is so small as to not be worth reporting. If the error is large, the
you can complain and have it fixed... Or you will not complain and wait for
the bank to figure where these millions came from ;-)

All of which will cause more problems, eventually, both to the payer
and the payee. If the system stops, it *will* get fixed. People in
data processing take payroll runs *very* seriously. Did you imagine
that they would just not pay anybody else, and hope for a better run
next week?
If you are the payer, you probably want the process to stop. If you are the
payee, it is not so obvious.


If they are, they should be logged and reported yet best efforts should be
extended to minimize the impact on the user. Warning the user of potential
malfunction, requesting urgent attention may be more appropriate than a core
dump with no warning and no restart.

How do you warn of a segfault before it happens?
Use common sense to determine what be
least impact the user. When the oil gauge trips, the dashboard turns a
light on, it does not immediately block the engine, fire the ejector seats
and vaporize the contents of the trunk.

Not "common sense." Systems analysis.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,764
Messages
2,569,565
Members
45,041
Latest member
RomeoFarnh

Latest Threads

Top