Storing setjmp()'s rv in a variable

S

s0suk3

I've read that the only two ways to use setjmp without invoking
undefined behavior are

- as the expression in an expression statement,
- as part of the controlling expression in an if, switch, while, do,
or for statement. The entire controlling expression must have one of
the following forms, where constexpr is an integer constant expression
and op is a relational or equality operator:

setjmp(...)
!setmp(...)
constexpr op setjmp(...)
setjmp(...) op constexpr

If that's correct, then a simple assignment like this would cause
undefined behavior:

rv = setjmp(env); // where 'env' is a jmp_buf and 'rv' is an int

Well, that's the theory. What about the practice? Are there actually
any compilers where this might go wrong? I'm primarily interested in
Visual C++.

Sebastian
 
N

Nate Eldredge

christian.bau said:
Think about it: setjmp and longjmp modify the control flow in a
program in a very brutal way.In most cases, they are implemented by
putting some assembler code for setjmp/longjmp into the standard
library, not having the compiler treat setjmp/longjmp in any way
different from any other function, and verifying that the code
generated in the presence of setjmp and longjmp meets the requirements
of the C Standard.

The compiler writers had no reason to check that rv = setjmp (env)
produces any particular behaviour in all possible situations, so it
probably won't. You may study the documentation of Visual C++; if it
gives any guarantees beyond what the C Standard requires, go ahead.

But I think the question is: why did the Standard authors feel the need
to disallow assigning the result of setjmp()? I can't imagine an
implentation of setjmp() where this wouldn't work automatically, so if
the Standard did permit it, I wouldn't expect it to be any extra effort
for an implementor to comply.

I would like to know if there is an concrete implementation (real or
conceptual) where this would go wrong, or if this is just paranoia on
the part of the Standard authors who preferred to require a bare minimum
of functionality.

I can think of a reason they might want to allow assigning the result of
setjmp(). Without it, it becomes much more awkward to use it as a
channel for information.

Consider for example the following, non-compliant code:

#include <setjmp.h>
jmp_buf j;

void compute(void);
void helper_function(void);
extern void report_error(int);

void main_loop(void) {
int error_code;
error_code = setjmp(j); /* sadly not legal */
if (error_code != 0)
report_error(error_code);
compute();
}

void compute(void) {
/* complicated stuff */
helper_function();
/* more complicated stuff */
}

void helper_function(void) {
/* further complicated stuff */
if (error_occurred)
longjmp(j, error_code);
}

By the Standard, the only way to recover the error code passed to
longjmp() would be to use a `switch':

switch (setjmp(j)) {
case 0:
break;
case FILE_NOT_FOUND:
report_error(FILE_NOT_FOUND);
break;
case OUT_OF_MEMORY:
report_error(OUT_OF_MEMORY);
break;
case MASTER_CONTROL_FAILURE:
report_error(MASTER_CONTROL_FAILURE);
break;
}

If there are a lot of error codes, this is really unwieldy.

A better workaround would be to introduce another global variable to
store the error code, but that clutters the program.
(As an example, the compiler may assign the same memory location to
variables that never have life values at the same time. This analysis
can go wrong if a volatile variable is only defined much later after
the setjmp (), because its life range actually begins immediately
after the setjmp. )

For this reason, then, presumably the compiler would not be able to
share the memory location of a `volatile' variable with any other
variable, since `volatile' variables are guaranteed to keep their values
after longjmp(). This is in keeping with `volatile's informal meaning
of "don't do any optimizations involving this variable".
 
E

Eric Sosman

Nate said:
But I think the question is: why did the Standard authors feel the need
to disallow assigning the result of setjmp()? [...]

For questions of the form "Why does the Standard do X?"
the first recourse is the Rationale that accompanies it, a
document whose entire purpose is to explain why the Standard
does X. Have you read the Rationale's discussion anent
setjmp() and longjmp()?

If you haven't, pray do so before complaining further.

If you have and you find the arguments unconvincing,
comp.std.c would be a more appropriate forum than this one.
 
J

jameskuyper

Nate Eldredge wrote:
.....
But I think the question is: why did the Standard authors feel the need
to disallow assigning the result of setjmp()?

The Rationale says:

| One proposed requirement on setjmp is that it be usable like any
other
| function, that is, that it be callable in any expression context,
and that the
| expression evaluate correctly whether the return from setjmp is
direct or via
| a call to longjmp. Unfortunately, any implementation of setjmp as a
| conventional called function cannot know enough about the calling
| environment to save any temporary registers or dynamic stack
locations
| used part way through an expression evaluation. (A setjmp macro
seems
| to help only if it expands to inline assembly code or a call to a
special
| built-in function.) The temporaries may be correct on the initial
call to
| setjmp, but are not likely to be on any return initiated by a
corresponding
| call to longjmp. These considerations dictated the constraint that
setjmp be
| called only from within fairly simple expressions, ones not likely
to need
| temporary storage.
|
| An alternative proposal considered by the C89 Committee was to
require
| that implementations recognize that calling setjmp is a special
case, and
| hence that they take whatever precautions are necessary to restore
the
| setjmp environment properly upon a longjmp call. This proposal was
| rejected on grounds of consistency: implementations are currently
allowed
| to implement library functions specially, but no other situations
require
| special treatment.

I've never attempted to design a compiler, not even the toy compiler
that is often assigned as a homework assignment for CS majors (my own
degree is in theoretical physics). Therefore, I'm not sure how well
this rationale explains the committee's decision. I can easily see how
it might apply to something like

array[i++] = setjmp(j);

However, I wouldn't think that a simple assignment with a left operand
that has no sub-expressions would involve any temporary storage.
However, perhaps they didn't want to bother with the complication of
saying something like "no sub-expressions" in proper standardese.
 
N

Nate Eldredge

Eric Sosman said:
Nate said:
But I think the question is: why did the Standard authors feel the need
to disallow assigning the result of setjmp()? [...]

For questions of the form "Why does the Standard do X?"
the first recourse is the Rationale that accompanies it, a
document whose entire purpose is to explain why the Standard
does X. Have you read the Rationale's discussion anent
setjmp() and longjmp()?

I had not; indeed I am embarassed to admit this is the first I knew of
the Rationale. Thanks for enlightening me, it's a very interesting
document.

[I had a little trouble finding it. For other readers: it's linked from
http://www.open-std.org/jtc1/sc22/wg14/ .]

It does make more sense now. I can see that using setjmp() in a
complicated expression would be trouble, since the compiler might be
storing temporary results somewhere that setjmp() doesn't preserve. It
is still hard to imagine how a simple assignment statement could go
wrong, but I have more sympathy for the Standard authors now, and can
understand their desire to require it to work in a minimal set of
cases.

setjmp() by its nature is really a hack, and doesn't carry over well
into the realm of portable program. I'm actually a bit surprised that
the Standard incorporated it given the difficulties it causes.
If you haven't, pray do so before complaining further.

I am appropriately chastened.
 
R

Richard Tobin

setjmp() by its nature is really a hack, and doesn't carry over well
into the realm of portable program.

Not at all. It's a completely natural control structure that most
other programs languages have in one form or another (catch and throw,
for example).

-- Richard
 
N

Nate Eldredge

Not at all. It's a completely natural control structure that most
other programs languages have in one form or another (catch and throw,
for example).

Right, but in that case it's designed as part of the language, and is
integrated in a way that is consistent with the rest of the language.
In the case of C, setjmp() appears to have been added as an afterthought
(it's not mentioned in K&R I) and implemented via (an abuse of) the
function call mechanism, so that it could be done completely in the
library without the knowledge of the compiler. As we see, the
transparency is imperfect, resulting in a "function call" that behaves
differently from any other function call in the language: it can't be
used arbitrarily in an expression, and it has unexpected effects on
local variables. (Not to mention the fact that it "returns twice".)

I have nothing against the control structure per se, but in its present
form setjmp() IMHO constitutes an ugly wart on C.

Of course, we're stuck with it now, so I'm really just whining. :)
 
N

Nate Eldredge

christian.bau said:
It will work on most implementations in most circumstances, but not
always. Assume a highly optimising compiler. Assume that the compiler
can use the same memory location for different variables with
different life ranges. Now take this bit of code:

volatile int x;
int rv = 1, z = 2;

int* p = somecondition () ? &rv : &z;
*p = 0;

rv = setjmp (env);
if (rv == 0) {
x = 1;
somestuff (z);
x = 2;
otherstuff ();
x = 3;
}

return x;

This code would hopefully return x = 1 if somestuff () calls longjmp,
x = 2 if otherstuff () calls longjmp, and x = 3 otherwise, But a
compiler without any knowledge of setjmp/longjmp could use the same
memory location for x and rv, so a call to longjmp () would overwrite
x and return the second argument passed to longjmp instead.

I disagree.

I think that a compiler that can use the same memory location for
different variables with different life ranges *even if one of them is
declared volatile* is in danger of non-conformance.

For example, if we take an even simpler, and legal, example:

#include <stdio.h>
#include <setjmp.h>

int main(void) {
static jmp_buf j;
volatile int x;
int y;
x = 3;
if (setjmp(j) == 1) {
printf("%d\n", x);
return 0;
}
y = 5;
(void)&y; /* please keep y in memory */
longjmp(j, 1);
}

If the compiler stored x and y at the same location, this program might
output "5". By my reading of the standard this is forbidden. It must
ensure that the value of x when setjmp() returns for the second time is
the same as it was when longjmp() was called. The output must be "3".
Therefore I believe that a compiler must not perform this optimization
when one of the variables is declared volatile.

I think this issue is a red herring, unrelated to the question of why
the return value of setjmp() can't be assigned.
 
M

Moi

Think about it: setjmp and longjmp modify the control flow in a program
in a very brutal way.In most cases, they are implemented by putting some
assembler code for setjmp/longjmp into the standard library, not having
the compiler treat setjmp/longjmp in any way different from any other
function, and verifying that the code generated in the presence of
setjmp and longjmp meets the requirements of the C Standard.

The compiler writers had no reason to check that rv = setjmp (env)
produces any particular behaviour in all possible situations, so it
probably won't. You may study the documentation of Visual C++; if it
gives any guarantees beyond what the C Standard requires, go ahead.

(As an example, the compiler may assign the same memory location to
variables that never have life values at the same time. This analysis
can go wrong if a volatile variable is only defined much later after the
setjmp (), because its life range actually begins immediately after the
setjmp. )

Which also means that

http://www.ioccc.org/1992/albert.c

uses the correct form (in a hilarious way).

[ From my memory it stored setjmp()s result in the nodes,
but he only stores the jmpbufs in the nodes. ]

Flow of control is also modified, but there is nothing wrong if you
know what you are doing ;-)

LOL,
AvK
 
T

Tim Rentsch

Nate Eldredge said:
I disagree.

I think that a compiler that can use the same memory location for
different variables with different life ranges *even if one of them is
declared volatile* is in danger of non-conformance.

Right. More specifically, any accesses to any other variables
must be to memory that is disjoint from the memory of a volatile
variable, during the entire lifetime of the volatile variable.
For example, if we take an even simpler, and legal, example:

#include <stdio.h>
#include <setjmp.h>

int main(void) {
static jmp_buf j;
volatile int x;
int y;
x = 3;
if (setjmp(j) == 1) {
printf("%d\n", x);
return 0;
}
y = 5;
(void)&y; /* please keep y in memory */
longjmp(j, 1);
}

If the compiler stored x and y at the same location, this program might
output "5". By my reading of the standard this is forbidden. It must
ensure that the value of x when setjmp() returns for the second time is
the same as it was when longjmp() was called. The output must be "3".
Therefore I believe that a compiler must not perform this optimization
when one of the variables is declared volatile.

I think this issue is a red herring, unrelated to the question of why
the return value of setjmp() can't be assigned.

It's really too bad there isn't some way of saving the result of
setjmp for later use. Practically speaking, I expect a simple
assignment statement to a volatile variable would always work, at
least on most implementations. It might be worth suggesting that
addition to the list of allowed cases.
 
T

Tim Rentsch

I'm sure it was considered. Alas, I don't remember why it was rejected.

I see. Well, that's both reassuring and discouraging.
In any case thank you for the info.
 
K

Kaz Kylheku

I disagree.

I think that a compiler that can use the same memory location for
different variables with different life ranges *even if one of them is
declared volatile* is in danger of non-conformance.

volatile only affects accesses to an object. By by definition of liveness,
access to a variable can only happen in a region of code where the variable
is live. The meaning of volatile is irrevant where the variable is not live.

I disagree with the analysis, though, because we can fix the code by
not assigning the return value of setjmp:

volatile int x;
int rv = 1, z = 2;

int* p = somecondition () ? &rv : &z;
*p = 0;

if (setjmp (env) == 0)
rv = 0;
else
rv = 1;

if (rv == 0) {
x = 1;
somestuff (z);
x = 2;
otherstuff ();
x = 3;
}

return x;

Now x and rv still have different lifetimes, and the compiler is still ignorant
of setjmp. If we apply Christian's analysis, the assignment to rv can still
clobber x. Only now, the program is well-defined, and so the implementation
doesn't have that excuse; it's plain broken.

(Note that I haven't made rv volatile; it doesn't have to be because its value
is never assigned in between the setjmp and longjmp and then accessed after the
longjmp. It's always overwritten after the longjmp.)

Though setjmp and longjmp are fragile, their semantics is sufficiently well
defined that an aggressively optimizing compiler /must/ be aware of setjmp to
some extent. If there is a setjmp in the function, it can't assume that it can
fold the storage of local variables that are in the same scope, even if they
have non-overlapping liveness. The compiler may in fact have to recognize that
setjmp marks the start of a basic block, as if the program were this:

volatile int x;
int rv = 1, z = 2;

int* p = somecondition () ? &rv : &z;
*p = 0;

label:
if (setjmp (env) == 0)
rv = 0;
else
rv = 1;

if (rv == 0) {
x = 1;
somestuff (z); // regarded as potential "goto label;"
x = 2;
otherstuff (); // ditto
x = 3;
}

return x;

In this variant of the code, the variables x and rv no longer have
non-overlapping lifetimes, because x has to be considered live on
entry into the basic block headed by label.

In short, the implementation /cannot/ aggressively optimize in a way that is
completely agnostic of the possibility that control passes into setjmp from
other places in the function, because such passing of control does happen in
well-defined programs!
 
T

Tim Rentsch

Kaz Kylheku said:
volatile only affects accesses to an object. By by definition of liveness,
access to a variable can only happen in a region of code where the variable
is live. The meaning of volatile is irrevant where the variable is not live.

I disagree with the analysis, though, because we can fix the code by
not assigning the return value of setjmp:

volatile int x;
int rv = 1, z = 2;

int* p = somecondition () ? &rv : &z;
*p = 0;

if (setjmp (env) == 0)
rv = 0;
else
rv = 1;

if (rv == 0) {
x = 1;
somestuff (z);
x = 2;
otherstuff ();
x = 3;
}

return x;

Now x and rv still have different lifetimes, and the compiler is still ignorant
of setjmp. If we apply Christian's analysis, the assignment to rv can still
clobber x. Only now, the program is well-defined, and so the implementation
doesn't have that excuse; it's plain broken.

(Note that I haven't made rv volatile; it doesn't have to be because its value
is never assigned in between the setjmp and longjmp and then accessed after the
longjmp. It's always overwritten after the longjmp.)

Though setjmp and longjmp are fragile, their semantics is sufficiently well
defined that an aggressively optimizing compiler /must/ be aware of setjmp to
some extent. If there is a setjmp in the function, it can't assume that it can
fold the storage of local variables that are in the same scope, even if they
have non-overlapping liveness. The compiler may in fact have to recognize that
setjmp marks the start of a basic block, as if the program were this:

volatile int x;
int rv = 1, z = 2;

int* p = somecondition () ? &rv : &z;
*p = 0;

label:
if (setjmp (env) == 0)
rv = 0;
else
rv = 1;

if (rv == 0) {
x = 1;
somestuff (z); // regarded as potential "goto label;"
x = 2;
otherstuff (); // ditto
x = 3;
}

return x;

In this variant of the code, the variables x and rv no longer have
non-overlapping lifetimes, because x has to be considered live on
entry into the basic block headed by label.

In short, the implementation /cannot/ aggressively optimize in a way that is
completely agnostic of the possibility that control passes into setjmp from
other places in the function, because such passing of control does happen in
well-defined programs!

This argument isn't convincing, because it disregards the requirements
imposed by the semantics of volatile.

Even though the variable rv and the variable x don't overlap in terms
of their /liveness/, they must occupy disjoint memory locations,
because x is volatile and they do overlap in their /lifetimes/. The
term /lifetime/ is used with a specific meaning in ISO C -- please see
section 6.2.4.

A volatile variable cannot occupy the same memory location as any
other variable that exists at any point during the lifetime of the
volatile variable. At some level that's the whole point of declaring
something volatile -- ordinary variables can be optimized, volatile
variables cannot. This restriction explains why local variables
accessed "around" setjmp()/longjmp() must be declared volatile, to
prevent having to consider setjmp() specially when doing optimization.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,780
Messages
2,569,610
Members
45,255
Latest member
TopCryptoTwitterChannels

Latest Threads

Top