Confusion about undefined behaviour

P

Phil Carmody

Dik T. Winter said:
If I read it well I think it is a compiler that modifies well-behaved code
to UB.

You might want to read again. The 'fix' patch demonstrates the error
most succinctly.
"Although the code correctly checks to make sure the tun variable
doesn't point to NULL, the compiler removes the lines responsible for that
inspection during optimization routines." Although this sentence is
slightly incomprehensible.

Because it's obfuscatorily, or simply ignorantly, presented.
The only time the check can detect potential null pointer defererences
is after one has already happened. In which case, it's at liberty to
check Scott Nudds' rear end for null pointers instead of the function's
parameter.

Phil
 
P

Phil Carmody

Hallvard B Furuseth said:
A source code audit? I sure hope not.

Coverity would flash a huge warning light, I'm sure. I don't have
any association with whoever makes coverity, and even today I marked
two supposed "static buffer overruns" with "Coverity's braindead",
but given the number of things it does find, I can't recommend it
highly enough.

Currently mulling over something flagged in an AES-related module,
with a 99% certainty that it's a nasty bug,
Phil
 
K

Keith Thompson

Eric Sosman said:
Just to pick the nit once more, I think you've restated the analysis
whose nit I tried to pick ... My point was that the optimization is
valid even if control *does* reach the check (where the check was, that
is). If the pointer was NULL, the behavior is undefined and removing
the check doesn't make it any more or less undefined than it already
was. The Linux environment doesn't enter into the matter, nor does the
likelihood of a trap or other crash: undefined is UN-defined, and it
simply doesn't matter what the code does afterwards.
[...]

Agreed.

On the other hand, a conforming C compiler *could* assume that the
code it generates will run under Linux, and assume whatever actual
behavior Linux guarantees for a null pointer dereference. It would
then not remove the null pointer test, even though it follows an
attempt to dereference the pointer. (Removing the test is an
optimization; compilers aren't required to perform optimizations.)

But I don't think there's much doubt that the error is in the code,
not in the compiler.
 
K

Keith Thompson

Stephen Sprunk said:
From a C standards perspective, that optimization is undoubtedly
correct; the dereference of NULL invokes UB, so the implementation can
do whatever it wants after that point.

However, the problem is that in Linux/x86 _kernel_ code, dereferencing
NULL is defined (by the implementation) to _not_ trap and thus what the
compiler "knows" is wrong; the optimization is not valid in that
specific context, though it happens to be valid in Linux _user_ code.

I'm curious what dereferencing a null pointer is defined to do.
Surely it's not something that a programmer would do intentionally.

A conforming implementation (say, gcc in some special mode) *could*
assume that dereferencing a null pointer has well-defined behavior; it
then wouldn't be allowed to remove the null pointer test that follows
the dereference. But this would only be for the benefit of code
that's buggy anyway.

It would have been nice if gcc had warned about removing the check;
that doesn't require anything Linux-kernel-specific.
 
S

Stephen Sprunk

Keith said:
I'm curious what dereferencing a null pointer is defined to do.
Surely it's not something that a programmer would do intentionally.

I'm not familiar enough with the Linux kernel to know offhand why it's
defined... It might be that traps are disabled, or the Interrupt
Descriptor Table* is mapped there, or something else entirely. The
point is that you won't get a segfault like you (and the compiler) would
expect in user code (also defined by the implementation).

(* The IDT must live at physical addresses 0-1023 on x86, which is a
large part of why buggy DOS programs crashed the computer so frequently:
writing through a null pointer would corrupt the IDT, so the next time
that interrupt fired, the CPU would wander off and start executing
something that probably wasn't code, trap to the corrupted IDT again, etc.)
A conforming implementation (say, gcc in some special mode) *could*
assume that dereferencing a null pointer has well-defined behavior; it
then wouldn't be allowed to remove the null pointer test that follows
the dereference.

GCC _does_ have an option to disable that optimization; however,
apparently nobody had thought that it needed to be used. If they had
known it was needed, they would have fixed the code and made it
unnecessary again.
But this would only be for the benefit of code that's buggy anyway.

.... unless, of course, dereferencing the null pointer was done
deliberately, e.g. to access the IDT on a DOS system.
It would have been nice if gcc had warned about removing the check;
that doesn't require anything Linux-kernel-specific.

The only time such a warning would be justified is when the optimization
might be invalid, i.e. only in the Linux kernel and a few other cases
even rarer.

S
 
R

Richard Bos

Hallvard B Furuseth said:
Then Linux in effect needs a language extension which allows that.
Which is what the 'gcc -fno-delete-null-pointer-checks' provides.
Or, for that matter, picking optimize flags which they know will do
that. Or they could use assembly, or some trick or other which they
know the supported compiler won't optimize this way, or...

....or they could just write code that is correct.

Richard
 
R

Richard Bos

jameskuyper said:
Possibly I haven't tried hard enough, but I haven't located the
context of the above code. However, since 'tun' can be null, and is
being dereferenced without bothering to check for that possibility,
such code would never pass a code review in my group unless the
mechanism whereby 'tun' could be null were extremely inobvious. It's
one of the simplest things we check; it's right up there with making
sure that indexes are within limits before using them to index an
array. Does this make us unusual?

Hell, no.

Richard
 
J

jameskuyper

Kenneth said:
But, isn't an implementation allowed to define what happens in situations
that the Standard leaves undefined? ...

Sure, it's permitted; writing non-portable code with behavior that is
undefined by the C standard, when the implementation itself provides a
definition, is a completely ordinary activity in the C programming
world. However, it should be done with great care and a clear
understanding of what the implementation's defined behavior is.
... What if the platform it's running on
explicitly states that dereferencing NULL is allowed?

Note: NULL is a C standard library macro; you're looking for an
adjective - the correct spelling of the adjective is 'null'.

That's precisely what happened in this case. The defined behavior for
gcc (depending upon which command line options are chosen) includes
treating the pointer's value as if it were guaranteed to be non-null,
in all subsequent code. It would appear that the authors were unaware
of that definition of the behavior, and that's what went wrong.
 
K

Keith Thompson

Stephen Sprunk said:
Keith Thompson wrote: [...]
It would have been nice if gcc had warned about removing the check;
that doesn't require anything Linux-kernel-specific.

The only time such a warning would be justified is when the optimization
might be invalid, i.e. only in the Linux kernel and a few other cases
even rarer.

I disagree. The check was written explicitly in the source code.
Optimizing it away is valid, but the fact that the compiler is
eliminating user-written code indicates (at least in this case) that
the code shouldn't have been written in the first place, that the code
itself is buggy.

For example, if I write:

int foo(int *p)
{
int x = *p;
if (p == NULL) {
return -1;
};
/* ... */
}

the compiler is free to remove the if statement, but I'd like it to
warn me that I've done something stupid.
 
K

Keith Thompson

jameskuyper said:
Kenneth Brody wrote: [...]
... What if the platform it's running on
explicitly states that dereferencing NULL is allowed?

Note: NULL is a C standard library macro; you're looking for an
adjective - the correct spelling of the adjective is 'null'.

That's precisely what happened in this case. The defined behavior for
gcc (depending upon which command line options are chosen) includes
treating the pointer's value as if it were guaranteed to be non-null,
in all subsequent code. It would appear that the authors were unaware
of that definition of the behavior, and that's what went wrong.

I think that's stretching things a bit. "Behavior" refers (mostly) to
run-time behavior; there is no defined behavior in this case. Based
on the lack of any definition of the behavior if the pointer is null,
the compiler is free (but not required) to assume that the pointer is
non-null; any behavior in that case is permitted for the null case as
well.
 
J

jameskuyper

Keith said:
jameskuyper said:
Kenneth Brody wrote: [...]
... What if the platform it's running on
explicitly states that dereferencing NULL is allowed?

Note: NULL is a C standard library macro; you're looking for an
adjective - the correct spelling of the adjective is 'null'.

That's precisely what happened in this case. The defined behavior for
gcc (depending upon which command line options are chosen) includes
treating the pointer's value as if it were guaranteed to be non-null,
in all subsequent code. It would appear that the authors were unaware
of that definition of the behavior, and that's what went wrong.

I think that's stretching things a bit. "Behavior" refers (mostly) to
run-time behavior; ...

It's the run-time behavior of the entire program that becomes
undefined, as far as the C standard is concerned, not just the
statement whose execution rendered the behavior undefined.
... there is no defined behavior in this case.

The C standard provides no definition for the behavior, but gcc does,
as it is permitted to do. The man page for gcc (v4.2.2) defines what -
fdelete-null-pointer-check does, and defines that this option is
turned on by default when using any of the options -O2, -O3, or -Os.
 
J

jameskuyper

Hallvard said:
jameskuyper writes: ....

Actually you've seen enough. A pointer is followed, and the very next
line says it may have been a null pointer. I don't see how the source
could make it easier to spot a possible null dereference.

I was basically curious to see whether the author had any good reason
to trust that a valid value that was stored in 'tun'. Obviously, since
the vulnerability was exploited, whatever reason the author had was
incorrect; but was it at least plausible? Since writing that message,
I've seen the context, and it involves executing a function that might
or might not have seemed trustworthy, depending upon what that
function's definition was. Apparently it was not trustworthy - but was
it supposed to be?

Of course, having the null-checking code after the pointer is
dereferenced is pointless, whether or not there seemed to be
justification for assuming that it was dereferenceable.
 
M

Moi

Strictly speaking it is UB.

But, given the fact that the kernel-source must be considered
freestanding code, IMHO different rules _could_ apply.
( I don't know what GCC's -ffreestanding flag actually does; it possibly
just avoids library dependencies)

IIRC in freestanding environments the compiler *may* define UB to
anything it wants. (correct me if I am wrong)

I can imagine the compiler user _wants_ the "intended behaviour" (IB) to
be "If I dereference null pointers, I know what I'm doing. Trust me ..."

How else could a kernel ever address memory at location=0 ?
Seen in this light, the compiler should *not* be allowed to optimize out
the null-test, since the initialiser does not *test* anything, it just
proves that it did not trap.

On the basis of the code sample presented, this isn't the
situation. The code is not well-behaved, and invokes undefined
behavior, full stop. Yes (by report), the compiler removes the test for
NULL, but this does not introduce undefined behavior, it merely changes
(probably) the nature of the undefined behavior already in progress.
Since a program has no right to rely on *anything* once it indulges in
U.B., it can in particular not rely on a NULL test occurring.

After the train goes off the rails, the settings of the
switches not yet encountered make no difference to its route.

But, I agree: the code is deceptive. The dereference in the initialiser
obfuscates things, but saves a line of code.

AvK
 
R

robertwessel2

I'm not familiar enough with the Linux kernel to know offhand why it's
defined...  It might be that traps are disabled, or the Interrupt
Descriptor Table* is mapped there, or something else entirely.  The
point is that you won't get a segfault like you (and the compiler) would
 expect in user code (also defined by the implementation).

(* The IDT must live at physical addresses 0-1023 on x86, which is a
large part of why buggy DOS programs crashed the computer so frequently:
writing through a null pointer would corrupt the IDT, so the next time
that interrupt fired, the CPU would wander off and start executing
something that probably wasn't code, trap to the corrupted IDT again, etc..)


On x86, only in real mode is there an IDT in a fixed location. Which
certainly does not apply to the Linux kernel in the usual
circumstances. In any protected mode, there’s a CPU register which
points to the IDT (with complications – the being x86, of course - and
with a completely different table format than in real mode).
 
J

jameskuyper

Moi said:
Strictly speaking it is UB.

But, given the fact that the kernel-source must be considered
freestanding code, IMHO different rules _could_ apply.
( I don't know what GCC's -ffreestanding flag actually does; it possibly
just avoids library dependencies)

IIRC in freestanding environments the compiler *may* define UB to
anything it wants. (correct me if I am wrong)

It's not quite clear what you're saying, but all of the
interpretations I can think of make that statement wrong.

One possibility is that you're talking about the circumstances where
behavior is UB. Those are defined by the C standard (though in some
cases, they are defined only by the absence of a definition for the
behavior). A freestanding implementation of C is not allowed to change
those definitions. However, those definitions impose somewhat fewer
requirements on freestanding implementations than on hosted ones. The
most important differences are that the startup need not be called main
(), and has an implementation-defined interface, and most of the C
standard library becomes optional. None of those differences are
relevant in this context.

Another possibility is that you're referring to the behavior that
actually occurs when the C standard says that the behavior is
undefined. All implementations, freestanding or hosted, have complete
freedom to define such behavior any way they want - that's precisely
what "undefined behavior" means.
I can imagine the compiler user _wants_ the "intended behaviour" (IB) to
be "If I dereference null pointers, I know what I'm doing. Trust me ..."

How else could a kernel ever address memory at location=0 ?

Please note: a null pointer need not refer to location==0, and a
pointer referring to location==0 need not be a null pointer; this is
entirely up to the implementation. Try not to confuse the two
concepts. An integer constant expression with a value of 0, converted
into a pointer, is guaranteed to convert into a null pointer. However,
it's not meaningful, within the C standard, to even talk about which
memory location a null pointer points at; if an implementation chooses
to allow derefencing such pointers, it need not define the resulting
behavior as "refer to the object of the appropriate type residing at
location=0".

How a kernel addresses the memory at location==0 is entirely
implementation-specific; there's no portable way to write such code
(though dereferencing a null pointer probably has that effect on many
platforms). An implementation is not required to provide a way to do
this; if that fact renders it impossible to compile the kernel for a
particular implementation, then you shouldn't use that implementation
to compile the kernel.
Seen in this light, the compiler should *not* be allowed to optimize out
the null-test, since the initialiser does not *test* anything, it just
proves that it did not trap.

It's trivial to fix this code to do what is actually desired.
However, if the kernel developer really feels a need to write
defective code, and to have the compiler produce the desired results
despite the fact that the code is defective, the a different compiler
should have been a used (such as gcc -fno-delete-null-pointer-checks).
 
M

Moi

Moi wrote:

It's not quite clear what you're saying, but all of the interpretations
I can think of make that statement wrong.

I meant: "the compiler is allowed to _define_ what it does in case of UB.
(eg: in case of a NULL pointer: handle it as if it were not null)


Another possibility is that you're referring to the behavior that
actually occurs when the C standard says that the behavior is undefined.
All implementations, freestanding or hosted, have complete freedom to
define such behavior any way they want - that's precisely what
"undefined behavior" means.

I guess this is the case I was referring to.
Please note: a null pointer need not refer to location==0, and a pointer
referring to location==0 need not be a null pointer; this is entirely up
to the implementation. Try not to confuse the two concepts. An integer

I know. This is c.l.c ;-)

I know as well that on the x86 a NULL pointer is represented as "all bits
zero". But this is not important, IMO; even if the NULL pointer were
represented as "all bits 1", the dereference could be Ok (apart from
alignment, of course), the code _could_ be valid, *given that NULL
pointer dereference is allowed* (and does not trap)
How a kernel addresses the memory at location==0 is entirely
implementation-specific; there's no portable way to write such code
(though dereferencing a null pointer probably has that effect on many
platforms). An implementation is not required to provide a way to do
this; if that fact renders it impossible to compile the kernel for a
particular implementation, then you shouldn't use that implementation to
compile the kernel.


But -given it allowed the dereference- it should not optimize out the
null pointer test. Nothing was tested by the initializer.

I know: this is all about implementing/defining UB.
Slippery slope...
It's trivial to fix this code to do what is actually desired. However,
if the kernel developer really feels a need to write defective code, and
to have the compiler produce the desired results despite the fact that
the code is defective, the a different compiler should have been a used
(such as gcc -fno-delete-null-pointer-checks).

That's what I said in the last paragraph. (which you snipped)

AvK
 
S

Stephen Sprunk

jameskuyper said:
The C standard provides no definition for the behavior, but gcc does,
as it is permitted to do. The man page for gcc (v4.2.2) defines what -
fdelete-null-pointer-check does, and defines that this option is
turned on by default when using any of the options -O2, -O3, or -Os.

.... but Linux, which is also part of the implementation, _does_ define
what happens when you dereference a null pointer. In user code, you get
SIGSEGV; in kernel code, you read from address 0. So, the behavior is
not undefined in this particular context. Given that the Linux kernel
code obviously does not run in any other context than the Linux
implementation, it's a stretch to call it UB.

S
 
J

James Kuyper

Moi said:
On Mon, 20 Jul 2009 16:21:34 -0700, jameskuyper wrote: ....
zero". But this is not important, IMO; even if the NULL pointer were
represented as "all bits 1", the dereference could be Ok (apart from
alignment, of course), the code _could_ be valid, *given that NULL
pointer dereference is allowed* (and does not trap)

Sure, an implementation is free to define what the consequences of a
null pointer dereference are, in a way that causes no problem for such
code. An implementation is also free to define what those consequence
are, in a way that causes an immediate segmentation fault, which seems
to be the only other option you recognize.

The point, however, is that the standard it much more lenient than you
are; it allows an infinite number of other more complicated options than
those two. Even if an implementation chooses not to generate an
immediate segmentation fault, it's still also free to have the
consequences of the dereference be that, for instance, the next call to
printf() will display the string "<author> doesn't know how to avoid
dereferencing null pointers", regardless of which arguments are passed
to printf().
But -given it allowed the dereference- it should not optimize out the
null pointer test. Nothing was tested by the initializer.

If you want to impose such a requirement, just select the appropriate
command line option for gcc. However, there's no external requirement
(such as the C standard) that prohibits such optimizations.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,770
Messages
2,569,585
Members
45,080
Latest member
mikkipirss

Latest Threads

Top