Knowing the implementation, are all undefined behaviours become implementation-defined behaviours?

Michael Tsang · Feb 14, 2010

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Deferencing a NULL pointer is undefined behaviour, but, on Linux, the
program crashes with SIGSEGV. So, the behaviour of derefencing a NULL
pointer is defined to "crash the program with SIGSEGV".

Signed integer overflow is undefined behaviour, but, on x86 CPUs, the number
simply wrap around so we can say that the behaviour is defined to round on
x86 CPUs.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)

iEYEARECAAYFAkt3kjsACgkQm4klUUKw07D7QwCfQH0jkVFEDAQMi9+t31JiQ449
4QMAn2M+QxWW3yf4WShHgmWjBCluBvun
=e8V1
-----END PGP SIGNATURE-----

Alf P. Steinbach · Feb 14, 2010

* Michael Tsang:

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Deferencing a NULL pointer is undefined behaviour, but, on Linux, the
program crashes with SIGSEGV. So, the behaviour of derefencing a NULL
pointer is defined to "crash the program with SIGSEGV".

Signed integer overflow is undefined behaviour, but, on x86 CPUs, the number
simply wrap around so we can say that the behaviour is defined to round on
x86 CPUs.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)

iEYEARECAAYFAkt3kjsACgkQm4klUUKw07D7QwCfQH0jkVFEDAQMi9+t31JiQ449
4QMAn2M+QxWW3yf4WShHgmWjBCluBvun
=e8V1
-----END PGP SIGNATURE-----

Your question, from the subject line, is

"Knowing the implementation, are all undefined behaviours become
implementation-defined behaviours?"

And it's cross-posted to [comp.lang.c] and [comp.lang.c++].

At least for C++ the answer is a definite maybe: theoretically it depends on the
implementation.

In practice the answer is a more clear "no", because it's practically impossible
for an implementation to clearly define all behaviors, in particular pointer
operations and use of external libraries.

Cheers & hth.,

- Alf

Seebs · Feb 14, 2010

Deferencing a NULL pointer is undefined behaviour, but, on Linux, the
program crashes with SIGSEGV. So, the behaviour of derefencing a NULL
pointer is defined to "crash the program with SIGSEGV".

Not necessarily.

Signed integer overflow is undefined behaviour, but, on x86 CPUs, the number
simply wrap around so we can say that the behaviour is defined to round on
x86 CPUs.

That's not rounding, that's wrapping.

But no, it's not the case. These are not necessarily *defined* -- they may
merely be typical side-effects that are not guaranteed or supported.

Modern gcc can do some VERY strange things if you write code which might
dereference a null pointer. (For instance, loops which check whether a
pointer is null may have the test removed because, if it were null, it
would have invoked undefined behavior to dereference it...)

-s

Malcolm McLean · Feb 14, 2010

"Undefined behaviour" doesn't mean "exists in some metaphysical state
of indefiniteness" but "the C standard imposes no requirements on the
program's behaviour (and therefore the program is incorrect)". There
was a huge thread about this a few years back on gets.

So typically derefencing null will have the same effect each time any
particular program is run, probably the same effect on any particular
platform. Derefencing a wild pointer may have different effects,
particularly on a multi-taskign machine where exact pointer vlaues
vary from runto run.

Robert Fendt · Feb 14, 2010

dereference a null pointer. (For instance, loops which check whether a
pointer is null may have the test removed because, if it were null, it
would have invoked undefined behavior to dereference it...)

Sorry to interrupt, but since when is checking a pointer value
for 0 the same as deferencing it? Checking a pointer treats the
pointer itself as a value, and comparison against 0 is one of
the few things that are _guaranteed_ to work with a pointer
value. So if GCC really would remove a check of the form

if(!pointer)
do_something(*pointer);

or even

if(pointer == 0)
throw NullPointerException;

then GCC would be very much in violation of the standard. And
produce absolutely useless code, as well. What's the point of
having pointers in a language if you wouldn't even be able to
perform basic operations on them?

Regards,
Robert

Alf P. Steinbach · Feb 14, 2010

* Richard Heathfield:

Thread's subject line: Knowing the implementation, are all undefined
behaviours become implementation-defined behaviours?

No. For example, consider a stack exploit on gets(). There are systems
on which the behaviour could be absolutely anything at all, depending on
user input!6\b$10be5c39no carrier

Cheers,

- Alf

Bo Persson · Feb 14, 2010

Robert said:
Sorry to interrupt, but since when is checking a pointer value
for 0 the same as deferencing it? Checking a pointer treats the
pointer itself as a value, and comparison against 0 is one of
the few things that are _guaranteed_ to work with a pointer
value. So if GCC really would remove a check of the form

if(!pointer)
do_something(*pointer);

or even

if(pointer == 0)
throw NullPointerException;

then GCC would be very much in violation of the standard. And
produce absolutely useless code, as well. What's the point of
having pointers in a language if you wouldn't even be able to
perform basic operations on them?

Yes, but there are cases where the compiler can determine that the
pointer is ALWAYS null or not-null, and remove code that would execute
otherwise. For example:

*pointer = 42;
if(pointer == 0)
throw NullPointerException;

is known never to throw the exception!

Bo Persson

Ersek, Laszlo · Feb 14, 2010

Checking a pointer treats the
pointer itself as a value, and comparison against 0 is one of
the few things that are _guaranteed_ to work with a pointer
value.

No, evaluating an invalid pointer is undefined behavior.

{
void *p;

p = malloc(1);
free(p);
p; /* UB */
!p; /* UB */
0 != p; /* UB */
}

See the C99 Rationale 6.3.2.3 Pointers for an informative (not
normative) description.

I believe that in this paragraph:

----v----
Regardless how an invalid pointer is created, any use of it yields
undefined behavior. Even assignment, comparison with a null pointer
constant, or comparison with itself, might on some systems result in an
exception.
----^----

"any use" denotes "any evaluation", and "assignment" means "assignment
FROM the invalid pointer". I'm fairly sure the following is valid:

{
int *ip;

ip = malloc(sizeof *ip);
free(ip);
sizeof ip;
sizeof *ip;
ip = 0;
ip;
!ip;
0 != ip;
}

Cheers,
lacos

Richard Tobin · Feb 14, 2010

Malcolm McLean said:
Derefencing a wild pointer may have different effects,
particularly on a multi-taskign machine where exact pointer vlaues
vary from runto run.

It's not a general characteristic of multi-tasking systems that
pointer values vary from run to run. Virtual memory has traditionally
been used to give all instances of a program indistinguishable address
spaces, and addresses will usually be the same.

Recently for security reasons some operating systems have started to
deliberately randomise the locations of, for example, shared
libraries, so pointers are now more likely to vary. (Fortunately this
can usually be disabled for debugging.)

-- Richard

Robert Fendt · Feb 14, 2010

Yes, but there are cases where the compiler can determine that the
pointer is ALWAYS null or not-null, and remove code that would execute
otherwise. For example:

*pointer = 42;
if(pointer == 0)
throw NullPointerException;

is known never to throw the exception!

Yes, that's static optimisation. Nothing wrong with that.
However, the posting I was commenting explicitely described
something different:

This would mean nothing else than the compiler removing
nullpointer checks solely on the grounds that a nullpointer
cannot be de-referenced legally. So the compiler would see a
pointer dereference, and decide "then it can't be null anyway,
since it's used later". And that's just bull, sorry.

Yes, if there's an unconditional pointer dereference and
_afterwards_ a check for null, the compiler could take this as a
hint that said pointer has been checked for null before the first
dereference and thus remove the superfluous check. So if you had
something like this:

MyType& obj = *pointer;
if (!pointer)
threw NullPointerException;

Since the dereference happens _before_ the check, the program
has already entered the domain of undefined behaviour, and the
check is moot (even if one has not 'used' the object reference
in any other way). If the author of the previous posting meant
that, then I agree (though I have doubts whether GCC really
optimises this agressively). But in that case his comment was at
least not very clear.

Regards,
Robert

Ben Bacarisse · Feb 14, 2010

Robert Fendt said:
Yes, if there's an unconditional pointer dereference and
_afterwards_ a check for null, the compiler could take this as a
hint that said pointer has been checked for null before the first
dereference and thus remove the superfluous check. So if you had
something like this:

MyType& obj = *pointer;
if (!pointer)
threw NullPointerException;

Since the dereference happens _before_ the check, the program
has already entered the domain of undefined behaviour, and the
check is moot (even if one has not 'used' the object reference
in any other way). If the author of the previous posting meant
that, then I agree (though I have doubts whether GCC really
optimises this agressively).

gcc does exactly that (with certain options). I think this is the
nature a recent Linux kernel bug: http://lkml.org/lkml/2009/7/6/19

The pointer use was ever so slightly less obvious but it led gcc to
conclude that the following test could be removed.

Given the cross-post, I should say that I have no idea if gcc does
this for the exact case you cite (which is C++) but I wanted to point
out that similar things are done.

<snip>

Robert Fendt · Feb 14, 2010

gcc does exactly that (with certain options). I think this is the
nature a recent Linux kernel bug: http://lkml.org/lkml/2009/7/6/19

It certainly looks that way. That's a nasty bugger to spot.

Given the cross-post, I should say that I have no idea if gcc does
this for the exact case you cite (which is C++) but I wanted to point
out that similar things are done.

Yes, I did not notice this whole thread had been crossposted to
comp.lang.c; a more appropriate example would then have been a
sizeof(*pointer) or something. Since sizeof in that case relies
only on static type information, one could assume it should work
whether the pointer is null or not. But the dereference itself
already makes the whole programm ill-formed (in case of a
nullpointer).

Regards,
Robert

James Kanze · Feb 14, 2010

And thus spake Ben Bacarisse <[email protected]>
Sun, 14 Feb 2010 13:41:23 +0000:

It certainly looks that way. That's a nasty bugger to spot.

Either the pointer can be null, or it cannot. If it can be
null, the first unit test which tests it with null should cause
a crash. If it cannot, then the test the g++ would have
removed is superfluous, and removing it shouldn't change
anything.

There are many other cases of undefined behavior which do affect
optimizations, however. Consider an expression like: f((*p)++,
(*q)++). Given this, the compiler "knows" that p and q do not
reference the same memory (since if they did, it would be
undefined behavior), which means that in other code in the
function, the compiler might have cached *p, and knows that it
doesn't have to update or purge its cached value if there is a
write through *q.

Yes, I did not notice this whole thread had been crossposted
to comp.lang.c; a more appropriate example would then have
been a sizeof(*pointer) or something. Since sizeof in that
case relies only on static type information, one could assume
it should work whether the pointer is null or not. But the
dereference itself already makes the whole programm ill-formed
(in case of a nullpointer).

Dereferencing a null pointer is only undefined behavior if the
code is actually executed. Something like sizeof(
f(*(MyType*)0) ) is perfectly legal, and widely used in some
template idioms (although I can't think of a reasonable use for
it in C).

Malcolm McLean · Feb 14, 2010

Dereferencing a null pointer is only undefined behavior if the
code is actually executed. Something like sizeof(
f(*(MyType*)0) ) is perfectly legal, and widely used in some
template idioms (although I can't think of a reasonable use for
it in C).

Nulls are dereferenced to produce the offsetof macro hack in C.

Ersek, Laszlo · Feb 14, 2010

Nulls are dereferenced to produce the offsetof macro hack in C.

No, they are not.

I guess you mean something like this:

#define offsetof(type, member_designator) \
((size_t)&((type *)0)->member_designator)

Let's deal first with the conversion of the final pointer to size_t:

C99 6.3.2.3 Pointers, p6: "Any pointer type may be converted to an
integer type. Except as previously specified, the result is
implementation-defined. If the result cannot be represented in the
integer type, the behavior is undefined. The result need not be in the
range of values of any integer type."

Then wrt. dereferencing the null pointer:

C99 6.6 Constant expressions, p9: "An address constant is a null
pointer, [...]; it shall be created explicitly using the unary &
operator or an integer constant cast to pointer type, or [...]. The
[...] member-access . and -> operators, the address & and indirection *
unary operators, and pointer casts may be used in the creation of an
address constant, but the value of an object shall not be accessed by
use of these operators."

Perhaps this is relevant too:

C99 6.5.3.2 Address and indirection operators, p3: "[...] If the operand
is the result of a unary * operator, neither that operator nor the &
operator is evaluated and the result is as if both were omitted, except
that the constraints on the operators still apply and the result is not
an lvalue. [...]"

Cheers,
lacos

Ben Bacarisse · Feb 14, 2010

James Kanze said:
On Feb 14, 1:54 pm, Robert Fendt <[email protected]> wrote:

Dereferencing a null pointer is only undefined behavior if the
code is actually executed. Something like sizeof(
f(*(MyType*)0) ) is perfectly legal, and widely used in some
template idioms (although I can't think of a reasonable use for
it in C).

For a non-literal null, it is quite common:

new_ptr = realloc(old_ptr, new_length * sizeof *new_ptr);

will work regardless of the state of new_ptr (null, well-defined or
indeterminate).

[I know you know this: I am simple illustrating the point with a
common idiom.]

Seebs · Feb 14, 2010

Sorry to interrupt, but since when is checking a pointer value
for 0 the same as deferencing it?

It's not.

But if you dereference a pointer at some point, a check against it can
be omitted. If, that is, that dereference can happen without the check.

So imagine something like:

ptr = get_ptr();

while (ptr != 0) {
/* blah blah blah */
ptr = get_ptr();
x = *ptr;
}

gcc might turn the while into an if followed by an infinite loop, because
it *knows* that ptr can't become null during the loop, because if it did,
that would have invoked undefined behavior.

And there are contexts where you can actually dereference a null and not
get a crash, which means that some hunks of kernel code can become infinite
loops unexpectedly with modern gcc. Until the kernel is fixed, which I
believe it has been.

-s

Seebs · Feb 14, 2010

Either the pointer can be null, or it cannot. If it can be
null, the first unit test which tests it with null should cause
a crash. If it cannot, then the test the g++ would have
removed is superfluous, and removing it shouldn't change
anything.

Unless you're in a context where dereferencing null exhibits the undefined
behavior of giving you access to a block of memory.

Dereferencing a null pointer is only undefined behavior if the
code is actually executed. Something like sizeof(
f(*(MyType*)0) ) is perfectly legal, and widely used in some
template idioms (although I can't think of a reasonable use for
it in C).

Implementation of offsetof(), too, although that's not exactly safe.

-s

Ben Bacarisse · Feb 14, 2010

Malcolm McLean said:
Nulls are dereferenced to produce the offsetof macro hack in C.

Then I would say that it is not an example of what James was talking
about. In his C++ example, no null pointer is dereferenced.

Obviously there is a terminology issue here in that you might want to
say that sizeof *(int *)0 is a dereference of a null pointer because,
structurally, it applies * to such a pointer; but I would rather
reserve the word dereference for an /evaluated/ application of * (or []
or ->). I'd go so far as to say that any other use is wrong.

Thad Smith · Feb 14, 2010

Michael said:
Deferencing a NULL pointer is undefined behaviour,

Actually, dereferencing a null pointer _results in_ behavior undefined by
Standard C.

In answer to your subject line question "Knowing the implementation, are all
undefined behaviours become implementation-defined behaviours?", no.

In Standard C "implementation-defined behavior" means that the implementation
documents the behavior. Even if the behavior is consistent for a particular
implementation, it may not be documented.

Are there any differences between 'Synplicity VHDL compiler, v1.0, b. 074R' and GHDL compiler?	5	Oct 13, 2010
undefined method `stringify_keys!' for #<Post:0xab67154>	1	Apr 5, 2010
Can one declare more than one signal on one line?	4	Nov 1, 2010
The lack of default function parameter in C99 makes compatibility difficult.	31	Nov 10, 2009
import reassignment different at module and function scope	0	Jan 31, 2009
ruby bug with fork and signals (SIGTERM)	2	Oct 12, 2010
Is it a good practice to call the destructor explicitly and use placement new(this) in assignment op	8	Dec 12, 2009
SImple list implementation	2	Aug 27, 2004

Knowing the implementation, are all undefined behaviours become implementation-defined behaviours?

Michael Tsang

Alf P. Steinbach

Seebs

Malcolm McLean

Robert Fendt

Alf P. Steinbach

Bo Persson

Ersek, Laszlo

Richard Tobin

Robert Fendt

Ben Bacarisse

Robert Fendt

James Kanze

Malcolm McLean

Ersek, Laszlo

Ben Bacarisse

Seebs

Seebs

Ben Bacarisse

Thad Smith

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads