S
Shao Miller
Hello Readers,
Please respond with the _highest_ levels of pedantry you can muster
up.
This e-mail is in regards to how a C translator/compiler should handle
the expression:
*(char *)0
Consider the following program:
int main(void) {
(void)*(char *)0;
return 0;
}
The question is: Does the above program imply undefined behaviour?
References here from the C standard draft with filename 'n1256.pdf'.
Looking at the second line of the program:
(A) "Expression and null statements", 6.8.3, Semantics 2:
"The expression in an expression statement is evaluated as a void
expression for its side effects."
The footnote 134 adds, "Such as assignments, and function calls which
have side effects."
This appears to describe the second line of the program pretty nicely.
(B) "void", 6.3.2.2, point 1:
"The (nonexistent) value of a void expression...shall not be used in
any way... If an expression of any other type is evaluated as a void
expression, its value or designator is discarded. (A void expression
is evaluated for its side effects.)"
Note that this doesn't read "_only_ evaluated for its side effects."
However, (A) doesn't read "_only_", either, but one can get that
impression due to the explicit mentioning of "side effects" in both
(A) and (B).
(C) "Address and indirection operators", 6.5.3.2, Semantics 4:
"...if [the operand] points to an object, the result is an lvalue
designating the object."
(D) "Address and indirection operators", 6.5.3.2, Semantics 4:
"If the operand has type 'pointer to type', the result has type
'type'.
(E) "Address and indirection operators", 6.5.3.2, Semantics 4:
"...If an invalid value has been assigned to the pointer, the
behavior...is undefined."
The footnote 87 adds, "Among the invalid values for dereferencing a
pointer...are a null pointer..." This footnote is referenced from
(E).
(C), (D) and (E) are in regards to the unary '*' operator, and where I
perceive a challenge in interpretation. This operator is followed by
a cast-expression, so such an expression would make up the operand, if
I'm not mistaken. The particular cast-expression in line two of the
program is '(char *)0'. _Is_this_an_assigned_value_?
_Is_"assigned"_meant_there_purposefully_or_not_?
(F) "object", 3.13, point 1:
"region of data storage in the execution environment, the contents of
which can represent values"
(G) "value", 3.17, point 1:
"precise meaning of the contents of an object when interpreted as
having a specific type"
By (G), is '(char *)0' a value? Maybe not by (G), but there are other
parts in the text which read as though expressions can have values,
without needing any objects. The "integer constant expression with
the value 0" in (H) below is such an example. Perhaps it _may_be_ a
value iff _used_ for its value?
(H) "Pointers", 6.3.2.3, points 3 and 4:
"An integer constant expression with the value 0, or such an
expression cast to type void *, is called a null pointer constant. If
a null pointer constant is converted to a pointer type, the resulting
pointer, called a null pointer, is guaranteed to compare unequal to a
pointer to any object or function."
"Conversion of a null pointer to another pointer type yields a null
pointer of that type. Any two null pointers shall compare equal."
By (H), it would appear that '(char *)0' is a pointer and a null
pointer. Also, it cannot point to an object. Thus, this operand does
not point to an object for (C), and we must forget (C)'s application
to our case.
(I) "Cast operators", 6.5.4, Semantics 4:
"Preceding an expression by a parenthesized type name converts the
value of the expression to the named type. ..."
Another example where an expression _has_ a value. But the text reads
"value of the expression" to describe the _use_ of that particular
property of the expression. Similar to "...expression with the value
0" in (H).
The footnote 89 adds, "A cast does not yield an lvalue."
By (I), it would appear that '(char *)0' converts the value of '0' to
a 'char *' type. But is this value _assigned_to_a_pointer_ in (E)?
We do now know that the type for this operand is 'char *' for (D).
Thus the unary '*' operator should yield a result with type 'char', by
(D).
(J) "The sizeof operator", 6.5.3.4, Semantics 2:
"...The size is determined from the type of the operand. ...the
operand is not evaluated."
In this, we see that a particular property "type" for the operand is
used. "...the operand is not evaluated" suggests that there is at
least one case in the C language where an expression can yield a
result with a type while avoiding that expression's evaluation.
But compare (J) with (A) and (B), which do describe evaluation, albeit
with "non-existant" values. (A) and (B) both mention side effects.
(K) "Program execution", 5.1.2.3, point 2:
"Accessing a volatile object, modifying an object, modifying a file,
or calling a function that does any of those operations are all side
effects, which are changes in the state of the execution environment.
Evaluation of an expression may produce side effects."
From (K), '*(char *)0' does not access a volatile object, nor does it
modify an object (remember that there's no assignment!), nor does it
modify a file, nor call a function doing any of those operations. It
does not appear to have any "side effects" at all. Iff '(char *)0'
can itself be considered an object (beyond being a pointer, a null
pointer, a cast expression, and having type 'char *'), then we _still_
don't have any side effects. For example: Would it be a volatile
object? Are we modifying an object?
If we constrain (A) and (B) to mean "_only_ evaluated for any side
effects", then (K) suggests '*(char *)0' has no side effects. This
constraint is not explicitly in the text, however. One can ponder if
it is meant or not.
Now then, let us please consider how '*(char *)0' evaluates if we take
"...If an invalid value has been assigned to the pointer..." from (E)
_literally_. There is no assignment here. There is conversion of the
value of the expression '0' to a null pointer. Then we are applying
the '*' operator to that null pointer. The result has of this
application yields a result with type 'char'. According to (J), this
expression can even be an operand to 'sizeof', since it has a type.
There is no object and there is no value.
Is there undefined behaviour? Perhaps consider it in terms of
variables and constants: In '*(char *)0', everything is constant. In
'*(char *)x', x is variable. Could be suppose that "has been
assigned" from (E) is used there _intentionally_, specifically because
with constants, we have full knowledge at translation-time, but with
variables, we need objects and an execution environment? In other
words, is an implementation _allowed_ to attempt to dereference a null
pointer, knowing 100% full well at translation time that that's what
the expression _looks_ like? With variables, the execution of the
program might or might not dereference a null pointer, and that can
trapped or not.
Consider the usual idea of '*x' as "object pointed-to by x" versus
splitting the idea into the more esoteric "result having a type,
possibly designating an object, and possibly having a value, depending
on properties of x".
What do you think? Thank you with sincerity for your time,
- Shao Miller
Please respond with the _highest_ levels of pedantry you can muster
up.
This e-mail is in regards to how a C translator/compiler should handle
the expression:
*(char *)0
Consider the following program:
int main(void) {
(void)*(char *)0;
return 0;
}
The question is: Does the above program imply undefined behaviour?
References here from the C standard draft with filename 'n1256.pdf'.
Looking at the second line of the program:
(A) "Expression and null statements", 6.8.3, Semantics 2:
"The expression in an expression statement is evaluated as a void
expression for its side effects."
The footnote 134 adds, "Such as assignments, and function calls which
have side effects."
This appears to describe the second line of the program pretty nicely.
(B) "void", 6.3.2.2, point 1:
"The (nonexistent) value of a void expression...shall not be used in
any way... If an expression of any other type is evaluated as a void
expression, its value or designator is discarded. (A void expression
is evaluated for its side effects.)"
Note that this doesn't read "_only_ evaluated for its side effects."
However, (A) doesn't read "_only_", either, but one can get that
impression due to the explicit mentioning of "side effects" in both
(A) and (B).
(C) "Address and indirection operators", 6.5.3.2, Semantics 4:
"...if [the operand] points to an object, the result is an lvalue
designating the object."
(D) "Address and indirection operators", 6.5.3.2, Semantics 4:
"If the operand has type 'pointer to type', the result has type
'type'.
(E) "Address and indirection operators", 6.5.3.2, Semantics 4:
"...If an invalid value has been assigned to the pointer, the
behavior...is undefined."
The footnote 87 adds, "Among the invalid values for dereferencing a
pointer...are a null pointer..." This footnote is referenced from
(E).
(C), (D) and (E) are in regards to the unary '*' operator, and where I
perceive a challenge in interpretation. This operator is followed by
a cast-expression, so such an expression would make up the operand, if
I'm not mistaken. The particular cast-expression in line two of the
program is '(char *)0'. _Is_this_an_assigned_value_?
_Is_"assigned"_meant_there_purposefully_or_not_?
(F) "object", 3.13, point 1:
"region of data storage in the execution environment, the contents of
which can represent values"
(G) "value", 3.17, point 1:
"precise meaning of the contents of an object when interpreted as
having a specific type"
By (G), is '(char *)0' a value? Maybe not by (G), but there are other
parts in the text which read as though expressions can have values,
without needing any objects. The "integer constant expression with
the value 0" in (H) below is such an example. Perhaps it _may_be_ a
value iff _used_ for its value?
(H) "Pointers", 6.3.2.3, points 3 and 4:
"An integer constant expression with the value 0, or such an
expression cast to type void *, is called a null pointer constant. If
a null pointer constant is converted to a pointer type, the resulting
pointer, called a null pointer, is guaranteed to compare unequal to a
pointer to any object or function."
"Conversion of a null pointer to another pointer type yields a null
pointer of that type. Any two null pointers shall compare equal."
By (H), it would appear that '(char *)0' is a pointer and a null
pointer. Also, it cannot point to an object. Thus, this operand does
not point to an object for (C), and we must forget (C)'s application
to our case.
(I) "Cast operators", 6.5.4, Semantics 4:
"Preceding an expression by a parenthesized type name converts the
value of the expression to the named type. ..."
Another example where an expression _has_ a value. But the text reads
"value of the expression" to describe the _use_ of that particular
property of the expression. Similar to "...expression with the value
0" in (H).
The footnote 89 adds, "A cast does not yield an lvalue."
By (I), it would appear that '(char *)0' converts the value of '0' to
a 'char *' type. But is this value _assigned_to_a_pointer_ in (E)?
We do now know that the type for this operand is 'char *' for (D).
Thus the unary '*' operator should yield a result with type 'char', by
(D).
(J) "The sizeof operator", 6.5.3.4, Semantics 2:
"...The size is determined from the type of the operand. ...the
operand is not evaluated."
In this, we see that a particular property "type" for the operand is
used. "...the operand is not evaluated" suggests that there is at
least one case in the C language where an expression can yield a
result with a type while avoiding that expression's evaluation.
But compare (J) with (A) and (B), which do describe evaluation, albeit
with "non-existant" values. (A) and (B) both mention side effects.
(K) "Program execution", 5.1.2.3, point 2:
"Accessing a volatile object, modifying an object, modifying a file,
or calling a function that does any of those operations are all side
effects, which are changes in the state of the execution environment.
Evaluation of an expression may produce side effects."
From (K), '*(char *)0' does not access a volatile object, nor does it
modify an object (remember that there's no assignment!), nor does it
modify a file, nor call a function doing any of those operations. It
does not appear to have any "side effects" at all. Iff '(char *)0'
can itself be considered an object (beyond being a pointer, a null
pointer, a cast expression, and having type 'char *'), then we _still_
don't have any side effects. For example: Would it be a volatile
object? Are we modifying an object?
If we constrain (A) and (B) to mean "_only_ evaluated for any side
effects", then (K) suggests '*(char *)0' has no side effects. This
constraint is not explicitly in the text, however. One can ponder if
it is meant or not.
Now then, let us please consider how '*(char *)0' evaluates if we take
"...If an invalid value has been assigned to the pointer..." from (E)
_literally_. There is no assignment here. There is conversion of the
value of the expression '0' to a null pointer. Then we are applying
the '*' operator to that null pointer. The result has of this
application yields a result with type 'char'. According to (J), this
expression can even be an operand to 'sizeof', since it has a type.
There is no object and there is no value.
Is there undefined behaviour? Perhaps consider it in terms of
variables and constants: In '*(char *)0', everything is constant. In
'*(char *)x', x is variable. Could be suppose that "has been
assigned" from (E) is used there _intentionally_, specifically because
with constants, we have full knowledge at translation-time, but with
variables, we need objects and an execution environment? In other
words, is an implementation _allowed_ to attempt to dereference a null
pointer, knowing 100% full well at translation time that that's what
the expression _looks_ like? With variables, the execution of the
program might or might not dereference a null pointer, and that can
trapped or not.
Consider the usual idea of '*x' as "object pointed-to by x" versus
splitting the idea into the more esoteric "result having a type,
possibly designating an object, and possibly having a value, depending
on properties of x".
What do you think? Thank you with sincerity for your time,
- Shao Miller