C Standard Regarding Null Pointer Dereferencing

S

Shao Miller

No, it impacts the definition of what happens.  There is no definition
given for what happens.

Okay, look at it this way:

        *(char *) 0

What is the defined result of evaluating this expression?  Show us the
explicit definition of what you get when you evaluate this.

If you can't, it's undefined.  Failure-to-define is sufficient to make
behavior undefined; even if you don't think there's an explicit statement that
it's undefined, unless you can provide the definition, it's still undefined.
Thank you for this feedback once again, Seebs.

*(char *)0

is an expression defined to evaluate to a result having type 'char'.
The operand for the unary '*' operator has type pointer-to-char, which
is why.

Now then, if the operand points to an object, the result is
furthermore an lvalue.

If the operand points to a function, the result is furthermore a
function designator.

Seems defined quite nicely to me. Do you see any error with this
reasoning? Does the text of 'n1256.pdf' stipulate that the result of
this expression is required to have anything _more_ than a type; that
it must additionally be at least one of an lvalue or a function
designator?
 
S

Seebs

*(char *)0
is an expression defined to evaluate to a result having type 'char'.
The operand for the unary '*' operator has type pointer-to-char, which
is why.

And what is that result?
Now then, if the operand points to an object, the result is
furthermore an lvalue.

It doesn't point to an object. But it does point to an object type.

The unary * operator denotes indirection. If the operand points to
a function, the result is a function designator; if it points to
an object, the result is an lvalue designating the object. If the
operand has type "pointer to type", the result has type "type". If
an invalid value has been assigned to the pointer, the behavior of
the unary * operator is undefined.)

There are three ways we can consider this text. Both yield identical
conclusions.

METHOD #1:

"Has been assigned to" is clumsy wording, but it obviously includes any
possible case in which a pointer *has* an invalid value. A null pointer
is by definition invalid, because it doesn't point to an object. The
behavior is undefined.

METHOD #2:

Consider more closely this sentence:
If the operand points to a function, the result is a function
designator; if it points to an object, the result is an
lvalue designating the object.

This offers the sum total of definitions of the behavior of the unary-*
operator. Since the operand does not point to a function or an object, its
behavior is not defined by this sentence. The behavior is undefined.

METHOD #3:

Let's imagine that the "type" argument is meaningful, and that since the
operand has a *type* of a pointer-to-object, the result is "an lvalue
designating the object". Then let's see 6.3.2.1, paragraph 1:

An lvalue is an expression with an object type or an
incomplete type other than void; if an lvalue does not
designate an object when it is evaluated, the behavior is
undefined.

It does not designate an object, we evaluate it, therefore the behavior is
undefined.

What it comes down to is: Dereferencing null pointers yields undefined
behavior. We know this, the standard is adequately clear on it, and running
around ignoring parts of it at random or adding extra significance to
"has been assigned" does not change it. The indirection operator does not
have any defined behavior when applied to something which is not a pointer
to an object. The behavior is undefined. It is up to you whether you prefer
to think of this as being undefined because the lvalue does not generate
an object, or because * is not defined in its behavior when not given a
pointer to an object-or-function; either way, it's undefined.

We could doubtless improve the text with something like "if the pointer
does not point to an object or function, the behavior is undefined", but
that the text could be improved does not mean that there is any ambiguity
here. In some cases, you can assert confidently that the wording is
poor but that the meaning is clear, and this is one of those cases.

-s
 
B

Ben Bacarisse

Shao Miller said:
It says "if it [the operand] points to an object the result is...".  If
the pointer does not point to an object the behaviour is undefined by
omission.
But it also defines the result in terms of its type, based on the type
of the operand. In our situation, this is defined. The omission of
when the pointer does not point to an object only impacts the
definition of the result, _when_ that result is an _lvalue_. It
doesn't apply to functions, for example. The "has type" clause covers
all three situations:
1. The operand points to an object
2. The operand points to a function
3. The operand points to neither, but has a pointer type

No. You are playing games with the language. Two clauses of one
sentence (separated by ;) talk about the result. A separate sentence
talks about the type. These are not three cases but two attributes of
this form of expression.

C defines the type of many expression forms whether the result is
defined or not. << and >> expressions have the type of the promoted
left operand. In some cases the result (or behaviour) is undefined.
The fact that the type is known does not make all << and >> expressions
defined.

What, in your opinion, is the result of the expression *(char *)0? If
you can't find words from that standard to explain what the result is
(not just its type) then it is undefined by omission.
Whole-heartedly agreed as clumsy, if and only if it's not explicitly
there for a good reason.

It's clear we disagree about that phrase too. Presumably you accept as
valid code like this:

int *ip;
{ int i = 42; ip = &i; }
*ip = 0;

because no invalid pointer has been assigned to ip -- the pointer has
merely become invalid without any assignment.
Its type is defined when the operand has a pointer type. It's
_further_ defined as an lvalue or a function designator, under certain
_additional_ circumstances.

Yes and, by omission, when neither circumstance applies nothing more can be
inferred about the result (from the standard). This is the definition of
undefined. That the expression has type is not in dispute (my original
intervention using sizeof *(char *)0 relies on the expression having a
type) by a type does not mean that the expression has a defined result.
I don't understand your text about "a value for the result of [the]
application" not being "a requirement of the text" but I don't think I
need to.  *E is defined when E points to a function or an object and I
think your example fails both tests.
Again, what do you mean by '*E'? Do you mean "the value of '*E'" or
"the type of '*E'" or both or neither or more than both?

I mean the expression form (E was a kind of syntax place-holder) is
undefined. It often has a type, but its evaluation has no defined
result.
So when 'p' points to a structure and we apply '*' to it, you suggest
that evaluation entails the requirement for knowing the value of '*p'
altogether? If so, does that knowledge require fetching the value?

The result of *p is the entire object. I can't answer your first
question because I don't know what "knowing the value of *p altogether"
means (the problem word for me is "knowing"). As for the second, all
reasonable compilers will look to see what is actually used from *p
to avoid fetching any more than is needed.

*p; /* probably fetch nothing (volatile objects excepted) */
s = *p; /* probably fetch it all -- at least we know where to put it */
(*p).m; /* probably behaves like p->m (i.e. only m is fetched) */

<snip>
 
S

Shao Miller

And what is that result?
What is "what"? The result is defined to have a type. The result
_is_ a result. It _has_ a type.
It doesn't point to an object.  But it does point to an object type.
Agreed for '(char *)0'. Not so for '(void *)0'. I'm with you here,
though.
There are three ways we can consider this text.  Both yield identical
conclusions.

METHOD #1:

"Has been assigned to" is clumsy wording, but it obviously includes any
possible case in which a pointer *has* an invalid value.
Agreed on clumsy wording, if and only if it's not _intentional_. That
remains to be proven or at the very least, demonstrated. What makes
this obvious to you?
 A null pointer
is by definition invalid,
Invalid _what_? A null pointer is a valid value for assignment, is it
not? If you are specifically referring to the context of the
evaluation of the expression '*(char *)0', your claim just above
_requires_ us to accept your previous claim before that; namely, that
"has been assigned" is clumsy and that it includes the case where the
pointer _has_ an invalid value. Furthermore, the non-normative
footnote is the only clue we have as to the possibility of a null
pointer being such an invalid value.
because it doesn't point to an object.  The
behavior is undefined.
You have invented the requirement for the pointer operand to "point to
an object". Consider a function pointer. You have invented
supposedly undefined behaviour. In this Method #1, you have not
convinced me, I'm sorry to say. I was hopeful. It might even be of
no consequence to you whether I've been convinced or not. That's
fine.

Where I might say "invented" below, I additionally mean, "or have not
cited a reference which supports."
METHOD #2:

Consider more closely this sentence:
        If the operand points to a function, the result is a function
        designator; if it points to an object, the result is an
        lvalue designating the object.

This offers the sum total of definitions of the behavior of the unary-*
operator.
Considered. You have just invented the offering that this sentence is
the sum total of definitions of the behaviour for the operator. If
that were so, we could discard the statement regarding type. In fact,
why is that statement in there at all? If the sentence you reference
just above is the "sum total" you describe, why would there be any
reason to add a further definition for the result's type? I propose
that the evaluation result can thus have a type and not be one of an
lvalue, a function designator. I would be much more convinced of the
validity of this Method #2 if the referenced text either explained
that one of these possibilities is required, or some other portion of
the draft explained that sentences like these have such a requirement
implicitly. This Method #2 does not address the "has been assigned to
the pointer", which would make it even more convincing to me.
 Since the operand does not point to a function or an object, its
behavior is not defined by this sentence.  The behavior is undefined.
This claim requires acceptance of the disputed claim above whereby the
sentence is the "sum total" of definitions for the result of
evaluation. Acceptance of that claim means we can discard the
possibility of invalid values being assigned to the pointer. Thus,
the "has been assigned to" sentence could be discarded. So why is it
in there at all, along with the definition of the type?
METHOD #3:

Let's imagine that the "type" argument is meaningful, and that since the
operand has a *type* of a pointer-to-object, the result is "an lvalue
designating the object".
It does _not_ have type pointer-to-object. The expression '(char *)0'
has type pointer-to-char. 'char' can certainly _be_ the type _for_ an
object. '(char *)0' certainly _isn't_ a pointer-to-object, as it is a
null pointer, "guaranteed to compare unequal to a pointer to any
object or function" according to 6.3.2.3, point 3. The result is thus
_not_ "an lvalue designating the object."
Then let's see 6.3.2.1, paragraph 1:

        An lvalue is an expression with an object type or an
        incomplete type other than void; if an lvalue does not
        designate an object when it is evaluated, the behavior is
        undefined.

It does not designate an object, we evaluate it, therefore the behavior is
undefined.
This claim requires acceptance of the disputed claim above that the
result must be an lvalue because the type of the operand is pointer-to-
char. If the result is not an lvalue, evaluation of the result is not
evaluation of an lvalue and the above reference does not apply.
What it comes down to is:  Dereferencing null pointers yields undefined
behavior.  We know this, the standard is adequately clear on it,
We _don't_ know this. Some of us, possibly the majority, _believe_
it. According to some of your arguments, the standard is adequately
clear on it. Why then does Method #1 detail "clumsy wording"? That
would appear to make it inadequately clear. Do we discard Method #1
or do you instead mean, "so far, everyone I've ever known to interpret
null pointer indirection shares the interpretation I have."
and running
around ignoring parts of it at random
Which parts have been ignored? Your 6.3.2.1, point 1 has been
considered and responded to. If I have ignored a cited reference in
another responder's response, then I would be glad of it being brought
to my attention, so I can settle this matter to rest once and for all,
and forget about all of the opinions, inventions, and what appear to
be some plain-and-simple "No, you can't [but I cannot seem to put my
finger on exactly why]" responses that I am interpreting.

I'm not running around. I'm not dead-set in thinking that there
_is_no_ undefined behaviour. I found that there _is_ undefined
behaviour for evaluation of the expression '(void)*(cast *)0;' Nobody
has shown it (UB for '*(char *)0' yet to a satisfactorily reasonable
degree without invoking "I don't believe the intended meaning is
congruent with your interpretation; the wording could be improved" I
have provided a few arguments regarding _no_need_ for the evaluated
result to imply undefined behaviour; only the remote possibility that
implementations with a desire to conform might need to be aware of or
revisit a couple of scenarios and treat the behaviour as defined
rather than undefined. I was hopeful that "obvious" meant that there
was a simple sequence of reasoning to follow based on the text of the
draft or of a standard of C. This hope remains, with the kind
intentions and assistance of responders such as yourself.
or adding extra significance to
"has been assigned" does not change it.
This is backwards. I am _not_deducting_ significance from "has been
assigned." You and other kind responders have been deducting. My
arguments treat the significance literally, do they not?
 The indirection operator does not
have any defined behavior when applied to something which is not a pointer
to an object.
Except that pointers to functions are defined, as well as with the
defined behaviour of having a result with a type. Thus this claim is
false. The claim does, however, certainly re-emphasize the
commonality of this argument throughout this thread.
The behavior is undefined.  It is up to you whether you prefer
to think of this as being undefined because the lvalue does not generate
an object, or because * is not defined in its behavior when not given a
pointer to an object-or-function; either way, it's undefined.
You require one of the previous claims to be true here. You suggest
that the definition of the result's type makes for an incomplete
definition for the result. I shall continue to choose "looks as
though it's defined," instead. This might be of no consequence to
you, but I do appreciate your attempts to provide evidence.
We could doubtless improve the text with something like "if the pointer
does not point to an object or function, the behavior is undefined", but
that the text could be improved does not mean that there is any ambiguity
here.
You have brought to light two points of ambiguity:
1. The semantic point including "has been assigned to" may only be
intended to mean "if the value of the pointer is an invalid value."
In which case, there still is no normative definition for what
constitutes an invalid value, though we have a hint from the footnote.
2. The semantic point's sentence regarding the result's type should
not be specified independently from the other two definitions of
_possible_ properties of the result, given certain circumstances.
 In some cases, you can assert confidently that the wording is
poor but that the meaning is clear, and this is one of those cases.
At this time, I cannot. You and some others have.

Thank you so much!
 
B

Ben Bacarisse

Shao Miller said:
What is "what"? The result is defined to have a type. The result
_is_ a result. It _has_ a type.

Oh, be serious!

6.5 Expressions

1 An expression is a sequence of operators and operands that specifies
computation of a value, or that designates an object or a function,
or that generates side effects, or that performs a combination
thereof.

What about the expression *(char *)0? Does it generate side-effects?
No. Does it designate an object or a function? No. It must therefore
specify the computation of a value. We are all certain of the type of
that value (char) by we don't yet know which of the many chars is it.

By the way, I am happy just to agree to disagree about this. I'll
continue to write C avoiding constructs like *(char *)0 and so will you!
 
S

Seebs

What is "what"? The result is defined to have a type. The result
_is_ a result. It _has_ a type.

But unless we know what the result is -- not just its type -- it is
NOT DEFINED.
Agreed for '(char *)0'. Not so for '(void *)0'. I'm with you here,
though.

Yes, so "*(void *) 0" is a constraint violation.
Agreed on clumsy wording, if and only if it's not _intentional_. That
remains to be proven or at the very least, demonstrated. What makes
this obvious to you?

Complete consistency across dozens of implementations and the last thirty
years of writing about C, reading about C, programming in C, and looking
at the code to implementations. I was on the committee. We have all, always,
agreed absolutely that dereferencing null pointers is clearly and
unambiguously undefined behavior.
Invalid _what_?

An invalid pointer -- as opposed to one which points to an object.
A null pointer is a valid value for assignment, is it
not? If you are specifically referring to the context of the
evaluation of the expression '*(char *)0', your claim just above
_requires_ us to accept your previous claim before that; namely, that
"has been assigned" is clumsy and that it includes the case where the
pointer _has_ an invalid value. Furthermore, the non-normative
footnote is the only clue we have as to the possibility of a null
pointer being such an invalid value.

No, it doesn't. It only requires that we understand that the standard
is consistent about referring to a pointer which does not definitely point
to an object as "invalid". (Invalid includes both null pointers and to
objects whose lifetime is over.)
You have invented the requirement for the pointer operand to "point to
an object".

No, I haven't.

I have observed that there is no definition provided for the behavior of
indirection through a pointer which does not point to either an object or
a function.
It might even be of
no consequence to you whether I've been convinced or not. That's
fine.

At this point, I see nothing in any of your posts to suggest that you are
even sincere; you are acting precisely like a troll.
Where I might say "invented" below, I additionally mean, "or have not
cited a reference which supports."

Because obviously, accusing someone of lying (which is what "invented" means
in this context) is the clearest way to communicate that you didn't see or
understand a citation.
Considered. You have just invented the offering that this sentence is
the sum total of definitions of the behaviour for the operator.

No, I haven't.

First off, "invented" means "newly created", and that is precisely equivalent
to accusing me of lying about what the standard says. I see no real reason
to continue arguing with you at this point.

Secondly, that is the ENTIRE POINT of a standard. It is the sum total of
the definition. You have not offered or cited or suggested or hinted at
any explanation of what value should be yielded by dereferencing a null
pointer, *because there is none*. That means it's undefined. If it were
defined, we would know not only its type, but its value.
If that were so, we could discard the statement regarding type.

No, we couldn't, because the value and the type are two different things,
and it is in some cases possible to dispute which type an expression would
have even knowing its value.
It does _not_ have type pointer-to-object. The expression '(char *)0'
has type pointer-to-char.

Yes, and since char is an object type, pointer-to-char is a pointer-to-object
type, as opposed to a pointer to an incomplete type or a pointer to a function
type.
'char' can certainly _be_ the type _for_ an
object. '(char *)0' certainly _isn't_ a pointer-to-object, as it is a
null pointer, "guaranteed to compare unequal to a pointer to any
object or function" according to 6.3.2.3, point 3. The result is thus
_not_ "an lvalue designating the object."

And that means that the standard does not define the results of the
indirection.

What is not defined is undefined. QED.

And because you are either an idiot or an unbelievable jerk, I'm plonking
you. You can't possibly be this stupid, and there is no way I'm putting
up with your continued totally unsupported accusations that other people are
lying. The only way that could be unintentional would be if your English
skills were weak enough that it is necessarily trolling for you to be
arguing with people about what they think sentences in English mean.

Either you should stop disputing peoples' interpretations of English, or you
know it well enough that the accusations of dishonesty are clearly
intentional. Either way, we're done, and I sincerely hope I never have to
see anything you have to say again. Go away. You do not have an attitude
conducive to learning about C, or any other language, and with the way you've
treated people, I see no reason to believe you will ever acquire one.

-s
 
K

Keith Thompson

Shao Miller said:
What is "what"? The result is defined to have a type. The result
_is_ a result. It _has_ a type.

And a value. What is the value of the result? Nothing in the standard
defines what that value is, therefore the value is undefined.

[...]
Agreed on clumsy wording, if and only if it's not _intentional_. That
remains to be proven or at the very least, demonstrated. What makes
this obvious to you?

How could it be intentional? If it's intentional, it doesn't make any
sense.

Once again, here's the passage in question, 6.5.3.2p4:

The unary * operator denotes indirection. If the operand points
to a function, the result is a function designator; if it points
to an object, the result is an lvalue designating the object. If
the operand has type ‘‘pointer to type’’, the result has
type ‘‘type’’. If an invalid value has been assigned to
the pointer, the behavior of the unary * operator is undefined.

It is not possible to assign a value, invalid or otherwise, to "the
pointer" unless "the pointer" is a pointer object. The context does not
imply the existence of any pointer object to which the phrase "the
pointer" could refer. Even if the operand happens to be an lvalue,
it is no longer an lvalue (and thus no longer designates an object)
before the "*" operator is applied to it. 6.3.2.1p2:

Except when it is the operand of [list snipped] an lvalue that
does not have array type is converted to the value stored in
the designated object (and is no longer an lvalue).

The author implicitly assumed the existence of a pointer object that
does not exist.

[big snip]
 
K

Keith Thompson

Ben Bacarisse said:
6.5 Expressions

1 An expression is a sequence of operators and operands that specifies
computation of a value, or that designates an object or a function,
or that generates side effects, or that performs a combination
thereof.

What about the expression *(char *)0? Does it generate side-effects?
No. Does it designate an object or a function? No. It must therefore
specify the computation of a value. We are all certain of the type of
that value (char) by we don't yet know which of the many chars is it.

(void)0 neither computes a value, nor designates an object or function,
nor generates side effects, or any combination thereof.

This is yet another reason the standard's definition of "expression" is
flawed. The other is that some expressions, for example 42, contain no
operators or operands.

It's not *fatally* flawed; I don't think it's led anyone to an incorrect
understanding of what "expression" means. But a more accurate
definition would refer to the syntax.
 
M

Morris Keesan

....
What about the expression *(char *)0? Does it generate side-effects?
No.

Can you point to anything in the standard which says that the expression
doesn't generate side-effects? Since evaluating that expression results
in undefined behavior, I would argue that whether or not there are side
effects is undefined.
 
B

Ben Bacarisse

Morris Keesan said:
On Fri, 23 Jul 2010 19:50:15 -0400, Ben Bacarisse


Can you point to anything in the standard which says that the expression
doesn't generate side-effects? Since evaluating that expression results
in undefined behavior, I would argue that whether or not there are side
effects is undefined.

Shao Miller does not believe that. I was taking him through the
consequences of that belief so that he would have to say what value the
expression has.
 
S

Shao Miller

No.  You are playing games with the language.  Two clauses of one
sentence (separated by ;) talk about the result.  A separate sentence
talks about the type.  These are not three cases but two attributes of
this form of expression.
I did not intend to play games with the language. If that has been
the case, then I sincerely apologize for doing so. Any inability of
my own to have implicitly understood this is nobody's challenge but
mine. Perhaps I have been confused surrounding "expressions" and
"results". The simplest way for me to accept your claim would be to
assert that "every expression must have a value." Since there is no
definition of a value for this scenario, that would quite simply lead
to an incompletely defined result, which I would be happy to call
"undefined behaviour" regarding the evaluation of the expression, or
even during execution.
C defines the type of many expression forms whether the result is
defined or not.
Here again, I was possibly missing the equivalence between "value" and
"result." Often interchangeable in everyday usage, I perhaps have
incorrectly assumed that the referenced C draft might have very
specific meanings for each and might distinguish them. In the
original post, we see a definition for "value" which could easily
contribute to such a misunderstanding.
 << and >> expressions have the type of the promoted
left operand.  In some cases the result (or behaviour) is undefined.
The fact that the type is known does not make all << and >> expressions
defined.
Your argument for tying both "type" and "value" together before
yielding a defined result is a very good one. By my previous
suggestions, "The type of the result is
that of the promoted left operand" for these two operators would again
yield a result with a type, but possibly no defined value. The text
for these operators does seem to cover all possibilities however,
signed, unsigned, UB, implementation-defined. The constraint for
"integer type" helps. You cannot have an integer type which is
neither signed nor unsigned. It would be nice if a constraint for the
unary '*' operator were that the pointer must either point to an
object or to a function. Perhaps some kind reader could introduce
such a constraint into a future standard. Thanks for this reference,
Ben.
What, in your opinion, is the result of the expression *(char *)0?  If
you can't find words from that standard to explain what the result is
(not just its type) then it is undefined by omission.
Here again I can only say "result with a type". Accepting that
evaluation of an expression shall yield a result with both type and
value means that I would have to say that '*(char *)0' is an
incompletely defined result, which I would happily call undefined
behaviour.
It's clear we disagree about that phrase too.  Presumably you accept as
valid code like this:

  int *ip;
  { int i = 42; ip = &i; }
  *ip = 0;
I accept this code as guaranteed to imply UB. If responsible for some
portion of development of a C implementation advertising conformance
and criticized by a stake-holder regarding diligence in regards to
this kind of code, I would be much more confident to be able to point
at the (albeit, non-normative) footnote which suggests that 'ip' _has_
been assigned a value which is "the address of an object after the end
of its lifetime". I would be less confident without.
because no invalid pointer has been assigned to ip -- the pointer has
merely become invalid without any assignment.
I disagreed above, meaning I believe we share a qualification for
"undefined behaviour" here.
Yes and, by omission, when neither circumstance applies nothing more can be
inferred about the result (from the standard).  This is the definition of
undefined.  That the expression has type is not in dispute (my original
intervention using sizeof *(char *)0 relies on the expression having a
type) by a type does not mean that the expression has a defined result.
This is possible misunderstanding of mine has been detailed twice
above. You appear to suggest that the semantics must define both a
value and a type to accomplish a defined result. Anything less is
undefined.
I mean the expression form (E was a kind of syntax place-holder) is
undefined.  It often has a type, but its evaluation has no defined
result.
Ok.


The result of *p is the entire object.
This troubles me just a bit, due to section 6.5, point 2. I would
worry that in:

*x = *x + 1;

That we have "read" the "prior value" twice, inappropriately, for each
evaluation of the unary '*' operator. I have not fully explored this
avenue of thought and do not intend it as an argument by any means.
Please feel free to discard.
 I can't answer your first
question
That you have answered any of the questions at all is valuable for
anyone concerned about the subject.
because I don't know what "knowing the value of *p altogether"
means (the problem word for me is "knowing").  As for the second, all
reasonable compilers will look to see what is actually used from *p
to avoid fetching any more than is needed.
Agreed.


  *p; /* probably fetch nothing (volatile objects excepted) */
  s = *p; /* probably fetch it all -- at least we know where to put it */
  (*p).m; /* probably behaves like p->m (i.e. only m is fetched) */
Also agreed. This makes me curious about something like:

int main(void) {
static int foo = 15;
struct bar {
char c[(size_t)&foo];
int baz;
};
return (*(struct foo *)0).baz;
}

for the not-yet-accepted (disputed) interpretation that something like
moving a Turing machine's head to position 0 but then moving it by the
offset of 'baz' before reading or writing to a potential object might
be a reasonable thing to do in C. (The common response has been that
it's UB to try this.) Some implementations might allow for such
behaviour, but that's obviously not evidence of any sort that it's not
UB.

My sense after your post here is that it would be very easy to let go
of any uncertainty regarding the un/defined behaviour of '*(char *)0'
and friends if we easily accept that a result must have a defined
value and a defined type. Thanks for that, Ben.
 
S

Shao Miller

Oh, be serious!
Please accept my apologies for not meeting your expectations for
serious discussion. I will try harder. :)
6.5 Expressions

  1 An expression is a sequence of operators and operands that specifies
    computation of a value, or that designates an object or a function,
    or that generates side effects, or that performs a combination
    thereof.
This fine reference is _extremely_ helpful in accepting that
evaluation of an expression (which we don't do for sizeof, for
example, but _as_defined_ for sizeof) implies both a type and a
value. Thank you, Ben!
What about the expression *(char *)0?  Does it generate side-effects?
No.  Does it designate an object or a function?  No.  It must therefore
specify the computation of a value.  We are all certain of the type of
that value (char) by we don't yet know which of the many chars is it.
Very convincing argument. Excellent.
By the way, I am happy just to agree to disagree about this.
I would be to, but perhaps that's not a requirement. You have kindly
chipped away here a good bit.
I'll
continue to write C avoiding constructs like *(char *)0 and so will you!
Absolutely. The only reason something like '*(void *)0;' might be
interesting to me if it did _not_ have undefined behaviour would be
for development of an implementation or for an easy "do-nothing"
preprocessor macro; but something more than just ';'.
 
S

Shao Miller

A thoroughly considered response.

It is clear that my discussion has agitated you. I did not intend to
imply to any audience that you have been lying to me with your
responses. To clarify, I do not believe you have lied to anyone in
your responses. Your experience and the experience of other
responders is what I was hopeful for the benefit of with regards to
this subject matter.

I _sincerely_ apologize, Peter. I will try to keep all of your
discussion's valuable points in mind as the subject matter becomes
clearer. Please do not attribute ill intent where an explanation of
another sort will do. I help people with computer-related subjects
all day, every day, for years. Sometimes I want to ask, "How much are
they paying you to waste my time?!" If that's how you feel, I don't
wish to aggravate that feeling. I won't trouble you any more, with
good fortune.

Perhaps this could lighten the mood? Agitating you with your
evaluation of my void expression '(void)*(char *)0;' was an unintended
side-effect.
 
R

Rich Webb

But unless we know what the result is -- not just its type -- it is
NOT DEFINED.

If I can jump in here ...

As I understand it, undefined behavior can be but is not required to be
defined by an implementation. And so ...
Yes, so "*(void *) 0" is a constraint violation.

.... is indeed a constraint violation but nonetheless is permitted to be
defined by e.g. an implementation targeted at embedded applications.

From the Rationale: Undefined behavior gives the implementer license not
to catch certain program errors that are difficult to diagnose. It also
identifies areas of possible conforming language extension: the
implementer may augment the language by providing a definition of the
officially undefined behavior.

Or am I missing the whole point here? (Wouldn't be the first time.)
 
S

Shao Miller

(void)0 neither computes a value, nor designates an object or function,
nor generates side effects, or any combination thereof.
Wait a minute, didn't post #56 have something about 'void's and
values? Here '0' has type and value. Casting to 'void' appears to
discard that value. Does that mean that result of evaluating
'(void)0' has a type but no value? Is that void expression a
legitimate expression statement? Thanks, the Other Keith.

So is the result of:

*(void *)0

required to have a value or could it just as well be a void
expression, possibly used in an expression statement? Is it somewhere
required that unary '*'' should have defined type and value in the
result but that casting to void may not? "Cast operators" has
semantics that appear to define "converts the value of the expression
to the named type." So '(void)58' yields UB, right? Even before
considered as a void expression?
This is yet another reason the standard's definition of "expression" is
flawed.  The other is that some expressions, for example 42, contain no
operators or operands.
Agreed. And while we might know what to do and what not to do while
programming, development of an implementation might require a stricter
understanding.
It's not *fatally* flawed; I don't think it's led anyone to an incorrect
understanding of what "expression" means.  But a more accurate
definition would refer to the syntax.
Agreed, with the exception of the question above.
 
S

Shao Miller

And a value.  What is the value of the result?  Nothing in the standard
defines what that value is, therefore the value is undefined.
This appears to be the critical piece. A result shall have a defined
value and a defined type. Anything less is undefined behaviour. And
yet, a void expression describes a non-existent value for an
expression, albeit with a type of 'void'. We can get one of these
from casting to 'void', right? Even though a cast converts a value to
a named type?
How could it be intentional?  If it's intentional, it doesn't make any
sense.

Once again, here's the passage in question, 6.5.3.2p4:

    The unary * operator denotes indirection. If the operand points
    to a function, the result is a function designator; if it points
    to an object, the result is an lvalue designating the object. If
    the operand has type ‘‘pointer to type’’, the result has
    type ‘‘type’’. If an invalid value has been assigned to
    the pointer, the behavior of the unary * operator is undefined.

It is not possible to assign a value, invalid or otherwise, to "the
pointer" unless "the pointer" is a pointer object.  The context does not
imply the existence of any pointer object to which the phrase "the
pointer" could refer.  Even if the operand happens to be an lvalue,
it is no longer an lvalue (and thus no longer designates an object)
before the "*" operator is applied to it.  6.3.2.1p2:
There is no constraint that the operand is a pointer object. The
constraint is that the operand is a pointer. We know from an earlier
reference of yours that the operand is a value. Perhaps this "has
been assigned..." goes beyond "the operand" (it doesn't mention the
operand, unlike the other sentences), and means something more like
"if the value was that of a pointer object, and that object had been
assigned..." It's just plain fishy, at the very least.
... ... ...
The author implicitly assumed the existence of a pointer object that
does not exist.
Could very well be. Even seems to be a common interpretation amongst
discussants.
 
S

Shao Miller

... is indeed a constraint violation but nonetheless is permitted to be
defined by e.g. an implementation targeted at embedded applications.
And what constraint is that? There is one constraint for unary '*':
"The operand of the unary * operator shall have pointer type."
Constraints versus semantics. 5.1.2.3, point 3 (again): "In the
abstract machine, all expressions are evaluated as specified by the
semantics." As a matter of fact, 'sizeof' even makes use of the
semantics insofar as the result's type, even though the expression
operand is not evaluated!
From the Rationale: Undefined behavior gives the implementer license not
to catch certain program errors that are difficult to diagnose.  It also
identifies areas of possible conforming language extension:  the
implementer may augment the language by providing a definition of the
officially undefined behavior.

Or am I missing the whole point here? (Wouldn't be the first time.)
An enjoyable reference. Thanks for sharing it, Rich. The question of
the original post has been answered because its cast to 'void'
requires value because its operand does not already have type void.
There are a few useful points floating around, from my perspective:
1. Does '*(void *)0' yield undefined behaviour?
2. The last sentence of 6.5.3.2, Semantics 4 seems to apply only to an
lvalue operand, but the constraint 2 does not require this. Something
there should change.
3. The definition for "expression" doesn't appear to apply to Keith's
"42" nor his "(void)0". Something there should change.
4. It is not entirely clear whether or not a result is required to
have both a type and a value.
5. Undefined behaviour is undesirable in a general sense.
 
B

Ben Bacarisse

Shao Miller said:
I did not intend to play games with the language. If that has been
the case, then I sincerely apologize for doing so. Any inability of
my own to have implicitly understood this is nobody's challenge but
mine.

OK, I accept that. You said:

| The omission of when the pointer does not point to an object only
| impacts the definition of the result, _when_ that result is an
| _lvalue_.

To the extent I can give that any meaning at all, it is at odds with the
plain words of the paragraph that is causing you so much trouble. This
made me suspect that you are looking for trouble -- deliberately trying
to misread the plain words to find a confusion. I am happy to be wrong
about that.

To be clear, the omission (the failure to specify a result) impacts the
result when the operand is neither a function pointer nor an object
pointer.

The simplest way for me to accept your claim would be to
assert that "every expression must have a value."

But that would be wrong, e.g. (void)(1+2).
Since there is no
definition of a value for this scenario, that would quite simply lead
to an incompletely defined result, which I would be happy to call
"undefined behaviour" regarding the evaluation of the expression, or
even during execution.

OK, undefined behaviour it is, then.
Here again, I was possibly missing the equivalence between "value" and
"result." Often interchangeable in everyday usage, I perhaps have
incorrectly assumed that the referenced C draft might have very
specific meanings for each and might distinguish them. In the
original post, we see a definition for "value" which could easily
contribute to such a misunderstanding.

I think you've not taken Tim's excellent advice to heart. If you treat
the standard as a set formal definitions with rigid consequences (like a
piece of mathematics) you will find that almost no programs have any
meaning. 0 is not an expression (it lacks operators); 42 has no value
(there is no object to have a value as per the definition) and so on.
You have to read it with slice of common sense. The meaning of its
terms is partly to be gleaned from absorbing how they are used. Look
at how "result" and "value" are used. "Result" is not defined so you
have to guess. What is "the result of an expression"? Most people
conclude that it is some notion of a quantity with an associated type.
Your argument for tying both "type" and "value" together before
yielding a defined result is a very good one. By my previous
suggestions, "The type of the result is
that of the promoted left operand" for these two operators would again
yield a result with a type, but possibly no defined value. The text
for these operators does seem to cover all possibilities however,
signed, unsigned, UB, implementation-defined.

It does not matter. If it only covered a few cases, those not
explicitly covered would be undefined. Knowing the type would not make
the result any less defined.
The constraint for
"integer type" helps. You cannot have an integer type which is
neither signed nor unsigned.

No, it does not help. Knowing that -1 << 512 is of type int does not
make it any less undefined.
It would be nice if a constraint for the
unary '*' operator were that the pointer must either point to an
object or to a function. Perhaps some kind reader could introduce
such a constraint into a future standard.

Such a constraint is not possible. Constraints must be diagnosed by the
implementation when the program is translated and the compiler can't
tell when the operand of * does not point to an object. Is this OK or
not:

int f(int *ip) { return *ip; }

?
Here again I can only say "result with a type". Accepting that
evaluation of an expression shall yield a result with both type and
value means that I would have to say that '*(char *)0' is an
incompletely defined result, which I would happily call undefined
behaviour.

We are agreed!

This is possible misunderstanding of mine has been detailed twice
above. You appear to suggest that the semantics must define both a
value and a type to accomplish a defined result. Anything less is
undefined.

Yes, that is my view. If a "defined result" could be just a type, why
does the standard not say more about what one can do with these results?
Is *(char *)0 << *(char *)0 just a pure int result? An expression's
result is either defined or undefined -- just a type is not enough.

This troubles me just a bit, due to section 6.5, point 2. I would
worry that in:

*x = *x + 1;

That we have "read" the "prior value" twice, inappropriately, for each
evaluation of the unary '*' operator. I have not fully explored this
avenue of thought and do not intend it as an argument by any means.
Please feel free to discard.

You can read the prior value as often as you like. The limit is on
modifying the stored value more than once. And whilst I agree that it's
not entirely clear what constitutes a "read" of the value, most people
would say that is requires an lvalue to value conversion to be a read.

  *p; /* probably fetch nothing (volatile objects excepted) */
  s = *p; /* probably fetch it all -- at least we know where to put it */
  (*p).m; /* probably behaves like p->m (i.e. only m is fetched) */
Also agreed. This makes me curious about something like:

int main(void) {
static int foo = 15;
struct bar {
char c[(size_t)&foo];
int baz;
};
return (*(struct foo *)0).baz;
}

That's a constraint violation. If your compiler does not complain about
it get another one! The array size must be an integer constant
expression.
for the not-yet-accepted (disputed) interpretation that something like
moving a Turing machine's head to position 0 but then moving it by the
offset of 'baz' before reading or writing to a potential object might
be a reasonable thing to do in C. (The common response has been that
it's UB to try this.) Some implementations might allow for such
behaviour, but that's obviously not evidence of any sort that it's not
UB.

I don't follow this at all. The example is undefined because of the
constraint violation -- it need not even generate any executable code.
My sense after your post here is that it would be very easy to let go
of any uncertainty regarding the un/defined behaviour of '*(char *)0'
and friends if we easily accept that a result must have a defined
value and a defined type.

Just do it. Take the red pill.
 
B

Ben Bacarisse

Ben Bacarisse said:
Shao Miller <[email protected]> writes:
int main(void) {
static int foo = 15;
struct bar {
char c[(size_t)&foo];
int baz;
};
return (*(struct foo *)0).baz;
}

That's a constraint violation. If your compiler does not complain about
it get another one! The array size must be an integer constant
expression.

Actually it is not a constraint violation but it certainly violates a
"shall" about integer constant expressions (6.6 p6). I am surprised
this is not a CV since I can't see any reason it can't be checked at
compile time, but it's not one.

<snip>
 
S

Shao Miller

OK, I accept that.  You said:

| The omission of when the pointer does not point to an object only
| impacts the definition of the result, _when_ that result is an
| _lvalue_.

To the extent I can give that any meaning at all, it is at odds with the
plain words of the paragraph that is causing you so much trouble.  This
made me suspect that you are looking for trouble -- deliberately trying
to misread the plain words to find a confusion.  I am happy to be wrong
about that.
You _are_ wrong about "looking for trouble". Please continue to be
happy. :)
To be clear, the omission (the failure to specify a result) impacts the
result when the operand is neither a function pointer nor an object
pointer.
Ok. But I'd rather that even this was clearer.
1. There is a sentence which specifies a value for the result.
2. There is a sentence which specifies a type for the result.
3. If the sentence regarding the value does not apply, the sentence
regarding the type is _insufficient_ to define a whole result.
But that would be wrong, e.g. (void)(1+2).
Ok. But I'd rather that even this was clearer:
1. There is a sentence which specifies a value for the result.
2. There is a sentence which specifies a type for the result.
3. If the sentence regarding the value does not apply, the sentence
regarding the type is _sufficient_ to define a whole result.
OK, undefined behaviour it is, then.
Agreed conditional upon acceptance of at least one of:
1. "...has been assigned..." really means something more like "is an
invalid value"

OR:

2. Casting to 'void' and application of the unary '*' operator are
treated differently. Both may fail to define a value for the result
of an evaluation, but the cast is permitted as defined behaviour.
I think you've not taken Tim's excellent advice to heart.  If you treat
the standard as a set formal definitions with rigid consequences (like a
piece of mathematics) you will find that almost no programs have any
meaning.  0 is not an expression (it lacks operators); 42 has no value
(there is no object to have a value as per the definition) and so on.
Each of these points feels like a blow, including any failure on my
part to treat the referenced draft as anything more than a guide to be
supplemented by popular consensus.
You have to read it with slice of common sense.
"Common sense" meaning "popular interpretation" to me. Very well;
accepted.
 The meaning of its
terms is partly to be gleaned from absorbing how they are used.  Look
at how "result" and "value" are used.  "Result" is not defined so you
have to guess.  What is "the result of an expression"?  Most people
conclude that it is some notion of a quantity with an associated type.
If writing a translator, I might have a 'struct result' with a pointer
to a type and a pointer to a value. I might initialize these with
NULL each. If an "operator" for a 'struct result' demanded one of
these properties but it was not defined, I might diagnose undefined
behaviour. It simply seemed to me that there were circumstances in
which some code path for "evaluation" might not ever use one of the
properties, which would lead me to question the validity of diagnosing
as UB if one compares as NULL but there was no expectation for it to
be non-NULL... Such as a void expression, which appears to be more
limited than I thought (casts to void and functions returning void,
for example).
It does not matter.  If it only covered a few cases, those not
explicitly covered would be undefined.  Knowing the type would not make
the result any less defined.
Well...


No, it does not help.  Knowing that -1 << 512 is of type int does not
make it any less undefined.

Such a constraint is not possible.  Constraints must be diagnosed by the
implementation when the program is translated and the compiler can't
tell when the operand of * does not point to an object.
A constraint that: "except when the '*' operator is used as the
operand to the 'sizeof' operator, an expression evaluating to a null
pointer constant or to a null pointer constant cast to any pointer
type shall not be the operand," might do, mightn't it?
 Is this OK or
not:

  int f(int *ip) { return *ip; }

?
Yes. The function call assigns the value of an argument to the 'ip'
parameter. Passing in invalid value would result in UB.
We are agreed!

Yes, that is my view.  If a "defined result" could be just a type, why
does the standard not say more about what one can do with these results?
Is *(char *)0 << *(char *)0 just a pure int result?  An expression's
result is either defined or undefined -- just a type is not enough.
Well actually, it does explain what you can do with the results. I
had made earlier references to these. "Cast operators"' first
constraint says "Unless...the operand shall have scalar type". Its
first semantic point talks about "the value of the expression."
"Simple assignment" talks about "type" and "value" for the
"operands". That explicitness (along with void expressions) was part
of why a result was not required to have both, against the consensus
here. However it was the _consumers_ of the results that I was taking
to give meaning to constraint-valid and semantically valid
expressions. The consensus appears to be that the results are defined
or not, regardless of the consumers or their properties, except for
'sizeof' (which nobody has disputed).
You can read the prior value as often as you like.  The limit is on
modifying the stored value more than once.  And whilst I agree that it's
not entirely clear what constitutes a "read" of the value, most people
would say that is requires an lvalue to value conversion to be a read.
That saved me some investigation. Very much appreciated.
Just do it.  Take the red pill.
The tiny print of the brand name appears to read "DeFacto"; I think
that's an Italian company.

Thanks so much, Ben. You've been a great help.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,774
Messages
2,569,596
Members
45,142
Latest member
arinsharma
Top