The undefinedness of a common expression.

E

Ersek, Laszlo

sfuerst said:
It is not me contending that undefined behaviour is invoked. That
answer seemed to be the consensus of the thread in 2002. It's start
is at:
http://groups.google.com/group/comp.std.c/browse_thread/thread/cffd61637f55=
20d/5ce672676f5ab8ef
and it is a particularly interesting read. The pointer-chain example
is introduced about halfway along.

I've paged through all 329 messages and searched for the string "*a".
I've also searched the current thread. Since I haven't found what I was
looking for, I'd like to add a reformulation of the original

a[a[0]] = 1;

statement. I apologize if it's trivial.

C89 6.3.2.1 Array subscripting

"The definition of the subscript operator [] is that E1[E2] is identical
to (*(E1+(E2)))."

C99 6.5.2.1 Array subscripting

"The definition of the subscript operator [] is that E1[E2] is identical
to (*((E1)+(E2)))."

Going with the latter, the statement can be rewritten as

(*((a)+((*((a)+(0)))))) = 1;

which can be simplified to

*(a + *a) = 1;

I'm sorry if this doesn't add anything to the discussion; my hope is
that it would.

lacos
 
K

Kaz Kylheku

Kaz Kylheku said:
Hello, back in 2002 there was a long discussion in these newsgroups
about the undefinedness of the expression:

a[a[0]] = 1; when a[0] begins with the value 0.
The general opinion was that the above invokes undefined behaviour due

General or not, that is not a particularly well-informed opinion.
<snip more about sequencing>

I don't disagree with your arguments about sequencing, logic, and
causality, but I don't see how you can dismiss the conclusion above so
lightly -- particularly without any reference to the text that, in my
view, renders it undefined.

The oft-quoted 6.5 paragraph 2 reads:

"Between the previous and next sequence point an object shall have
its stored value modified at most once by the evaluation of an
expression. Furthermore, the prior value shall be read only to
determine the value to be stored." [footnote numbers removed]

This text can be parsed as:

[ Furthermore, the prior value shall be read only ] to
determine the value to be stored."

Indeed, before the store, the prior value shall not be written, but only read,
and this is of course necessary for determining the value to be stored.

See, people are just reading it wrong. There is no document defect to see
here, people; move along.

So this sentence does not unambiguously grant implementors a license
to gratuitously break code with cunning diagnostics (at least in
a mode that claims to be conforming).
 
B

Ben Bacarisse

Kaz Kylheku said:
Kaz Kylheku said:
Hello, back in 2002 there was a long discussion in these newsgroups
about the undefinedness of the expression:

a[a[0]] = 1; when a[0] begins with the value 0.
The general opinion was that the above invokes undefined behaviour due

General or not, that is not a particularly well-informed opinion.
<snip more about sequencing>

I don't disagree with your arguments about sequencing, logic, and
causality, but I don't see how you can dismiss the conclusion above so
lightly -- particularly without any reference to the text that, in my
view, renders it undefined.

The oft-quoted 6.5 paragraph 2 reads:

"Between the previous and next sequence point an object shall have
its stored value modified at most once by the evaluation of an
expression. Furthermore, the prior value shall be read only to
determine the value to be stored." [footnote numbers removed]

This text can be parsed as:

[ Furthermore, the prior value shall be read only ] to
determine the value to be stored."

Are you being serious? It seems unlikely. In case you are, I have
these arguments:

It is true that "only" can either follow or precede the thing it
limits, but not, I think when followed by "to". The OED uses "only
to" as an example of only preceding the thing it limits (1899 Literary
Guide 1 Oct. 146/2 "Certain doctrines were imparted only to
initiates"). Your interpretation is more naturally written "the prior
value shall only be read to determine the value to be stored".

Your parse (if I have it right) is that to determine the value to be
stored, only reading of the prior value is permitted. What actions on
the prior value is your interpretation intended to prohibit? Would it
not prevent the value from being doubled to determine the value to
be stored (i = i*2;)?

In your reading of it, does the sentence have any purpose? I.e. what
kinds of expression would be defined were it not for that extra
sentence?
Indeed, before the store, the prior value shall not be written, but only read,
and this is of course necessary for determining the value to be
stored.

Your phrase "the store" is a slight of hand. Which is "the store"
before which the prior value can not be written in:

x = i++;

and why would the standard wish to say which store comes first? The
prior value of i may very well be written (to x) before the store to
i. If you intended to limit this remark to expressions with only one
object being modified, the you are saying that "the prior value can't
be written before it is written".
See, people are just reading it wrong. There is no document defect to see
here, people; move along.

If you are right, there is a defect because the second example in
footnote 73 is then wrong. a[i++] = i; is fine by your reading, is it
not?
 
W

Wojtek Lerch

Kaz Kylheku said:
This text can be parsed as:

[ Furthermore, the prior value shall be read only ] to
determine the value to be stored."

I'm sorry, English is not my first language and it's not obvious to me what
you meant to say there -- do you mean that the "only" does not apply to
"determine" (i.e. only to determine the new value, but not for any other
purpose), but to "read" (i.e. only read, but not written anywhere, added to
anything, or processed in any other way)? This wouldn't seem to make a lot
of sense to me, so perhaps that's not what you meant?
 
K

Kaz Kylheku

Kaz Kylheku said:
Hello, back in 2002 there was a long discussion in these newsgroups
about the undefinedness of the expression:

a[a[0]] = 1; when a[0] begins with the value 0.
<snip discussion of sub-expression sequencing>
The general opinion was that the above invokes undefined behaviour due

General or not, that is not a particularly well-informed opinion.
<snip more about sequencing>

I don't disagree with your arguments about sequencing, logic, and
causality, but I don't see how you can dismiss the conclusion above so
lightly -- particularly without any reference to the text that, in my
view, renders it undefined.

The oft-quoted 6.5 paragraph 2 reads:

"Between the previous and next sequence point an object shall have
its stored value modified at most once by the evaluation of an
expression. Furthermore, the prior value shall be read only to
determine the value to be stored." [footnote numbers removed]

This text can be parsed as:

[ Furthermore, the prior value shall be read only ] to
determine the value to be stored."

Are you being serious? It seems unlikely. In case you are, I have
these arguments:

Yep.

And note that the parse that many people are assuming rules out
the assignment operator completely. Code such as

i = 1;

is undefined. Here i is modified, but it is also read, not for
computing the value to be stored. The value of an assignment expression
is that of the lvalue, after the assignment, you see. So the lvalue
is read by this expression to fetch this value. That ``after
the assignment'' bit appears like it imposes an order, but it's
not a sequence point. So the expression statement i = 1; contravenes the
naive parsing of paragraph 2.
limits, but not, I think when followed by "to". The OED uses "only
to" as an example of only preceding the thing it limits (1899 Literary
Guide 1 Oct. 146/2 "Certain doctrines were imparted only to
initiates").

Here ``to initiates'' is a very different kind of clause from ``to
determine ...'', because the ``determine ...'' part is a full sentence
in its own right. Also, ``read'' and ``impart'' are semantically
different. The relationship between ``doctrine'' and ``impart''
is limited. What else can you do with a doctrine and an initiate,
besides impart? Why ``only impart''? Doctrines were only imparted on the
initiates (but not also tatooed on their foreheads?). Nah; such
tattooing is subsumed under extended semantics of imparting, right?

Read is different. In computing, we even have the phrase ``read-only'';
there is a relationship between read and only, which means that
writing is excluded.
Your interpretation is more naturally written "the prior
value shall only be read to determine the value to be stored".

True. All kinds of things are more naturally wirtten than some of the
long-winded gobbledygook in the ISO C standard.
Your parse (if I have it right) is that to determine the value to be
stored, only reading of the prior value is permitted.

Permitted, but of course not required.
What actions on
the prior value is your interpretation intended to prohibit? Would it
not prevent the value from being doubled to determine the value to
be stored (i = i*2;)?

Doubling does not destroy the value; it produces a new value which is
twice the previous one, in the absence of overflow.

Clearly the ``value'' here refers to manipulation of the object: the
stored value. The term value has multiple meanings; there is the stored
value in an object which can be read, or the value of an expression.

The value of the expression i*2 is no longer the stored value in the
object i. All that i*2 does to the stored value is read it.
In your reading of it, does the sentence have any purpose?

I suspect that it doesn't, and that it's not alone in not having one.
I.e. what
kinds of expression would be defined were it not for that extra
sentence?

None, but clearly, if a value modified by some store is written before
that store, that is not a good thing. This may not be; it may be read
only. :)
Your phrase "the store" is a slight of hand.

Not intended.
Which is "the store"
before which the prior value can not be written in:

x = i++;

the prior value of x cannot be written prior to the assignment x =
and the prior avlue of i cannot be written prior to the incremnt i++.

Each modified object has a prior value. Each modified object is modified
only once, hence it cannot be modified prior to that one and only
modification.
and why would the standard wish to say which store comes first? The

It doesn't. The order in which the side effects to x and i happen
is not specified.
prior value of i may very well be written (to x) before the store to
i. If you intended to limit this remark to expressions with only one
object being modified, the you are saying that "the prior value can't

Paragraph 2 is not limited to expressions with just one object modified,
but it's clear that for some object that is modified, the prior value
and store refer to the same object.
be written before it is written".

Yes, precisely. Isn't that clear? If a value is written before it is
written, then it's modified twice. So this, uh, reinforces the first
sentence of the same paragraph. That's it!
See, people are just reading it wrong. There is no document defect to see
here, people; move along.

If you are right, there is a defect because the second example in
footnote 73 is then wrong. a[i++] = i; is fine by your reading, is it
not?

No, it's not fine.

This expression is ruled out by the unspecified order of evaluation of
subexpressions (except for the noted operators) and by
the unspecified order of side effect completion.
The major connective of this expression is the = operator, whose
constituents are a[i++] and i, which, being subexpressions of an
unsequenced operator, may be evaluated in either order. Moreover, the
completion of the side effect emanating from the one subexpression is
not required as a dependency for the computation of the other. Clearly,
this is ambiguous. The i++ effect may complete before the i is accessed,
after, or could be in progress while i is accessed. You will have
an actual portability problem with actual compilers if you write
this expression.

Now we could add some superfluous text to the standard to try to capture
this idea, but it's not necessary; the undefinedness follows straight
from paragraph 3.

The defect is that the example in the footnote is irrelevant to
the paragraph to which the footnote is attached. That paragraph
does not render it undefined; however, the next paragraph does.

The footnote could be moved to the next paragraph and reworded
to say that it pertains to ``this and the preceding
paragraph''.
 
M

Michael Foukarakis

I don't see how the quoted statement can invoke undefined behaviour,

C99 6.5p2: between two sequence points (here the whole assignment
expression), an assigned-to object (here, the "final" /next/ pointer)
shall have its value read only to determine the value to be stored
(here, clearly restricted to 'something', perhaps casted.)

And one may argue that in the above expression, the value is _also_ read
as part of the evaluation of the "intermediary" /next/ pointer, in order
to determine the object to be assigned.
It is not at all equivalent with the a[a[0]] = foo; statement, either..

I believe Dag-Erling addressed this one.

Mr. Rentsch clarified my point quite well on that one, I believe.
OTOH, if you still believe those two to be semantically identical I
cannot continue arguing.
There are no side effects with the use of the arrow operator,

I fail to see any side effect with the [] operator either. Can you
elaborate?

I can elaborate on the OPs original point, if you'd like. We just
solved the [] operator issue a few posts ago, correct?
There is nothing obscure, just that we are in the grey areas of the
legal terms of the Standard.

Well, perhaps we are referring to different versions of the Standard
(sic). My understanding is that even under N1124 the statement
wouldn't invoke UB - Mr. Kylheku clarified on that.
The thread is cross-posted to comp.lang.c and comp.std.c; in the latter
group (which is the one I read), I believe it is really on-topic.
Please take my post in this context, it might help you to understand my
point.

I'd start a thread on trolling but that'd be off-topic, wouldn't it. :-
P

@OP: I understand what the discussion's consensus was. Leaving outside
the fact that it was back in 2002 and the C language is still evolving
(thankfully), perhaps I'm interpreting the standard wrong, or I'm too
familiar with gcc features and extensions - it wouldn't be the first
time. However I've yet to see an argument that will convince me I'm
wrong through proper wording (maybe I'm just asking too much) or
perhaps a PoC program.
 
F

frank

In comp.std.c sfuerst said:
Hello, back in 2002 there was a long discussion in these newsgroups
about the undefinedness of the expression:

a[a[0]] = 1; when a[0] begins with the value 0.

The general opinion was that the above invokes undefined behaviour due
to the fact that there is no sequence point in the expression, and
that the value of a[0] is used in a "value computation" unsequenced
with respect to the side-effect of the assignment operator.

I believe this is corrected by the new sequencing language in the C1X
draft (N1425 is the latest version).

Did they add a new variety of sequence point?
 
N

Nick Keighley

(e-mail address removed)-berlin.de (Stefan Ram) writes:

restored snip
****
struct node { int data; struct node *next; };
struct node *node;
node = calloc(1, sizeof *node); ****

there are two changes.
1. compression of three statements into one
2. replacement of calloc() with malloc()

I can't see any point to 1. apart from increasing the obscurity of the
code.
Check the standard, your system documentation, or any good book about C
(e.g. K&R 2) for an explanation of the difference between malloc() and
calloc().

so to repeat the question, does this have any benefits? We can all
read.
 
R

Richard Tobin

[ Furthermore, the prior value shall be read only ] to
determine the value to be stored."
[/QUOTE]
I'm sorry, English is not my first language and it's not obvious to me what
you meant to say there -- do you mean that the "only" does not apply to
"determine" (i.e. only to determine the new value, but not for any other
purpose), but to "read" (i.e. only read, but not written anywhere, added to
anything, or processed in any other way)? This wouldn't seem to make a lot
of sense to me, so perhaps that's not what you meant?

It would not be a reasonable reading in English.

-- Richard
 
M

Michael Foukarakis

so to repeat the question, does this have any benefits? We can all
read.

None. Of course, counting the initialization of the allocated buffer
to 0 as a gain is debatable.
 
I

Ike Naar

And note that the parse that many people are assuming rules out
the assignment operator completely. Code such as

i = 1;

is undefined. Here i is modified, but it is also read, not for
computing the value to be stored. The value of an assignment expression
is that of the lvalue, after the assignment, you see. So the lvalue
is read by this expression to fetch this value. That ``after
the assignment'' bit appears like it imposes an order, but it's
not a sequence point. So the expression statement i = 1; contravenes the
naive parsing of paragraph 2.

It is not necessary to read i in order to obtain the assigned value.

tmp = 1; /* compute right hand side */
i = tmp; /* assign the computed value to i */
/* the assigned value is now available in tmp */
 
A

Antoine Leca

Michael said:
It is not at all equivalent with the a[a[0]] = foo; statement, either.

Mr. Rentsch clarified my point quite well on that one, I believe.

Can you point out to me the relevant post? (assuming it is not
<which is not
cross-posted but that I spotted; and was written 4 hours after mine.)

We just solved the [] operator issue a few posts ago, correct?

Sorry, I cannot follow your reasonning.

As I wrote, I am reading this thread from comp.std.c, and I believe you
are reading it exclusively from the comp.lang.c context. Also see below.

Well, perhaps we are referring to different versions of the Standard
(sic).

:) Sorry for the pedantism, but please bear with me: ISO rules wants
the officially approved standards to be spelled (in English)
"International Standards", and are quite strict once the capitalization;
I just follow that move; please remember once again that I am reading
you from comp.std.c context.

Other than that, no, we are not referring to different versions,
I quoted C99 which is the one you ought to refer yourself. Having this
point clearer in C1x will not render the text of C99 more clear, much
the contrary: it highlights the fact that the area is grey in C90/C99.
My understanding is that even under N1124 the statement
wouldn't invoke UB - Mr. Kylheku clarified on that.

I do not read <as a clarification,
rather the contrary: it sparkled more discussion (at least here.)

However I've yet to see an argument that will convince me I'm
wrong through proper wording (maybe I'm just asking too much) or
perhaps a PoC program.

I believe there is a basic misunderstanding here.
There is one thing which I believe is applicable here: a Standard (note
the capital) has to be written in common language and have a reasonable
size, and sometimes this cannot afford to cover all the possible cases;
in such a case, necessarily the Standard has to be conservative, and
underspecifies, or if you prefer, it should "outlaw" the cases which are
not possible to spell out clearly.
Note this is clearly different from Roman Law.

It is the job of the people writing the next version of the standard
(lower case here ;-) ) to further refine the text, in order to cover
more potential corner cases and if possible include them in the accepted
and well-defined behaviours; you might notice that revisions of the
Standards (and laws in general) are increasing in size, and this is one
of the main reasons; also this is a main topic of comp.std.c, to iron
out those corner cases and to improve the words if possible (whether it
succeeds is a debate I shan't enter.) If you are concerned about the
words (as you might imply above), then comp.std.c is the correct group
to discuss it; please join.

On the other hand, comp.lang.c should focus on the practical situations;
as Kaz correctly pointed out in the first part of his post, there are no
situations (we presently know of) where those expressions can be
mishandled; as such, there are no "proofs" to be presented. So the fact
the current words of the relevant Standards may, or may not, qualify
them as "undefined behaviour" is pretty immaterial to C programmers,
since every compiler and every compiled program will end with the
expected behaviour, or, if you prefer, "/no undefined behaviour can be/
/invoked/." As such, I believe it was an initial mistake to keep the
thread cross-posted, since it raised biaised discussion, it rather
should have been redirected (use followup-to:); as you correctly pointed
out yesterday, the whole issue looks like a troll on comp.lang.c

In fact, I was required to keep my yesterday's post cross-posted since
apparently you were asking for details while writingand I felt it was important to try to answer it, but I was not seeing
any sign of you reading comp.std.c. Maybe I was wrong and it was written
tongue-in-cheek, I do not know your style sufficiently. In such a case,
sorry for the overlengthly explanations (and I reiterate my invitation
to debate these issues in comp.std.c!)


Antoine
 
N

Nobody

It reminds me of a simple loop detection algorithm (use two pointers;
for every iteration, one pointer advances one step and the other two
steps; if there is a loop, the "fast one" will eventually catch up with
the "slow one" from behind)

What's the advantage of that over the simpler algorithm: one pointer
advances one step and the other zero steps?
 
R

Richard Tobin

It reminds me of a simple loop detection algorithm (use two pointers;
for every iteration, one pointer advances one step and the other two
steps; if there is a loop, the "fast one" will eventually catch up with
the "slow one" from behind)
[/QUOTE]
What's the advantage of that over the simpler algorithm: one pointer
advances one step and the other zero steps?

That only works if the initial node is in the loop.

-- Richard
 
W

Wojtek Lerch

Richard Tobin said:
[ Furthermore, the prior value shall be read only ] to
determine the value to be stored."
I'm sorry, English is not my first language and it's not obvious to me
what
you meant to say there -- do you mean that the "only" does not apply to
"determine" (i.e. only to determine the new value, but not for any other
purpose), but to "read" (i.e. only read, but not written anywhere, added
to
anything, or processed in any other way)? This wouldn't seem to make a
lot
of sense to me, so perhaps that's not what you meant?

It would not be a reasonable reading in English.[/QUOTE]

Thanks. But would it be a reasonable interpretation in the context of the C
standard? Clearly, the purpose of that sentence is to forbid things that
don't fit into the "only" -- this interpretation sounds as if the prior
value can be read from the object but then doing anything else with it is
forbidden. Except, it seems, when the goal is *not* to determine the value
to be stored. Is my parsing of the English correct here too?
 
J

jameskuyper

Kaz said:
Kaz Kylheku said:
Hello, back in 2002 there was a long discussion in these newsgroups
about the undefinedness of the expression:

a[a[0]] = 1; when a[0] begins with the value 0.
The general opinion was that the above invokes undefined behaviour due

General or not, that is not a particularly well-informed opinion.
<snip more about sequencing>

I don't disagree with your arguments about sequencing, logic, and
causality, but I don't see how you can dismiss the conclusion above so
lightly -- particularly without any reference to the text that, in my
view, renders it undefined.

The oft-quoted 6.5 paragraph 2 reads:

"Between the previous and next sequence point an object shall have
its stored value modified at most once by the evaluation of an
expression. Furthermore, the prior value shall be read only to
determine the value to be stored." [footnote numbers removed]

This text can be parsed as:

[ Furthermore, the prior value shall be read only ] to
determine the value to be stored."

Indeed, before the store, the prior value shall not be written, but only read,
and this is of course necessary for determining the value to be stored.

See, people are just reading it wrong. There is no document defect to see
here, people; move along.

As a native speaker of English, I don't see that as a valid parse of
that sentence. It's not even an ambiguous parse; the phrase "only to"
must apply forward, not backward. It restricts the ways in which the
value read may be used. It doesn't say that that the stored value may
only be read, and that no other operations may be performed on it.
Writing the value was already prohibited by the immediately preceding
sentence "Between the previous and next sequence point an object shall
have its stored value modified at most once by the evaluation of an
expression.", so there's no need to repeat that prohibition. It was
the committee's intent that all other operations that might be
performed on the value (multiplication, addition, division, use as a
subscript, etc.) are permitted - so long as the end result of those
operations is to determine the value to be written. Because the new
value must be determined before the new value is written, code meeting
this requirement must have the read occur before the write. That was
the primary purpose for imposing this requirement.

A read of the prior value that determines where the value is to be
written must also necessarily precede the write, and the existing
wording doesn't cover that. That is one of the reasons why better
wording has been proposed for the next version of the standard.
So this sentence does not unambiguously grant implementors a license
to gratuitously break code with cunning diagnostics (at least in
a mode that claims to be conforming).

The clause is poorly worded and overly-restrictive, and clearer
wording is planned for the next revision, but the 2002 discussion
described possible implementations (such as ones where reads are
destructive at the hardware level) where such breakage would not be
gratuitous, but a perfectly natural result of an implementation
optimized for such hardware.
 
J

jameskuyper

Nick said:
restored snip
****

there are two changes.
1. compression of three statements into one
2. replacement of calloc() with malloc()

I can't see any point to 1. apart from increasing the obscurity of the
code.


so to repeat the question, does this have any benefits? We can all
read.

Yes, but reading that documentation should be sufficient to answer the
question. The key difference between calloc() and malloc() is that the
memory is zeroed out. Whether or not this has any benefits depends
entirely upon whether or not the memory needs zeroing. If it does,
this difference counts as a benefit; if it does not, calling calloc()
rather than malloc() is a waste of time.

In this particular case, since the calloc() call was followed
immediately (without error checking!) by code that ensured that the
'next' field of the structure was initialized, the only effect of the
change to calloc() was to ensure that the 'data' field was also
zeroed. Personally, in that particular case, I'd prefer to write

node->data = 0;

but I think he was concerned about the more general case where there
was more than one field to be zeroed. As long as none of the fields to
be zeroed have a floating point or pointer type, using calloc()
strikes me as a reasonable way to initialize a large structure at the
same time as allocating memory for it.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,776
Messages
2,569,603
Members
45,189
Latest member
CryptoTaxSoftware

Latest Threads

Top