C Test Incorrectly Uses printf() - Please Confirm

S

Shao Miller

So, on systems without hardware memory protection, every pointer dereference
must include an explicit "is this pointer NULL" check?

On a different, but still UB note, how would you handle defining array
out-of-bounds references?
'NULL' is a null pointer constant and can be diagnosed at translation-
time. That is, termination-code could be produced during compilation.

Most likely you are referring to a null pointer, I'd guess. :) In
that case, I agree that without hardware support, you'd need software
support... Or you could find out what happens when reading the
apparent memory location (assuming a pointer was a simple memory
address). If it's address 0, maybe something will be there. If it's
address XXX, maybe something will be there. If it isn't, what will
the hardware do? Hardware-specific things, I'd guess. Maybe a
pointer value isn't strictly a memory address in some cases. :S

Bounds-checking... There's a thread about "Bounds Checking as
Undefined Behaviour?" Were you asking how a C Standard should define
bounds or how an implementation would, in the absence of a Standard
definition? I believe that we currently enjoy the latter, since the
bounds of recursive objects are fairly loosely defined (in C99, and in
my opinion). "Element of an array object," "one past", etc.
 
S

Shao Miller

     A classic example is the conditional ("ternary") operator,
which has a sequence point between the evaluation of the first
expression and the evaluation of whichever of the other two is
chosen:

        ( a /* SP here */ ? b : c )

Thus

        ( ++x ? x-- : -1 )

is well-defined (for suitable `x').  However,

        ( ++x ? x-- : -1 ) + ( ++x ? x-- : -1 )

is not!  No sequence point divides the two `++x' operations from
each other; also, no sequence point separates the two `x--' bits.
The S.P.'s along one branch do not separate its operations from
those of another branch.
I think Dr. Who did that once! ;)

Is there anything preventing an implementation from producing
effectively multiple threads during the evaluation of each operand in:

f() + g()

? Can we expect that this is a violation of 6.5p2 if 'f' and 'g' use
a 'static' object and one of them modifies it (for example)?
 
W

Willem

Willem wrote:
) Malcolm McLean wrote:
) )>
) )> To what advantage is there in defining the derefence of NULL?
) )>
) ) A program that deferences NULL might produce wrong but seemingly
) ) plausible results. Depending on what type of program it was, these
) ) could be very negative (for instance if you send someone a gas bill
) ) for 150 pounds when the actual amount is 100 pounds, you could find
) ) yourself on fraud charges).
)
) <snip>

Oh, and also a very real issue: On embedded systems, the implementation
could very well define valid results for using a null pointer. Perhaps
it's some hardware register.

If the C standard were to define it, those systems would suddenly be
non-conforming.


SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT
 
S

Shao Miller

[...]> I suppose it just seems counter-intuitive to me that function
arguments be treated the same as sub-expressions for the computation
of a single value, instead of as independent expressions not part of a
larger expression.

[...]

A function call is an expression (6.5.2.2).
Yes of course, and that is why I've conceded.

So how about:

int a = 1;
printf("%d", (int){ ++a }, (int){ a + 5 });

? There appear to be 3 full expressions in the second line, there.
Does 6.5p2 apply there, too?

Thanks.
 
S

Shao Miller

I think Dr. Who did that once! ;)

Is there anything preventing an implementation from producing
effectively multiple threads during the evaluation of each operand in:

f() + g()

?  Can we expect that this is a violation of 6.5p2 if 'f' and 'g' use
a 'static' object and one of them modifies it (for example)?
Oops. Please disregard. Larry Jones has already answered this in
regards to C1X.
 
K

Keith Thompson

Shao Miller said:
[...]> I suppose it just seems counter-intuitive to me that function
arguments be treated the same as sub-expressions for the computation
of a single value, instead of as independent expressions not part of a
larger expression.

[...]

A function call is an expression (6.5.2.2).
Yes of course, and that is why I've conceded.

So how about:

int a = 1;
printf("%d", (int){ ++a }, (int){ a + 5 });

? There appear to be 3 full expressions in the second line, there.
Does 6.5p2 apply there, too?

Hmm. (int){ ++a } and (int){ a + 5 } are compound literals.
The syntax of a compound literal is:

( type-name ) { initializer-list }
( type-name ) { initializer-list , }

Following the syntax of initializer-list, we see that ++a and a + 5 are
both initializers. 6.8p4 seems to say that both that an initializer is
a full expression, and that these particular initializers are not.
Here's the full paragraph:

A _full expression_ is an expression that is not part of another
expression or of a declarator. Each of the following is a full
expression: an initializer; the expression in an expression
statement; the controlling expression of a selection statement
(if or switch); the controlling expression of a while or do
statement; each of the (optional) expressions of a for statement;
the (optional) expression in a return statement. The end of a
full expression is a sequence point.

The first sentence is the definition of the term "full expression".
By this definition, since ++a and a + 5 are part of larger
expressions, they clearly aren't full expressions. But the next
sentence says that an initializer is a full expression.

The wording is unchanged, or nearly so, from C90, which didn't
have compound literals; initializers appeared only in declarations.
I suggest that the authors neglected to update this paragraph when
compound literals were added, and that these initializers are not
(or should not be) considered to be full expressions; thus the
behavior is undefined.
 
S

Shao Miller

Shao Miller said:
[...]> I suppose it just seems counter-intuitive to me that function
arguments be treated the same as sub-expressions for the computation
of a single value, instead of as independent expressions not part of a
larger expression.
[...]
A function call is an expression (6.5.2.2).
Yes of course, and that is why I've conceded.
So how about:
int a = 1;
printf("%d", (int){ ++a }, (int){ a + 5 });
?  There appear to be 3 full expressions in the second line, there.
Does 6.5p2 apply there, too?

Hmm.  (int){ ++a } and (int){ a + 5 } are compound literals.
The syntax of a compound literal is:

    ( type-name ) { initializer-list }
    ( type-name ) { initializer-list , }

Following the syntax of initializer-list, we see that ++a and a + 5 are
both initializers.  6.8p4 seems to say that both that an initializer is
a full expression, and that these particular initializers are not.
Here's the full paragraph:

    A _full expression_ is an expression that is not part of another
    expression or of a declarator.  Each of the following is a full
    expression: an initializer; the expression in an expression
    statement; the controlling expression of a selection statement
    (if or switch); the controlling expression of a while or do
    statement; each of the (optional) expressions of a for statement;
    the (optional) expression in a return statement. The end of a
    full expression is a sequence point.

The first sentence is the definition of the term "full expression".
By this definition, since ++a and a + 5 are part of larger
expressions, they clearly aren't full expressions.  But the next
sentence says that an initializer is a full expression.

The wording is unchanged, or nearly so, from C90, which didn't
have compound literals; initializers appeared only in declarations.
I suggest that the authors neglected to update this paragraph when
compound literals were added, and that these initializers are not
(or should not be) considered to be full expressions; thus the
behavior is undefined.
That makes complete sense. Essentially, "If an initializer appears
within an expression, it is not a full expression."
 
S

Shao Miller

Kenneth said:
Using your logic, you would say that this is not UB:

(void)*NULL;
Perhaps because 'NULL' is an implementation-defined null pointer
constant?[1]

C1X makes 'void' an incomplete object type[2]. In C99, 'void' is an
"incomplete type"[3], which is a peer category to "object type" and
"function type"[4]. If 'NULL' is '((void *)0)' (it could be simply '0',
AFAIK), then you needn't even cast to 'void', in your example.

*((void *)0);

The operand to unary '*' has type 'void *'. The result has type
'void'[5]. An expression with type 'void' is a "void expression."[6]
'((void *)0)' is evaluated for its side effects[6], much as it is in:

char *p = ((void *)0);

If you decide that the two "ifs"[7] about pointing to a function and
pointing to an object are requirements and the bit about "has type"[5]
is by itself insufficient to define the result, then you may say that:

*((void *)0);

is undefined behaviour. If you decide that the bit about "has type"[5]
is sufficient to define the result even if the two "ifs"[7] are false,
then why should the above not be a void expression[6]?

Isn't it odd that C99's unary '*' constraint[8] did not say "pointer to
object type or pointer to function type"? Was that to specifically
allow implementations to define their own behaviour for attempts to
"dereference" awkward things with incomplete types?

Even if we agree (sure) that '*((void *)0)' is defined as undefined, for
having an operand with a null pointer value, what about:

int i = 10;
void *vp = &i;
*vp;

Here, does 'vp' point to an object?[9]

It seems to me that if you're right (seems widely supported), then every
sane compiler in existence must diagnose this, but which particular
violation might it cite?

:)

[1] 'n1256.pdf': 7.17p3 and 6.3.2.3p3
[2] 'n1494.pdf': 6.2.5p1 and 6.2.5p19
[3] 'n1256.pdf': 6.2.5p19
[4] 'n1256.pdf': 6.2.5p1
[5] 'n1256.pdf': 6.5.3.2p4 "...If the operand has type..."
[6] 'n1256.pdf': 6.3.2.2p1 "...an expression that has type void..."
[7] 'n1256.pdf': 6.5.3.2p4 "...If the operand points to..."
[8] 'n1256.pdf': 6.5.3.2p2
[9] 'n1256.pdf': 6.3.2.3p1 and 6.3.2.3p7
 
S

Shao Miller

Shao said:
Shao Miller said:
[...]> I suppose it just seems counter-intuitive to me that function
arguments be treated the same as sub-expressions for the computation
of a single value, instead of as independent expressions not part of a
larger expression.
[...]
A function call is an expression (6.5.2.2).
Yes of course, and that is why I've conceded.
So how about:
int a = 1;
printf("%d", (int){ ++a }, (int){ a + 5 });
? There appear to be 3 full expressions in the second line, there.
Does 6.5p2 apply there, too?
Hmm. (int){ ++a } and (int){ a + 5 } are compound literals.
The syntax of a compound literal is:

( type-name ) { initializer-list }
( type-name ) { initializer-list , }

Following the syntax of initializer-list, we see that ++a and a + 5 are
both initializers. 6.8p4 seems to say that both that an initializer is
a full expression, and that these particular initializers are not.
Here's the full paragraph:

A _full expression_ is an expression that is not part of another
expression or of a declarator. Each of the following is a full
expression: an initializer; the expression in an expression
statement; the controlling expression of a selection statement
(if or switch); the controlling expression of a while or do
statement; each of the (optional) expressions of a for statement;
the (optional) expression in a return statement. The end of a
full expression is a sequence point.

The first sentence is the definition of the term "full expression".
By this definition, since ++a and a + 5 are part of larger
expressions, they clearly aren't full expressions. But the next
sentence says that an initializer is a full expression.

The wording is unchanged, or nearly so, from C90, which didn't
have compound literals; initializers appeared only in declarations.
I suggest that the authors neglected to update this paragraph when
compound literals were added, and that these initializers are not
(or should not be) considered to be full expressions; thus the
behavior is undefined.
That makes complete sense. Essentially, "If an initializer appears
within an expression, it is not a full expression."
But also worth noting might be a little bit[1] about the evaluation
order and side effects for initializers. This is also odd, since each
initializer is supposed to be a full expression. Would you agree?

Perhaps that suggests that each initializer -> assignment-expression[2]
is a disjoint expression; an island. Perhaps each one can be evaluated
independently of the others without any concern(s), in an
unspecified[3] manner. Perhaps a read and write could be scheduled to
happen at the same time.

Why would there be a sequence point after each initializer and yet the
order of side effects be unspecified... Explicitly?

[1] 'n1256.pdf': 6.7.8p23 and non-normative footnote 133
[2] 'n1256.pdf': 6.7.8p1
[3] 'n1256.pdf': 3.4.4p1 and 3.4.4p2
 
K

Keith Thompson

Kenneth Brody said:
Since this subthread was turning into "a language with UB is
inexcusable", I was asking about how one would detect out-of-bounds.
And, how much overhead does that require, simply to handle the
"just-in-case" situations? Regardless of how such a standard would
define what to do, it still has to be a detectable situation.

Experience with languages that do require bounds checking indicates
that, given good optimization, the overhead isn't as bad as you might
expect. For example, given something like:

#define N 10
int arr[N];
for (int i = 0; i < N; i ++) {
arr = i;
}

a reasonably clever optimizer can prove that the bounds check can
never fail, and omit it, saving both time and code size. In other
cases, checks can be hoisted out of loops.

I'm not suggesting that C should require bounds checks, just that
the performance hit isn't necessarily all that bad.
 
S

Shao Miller

[... Subthread, basically "C is 'bad' and the committee 'cowards' because
      they allowed undefined behavior in the language".
...]
Who might have implied or explicitly suggested these items? C can
hardly be considered bad; it's been around a long time and is still
popular. That seems like a good thing. :) I don't really perceive a
problem with a C Standard leaving expectations open in certain areas.
The C Standard appears to be evolving, too. A couple of things that
were, perhaps, ambiguous in C99 appear to have been pondered-over for
C1X. :)
In order to have no UB, then every pointer dereference would have to have an
"is this address valid" check.  (Not just a check for NULL, but an entire
validity check.)  And that includes "is this address writable" checks as
necessary, too.
How could "valid address" be defined beyond what the C Standard
currently offers?
And every call to free() would have to validate the pointer as being a value
previously returned from *alloc(), and not yet freed.
Additionally, it would be interesting if an implementation digitally
signed pointers to prevent tampering, too. :) Is that a worth-while
expectation for conforming implementations?
Since this subthread was turning into "a language with UB is inexcusable", I
was asking about how one would detect out-of-bounds.  And, how much overhead
does that require, simply to handle the "just-in-case" situations?
Regardless of how such a standard would define what to do, it still has to
be a detectable situation.
Who is it that has said "a language with UB is inexcusable"? Sorry, I
don't remember that. If somebody did say that, I'd have to disagree
with them. That doesn't mean that a C Standard cannot be improved by
experts and practitioners and hobbyists over time, though. :)

But the bounds-checking currently defined for C (in regards to pointer
arithmetic) does seem odd to me... It's not clear what "array object"
is being referred to or how one can determine the "number of elements"
in that array object... Hence the other thread.

There're sure to be trade-offs in decision-making. It appears that
function calls in C are supposed to be akin to an N-ary operator (like
in mathematics), rather than including a list of atomic argument
evaluations used to be assigned to the parameters. The C Standard
grants a license for compilers to optimize as long as they yield
results consistent with the abstract semantics. Those abstract
semantics further grant a license for compilers to determine
evaluation order. That's permissive and consistent, but does appear
to mean that:

int a = 1;
printf("%d", ++a, a + 5);

yields undefined behaviour.

int a = 1;
/* No luck here, either. */
printf("%d", (0, ++a, a), (0, 0, 0, 0, 0, 0, a + 5));
/* Or here. */
printf("%d", (0, 0, 0, 0, 0, 0, ++a, a), (0, a + 5));

Or:

volatile int a = 1;
/* D'oh! Still a violation. */
printf("%d %d", ++a, a + 5);

Or:

#include <stdio.h>

static inline int inc_int(int *param) {
return ++*param;
}

int main(void) {
int a = 1;
/**
* Worst-case is a non-inline function call and
* the need for 'a' to be addressable. :(
* C99 still might not define it.
* C1X should be ok.
*/
printf("%d", inc_int(&a), a + 5);
return 0;
}
 
T

Tim Streater

"Martin O'Brien said:
I took an online C test a few months ago. I actually thought the test
was better than some I've taken, but one question in particular I
think has the wrong answer. The question is this:

What is printed:

int a = 1;
printf("%d", ++a, a + 5);

a. 1
b. 2
c. 7
d. undefined

I selected d. This is the explanation given as to why b is the correct
answer.

The first expression in the parameter list following the format
string is paired with the first (and only) conversion specification.
The increment is a prefix so the result of the operation is the a + 1.
Since there are more items in the value list than there are conversion
specifications, the extra value is not shown.

I believe the correct answer is d because according to K&R2 (and by
implication the Standard) the order in which function arguments are
evaluated is not defined; in fact, K&R2's example, in Section 2.12,
shows the variable n being used twice in the same printf call (albeit
with the correct number of conversion specifications).

Am I correct that d is the correct answer?

I agree that the value of a+5 is undefined, but as that expression
doesn't modify a, ++a will have the value 2. It's a long time since I
used C, but I thought that printf, via its format string, used as many
extra arguments as required by the format string, and no more
(regardless of their order of evaluation).

This would imply to me that b is the correct answer; the value of a+5 is
undefined but as it's not used, it doesn't matter.
 
T

Tim Streater

Kenneth Brody said:
int a = 1;
printf("%d", ++a, a + 5);

a. 1
b. 2
c. 7
d. undefined

I selected d. This is the explanation given as to why b is the correct
answer. [...]
Am I correct that d is the correct answer?

I agree that the value of a+5 is undefined, but as that expression doesn't
modify a, ++a will have the value 2. It's a long time since I used C, but I
thought that printf, via its format string, used as many extra arguments as
required by the format string, and no more (regardless of their order of
evaluation).

This would imply to me that b is the correct answer; the value of a+5 is
undefined but as it's not used, it doesn't matter.

The problem isn't that "a+5" is undefined. The statement, as a whole,
causes undefined behavior. As far as C is concerned, anything (literally
"anything") is allowed to happen. The fact that printf() won't use the
third parameter passed to it is irrelevant.

Yes, _most_ platforms and compilers you are _likely_ to run into will
_probably_ print "2". That doesn't make it defined as far as C is concerned.

Consider a CPU which can run two operations in parallel. The compiler
generates this pseudo-code

incr a \ These 2 run in parallel
load r3,a /

load r2,a \ These also run in parallel
add r3,5 /

The "desired" result is to load r2 with ++a, and r3 with a+5. (And this
would be the result if it were equivalent code for something like "++a" and
"b+5".)

However, the first pair of instructions cause a write and a read to the same
memory location at the same time, triggering a hardware fault. You don't
get a "2" printed, as the program crashes before printf() is even called.

If the first two run in parallel, and so would like to access a at the
same moment, it's a piss-poor piece of hardware that can't delay one of
these instructions until the other completes. I see no excuse for a
hardware fault. The CDC 6600 (designed in the early '60s, that is 50
years ago) had 10 separate arithmetic units that operated in parallel,
doing different sorts of arithmetic operations. If they were able, in
those days, to resolve conflicts, I don't see any excuse for it today.

Hardware fault my foot.
 
S

Seebs

If the first two run in parallel, and so would like to access a at the
same moment, it's a piss-poor piece of hardware that can't delay one of
these instructions until the other completes. I see no excuse for a
hardware fault. The CDC 6600 (designed in the early '60s, that is 50
years ago) had 10 separate arithmetic units that operated in parallel,
doing different sorts of arithmetic operations. If they were able, in
those days, to resolve conflicts, I don't see any excuse for it today.

Usually a matter of performance and/or gate cost. If you give me a choice
between a machine where a simultaneous write and read operation to the same
hunk of cache can produce bogus results, and a machine which is 20% slower,
I'll take the faster one and expect people to write code that doesn't do
that.

I've seen a whole lot of chips, across multiple architectures, which made
the same decision; if you access something while it's being modified, you
sometimes get really strange intermediate results which are not the same as
either the previous or upcoming value. Solution: Don't do that. This allows
you to save an IMMENSE amount of extra logic and design in the CPU, which
means it can be some combination of much faster and much cheaper.

-s
 
E

Eric Sosman

[... a hypothetical implementation of an undefined construct ...]
However, the first pair of instructions cause a write and a read to
the same memory location at the same time, triggering a hardware
fault. You don't get a "2" printed, as the program crashes before
printf() is even called.

If the first two run in parallel, and so would like to access a at the
same moment, it's a piss-poor piece of hardware that can't delay one of
these instructions until the other completes. I see no excuse for a
hardware fault.

Your poor eyesight is cause for pity, but not cause for
rewriting the Standard.
The CDC 6600 (designed in the early '60s, that is 50
years ago) had 10 separate arithmetic units that operated in parallel,
doing different sorts of arithmetic operations. If they were able, in
those days, to resolve conflicts, I don't see any excuse for it today.

The CDC 6600 was one of the more expensive computers of its day.
Even now, using inflated 2010 dollars, I doubt you could afford its
196x list price of 6-10 million $US. However, some users of C are
not interested in paying an arm and a leg for the privilege.
Hardware fault my foot.

The one on the leg you paid with?
 
R

Richard Bos

Malcolm McLean said:
Someone who knows the standard might write
a << b % c.
Someone who doesn't is forced to write
a << (b % c).

Your faith in badly educated programmers is... disturbing.

_Of course_ someone who doesn't know the Standard would be equally
likely to write

(a << b%c)

and expect that to "force precedence". We've all seen it. Here, if
nowhere else.

Richard
 
R

Richard Bos

Shao Miller said:
Programmers could expect consistent treatment across any conforming
implementation.

No, they couldn't. You would need a lot more than that to achieve
consistent treatment. You would need, for starters, to define the sizes
of all types, to a bit; and the behaviour on overflow; and the presence
or absence of padding in both primitives and structures; and that's just
the start.
And the result? A language which, on anything but a contemporary desktop
system, is useless (and that category may well include future desktop
systems!); but which, on contemporary desktop systems, is nothing but
Java Lite[sic].
Advantages can be debated.

:)

You over-use the smiley. I suggest diminishing its necessity instead.

Richard
 
R

Richard Bos

Tom St Denis said:
Well it's defined in the sense that logically one of two computations
can happen

PASS a+5
increment a
PASS a

or

increment a
PASS a
PASS a+5

Any other interpretation is just lunacy.

You assume a single-processor, single-pipeline, non-concurrent,
non-optimising system. _That_ is lunacy.

Here's something for you to ponder:

PASS a+1
PASS a+5 INC a
BANG!
Process has been terminated due to conflicting memory accesses.

Variations on this scenario are neither hard to imagine nor lunatic.

Richard
 
R

Richard Bos

Tim Streater said:
If the first two run in parallel, and so would like to access a at the
same moment, it's a piss-poor piece of hardware that can't delay one of
these instructions until the other completes.

On the contrary, it's a piss-poor piece of hardware which will not
optimise the running time of my correct code just to bail out the
piss-poor code which overly-clever newbie hacks write.

Richard
 
S

Shao Miller

Richard said:
No, they couldn't. You would need a lot more than that to achieve
consistent treatment. You would need, for starters, to define the sizes
of all types, to a bit; and the behaviour on overflow; and the presence
or absence of padding in both primitives and structures; and that's just
the start.
This comment of mine was specifically in regards to operand evaluation
order. If it were standardized, programmers could expect consistent
treatment on any conforming implementation. It was not meant an
all-encompassing statement.
... ... ...
You over-use the smiley. I suggest diminishing its necessity instead.
Thank you for your feedback, Mr. R. Bos. There is no necessity. It's
an expression.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,579
Members
45,053
Latest member
BrodieSola

Latest Threads

Top