Does this program exhibit undefined behaviour or output?

W

WP

I saw a discussion on a forum where two people were arguing if the
following program causes undefined behaviour (or at least undefined
output for a given program name and argument) or not (ignore the
failure to check argc, assume at least one argument is passed to the
program):

#include <stdio.h>

int
main(int argc, char ** argv)
{
printf("%s got %s\n", *argv++, *argv);

return 0;
}

I think I read that the order of evaluation of the arguments is not
specified so you could get the output:
thearg got progname
if you launched with
$ progname thearg

I'd like to hear what you gurus say (alot of people hanging on forums
do not seem to realise that this is the place to go for expert advice
btw).

- WP
 
K

Keith Thompson

WP said:
I saw a discussion on a forum where two people were arguing if the
following program causes undefined behaviour (or at least undefined
output for a given program name and argument) or not (ignore the
failure to check argc, assume at least one argument is passed to the
program):

#include <stdio.h>

int
main(int argc, char ** argv)
{
printf("%s got %s\n", *argv++, *argv);

return 0;
}

I think I read that the order of evaluation of the arguments is not
specified so you could get the output:
thearg got progname
if you launched with
$ progname thearg

I'd like to hear what you gurus say (alot of people hanging on forums
do not seem to realise that this is the place to go for expert advice
btw).

Yes, the order of evaluation of arguments is not specified, so at the
very least it could print either "thearg got progname" or "progname
got thearg'. There are programs whose behavior can vary depending on
the order of argument evaluation without invoking undefined behavior.

Consider this rather contrived program:

#include <stdio.h>
int main(void)
{
printf("puts returns %d and %d\n",
printf("first\n"),
printf("second\n"));
return 0;
}

Its output can be either

second
first
puts returns 6 and 7

or

first
second
puts returns 6 and 7

but it can't legally do anything else (barring an I/O error).

But the printf call in your example:

printf("%s got %s\n", *argv++, *argv);

invokes undefined behavior because it violates C99 6.5p2:

Between the previous and next sequence point an object shall have
its stored value modified at most once by the evaluation of an
expression. Furthermore, the prior value shall be read only to
determine the value to be store.

The evaluations of all three arguments occur between two consecutive
sequence points. The value of argv is not modified more than once, so
it doesn't violate the first sentence, but it's read to determine the
value to be stored in argv *and* to determine the value of the third
argument, which violates the second sentence.

You weren't asking how to write it correctly, but I'll answer that
anyway:

printf("%s got %s\n", argv[0], argv[1]);
argv++;

Or, if you insist on having side effects within your arguments:

printf("%s got", *argv++);
printf(" %s\n", *argv);
 
T

Thomas Lumley

I saw a discussion on a forum where two people were arguing if the
following program causes undefined behaviour (or at least undefined
output for a given program name and argument) or not (ignore the
failure to check argc, assume at least one argument is passed to the
program):

#include <stdio.h>

int
main(int argc, char ** argv)
{
printf("%s got %s\n", *argv++, *argv);

return 0;

}

I think I read that the order of evaluation of the arguments is not
specified so you could get the output:
thearg got progname
if you launched with
$ progname thearg

There are two issues here.

The order of evaluation is at best unspecified, but that would give
progname got thearg
or
progname got progname
as the possible outputs.

The only way to get
thearg got progname
would be if the program had undefined behaviour (in which case you
could get anything). I think it does in fact have undefined
behaviour:
the sequence point at the function call happens too late to stop this
being just like evaluating i++ * i.

It's not obvious to me why the Standard makes these multiple-access
errors have undefined behaviour rather than just unspecified
behaviour. That is, it doesn't seem unduly restrictive to make an
implementation give an answer than would result from some order of
evaluation and application of side effects, rather than allowing it to
produce nasal demons. In contrast, I think it's fairly clear that
unspecified behaviour isn't enough to handle the consequences of
writing off the end of an array or deferencing a null pointer.


-thomas
 
F

Flash Gordon

WP wrote, On 24/08/07 21:12:
I saw a discussion on a forum where two people were arguing if the
following program causes undefined behaviour (or at least undefined
output for a given program name and argument) or not (ignore the
failure to check argc, assume at least one argument is passed to the
program):

#include <stdio.h>

int
main(int argc, char ** argv)
{
printf("%s got %s\n", *argv++, *argv);

return 0;
}

I think I read that the order of evaluation of the arguments is not
specified so you could get the output:

The order is unspecified, but that is not the big problem. The big
problem is that you modify argv *and* evaluate it for a purpose other
than generating the new value with no intervening sequence point.
thearg got progname
if you launched with
$ progname thearg

I'd like to hear what you gurus say (alot of people hanging on forums
do not seem to realise that this is the place to go for expert advice
btw).

A good reason for not bothering with forums :)
 
W

WP

WP wrote, On 24/08/07 21:12:








The order is unspecified, but that is not the big problem. The big
problem is that you modify argv *and* evaluate it for a purpose other
than generating the new value with no intervening sequence point.



A good reason for not bothering with forums :)

Thanks Flash, and the others who have replied. I suspected there might
be an error with regards to sequence points and modifying the variable
too many times but I was very unsure. Could you eloborate on that
please? Then I shall point the forum folk to this information. I get
so annoyed when people at forums are spreading disinformation
unchallenged because there are no real gurus there. I try to challenge
them when I'm confident, otherwise I ask here because I know this is
the place to go and has been for a long time. But many newcomers
doesn't seem to know the first thing about usenet and instead end up
in forums of lesser quality.
 
K

Keith Thompson

Thomas Lumley said:
It's not obvious to me why the Standard makes these multiple-access
errors have undefined behaviour rather than just unspecified
behaviour. That is, it doesn't seem unduly restrictive to make an
implementation give an answer than would result from some order of
evaluation and application of side effects, rather than allowing it to
produce nasal demons. In contrast, I think it's fairly clear that
unspecified behaviour isn't enough to handle the consequences of
writing off the end of an array or deferencing a null pointer.

The point is to allow for optimization. There are cases where the
compiler can't detect whether the behavior is undefined. 'i + i++' is
obvious, but '*p1 + (*p2)++' may or may not be undefined, depending on
whether p1==p2.

By making such cases undefined, the standard gives compilers the
freedom to rearrange the expression more freely, by *assuming* that
'(*p2)++' won't affect the value if '*p1'.
 
C

CBFalconer

Keith said:
Yes, the order of evaluation of arguments is not specified, so at the
very least it could print either "thearg got progname" or "progname
got thearg'. There are programs whose behavior can vary depending on
the order of argument evaluation without invoking undefined behavior.

In fact equally likely is "thearg got thearg" and "progname got
progname". Or anything else, since it invokes undefined behaviour.
 
F

Flash Gordon

WP wrote, On 24/08/07 22:08:
Thanks Flash, and the others who have replied. I suspected there might
be an error with regards to sequence points and modifying the variable
too many times but I was very unsure. Could you eloborate on that
please? Then I shall point the forum folk to this information.

I suggest starting at question 3.1 of the comp.lang.c FAQ at
http://www.c-faq.com/ and following the links through the related questions.
> I get
so annoyed when people at forums are spreading disinformation
unchallenged because there are no real gurus there. I try to challenge
them when I'm confident, otherwise I ask here because I know this is
the place to go and has been for a long time. But many newcomers
doesn't seem to know the first thing about usenet and instead end up
in forums of lesser quality.

So point them here. We are always happy to receive new blood... ^-^
 
T

Thomas Lumley

[...]
It's not obvious to me why the Standard makes these multiple-access
errors have undefined behaviour rather than just unspecified
behaviour. That is, it doesn't seem unduly restrictive to make an
implementation give an answer than would result from some order of
evaluation and application of side effects, rather than allowing it to
produce nasal demons. In contrast, I think it's fairly clear that
unspecified behaviour isn't enough to handle the consequences of
writing off the end of an array or deferencing a null pointer.

The point is to allow for optimization. There are cases where the
compiler can't detect whether the behavior is undefined. 'i + i++' is
obvious, but '*p1 + (*p2)++' may or may not be undefined, depending on
whether p1==p2.

By making such cases undefined, the standard gives compilers the
freedom to rearrange the expression more freely, by *assuming* that
'(*p2)++' won't affect the value if '*p1'.

Yes, I get that. My question is why is *p1 + (*p1)++ undefined rather
than merely unspecified? If the compiler were given freedom to perform
the two reads of *p1 and the write in any order consistent with
precedence then the result could be unspecified but not undefined. For
example, if *p1 starts off equal to 0 then possible answers would
include 0 and 1 but not -1000 or a disk crash. I know that the
Standard does in fact permit -1000 or a disk crash, I just don't know
why this extra freedom is useful to compiler writers.
 
K

Keith Thompson

Thomas Lumley said:
Yes, I get that. My question is why is *p1 + (*p1)++ undefined rather
than merely unspecified? If the compiler were given freedom to perform
the two reads of *p1 and the write in any order consistent with
precedence then the result could be unspecified but not undefined. For
example, if *p1 starts off equal to 0 then possible answers would
include 0 and 1 but not -1000 or a disk crash. I know that the
Standard does in fact permit -1000 or a disk crash, I just don't know
why this extra freedom is useful to compiler writers.

My understanding is that there are cases where a compiler can generate
substantially better code if it's allowed to make certain assumptions
about aliasing, sometimes substantially better than if it were limited
by restrictions on what can happen if the code violates those
assumptions.

I suppose that doesn't really answer your question; you asked why it's
useful, and I more or less repeated my assertion that it is.
 
P

pete

Thomas said:
My question is why is *p1 + (*p1)++ undefined rather
than merely unspecified?

Because nobody on the C standard committee
ever cared about code like that.
I don't need it.
The more crap there is, that is undefined,
the simpler the language is.
 
K

Keith Thompson

CBFalconer said:
In fact equally likely is "thearg got thearg" and "progname got
progname". Or anything else, since it invokes undefined behaviour.

Yes, of course. By snipping the rest of my article (in which I
explain why the behavior is undefined), you make it appear as if I
didn't already know that.

BTW, I wouldn't say those results are "equally likely". Anything
could happen; what actually does happen depends on what the
implementation does.
 
R

Richard Bos

Thomas Lumley said:
Yes, I get that. My question is why is *p1 + (*p1)++ undefined rather
than merely unspecified? If the compiler were given freedom to perform
the two reads of *p1 and the write in any order consistent with
precedence then the result could be unspecified but not undefined.

Because you can do a great deal more of optimisation - for example, ones
involving multi-processor or multi-pipeline systems reading two separate
sub-expressions in parallel - if you simply don't have to worry about
whether any complicated expression involving pointers might refer to the
same object as another CEIP. It may seem quite simple to work this out
in i + i++, but it's not about i + i++. The real problem, and the real
advantage, is with expressions involving several more terms, each of
which might be a pointer to a pointer. And conversely, while it's
relatively simple to write a rule which makes i + i++ well-defined or
just implementation-defined, it's a lot more difficult to write a rule
which allows decent optimisation, fully defines _any_ complex
expression, _and_ does not change the semantics of code which is now
well-defined.

Richard
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,537
Members
45,022
Latest member
MaybelleMa

Latest Threads

Top