Undefined behavior in standards

X

xlar54

Ive lurked a bit, always reading and learning (thank you all).
Regarding undefined behaviors.. in some cases I can understand, but in
others I dont fully get it. Why would the standards committee allow
an undefined behavior? Why not define it? Granted when pointers are
involved, you're often at the mercy of the system itself, but for
something like reading a variable before it is initialized... seems to
me that this could be easily standardized as a compile-time error.
Your thoughts?
 
I

Ike Naar

Ive lurked a bit, always reading and learning (thank you all).
Regarding undefined behaviors.. in some cases I can understand, but in
others I dont fully get it. Why would the standards committee allow
an undefined behavior? Why not define it? Granted when pointers are
involved, you're often at the mercy of the system itself, but for
something like reading a variable before it is initialized... seems to
me that this could be easily standardized as a compile-time error.

It is not always possible to detect this situation at compile-time.

extern init(double*);

int main(void)
{
double d;
init(&d);
return d; /* reading d, but is d initialized? */
}
 
S

Seebs

Ive lurked a bit, always reading and learning (thank you all).
Regarding undefined behaviors.. in some cases I can understand, but in
others I dont fully get it. Why would the standards committee allow
an undefined behavior? Why not define it?

Usually because there are machines on which the most natural behavior
is some kind of trap or interrupt, and avoiding this would be EXTREMELY
expensive.
Granted when pointers are
involved, you're often at the mercy of the system itself, but for
something like reading a variable before it is initialized... seems to
me that this could be easily standardized as a compile-time error.

No, it couldn't. Halting problem.

int i, n = 0;
scanf("%d", &n);
if (n != 1) {
i = 0;
}
i; /* do we read i before it is initialized? */
Your thoughts?

In general, undefined behavior occurs when you do something fundamentally
incoherent, and the cost of expecting a compiler to check for it or deal
with it is very large, and the cost of telling you not to do that is small.

-s
 
B

Ben Bacarisse

Seebs said:
On 2010-06-05, xlar54 <[email protected]> wrote:

No, it couldn't. Halting problem.

Yes, though your example might confuse someone expecting to see how the
halting problem relates to this question.
int i, n = 0;
scanf("%d", &n);
if (n != 1) {
i = 0;
}
i; /* do we read i before it is initialized? */

This example is ironic since it introduces another source of UB: the
scanf call can produce UB when otherwise well-formed input can't be
represented as an int. That does not matter to the point you are
making, but something entirely well-defined like a simple getchar call
would have avoided the irony.

The irony is significant in that the OP is wondering why so many things
are UB in C and this is one of the most infuriating examples with, in my
opinion, the weakest justification. It means you can't use any of the
scanf family for numeric input if you take UB and corner cases
seriously. You can cripple the input with a length limit (%9ld for
example) but that is hardly satisfactory.

[Aside: my preference would be for a correctly formatted but
unrepresentable input to be classed as a matching failure.]
In general, undefined behavior occurs when you do something fundamentally
incoherent, and the cost of expecting a compiler to check for it or deal
with it is very large, and the cost of telling you not to do that is
small.

Ack "in general" but in the specific case you introduced I don't think
the cost would be very large and the benefit would be significant but
maybe I am missing the reason for this specific UB. (I'd alter atoi to
be well-defined as well, though there are relatively simple solutions
for that function.)
 
E

Eric Sosman

Ive lurked a bit, always reading and learning (thank you all).
Regarding undefined behaviors.. in some cases I can understand, but in
others I dont fully get it. Why would the standards committee allow
an undefined behavior? Why not define it? Granted when pointers are
involved, you're often at the mercy of the system itself, but for
something like reading a variable before it is initialized... seems to
me that this could be easily standardized as a compile-time error.

There've been several responses on the specific issue of
using uninitialized variables, but on the wider question of "Why
is some behavior left undefined?" here are a few quotes from
Section 3 of the Rationale:

"The terms unspecified behavior, undefined behavior, and
implementation-defined behavior are used to categorize the
result of writing programs whose properties the Standard
does not, or cannot, completely describe. The goal of adopting
this categorization is to allow a certain variety among
implementations which permits quality of implementation to be
an active force in the marketplace as well as to allow certain
popular extensions, without removing the cachet of conformance
to the Standard."

"Undefined behavior gives the implementor license not to catch
certain program errors that are difficult to diagnose. It also
identifies areas of possible conforming language extension:
the implementor may augment the language by providing a
definition of the officially undefined behavior."

I'd read this as saying there are multiple reasons to leave some
behaviors undefined. Here are my paraphrases of a few:

- Some errors are difficult to detect (most of the responses about
uninitialized variables have mentioned this aspect), so the
Standard places the burden for their detection on the programmer
rather than on the compiler. Looking at it another way, getting
the program to run correctly is a shared responsibility, and the
compiler shouldn't have to shoulder all of it unaided.

- Leaving some behaviors undefined may lead to higher-quality
implementations of defined behaviors. For example, strcpy()
may be able to use high-speed in-line instruction sequences
that would be unsuitable if it had to worry about predictable
behavior when source and destination overlap. By leaving the
behavior on overlap undefined, the Standard permits strcpy()
implementations that are faster than they might be otherwise.

- Leaving some behaviors undefined allows opportunity for language
and library extensions. If *all* behaviors were nailed down,
extensions would be impossible. (It is interesting, although
discouraging, to note that some of the more virulent anti-Standard
posters to this forum are the same people who make extensive use
of the freedoms the Standard grants them.)

Finally, there's a further argument for undefined behavior, one
that neither the Standard nor the Rationale appears to state out loud:
It's *really* *hard* to define everything precisely! If the writers
had withheld the Standard until every single corner had been smoothed,
primed, and varnished, we would still be waiting for the first version.
 
L

Lew Pitcher

i not find the problem
if someone want to eliminate this UB it is easy
"double d;" could mean "double d=0.0;"
in each compiler => no UB in this case

Except that the current standards do not support such behaviour. If, in a
*new* standard, it was specified that explicitly uninitialized automatic
variables take on a zero (or floatingpoint zero, or NULL, or zero-like (for
structures and unions) ) value, then your plan would work. But, right now,
the standards say (for instance, in C90, Section 6.7.8, paragraph 10)
"If an object that has automatic storage duration is not initialized
explicitly, its value is indeterminate."

Your
double d;
declares d to be an object of type double, with automatic storage duration,
not initialized to any set value. And, thus 6.7.8 #10 applies, and the
value of d is indeterminate /as far as the standard is concerned/, no
matter /what/ individual compilers do.
 
P

Peter Nilsson

Richard Heathfield said:
... to avoid imposing a hidden runtime penalty, C doesn't
zero out auto scope objects by default, and it places the
burden on you the programmer...

Which is in contrast to your own policy of initialising
all objects explicitly. Given that modern compilers can
detect uninitialised objects in most cases, it seems
reasonable to believe they can detect unnecessary
initialisation with equal success.

I can understand how the initial standards didn't want to
impose burdens on implementations on small systems, but
people these days almost exclusively use cross compilers
on large systems when targetting embedded platforms.
 
N

Nick Keighley

it couldn't be a compile time error, in general it's too hard to
detect. Halting Problem too hard.

that wouldn't help you detect the error at compile time which is what
you suggested. I'm guessing they didn't do this becuase it has a
slight cost.
Except that the current standards do not support such behaviour.

since he's asking why the standard is the way it is, this is a bit of
a daft answer
 
T

Tim Rentsch

Eric Sosman said:
On 6/5/2010 3:57 AM, xlar54 wrote:
[why is there undefined behavior]

There've been several responses on the specific issue of
using uninitialized variables, but on the wider question of "Why
is some behavior left undefined?" here are a few quotes from
Section 3 of the Rationale:

[various good reasons given, including some by ES]

Finally, there's a further argument for undefined behavior, one
that neither the Standard nor the Rationale appears to state out loud:
It's *really* *hard* to define everything precisely! [snip]

I don't buy this argument. It might be hard to identify
where the line is, but it's the Standard's job to draw that
line sharply and precisely. Once the line is drawn, it's
very easy to say "everything on <side X> of the line causes
the program to stop termination immediately; all actions
before have been done, all actions following have not been
started." It's easy to give a definition. What makes the
question hard is /what/ definition to give -- that's why there
is undefined behavior, to avoid having to be pinned down to a
single answer.

(I should add that the rest of Eric's comments were spot on.)
 
M

Malcolm McLean

    Finally, there's a further argument for undefined behavior, one
that neither the Standard nor the Rationale appears to state out loud:
It's *really* *hard* to define everything precisely!  [snip]

I don't buy this argument.  It might be hard to identify
where the line is, but it's the Standard's job to draw that
line sharply and precisely.  Once the line is drawn, it's
very easy to say "everything on <side X> of the line causes
the program to stop termination immediately;  all actions
before have been done, all actions following have not been
started."  It's easy to give a definition.  What makes the
question hard is /what/ definition to give -- that's why there
is undefined behavior, to avoid having to be pinned down to a
single answer.
On some systems writing to a null pointer will trigger a hardware
trap, on others it will place a byte at position zero in memory.
Mandating a behaviour would put a burden on one compiler, essentially
involving an if stateemnt at every pointer write, so it's easier to
say 'the behaviour is undefined'.
 
T

Tim Rentsch

Malcolm McLean said:
Finally, there's a further argument for undefined behavior, one
that neither the Standard nor the Rationale appears to state out loud:
It's *really* *hard* to define everything precisely! [snip]

I don't buy this argument. It might be hard to identify
where the line is, but it's the Standard's job to draw that
line sharply and precisely. Once the line is drawn, it's
very easy to say "everything on <side X> of the line causes
the program to stop termination immediately; all actions
before have been done, all actions following have not been
started." It's easy to give a definition. What makes the
question hard is /what/ definition to give -- that's why there
is undefined behavior, to avoid having to be pinned down to a
single answer.
On some systems writing to a null pointer will trigger a hardware
trap, on others it will place a byte at position zero in memory.
Mandating a behaviour would put a burden on one compiler, essentially
involving an if stateemnt at every pointer write, so it's easier to
say 'the behaviour is undefined'.

That may be true but it's irrelevant to the point I was making.
 
M

Malcolm McLean

Malcolm McLean said:
    Finally, there's a further argument for undefined behavior, one
that neither the Standard nor the Rationale appears to state out loud:
It's *really* *hard* to define everything precisely!  [snip]
I don't buy this argument.  It might be hard to identify
where the line is, but it's the Standard's job to draw that
line sharply and precisely.  Once the line is drawn, it's
very easy to say "everything on <side X> of the line causes
the program to stop termination immediately;  all actions
before have been done, all actions following have not been
started."  It's easy to give a definition.  What makes the
question hard is /what/ definition to give -- that's why there
is undefined behavior, to avoid having to be pinned down to a
single answer.
On some systems writing to a null pointer will trigger a hardware
trap, on others it will place a byte at position zero in memory.
Mandating a behaviour would put a burden on one compiler, essentially
involving an if stateemnt at every pointer write, so it's easier to
say 'the behaviour is undefined'.

That may be true but it's irrelevant to the point I was making.- Hide quoted text -
That's a major reason why we have undefined behaviour. It doesn't
exist in Java, because you have no platform dependence. The other
reason is that you can't write arbitrary bytes to memory objects in
Java - it's difficult to mandate a behaviour when this corrupts data
objects.
 
N

Nobody

That's a major reason why we have undefined behaviour. It doesn't
exist in Java, because you have no platform dependence.

That specific form of UB (null pointer handling) doesn't exist in Java,
but other forms of UB do.

One form of UB which will be found in any real language is the amount of
memory available. No real language is going to specify that allocating N
bytes of memory (in total) must succeed while allocating N+1 bytes must
fail.

Similarly, any language which provides the equivalent of time() is going
to admit UB through execution times; no real language is going to specify
that a given code fragment must take N seconds (or, at least, must
*appear* to take N seconds according to time()).
 
E

Eric Sosman

That specific form of UB (null pointer handling) doesn't exist in Java,
but other forms of UB do.

Right.
One form of UB which will be found in any real language is the amount of
memory available. No real language is going to specify that allocating N
bytes of memory (in total) must succeed while allocating N+1 bytes must
fail.

That seems to stretch "undefined" beyond its useful elasticity.
In the language of the C Standard "implemenation-defined" or perhaps
"unspecified" would cover it better than "undefined." Follow this
route a bit further and you'll call `printf("Hello, world!\n")'
undefined because of the possibility of I/O error. That way lies
madness.
Similarly, any language which provides the equivalent of time() is going
to admit UB through execution times; no real language is going to specify
that a given code fragment must take N seconds (or, at least, must
*appear* to take N seconds according to time()).

C dodges this particular bullet by not treating elapsed time
as a "behavior" in the first place. The definition "external
appearance or action" (3.4p1) is over-broad, I'd say: It includes,
for example, the fragrance of fopen() and the sound of setjmp().
Still, I think we can exclude elapsed time from consideration
because it is not listed among the attributes the Standard claims
to govern (1p1).
 
M

Malcolm McLean

One form of UB which will be found in any real language is the amount of
memory available. No real language is going to specify that allocating N
bytes of memory (in total) must succeed while allocating N+1 bytes must
fail.
That only applies in a hosted environment. Plenty of programs have the
entire resources of the processor avialable to them, so in fact an
attempt to use 65536 bytes of memory will always succeed whilst an
attempt to use 65537 will always fail. This is quite common. The
language itself isn't normally invented from scratch for that
particular processor, however.


In C 'undefined' means "anything can happen". So ptr = malloc(N); if(!
ptr) exit(EXIT_FAILURE); is defined, even though it may not be
possible to predict whether the branch will be taken. ptr = malloc(N);
*ptr = 1; is however undefined if malloc(0 returns null. The result of
writing to the null pointer could be anything from an error message to
the failure of the keyboard to another function in a seemingly
unrelated part of the program returning the wrong result.
 
N

Nick Keighley

Malcolm McLean said:
    Finally, there's a further argument for undefined behavior, one
that neither the Standard nor the Rationale appears to state out loud:
It's *really* *hard* to define everything precisely!  [snip]
I don't buy this argument.  

I'm semi in agreement with you. I think C left much behaviour
undefined because it was expensive to compute. C traded off absolute
safety for speed and simplicity of implementation.

I'm not sure I agree. A standard may do this but I don't think it's
under any obligation to do so.

but not necessarily easy to implement

don't the freedom of action between sequence points bugger this up?


I bet it isn't. Writing standards is hard.


I don't think this is why C is the way it is
 
T

Tim Rentsch

Malcolm McLean said:
Malcolm McLean said:
On Jun 20, 4:24 am, Tim Rentsch <[email protected]> wrote:
Finally, there's a further argument for undefined behavior, one
that neither the Standard nor the Rationale appears to state out loud:
It's *really* *hard* to define everything precisely! [snip]
I don't buy this argument. It might be hard to identify
where the line is, but it's the Standard's job to draw that
line sharply and precisely. Once the line is drawn, it's
very easy to say "everything on <side X> of the line causes
the program to stop termination immediately; all actions
before have been done, all actions following have not been
started." It's easy to give a definition. What makes the
question hard is /what/ definition to give -- that's why there
is undefined behavior, to avoid having to be pinned down to a
single answer.
On some systems writing to a null pointer will trigger a hardware
trap, on others it will place a byte at position zero in memory.
Mandating a behaviour would put a burden on one compiler, essentially
involving an if stateemnt at every pointer write, so it's easier to
say 'the behaviour is undefined'.

That may be true but it's irrelevant to the point I was making.- Hide quoted text -
That's a major reason why we have undefined behaviour. It doesn't
exist in Java, because you have no platform dependence. The other
reason is that you can't write arbitrary bytes to memory objects in
Java - it's difficult to mandate a behaviour when this corrupts data
objects.

Once again that may be true but it's irrelevant to the point
I was making.
 
T

Tim Rentsch

Nick Keighley said:
Malcolm McLean said:
Finally, there's a further argument for undefined behavior, one
that neither the Standard nor the Rationale appears to state out loud:
It's *really* *hard* to define everything precisely! [snip]
I don't buy this argument.

I'm semi in agreement with you. I think C left much behaviour
undefined because it was expensive to compute. C traded off absolute
safety for speed and simplicity of implementation.

I'm not sure I agree. A standard may do this but I don't think it's
under any obligation to do so.

When I say "draw the line" what I mean is to identify which
behaviors are defined and which behaviors are undefined.
(Also, which are unspecified, etc.) My position is (still)
that it is /absolutely/ the job of the Standard to do this.
If someone can't tell after reading the Standard whether
behavior X is defined or undefined, it has failed to fulfill
(at least one of) its primary function(s).

but not necessarily easy to implement

Very true.

don't the freedom of action between sequence points bugger this up?

Obviously this needs to be taken into account, but I don't
think it prevents any significant difficulties. Remember,
we only have to say what the behavior will be, we don't
have to write a compiler that provides that behavior.

I bet it isn't. Writing standards is hard.

Sure it is; just try it:

"Execution of any statement whose behavior is not defined by
this Standard shall cause the computer it's running on to
catch fire."

"Execution of any statement whose behavior is not defined by
this Standard shall issue launch codes to all armed nuclear
missles."

"Execution of any statement whose behavior is not defined by
this Standard shall initiate entering the Hobart Phase where
time flows backwards instead of forwards."

It's only if we want the definition to be agreeable to potential
implementors that it gets hard.

I don't think this is why C is the way it is

I think I see the point you're making, and I believe I agree with it,
at least partly. What I meant by the statement wasn't expressed very
well. I wasn't trying to explain why C has undefined behavior _at
all_; that's historical plus a lot of other different things. But
when considering some particular aspect, and deciding whether its
behavior will be defined or undefined (and ignoring for the moment
some other possibilities such as implementation defined), it's often
true that "undefined behavior" simply means we don't want to be pinned
down to a single answer. It isn't that a choice can't be made, or
even that a choice can't be made that's reasonably cheap to implment;
but rather that we have decided /not to make a choice at all/ -- to
leave the freedom of choice (for that aspect) open to other factors.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,768
Messages
2,569,574
Members
45,049
Latest member
Allen00Reed

Latest Threads

Top