Keith said:
Philip Potter wrote: [...]
Except that yours isn't standard-compliant.
Of course it is -- under conditions of UB any behavior is
standard-compliant.
No. I'll expand on that below.
Actually you didn't. You simply tried to defend gets() by describing a
scenario outside the specification (hence under UB) that was
predictable in a way you've constructed which happens to coincide with
what the optimistic things that specification tries to describe in its
explanation of gets(). But you never removed the "UB cloud" which
covers the whole thing.
I did not say that "the C language does not contain variables". (If I
did, please cite the article in which I said that.) I said that the C
standard does not define the term "variable".
The relevant quote:
" [...] It is not obvious what the word "variable" should mean in the
context of C. [...]"
And if you think that quote is out of context, you can look up it for
for yourself and see the follow up with a half dozen examples of things
in C where it supposedly can't be decided whether or not something is a
variable.
The semantics of function calling does require some sort of structure
that behaves in a stack-like manner (last-in first-out). On the other
hand, the term "stack" is also commonly used to refer to a particular
data structure implemented in hardware, where a CPU register is
dedicated as a "stack pointer", and the stack grows and shrinks
through contiguous memory addresses. This kind of "stack" is not
required or implied by the C standard, and there are implementations
that don't have such a "stack"; the data storage required for the
local objects created by a function call is allocated by something
similar to malloc(), and released by something similar to free().
Referring to "the stack" on such a system would be misleading.
What the hell are you talking about? If you think "the stack" means a
hardware stack, its because of something in your mind. We all
understand that the C specification is implemented on an abstract
machine, and that's where its "stack" is. If someone is conflating
this stack with a particular stack implementation (such as Sparc's
register window mechanism, or Itanium's register block stack thingy),
its no different from the people who post with gcc-specific extensions
(like an extra envp parameter in main) which happens here all the time.
And of course in this case the conflation is usually harmless since its
a very rare thing for someone to use an *extension* or
platform-specific feature of a hardware stack in real world code. You
usually use it exactly in the same way you use it in its abtract form
-- you push and pop to it. Compliers may play games with hardware
stacks, general programmers (even hard code low level programmers like
myself) usually do not.
So insisting that C has no stack because the specification doesn't say
that it does is just silly. This is why confining discussion of C only
to the language in the specification is idiotic.
[...]
So regardless of what the spec says under the heading of gets(), the UB
is inescapable from within the spec, which means the behavior described
in the spec is irrelevant since UB can invoke any behavior.
Ok, getting back to gets().
You've encouraged me to do something I don't believe I've ever done
here. I'm going to defend gets().
I know your mind doesn't work very "flexibility" at all but I'll give
it a shot -- replace your bad gets() program with another program,
which say, performs a simple buffer overflow:
char digs[5];
sprintf (digs, "%d", (int) val);
Ok, then continue to apply the reasoning and statements you just made
with your gets() program, but in the obvious analogous way. Ok, so
here are the statements you made which apply equally to a program
whicih contains the above:
1) " ... But suppose I enter a 10-character line. The C standard
guarantees that it will work properly."
-- Similarly, if we make val small enough here, it will work
properly.
2) "If I enter a 300-character line while running this program, I get
undefined behavior. The consequences would be entirely my own fault.
[...] But suppose I enter a 10-character line. The C standard
guarantees that it will work properly."
-- Similarly if I make val a 5+ digit integer, the program that
includes the above will have UB. But if I make val a 4 digit
positive number, or 3 digit negative number, it will work
just fine.
The UB we get from overrunning digs[] here obviously can lead to
arbitrary action since it will smash and adjacent declarations
including possibly volatiles, sig_atomic_t or whatever. Same with your
gets() program. So both programs occupy the same space of what's the
worst that can go wrong. Either program could easily format your hard
drive with the right set of circumstances.
So we see the analogy is a pretty close fit, and because of that we
usually look at code such as the above very skeptically. In other
words your argument about gets() hasn't specifcally bolstered gets() in
any way that doesn't also bolster the code above. Let me repeat --
your *argument* doesn't significantly distinguish gets() from the code
snippet above in the context we are in.
Where the analogy falls down, however, is that that above code can be
made to work solely through mechanisms inside the program itself. If I
have some way of guaranteeing that val is between -999 and 9999 solely
through mechanisms inside the program itself, then everything is fine.
I would be using things *IN THE C STANDARD* to make sure that the
semantics of that code remained compliant. The key point is that I do
not need to venture outside the system/program or invoke platform
specific behavior to guarantee that code and brinng it within spec.
I.e., the semantic correctness is guaranteed, essentially by other
contents from the standard itself. I.e., the code above is actually
correct within certain assumptions, and those assumptions can be
enforced by nothing more than the standard itself. The potential for
UB is *eliminated* from within usage of the specification itself.
Your gets() program cannot be similarly fixed, or similarly rely on
analogous guarantees. In order to make gets() work according to what
the standard is optimistically hoping for you *MUST* step outside of
the standard. Thus the standard is trying to specify something that
specifically needs something that isn't (and can't realistically be
put) in the standard. BTW, what does UB mean? Doesn't it mean
arbitrary behavior outside the specification? So trying to format your
hard drive (perhaps successfully, perhaps not) because you stepped
outside the spec
Your argument fails to make this distinction (can you see this?) and by
implication misses the whole point.
Having said that, if I were writing such a quick-and-dirty test
program in real life I *still* wouldn't use gets().
And in this case, its not because of any typically wrong reasoning on
your part. You are actually behaving correctly. As would any
programmer that behaved this way. So why is this being specified? The
rationale is not convincing, and in fact is clearly meant as
subterfuge.
[...] I'd use fgets()
and remove the trailing '\n'. (There would always be one because,
again, I wouldn't feed very long lines to the program; if I
accidentally did so, the program would misbehave, but in a benign and
predictable manner.)
So you've traded one bad behavior for another? ... Whatever, that's
another discussion entirely. You won't UB with this strategy (just get
wrong results, but predictably so.) The \n can also be omitted if EOF
is encountered without a \n just before it, btw. A \n can also
*appear* to be omitted if a \0 is consumed before a \n is, and you are
just using C's char * string semantics on the results.
[...] gets() should not be used, and it should be
removed from the standard, or at least formally deprecated.
Implementations should warn about any calls to gets().
So what are you defending?
But *if* I use it in a manner whose behavior is guaranteed by the
standard, I have every right to expect it to behave as the standard
specifies.
Ok, but the standard *CANNOT* specify that guarantee. It makes a
"chicken before the egg" kind of specification about how gets() works.
It basically says *IF* the call to gets() doesn't invoke UB, then it
reflects some kind of stdin input. But that *IF* cannot be satisfied
by any content in the standard at all. Are you following? Therefore
the standard is not *specifying* a way for gets() to behave in the
optimistic way they are hoping it does.