Why GCC does warn me when I using gets() function for accessing file

P

Philip Potter

Richard Bos said:
Yes, but there are no realistic situations where you _can_ know that.
You can't even be certain if you enter the text yourself; typos are easy
to make.

It doesn't matter if it's realistic or not. If gets() is in the standard
then a conforming implementation must implement it properly when stdin gives
it "friendly" input. The example shows that gets() can have perfectly
well-defined behaviour, even if only in unrealistic situations. If gets()
did have completely undefined behaviour, it would be trivial to remove it
from the standard, since none of the programs which use it had defined
behaviour anyway.

Philip
 
M

Mark McIntyre

France is a member of EU.

Do tell.

*sigh* Its a reference to the "enthusiastic" expenses policy that EU
comissioners and many other staff have been reputed to enjoy.
--
Mark McIntyre

"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it."
--Brian Kernighan
 
M

Mark McIntyre

Unfortunately, I doubt that.

I disagree. I'd actually be quite annoyed if the ISO committee
littered the Standard with anal-retentive error checking. Its not a
blessed instruction manual.
I believe an implementation could add error checking for cases where
the version provided in the standard invokes undefined behavior, but
any suggestion that this is encouraged is not supported by the
standard.

The standard doesn't require or indeed encourage many things,
including the safe use of gets() but that doesn't mean the Standard
committee are recklessly or wilfully negligent, as JN imputed.
--
Mark McIntyre

"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it."
--Brian Kernighan
 
M

Mark McIntyre

The answer in full (quoted from the comitee reports) is: ....
There is no consensus to make the suggested change or any change along
this line.
....
Thank you for posting this quote, which /entirely/ makes my point.
-----------------------------------------------------

AS ALWAYS, THE RANGE OF UNDEFINED BEHAVIOR PERMITTED INCLUDES

CORRUPTING MEMORY.

Indeed. So what? And don't shout.
Isn't this very clear?

Absolutely. So what?
Who cares about keyboards?

The standard requires the use of a basic character set. Since not all
keyboard layouts contain all those characters a means was provided to
resolve this issue.
I expected that I could submit a proposition and discuss it by email.

What, you expected the committee members to give up yet more of their
personal time (they have day jobs too), to enter into personal
correspondence with you, who couldn't even be bothered to attend
meetings?
Yes, (maybe you know this) modern communications and this "internet" fad
make sometimes traveling obsolete.

Ever heard of comp.std.c?
--
Mark McIntyre

"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it."
--Brian Kernighan
 
M

Mark McIntyre

On Tue, 05 Sep 2006 08:41:15 +0200, in comp.lang.c , jacob navia

<stuff>

Oh, and for reference, I'm threadplonking this thread, as I have no
further desire to feed Jacob's paranoia, or to read his pomposity. I
get plenty of opportunities in other threads, sadly.
--
Mark McIntyre

"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it."
--Brian Kernighan
 
K

Keith Thompson

Mark McIntyre said:
I disagree. I'd actually be quite annoyed if the ISO committee
littered the Standard with anal-retentive error checking. Its not a
blessed instruction manual.

The key phrase from the Standard, which you snipped, is:

"using the equivalent of the following algorithm"

It seems to me that the wording encourages the use of the actual code
from the standard to implement asctime().

By contrast, the standard provides a sample implementation of srand()
and rand(), but doesn't require the actual implementation to use an
equivalent algorithm. (Too many implementations do so anyway.)
 
J

jacob navia

Mark said:
I disagree. I'd actually be quite annoyed if the ISO committee
littered the Standard with anal-retentive error checking. Its not a
blessed instruction manual.

This is the main reason why C has lost most of its
supporters.

This "macho" attitude, this disdain for careful programming,
this "anything goes" attitude.

Careful error checking is "anal retentive" for people
that go around leaving their shit in each and every place!

Careful error specification is the most important
thing to do in the C standard now.

jacob
 
R

Richard Heathfield

jacob navia said:
This is the main reason why C has lost most of its
supporters.

What makes you think C has lost most of its supporters?
This "macho" attitude, this disdain for careful programming,
this "anything goes" attitude.

That isn't what Mark said. He only said that he didn't want the Standard
littered with error-checking code. He didn't say he didn't want his own
programs to do error-checking.

It really is time you learned to read. I run a little course...
 
W

websnarf

Philip said:
The standard says that integer overflow is UB; therefore, if I add 1 to an
int containing 32767, I may have invoked UB.

Correct, that's why Richard Seacord recently created a secure integer
library. (Personally, I just make sure my ranges make sense.)
[...] However, if I know that on my
system int is 17 bits or more, I can guarantee I haven't. The size of an int
is outside the programmer/language/specification's control, so according to
your argument this is still UB, and my implementation is free to reformat my
hard drive instead. I don't think many people would agree with you on this.

Uhhh ... that ANSI C committee itself agrees with this point of view.
I would actually prefer to take your point of view, and decribe a
limited scope of "bad behavior" for certain failures like numerical
overflows. But the ANSI C people decided not to bother with that. So
yes, overflowing an integer apparently can email the KGB the US nuclear
launch codes.
Similarly, if I know that on my system stdin gets a \n character at least
every 20 characters, I can use gets() and guarantee no UB.

All your doing is going ahead and translating the universe of UB that
comes with gets() into one narrow manifestation or predictable
behavior. That's exactly what I did in my sample implementation, BTW.
We can both do this, of course, because we are covered the by the UB.
Either way neither is behaving as the optimistic description in the
standard suggests.
Is this true? Please tell me more - I'd be interested to hear.

I found this in the C9X Rationale (sorry, got this mixed up with the
standard itself):

"Because gets does not check for buffer overrun,
it is generally unsafe to use when its input is not
under the programmer's control. [...]"

Ok, so they have a rudimentary understanding of the problem.

"[...] This has cause some to question whether it
should appear in the Standard at all. [...]"

Classic PR -- "some to question ...". A *LOT* of people question this.
Compare to how Fox news in the US reports on things that disagree with
their bias. Anyhow, so they recognize the people who understand what's
wrong with gets() do exist. So it all looks pretty reasonable right?

"The Committee decided that gets was useful and
convenient in those special circumstances when
the programmer does have adequate control over
the input, and as longstanding existing practice,
it needed a standard specification."

Two distortions in a single sentence:

1) Any place that gets could in theory be safely used, fgets can be
safely used just as easily (if you want to be an idiot, you can even
pass in INT_MAX as the buffer length parameter). Therefore gets is not
needed (and thus neither does a specification for it) for this reason.

2) Longtime existing practice is not a justification but rather an
indictment, because it is erroneous practice.

I.e., removing gets actually *improves* the situation for *both*
reasons. The longtime existing practice needs to stop, and even under
programmer control fgets works at least as well. Black is white, up is
down, war is peace, etc. Of course they go ahead an contradict
themselves in the very next sentence:

"In general, however, the preferred function is fgets."

Ya think? In fact do you think maybe its so preferred that you should
always use it (or something even better) instead?

Do you see what they've done? They've gone ahead and presented the
main and sufficient reason for taking gets *out* of the standard, and
just pretended the logic was inverted and claimed that's why they are
leaving it *in* the standard. So if you try to bring up the issue to a
member of the the standards committee, they can point to the rationale
and claim "that issue has been addressed". As such they don't have to
explain themselves. They don't have to justify themselves, they just
have to claim the logic implies the opposite of what it really does.
If you don't get this, go read 1984.
 
P

Philip Potter

Philip said:
[...] However, if I know that on my
system int is 17 bits or more, I can guarantee I haven't. The size of an int
is outside the programmer/language/specification's control, so according to
your argument this is still UB, and my implementation is free to reformat my
hard drive instead. I don't think many people would agree with you on
this.

Uhhh ... that ANSI C committee itself agrees with this point of view.
I would actually prefer to take your point of view, and decribe a
limited scope of "bad behavior" for certain failures like numerical
overflows. But the ANSI C people decided not to bother with that. So
yes, overflowing an integer apparently can email the KGB the US nuclear
launch codes.

You've completely missed the point here. Re-read the sentence "If I know
that on my system int is 17 bits or more, I can guarantee I haven't [invoked
UB when adding 32767 to 1]".
All your doing is going ahead and translating the universe of UB that
comes with gets()
http://en.wikipedia.org/wiki/Begging_the_question

into one narrow manifestation or predictable
behavior. That's exactly what I did in my sample implementation, BTW.

Except that yours isn't standard-compliant. gets() does not invoke UB unless
it actually overruns the buffer. If you believe otherwise, quote C&V, rather
than just asserting it.

Philip
 
R

Richard Bos

Philip said:
Is this true? Please tell me more - I'd be interested to hear.

I found this in the C9X Rationale (sorry, got this mixed up with the
standard itself):

"Because gets does not check for buffer overrun,
it is generally unsafe to use when its input is not
under the programmer's control. [...]"

Ok, so they have a rudimentary understanding of the problem.

"[...] This has cause some to question whether it
should appear in the Standard at all. [...]"

My dear fellow, if you can't even be bothered to quote them correctly,
you shouldn't be the one to whine.

Richard
 
W

websnarf

Philip said:
Philip said:
[...] However, if I know that on my
system int is 17 bits or more, I can guarantee I haven't. The size of an int
is outside the programmer/language/specification's control, so according to
your argument this is still UB, and my implementation is free to reformat my
hard drive instead. I don't think many people would agree with you on this.

Uhhh ... that ANSI C committee itself agrees with this point of view.
I would actually prefer to take your point of view, and decribe a
limited scope of "bad behavior" for certain failures like numerical
overflows. But the ANSI C people decided not to bother with that. So
yes, overflowing an integer apparently can email the KGB the US nuclear
launch codes.

You've completely missed the point here. Re-read the sentence "If I know
that on my system int is 17 bits or more, I can guarantee I haven't [invoked
UB when adding 32767 to 1]".

What relevance is that? The C standard says nothing about such
guarantees. You are comflating a particular implementation with the
standard. Of course an implementation can do whatever it wants for
platform-specific and undefined behavior. So it does -- this does not
prove any point.

That does not apply here. That gets() comes with UB is not in dispute.
That you can remove its undefinedness on a particular platform, is
nothing more than an abuse of the meaning of the word UB.
Except that yours isn't standard-compliant.

Of course it is -- under conditions of UB any behavior is
standard-compliant.
[...] gets() does not invoke UB unless
it actually overruns the buffer.

But nothing in the program can make this condition either happen or not
happen. I.e., to be well defined the spec has to specify something
outside of the C language. Besides not being its mandate -- it
actually does not do that. So the specification does not describe
conditions within the confines of what its describing (that C language,
not what user should do) under which the call can be made to be well
defined. The "unless its actually overruns the buffer" is nothing that
the C standard can explain with any specificity -- and it doesn't try.
[...] If you believe otherwise, quote C&V, rather
than just asserting it.

Quoting from the standard is useless since the standard does not make
any attempt to analyze things to their logical conclusion. Taking a
classic cue from Keith Thomspon, if you narrow your view of C
programming to just the spec you would conclude that the C language
does not contain variables (I kid you not, he posted this). Or as
other people keep claiming -- that C doesn't have a stack (if function
calls and returns are always bracketted like pushes and pops, do we not
have a stack? In fact a "call" stack?).

But telling me to justify my "beliefs" by citing chapter and verse you
are just being like the standards committee was with the gets()
rationale by attempting to subvert the rules for the argument.

Buffer overruns invoke UB. gets() may invoke buffer overruns
independent of how the programmer uses it. In some implementations its
possible to go outside the ANSI specification to force gets() to not
buffer overflow and have the behavior that's described in the spec,
however clearly the spec does not delineate these conditions.

So regardless of what the spec says under the heading of gets(), the UB
is inescapable from within the spec, which means the behavior described
in the spec is irrelevant since UB can invoke any behavior.
 
P

Philip Potter

Philip said:
You've completely missed the point here. Re-read the sentence "If I know
that on my system int is 17 bits or more, I can guarantee I haven't [invoked
UB when adding 32767 to 1]".

What relevance is that? The C standard says nothing about such
guarantees.

It says that the size of an int is implementation-defined. It describes the
meaning of "implementation-defined" carefully. It talks about minimum values
for INT_MAX, and says that integer overflow is UB.

If integer overflow happens, the behaviour is undefined. If an addition does
not overflow, the behaviour is well-defined, and must conform to the
standard's definition of addition.

On an implementation with INT_MAX>32767, 32767+1 is not an overflow, and
therefore not UB. It must result in 32768 - no other behaviour is
conforming.

Please tell me which step in this argument you disagree with.
You are comflating a particular implementation with the
standard. Of course an implementation can do whatever it wants for
platform-specific and undefined behavior. So it does -- this does not
prove any point.

No it can't. Please see FAQ 11.33. Implementation-defined behaviour must be
consistent, and must fit within the restrictions imposed by the standard.

If you are going to continue to place your hands over your ears, singing to
yourself, you are welcome to. I am tired of trying to talk sense over the
endless noise you put out.
[...] If you believe otherwise, quote C&V, rather
than just asserting it.

Quoting from the standard is useless since the standard does not make
any attempt to analyze things to their logical conclusion.

It sure beats your preferred method of "Yes it is!" "No it isn't!"

Philip
 
K

Keith Thompson

Philip Potter wrote: [...]
Except that yours isn't standard-compliant.

Of course it is -- under conditions of UB any behavior is
standard-compliant.

No. I'll expand on that below.

[...]
Quoting from the standard is useless since the standard does not make
any attempt to analyze things to their logical conclusion. Taking a
classic cue from Keith Thomspon, if you narrow your view of C
programming to just the spec you would conclude that the C language
does not contain variables (I kid you not, he posted this).

I think you're referring to the "curious about array initialization."
thread from last April and May.

I did not say that "the C language does not contain variables". (If I
did, please cite the article in which I said that.) I said that the C
standard does not define the term "variable". The discussion is
archived on groups.google.com; anyone who's interested can read it
there.
Or as
other people keep claiming -- that C doesn't have a stack (if function
calls and returns are always bracketted like pushes and pops, do we not
have a stack? In fact a "call" stack?).

The semantics of function calling does require some sort of structure
that behaves in a stack-like manner (last-in first-out). On the other
hand, the term "stack" is also commonly used to refer to a particular
data structure implemented in hardware, where a CPU register is
dedicated as a "stack pointer", and the stack grows and shrinks
through contiguous memory addresses. This kind of "stack" is not
required or implied by the C standard, and there are implementations
that don't have such a "stack"; the data storage required for the
local objects created by a function call is allocated by something
similar to malloc(), and released by something similar to free().
Referring to "the stack" on such a system would be misleading.

[...]
So regardless of what the spec says under the heading of gets(), the UB
is inescapable from within the spec, which means the behavior described
in the spec is irrelevant since UB can invoke any behavior.

Ok, getting back to gets().

You've encouraged me to do something I don't believe I've ever done
here. I'm going to defend gets().

Suppose I've written a function that takes a char* argument (that
points to a string), and I want to write a quick and dirty test
program for it. (I might write a more rigorous test framework later
on; for now, I just want to try it with a few arguments to see if the
results seem plausible.) So, I write a small program like this:

#include <stdio.h>
#include <string.h>
void show(char *s)
{
printf("%d: \"%s\"\n", (int)strlen(s), s);
}

int main(void)
{
char buf[256];
while (gets(buf) != NULL) {
show(buf);
}
return 0;
}

This lets me manually test my function with a few values. Since I
wrote the program in the last 5 minutes, I *know* that it could fail
if I enter too long a line. Once I've satisfied myself that the
function works more or less as I want it to, I delete the program.
I've never made it available to anyone else. I have exactly as much
control over the program's input as I do over the program itself.

If I enter a 300-character line while running this program, I get
undefined behavior. The consequences would be entirely my own fault.

But suppose I enter a 10-character line. The C standard guarantees
that it will work properly. If I'm using your proposed
implementation, on which any call to gets() attempts to reformat my
hard drive, then the damage to my system is entirely *your* fault. I
used a standard function in a safe manner, in a way that *cannot*
invoke undefined behavior (because I control the input, and I will not
enter an overly long input line ).

Having said that, if I were writing such a quick-and-dirty test
program in real life I *still* wouldn't use gets(). I'd use fgets()
and remove the trailing '\n'. (There would always be one because,
again, I wouldn't feed very long lines to the program; if I
accidentally did so, the program would misbehave, but in a benign and
predictable manner.) gets() should not be used, and it should be
removed from the standard, or at least formally deprecated.
Implementations should warn about any calls to gets().

But *if* I use it in a manner whose behavior is guaranteed by the
standard, I have every right to expect it to behave as the standard
specifies.

I don't expect you to be willing to understand this, but I'm prepared
to be pleasantly surprised.
 
W

websnarf

Keith said:
Philip Potter wrote: [...]
Except that yours isn't standard-compliant.

Of course it is -- under conditions of UB any behavior is
standard-compliant.

No. I'll expand on that below.

Actually you didn't. You simply tried to defend gets() by describing a
scenario outside the specification (hence under UB) that was
predictable in a way you've constructed which happens to coincide with
what the optimistic things that specification tries to describe in its
explanation of gets(). But you never removed the "UB cloud" which
covers the whole thing.
I did not say that "the C language does not contain variables". (If I
did, please cite the article in which I said that.) I said that the C
standard does not define the term "variable".

The relevant quote:

" [...] It is not obvious what the word "variable" should mean in the
context of C. [...]"

And if you think that quote is out of context, you can look up it for
for yourself and see the follow up with a half dozen examples of things
in C where it supposedly can't be decided whether or not something is a
variable.
The semantics of function calling does require some sort of structure
that behaves in a stack-like manner (last-in first-out). On the other
hand, the term "stack" is also commonly used to refer to a particular
data structure implemented in hardware, where a CPU register is
dedicated as a "stack pointer", and the stack grows and shrinks
through contiguous memory addresses. This kind of "stack" is not
required or implied by the C standard, and there are implementations
that don't have such a "stack"; the data storage required for the
local objects created by a function call is allocated by something
similar to malloc(), and released by something similar to free().
Referring to "the stack" on such a system would be misleading.

What the hell are you talking about? If you think "the stack" means a
hardware stack, its because of something in your mind. We all
understand that the C specification is implemented on an abstract
machine, and that's where its "stack" is. If someone is conflating
this stack with a particular stack implementation (such as Sparc's
register window mechanism, or Itanium's register block stack thingy),
its no different from the people who post with gcc-specific extensions
(like an extra envp parameter in main) which happens here all the time.

And of course in this case the conflation is usually harmless since its
a very rare thing for someone to use an *extension* or
platform-specific feature of a hardware stack in real world code. You
usually use it exactly in the same way you use it in its abtract form
-- you push and pop to it. Compliers may play games with hardware
stacks, general programmers (even hard code low level programmers like
myself) usually do not.

So insisting that C has no stack because the specification doesn't say
that it does is just silly. This is why confining discussion of C only
to the language in the specification is idiotic.
[...]
So regardless of what the spec says under the heading of gets(), the UB
is inescapable from within the spec, which means the behavior described
in the spec is irrelevant since UB can invoke any behavior.

Ok, getting back to gets().

You've encouraged me to do something I don't believe I've ever done
here. I'm going to defend gets().

I know your mind doesn't work very "flexibility" at all but I'll give
it a shot -- replace your bad gets() program with another program,
which say, performs a simple buffer overflow:

char digs[5];
sprintf (digs, "%d", (int) val);

Ok, then continue to apply the reasoning and statements you just made
with your gets() program, but in the obvious analogous way. Ok, so
here are the statements you made which apply equally to a program
whicih contains the above:

1) " ... But suppose I enter a 10-character line. The C standard
guarantees that it will work properly."

-- Similarly, if we make val small enough here, it will work
properly.

2) "If I enter a 300-character line while running this program, I get
undefined behavior. The consequences would be entirely my own fault.
[...] But suppose I enter a 10-character line. The C standard
guarantees that it will work properly."

-- Similarly if I make val a 5+ digit integer, the program that
includes the above will have UB. But if I make val a 4 digit
positive number, or 3 digit negative number, it will work
just fine.

The UB we get from overrunning digs[] here obviously can lead to
arbitrary action since it will smash and adjacent declarations
including possibly volatiles, sig_atomic_t or whatever. Same with your
gets() program. So both programs occupy the same space of what's the
worst that can go wrong. Either program could easily format your hard
drive with the right set of circumstances.

So we see the analogy is a pretty close fit, and because of that we
usually look at code such as the above very skeptically. In other
words your argument about gets() hasn't specifcally bolstered gets() in
any way that doesn't also bolster the code above. Let me repeat --
your *argument* doesn't significantly distinguish gets() from the code
snippet above in the context we are in.

Where the analogy falls down, however, is that that above code can be
made to work solely through mechanisms inside the program itself. If I
have some way of guaranteeing that val is between -999 and 9999 solely
through mechanisms inside the program itself, then everything is fine.
I would be using things *IN THE C STANDARD* to make sure that the
semantics of that code remained compliant. The key point is that I do
not need to venture outside the system/program or invoke platform
specific behavior to guarantee that code and brinng it within spec.
I.e., the semantic correctness is guaranteed, essentially by other
contents from the standard itself. I.e., the code above is actually
correct within certain assumptions, and those assumptions can be
enforced by nothing more than the standard itself. The potential for
UB is *eliminated* from within usage of the specification itself.

Your gets() program cannot be similarly fixed, or similarly rely on
analogous guarantees. In order to make gets() work according to what
the standard is optimistically hoping for you *MUST* step outside of
the standard. Thus the standard is trying to specify something that
specifically needs something that isn't (and can't realistically be
put) in the standard. BTW, what does UB mean? Doesn't it mean
arbitrary behavior outside the specification? So trying to format your
hard drive (perhaps successfully, perhaps not) because you stepped
outside the spec

Your argument fails to make this distinction (can you see this?) and by
implication misses the whole point.
Having said that, if I were writing such a quick-and-dirty test
program in real life I *still* wouldn't use gets().

And in this case, its not because of any typically wrong reasoning on
your part. You are actually behaving correctly. As would any
programmer that behaved this way. So why is this being specified? The
rationale is not convincing, and in fact is clearly meant as
subterfuge.
[...] I'd use fgets()
and remove the trailing '\n'. (There would always be one because,
again, I wouldn't feed very long lines to the program; if I
accidentally did so, the program would misbehave, but in a benign and
predictable manner.)

So you've traded one bad behavior for another? ... Whatever, that's
another discussion entirely. You won't UB with this strategy (just get
wrong results, but predictably so.) The \n can also be omitted if EOF
is encountered without a \n just before it, btw. A \n can also
*appear* to be omitted if a \0 is consumed before a \n is, and you are
just using C's char * string semantics on the results.
[...] gets() should not be used, and it should be
removed from the standard, or at least formally deprecated.
Implementations should warn about any calls to gets().

So what are you defending?
But *if* I use it in a manner whose behavior is guaranteed by the
standard, I have every right to expect it to behave as the standard
specifies.

Ok, but the standard *CANNOT* specify that guarantee. It makes a
"chicken before the egg" kind of specification about how gets() works.
It basically says *IF* the call to gets() doesn't invoke UB, then it
reflects some kind of stdin input. But that *IF* cannot be satisfied
by any content in the standard at all. Are you following? Therefore
the standard is not *specifying* a way for gets() to behave in the
optimistic way they are hoping it does.
 
K

Keith Thompson

Keith said:
Philip Potter wrote: [...]
Except that yours isn't standard-compliant.

Of course it is -- under conditions of UB any behavior is
standard-compliant.

No. I'll expand on that below.

Actually you didn't. You simply tried to defend gets() by describing a
scenario outside the specification (hence under UB) that was
predictable in a way you've constructed which happens to coincide with
what the optimistic things that specification tries to describe in its
explanation of gets(). But you never removed the "UB cloud" which
covers the whole thing.
I did not say that "the C language does not contain variables". (If I
did, please cite the article in which I said that.) I said that the C
standard does not define the term "variable".

The relevant quote:

" [...] It is not obvious what the word "variable" should mean in the
context of C. [...]"

And if you think that quote is out of context, you can look up it for
for yourself and see the follow up with a half dozen examples of things
in C where it supposedly can't be decided whether or not something is a
variable.

Thank you for confirming that I did *not* say that "the C language
does not contain variables".
The semantics of function calling does require some sort of structure
that behaves in a stack-like manner (last-in first-out). On the other
hand, the term "stack" is also commonly used to refer to a particular
data structure implemented in hardware, where a CPU register is
dedicated as a "stack pointer", and the stack grows and shrinks
through contiguous memory addresses. This kind of "stack" is not
required or implied by the C standard, and there are implementations
that don't have such a "stack"; the data storage required for the
local objects created by a function call is allocated by something
similar to malloc(), and released by something similar to free().
Referring to "the stack" on such a system would be misleading.

What the hell are you talking about? If you think "the stack" means a
hardware stack, its because of something in your mind. We all
understand that the C specification is implemented on an abstract
machine, and that's where its "stack" is. [snip]

So insisting that C has no stack because the specification doesn't say
that it does is just silly. This is why confining discussion of C only
to the language in the specification is idiotic.

In most implementations, local variables and other storage associated
with a called function are allocated on "the stack". My understanding
of the phrase "the stack" in this context is exactly the kind of
hardware-based stack I discussed above, something that is not
guaranteed by the standard. The word "the" implies something
specific.

If the phrase "the stack" doesn't carry that implication for you,
that's terrific, but I strongly suspect that it does for most people.
[...]
So regardless of what the spec says under the heading of gets(), the UB
is inescapable from within the spec, which means the behavior described
in the spec is irrelevant since UB can invoke any behavior.

Ok, getting back to gets().

You've encouraged me to do something I don't believe I've ever done
here. I'm going to defend gets().
[snip]

Your gets() program cannot be similarly fixed, or similarly rely on
analogous guarantees. In order to make gets() work according to what
the standard is optimistically hoping for you *MUST* step outside of
the standard. Thus the standard is trying to specify something that
specifically needs something that isn't (and can't realistically be
put) in the standard. BTW, what does UB mean? Doesn't it mean
arbitrary behavior outside the specification? So trying to format your
hard drive (perhaps successfully, perhaps not) because you stepped
outside the spec

Ok, there's no way within the standard to use gets() safely. Beyond
the question of whether it should be used in any circumstances, it
certainly shouldn't be used in code that's intended to be portable.
(The sample program I posted was not intended to be portable; it was
specifically designed to be used in tightly controlled conditions and
then discarded.)

Not all C code has to be portable. Most C code should be portable,
but most C *programs* are not; they depend on system-specific
features. fopen() can't be successfully called without a valid file
name, and there's no portable way (other than tmpnam()) to generate a
valid file name. (And yes, fopen() behaves in a well-defined manner
if you give it an invalid file name, which makes it more robust than
gets().)
Your argument fails to make this distinction (can you see this?) and by
implication misses the whole point.

I didn't miss the point. I made a different point.
And in this case, its not because of any typically wrong reasoning on
your part. You are actually behaving correctly. As would any
programmer that behaved this way. So why is this being specified? The
rationale is not convincing, and in fact is clearly meant as
subterfuge.

A subterfuge? Do you think that the ISO C committee keeps gets() in
the standard for malicious purposes? What is their motivation?

[...]
[...] gets() should not be used, and it should be
removed from the standard, or at least formally deprecated.
Implementations should warn about any calls to gets().

So what are you defending?

Just this: Given that gets() is defined by the standard, a conforming
implementation must implement it properly. gets() does not always
invoke undefined behavior. In those cases where it doesn't, it must
behave as specified.
Ok, but the standard *CANNOT* specify that guarantee. It makes a
"chicken before the egg" kind of specification about how gets() works.
It basically says *IF* the call to gets() doesn't invoke UB, then it
reflects some kind of stdin input.
Correct.

But that *IF* cannot be satisfied
by any content in the standard at all. Are you following? Therefore
the standard is not *specifying* a way for gets() to behave in the
optimistic way they are hoping it does.

The standard provides no portable way to use gets() safely.

There are *non-portable* ways to use gets() safely.

C is specifically designed to support both portable and non-portable
programming.
 
T

Tak-Shing Chan

In order to make gets() work according to what
the standard is optimistically hoping for you *MUST* step outside of
the standard. Thus the standard is trying to specify something that
specifically needs something that isn't (and can't realistically be
put) in the standard. BTW, what does UB mean? Doesn't it mean
arbitrary behavior outside the specification? So trying to format your
hard drive (perhaps successfully, perhaps not) because you stepped
outside the spec

I believe that gets() works as follows. I simply cannot see
how you could apply the as-if rule to ``optimize'' this code into
unconditional arbitrary behavior.

/*
* 7.19.7.7 The gets function
*
* Implemented by Tak-Shing Chan
*/

#include <stdio.h>

char *
gets(char *s)
{
int c;
char *itaptbs = s;

/*
* 7.19.7.7 paragraph 2
*
* The gets function reads characters from the input
* stream pointed to by stdin, into the array pointed to
* by s, until end-of-file is encountered or a new-line
* character is read.
*/
while (!((c = getchar()) == EOF || c == '\n'))
*itaptbs++ = c;

/*
* 7.19.7.7 paragraph 3
*
* If end-of-file is encountered and no characters have
* been read into the array, the contents of the array
* remain unchanged and a null pointer is returned. If
* a read error occurs during the operation, the array
* contents are indeterminate and a null pointer is
* returned.
*/
if (c == EOF && (itaptbs == s || ferror(stdin)))
return NULL;

/*
* 7.19.7.7 paragraph 2
*
* Any new-line character is discarded, and a null
* character is written immediately after the last
* character read into the array.
*/
*itaptbs = 0;


/*
* 7.19.7.7 paragraph 3
*
* The gets function returns s if successful.
*/
return s;
}

Tak-Shing
 
W

websnarf

Tak-Shing Chan said:
I believe that gets() works as follows. I simply cannot see
how you could apply the as-if rule to ``optimize'' this code into
unconditional arbitrary behavior.

This is because you are doing a literal translation of what they are
saying without taking anything to a logical conclusion. This has
nothing to do with optimization. "As-if" also has little meaning once
UB is encountered -- every behaviour is "as-if" once you enact UB. All
that needs to be established is that there is a UB here.
/*
* 7.19.7.7 The gets function
*
* Implemented by Tak-Shing Chan
*/

#include <stdio.h>

char *
gets(char *s)
{
int c;
char *itaptbs = s;

/*
* 7.19.7.7 paragraph 2
*
* The gets function reads characters from the input
* stream pointed to by stdin, into the array pointed to
* by s, until end-of-file is encountered or a new-line
* character is read.
*/
while (!((c = getchar()) == EOF || c == '\n'))
*itaptbs++ = c;

This last line causes an unfixable and unaddressable UB. The fact that
this is not stated in the specification does not change it from being
so. Because of that, the code can in fact, undo the stream state, send
the characters back, send the state of s into anything it likes, then
proceed to format your hard drive. In fact it can do anything, and a
programmer cannot have any expectation that anything less happens,
except in platform-specific scenarios that are not covered by the
specification.
 
T

Tak-Shing Chan

This is because you are doing a literal translation of what they are
saying without taking anything to a logical conclusion. This has
nothing to do with optimization. "As-if" also has little meaning once
UB is encountered -- every behaviour is "as-if" once you enact UB. All
that needs to be established is that there is a UB here.

UB would only occur if the input really exceeds the size of
the array pointed to by s.
This last line causes an unfixable and unaddressable UB. The fact that
this is not stated in the specification does not change it from being
so. Because of that, the code can in fact, undo the stream state, send
the characters back, send the state of s into anything it likes, then
proceed to format your hard drive. In fact it can do anything, and a
programmer cannot have any expectation that anything less happens,
except in platform-specific scenarios that are not covered by the
specification.

It is not UB if the array pointed to by s is large enough for
the input.

Tak-Shing
 
W

websnarf

Keith said:
Keith said:
(e-mail address removed) writes:
Philip Potter wrote:
Or as
other people keep claiming -- that C doesn't have a stack (if function
calls and returns are always bracketted like pushes and pops, do we not
have a stack? In fact a "call" stack?).

The semantics of function calling does require some sort of structure
that behaves in a stack-like manner (last-in first-out). On the other
hand, the term "stack" is also commonly used to refer to a particular
data structure implemented in hardware, where a CPU register is
dedicated as a "stack pointer", and the stack grows and shrinks
through contiguous memory addresses. This kind of "stack" is not
required or implied by the C standard, and there are implementations
that don't have such a "stack"; the data storage required for the
local objects created by a function call is allocated by something
similar to malloc(), and released by something similar to free().
Referring to "the stack" on such a system would be misleading.

What the hell are you talking about? If you think "the stack" means a
hardware stack, its because of something in your mind. We all
understand that the C specification is implemented on an abstract
machine, and that's where its "stack" is. [snip]

So insisting that C has no stack because the specification doesn't say
that it does is just silly. This is why confining discussion of C only
to the language in the specification is idiotic.

In most implementations, local variables and other storage associated
with a called function are allocated on "the stack".

Really? Most implementations I know of actually throw these things
into registers first. Many even throw return addresses into "link
registers". Even on the x86 (a very popular platform), there are at
least *two* stacks (one for floating point, and one for the rest). We
must inhabit different planes of existance.
[...] My understanding
of the phrase "the stack" in this context is exactly the kind of
hardware-based stack I discussed above, something that is not
guaranteed by the standard. The word "the" implies something
specific.

If the phrase "the stack" doesn't carry that implication for you,
that's terrific, but I strongly suspect that it does for most people.

You think most people know assembly language? You really do live in a
bizarre fantasy world.
[...]
So regardless of what the spec says under the heading of gets(), the UB
is inescapable from within the spec, which means the behavior described
in the spec is irrelevant since UB can invoke any behavior.

Ok, getting back to gets().

You've encouraged me to do something I don't believe I've ever done
here. I'm going to defend gets().
[snip]

Your gets() program cannot be similarly fixed, or similarly rely on
analogous guarantees. In order to make gets() work according to what
the standard is optimistically hoping for you *MUST* step outside of
the standard. Thus the standard is trying to specify something that
specifically needs something that isn't (and can't realistically be
put) in the standard. BTW, what does UB mean? Doesn't it mean
arbitrary behavior outside the specification? So trying to format your
hard drive (perhaps successfully, perhaps not) because you stepped
outside the spec

Ok, there's no way within the standard to use gets() safely. Beyond
the question of whether it should be used in any circumstances, it
certainly shouldn't be used in code that's intended to be portable.

portable?!?! What? What has that got to do with anything? It fails
in *every* system. *Every* time its put into a program its wrong.
Only in systems where the input is redirected *and* the system does not
support multitasking can you even build a credible case for a well
defined scenario where it can satisfy the committees fantasies about
how gets() is supposed to behave. Even there, you are relying on
specific platform behavior.
(The sample program I posted was not intended to be portable;

It is if you ignore the UB -- which of course you are. Its only not
portable because UB is not portable. Portability just isn't the issue.
Every platform must fail except by extraordinary intervention (that
can't realistically be called programming).
[...] it was
specifically designed to be used in tightly controlled conditions and
then discarded.)

I thought it was designed for you to post and make a point. If you
actually used it for any reason, besides contradicting earlier
statements you made, it would just be irresponsible.
Not all C code has to be portable. Most C code should be portable,
but most C *programs* are not; they depend on system-specific
features. fopen() can't be successfully called without a valid file
name, and there's no portable way (other than tmpnam()) to generate a
valid file name.

You are confusing platform specific with undefined behavior. Calls to
fopen(), and system() can't be made portable. This is well understood.
This has nothing to do with the situation with gets().
[...] (And yes, fopen() behaves in a well-defined manner
if you give it an invalid file name, which makes it more robust than
gets().)

It makes it well defined. As opposed to gets().
I didn't miss the point. I made a different point.

There's no point in there. You can't use non-portability as a
protection for gets(), and that clearly was not the point you were
making.
A subterfuge? Do you think that the ISO C committee keeps gets() in
the standard for malicious purposes? What is their motivation?

I have no idea *WHY* they do things like that. I just know that they
did it. I mean we *KNOW* that the committee is aware of what the issue
is. But they have gone on record to say that that doesn't matter them
and they are leaving it in, and they've created a "doublespeak" kind of
rationale for their behavior.
[...]
[...] gets() should not be used, and it should be
removed from the standard, or at least formally deprecated.
Implementations should warn about any calls to gets().

So what are you defending?

Just this: Given that gets() is defined by the standard, a conforming
implementation must implement it properly. gets() does not always
invoke undefined behavior. In those cases where it doesn't, it must
behave as specified.

You're a broken record. I have asked and you have not explained the
difference between undefined behavior and sometimes undefined behavior.
Literally you gave an example of a platform and environment specific
way of making the undefined behavior emit some sort of predictable
results. But that's generally exactly the case for every other kind of
UB that you can create as well. So you have not made a distinction,
and thus have not made the case. There is a built-in contradiction of
language in the specification -- they just omit the blatant expression
of that contradiction, even though they cannot excise it from real
manifestations.
The standard provides no portable way to use gets() safely.

It provides *NO* way to use gets() safely. Portable or not.
There are *non-portable* ways to use gets() safely.

There are non-portable ways of making every UB safe. *EVERY*. That's
an irrelevant tautology.
C is specifically designed to support both portable and non-portable
programming.

It was *supposed* to be designed to be well defined, regardless of
portability. They specified gets() obviously -- so you have
reinterpret the spec to realize the gets() always invokes UB, to retain
this well definedness property. Your portability argument is just a
red herring.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,770
Messages
2,569,584
Members
45,075
Latest member
MakersCBDBloodSupport

Latest Threads

Top