Why GCC does warn me when I using gets() function for accessing file

W

websnarf

Keith said:
No, gets() should not be assumed to always invoke undefined behavior,
because it *doesn't* always invoke undefined behavior.

How is "doesn't have to be UB" distinct from "always UB"? The
distinction in this case is outside of the
specification/programmer/language's control. But that's basically the
same situation for pretty much *ALL* UB.

Look, if I manage a pointer with char * type, and only store/read from
it with a cast to (int *) you are going to say that what I am doing is
poorly defined, even though its it *doesn't* always invoke undefined
behavior. Whatever -- the fact that it sometimes works just fine
doesn't help -- it can do bad things that can only be soundly fixed by
recoding it to behave in another way. gets() is in exactly the same
situation.
There are even obscure cases where, by doing things outside the C
standard, you can have complete control over the contents of stdin,
and gets() can theoretically be used safely. (I still wouldn't use it
myself, but it's theoretically possible.)

This is often true for most other kinds of UB in general as well. Why
are you so energized to protect gets() as distinct from them?
"At all costs"? Get a grip, will you?

I can't make the OP give up on computer programming. I can make James
Dow Allen give up on computer programming (and that wouldn't even
necessarily be a good thing). People use gets() and will continue to
do so. There is no natural way to make them stop. Yet its wrong on
its face -- and they must be stopped. What would you recommend? My
proposal is pretty compelling and sort of solves the problem. I admit
its a bit like getting rabies shots, but the alternative is to simply
not have the cure available.
For the Nth time, I agree that gets() should not be in the language
and nobody should use it, but making it more dangerous than it already
is is not the answer

Who is proposing to make it more dangerous? The source I gave should
be fairly safe.

The fact that you don't want it in the standard, and I don't want it in
the standard doesn't matter -- the standards committee continues to
endorse its presence and usage. So agreeing with me on this point is
like agreeing that rape is bad -- its of little consolation to the
victims.
[...] -- and what you're proposing cannot be done in a conforming implementation.

What are you talking about? So long as the UB is always there, it
satisfies the specification. In the presence of UB, it can do anything
it wants.
If you want to have a non-conforming C implementation that discourages
gets(), just reject any program that calls it. If you want to have a
*conforming* C implementation that discourages gets(), issue a warning
(as gcc does).

Its actually the linker which is issuing the warning. I.e., in order
to "conform" with C, the gcc people just go ahead and do the bad thing
in their compiler like everyone else does (*sigh*). The C
specification does not say what should be done in the linker, so they
make the bold step of crossing the line at the linker stage. This is
unfortunate, as *OTHER* languages may have a different specification
that puts implicit limits on buffer sizes or uses some other mechanism
which makes it possible for them to use gets() safely -- and thus these
other languages are forced to work around this warning. Its probably
not a big deal, since you can rewrite another gets easily, but you can
see the point that the gcc people have pushed back to a completely
different position in order to be both conforming and at least prod the
developers a *little* bit, and they are willing to take a slight
theoretical system wide hit for it.
[...] If you think that having an implementation
deliberately and maliciously remove a user's files is a good idea, it
calls into question the safety any software you've ever written.

First of all, in most systems and typical usages gets() already does
this. I.e., I am not adding anything that isn't already in there. My
suggested implementation merely makes it more predictable.
Mind you, I don't believe you really think so. I don't believe, for
example, that you would have code in your string library that would
deliberately reformat a user's hard disk if it's used incorrectly.

My string library doesn't contain any functions with these sorts of
problems (that I know of.) I am open to fixing the library whenever
anomolies of design or implementation are pointed out. My library goes
to great lengths to eliminate unpredictable UB.

This is a completely different situation from gets(). The ANSI C
committee has openly declared hostile intent towards the software
industry by putting their stamp of approval on this function. They
even go so far as to put deceptive language in the standard in an
attempt to demonstrate they've addressed the problem of potential bad
uses of gets().
But it would be consistent with the arguments you're making here.

Yeah, you can just go ahead and stop trying to "analyze my arguments"
any time you feel like. I haven't been impressed in the past, and I am
not likely to now or in the future.
 
F

Frederick Gotham

websnarf posted:
How is "doesn't have to be UB" distinct from "always UB"?


The two concepts are equivalent as far as the Standard is concerned. The
Standard clearly specifies what the following program must do:

#include <stdio.h>

int main(void)
{
puts("Hello World!");
}

However, the Standard gives the implementation the freedom to implement the
following program however it pleases. No matter what the resultant program
does, the implementation is still conformant:

#include <stdio.h>

int main(void)
{
unsigned i;

puts((char const*)i);
}

Look, if I manage a pointer with char * type, and only store/read from
it with a cast to (int *) you are going to say that what I am doing is
poorly defined, even though its it *doesn't* always invoke undefined
behavior.


The implementation however is free to do whatever it likes and still be
conformant.
 
W

websnarf

Eric said:
Nonsense. You might as well claim that attempting to call
any function at all must be assumed to yield undefined behavior,
because it might exceed an implementation limit.

Which limit does calling, say, rand() from main() exceed?
[...] It is true that
gets() is impossible to use safely, but that's a far cry from
saying that it must always misbehave.

Its not a far cry for Richard Morris. The long term success of his
worm relied on the fact that there is very little distinction between
those in practice. Go read the history of the Morris worm. Its not
just that it attacked a simple finger exploit. Its that the system
developers at multiple sites spend an enormous amount of energy
*afterwards* trying to decide what they should do about the situation.
Today? We see that very few *NIX installations have finger daemons
running at all -- they couldn't get gets() out of the standard, they
couldn't take it out of their compiler, and they couldn't make people
stop writing trojan horses. Removing functionality became the long
term solution.

In fact nearly all exploits rely on the fact programmers like you have
that kind of attitude.
[...] You cannot cross the street
with certainty of safety, but it does not follow that you will
be run down and killed every time you try.

The anology doesn't fly. You are in complete control of the risk and
success of crossing the street so long as every thing else in the
system is functioning properly (just like the rest of the well defined
portions of C). Using gets() is more analogous to a blind and deaf man
crossing a highway which has no speed limits.
The Standard describes what gets() does, and your silly botch [...]
does not fulfil the Standard's requirements. Period.

So long as its covered by the built-in UB that is in gets(), there is
no issue with my implementation.
 
J

jacob navia

Frederick Gotham a écrit :
websnarf posted:





The two concepts are equivalent as far as the Standard is concerned. The
Standard clearly specifies what the following program must do:

#include <stdio.h>

int main(void)
{
puts("Hello World!");
}

However, the Standard gives the implementation the freedom to implement the
following program however it pleases. No matter what the resultant program
does, the implementation is still conformant:

#include <stdio.h>

int main(void)
{
unsigned i;

puts((char const*)i);
}






The implementation however is free to do whatever it likes and still be
conformant.

The only thing that you said is:

IF IT IS WRITTEN IN THE STANDARD IS OK. IF IT IS NOT IS NOT OK.

So, you start defending absurdities like gets().

Why?

Because is in the standard.

This is the quitessence of DOGMATIC.

The dictionary.com
DOGMATIC
asserting opinions in a doctrinaire or arrogant manner; opinionated
http://dictionary.reference.com/search?q=dogmatic&x=0&y=0
 
R

Racaille

Its not a far cry for Richard Morris. The long term success of his
worm relied on the fact that there is very little distinction between
those in practice. Go read the history of the Morris worm. Its not
just that it attacked a simple finger exploit. Its that the system
developers at multiple sites spend an enormous amount of energy
*afterwards* trying to decide what they should do about the situation.
Today? We see that very few *NIX installations have finger daemons
running at all -- they couldn't get gets() out of the standard, they
couldn't take it out of their compiler, and they couldn't make people
stop writing trojan horses. Removing functionality became the long
term solution.

is it because of gets() that people don't run fingerd daemons ?

I learn something new every day.
 
M

Mark McIntyre

You mean a function along the lines of

char *gets(char *buf,size_t bufsiz);

would NOT do the same with a more rational interface?

It might, but it would break pre-existing code. The standards body has
never, to my knowledge, changed function signatures so as to break
existing code.
A single extra argument would do such a difference you think?

Simpler just to drop the function entirely.
For instance, using the code published in the C standard
for asctime() you get a buffer overflow if there is
the slightest problem with user's input.

The 'slightest' problem? Hyperbole. ISTR you can get overflow if the
year is too large, but not otherwise.
The standard does NOT specify the limits of the acceptable inputs.

When discussing this, the comitee answered that it doesn't matter and
that they will not fix it.

I'd be fascinated to see an actual evidence of that statement. I
strongly suspect that what was actually said was something alonge the
lines of:
The code published is not intended to be a rigorous implementation of
asctime() but merely aid to understand how it works. The purpose is
not to demonstrate error handling techniques and we do not consider it
relevant to add such code which would only serve to complicate and
obfuscate the fragments.
that treated us as "anti-gets fanatics" and stubbornly defended
gets(), as he defended trigraphs, and all other obsolete
stuff that pollutes the language.

Again you ruin your argument by bringing in something irrelevant. And
it seems you think "All the Worlds a 386 with a US keyboard". Which
is quite astounding from a francophone. I've used many keyboards which
had no keystroke for common C symbols. And even if not, so what?
Nobody's forcing you to use trigraphs.
To put up a proposal I was told by the French standardizations comitee
that I would need to pay at least 10 000 euros. Just to put the
proposal.

Thats how it works on many national Standards committees (and other
bodies for that matter, never joined a trades union or debating club?)
Interested parties have to pay their subs.
Then I would have to go to their meetings paying all travel expenses.

What, you expected them to pay you ? ROFL. This isn't the EU you
know....

--
Mark McIntyre

"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it."
--Brian Kernighan
 
S

Spiros Bousbouras

Mark said:
Thats how it works on many national Standards committees (and other
bodies for that matter, never joined a trades union or debating club?)
Interested parties have to pay their subs.


What, you expected them to pay you ? ROFL. This isn't the EU you
know....

France is a member of EU.
 
W

websnarf

Racaille said:
is it because of gets() that people don't run fingerd daemons ?

People started turning it off by default in many instances citing the
Morris Worm. The fact that most fingerd's were fixed could not repair
its reputation. So fingerd went from on by default to off by default,
until eventually people today barely remember what fingerd was.
 
G

goose

the standard does not require this behaviour. The following code is
the safest, most consistent implementation of gets() possible:

#include <stdio.h>
char * gets_fixed (char * buf, const char * sourcefile) {
remove (sourcefile);

Instead of predictable but malicious UB, why not a
predictable (but non-malicious) side-effect?

fprintf (stderr, "WARNING: bug detected, please contact vendor\n");
return "Attempted callsite source file removal for calling gets()";
}

/* This should appear in stdio.h somewhere */
#undef gets
#define gets(x) gets_fixed ((x), __FILE__)
<snipped>
 
K

Keith Thompson

Mark McIntyre said:
I'd be fascinated to see an actual evidence of that statement. I
strongly suspect that what was actually said was something alonge the
lines of:
The code published is not intended to be a rigorous implementation of
asctime() but merely aid to understand how it works. The purpose is
not to demonstrate error handling techniques and we do not consider it
relevant to add such code which would only serve to complicate and
obfuscate the fragments.

Unfortunately, I doubt that.

The standard (C99 7.23.3.1) says:

The asctime function converts the broken-down time in the
structure pointed to by timeptr into a string in the form

Sun Sep 16 01:03:52 1973\n\0

using the equivalent of the following algorithm.

char *asctime(const struct tm *timeptr)
{
[snip]
}

I believe an implementation could add error checking for cases where
the version provided in the standard invokes undefined behavior, but
any suggestion that this is encouraged is not supported by the
standard.
 
W

websnarf

goose said:
Instead of predictable but malicious UB, why not a
predictable (but non-malicious) side-effect?

fprintf (stderr, "WARNING: bug detected, please contact vendor\n");

UB is UB, technically you can do what you want. But in standard
negotiations you always present your most optimistic demands first.
You find common ground when *both* sides start making concessions.
AFAICT, I don't think the ANSI C committee is interested in discussion
on this point, let alone concessions.
 
B

Barry Schwarz

This function is dangerous because there is no way you can pass
it the size of the given buffer.

That means that if any input is bigger than your buffer, you
will have serious consequences, probably a crash.

Only if you are lucky. The bad thing about undefined behavior is that
it can lead to other problems with more serious consequences than
crashing your program.


Remove del for email
 
G

goose

UB is UB, technically you can do what you want. But in standard
negotiations you always present your most optimistic demands first.
You find common ground when *both* sides start making concessions.
AFAICT, I don't think the ANSI C committee is interested in discussion
on this point, let alone concessions.

Seeing as how I neither frequent comp.std.c nor
maintain a compiler, what exactly were the
objects the standards people put forward to
a removal of gets? Would they be prepared to
massage the wording (wrt gets) so that the
"fprintf (stderr..." above in gets won't be
non-conformant?
 
A

Andrew Poelstra

Who is proposing to make it more dangerous? The source I gave should
be fairly safe.

While
remove (__FILE__)
is perfectly safe, chances are that it'll give an error message or
something that you don't want, because __FILE__ might not expand
to a real filename when the program is run.

I for one, often start writing programs in my code/scratch/
heirarchy, and move it to either code/tools/ or code/libs/,
depending on how general my solution is.

If I ran a program with gets() in it, I might get a "file not found"
error message or something, which would distract from the real "Don't
use gets()!" message.

In the version of your code that I implemented:
o I changed gets_fixed() to strgetsgets() to avoid user namespace.
o I had it print out a mildly insulting warning message.
o Instead of deleting the file, I simply killed the program.
 
J

jacob navia

Mark McIntyre a écrit :
I'd be fascinated to see an actual evidence of that statement. I
strongly suspect that what was actually said was something alonge the
lines of:
The code published is not intended to be a rigorous implementation of
asctime() but merely aid to understand how it works. The purpose is
not to demonstrate error handling techniques and we do not consider it
relevant to add such code which would only serve to complicate and
obfuscate the fragments.

I am not the first one to point out this problem. In a “Defect Report”
filed in 2001, Clive Feather proposed to fix it. The answer of the
committee was that if any of the members of the input argument was out
of range this was “undefined behavior”, and anything was permitted,
including corrupting memory.

The answer in full (quoted from the comitee reports) is:

Defect Report #217
Submitter: Clive Feather (UK)
Submission Date: 2000-04-04
Reference Document: N/A
Version: 1.3
Date: 2001-09-18 15:51:36
Subject: asctime limits

Summary
The definition of the asctime function involves a sprintf call writing
into a buffer of size 26. This call will have undefined behavior if the
year being represented falls outside the range [-999, 9999]. Since
applications may have relied on the size of 26, this should not be
corrected by allowing the implementation to generate a longer string.
This is a defect because the specification is not self-consistent and
does not restrict the domain of the argument.
[snip]

Committee Response
From 7.1.4 paragraph 1:
If an argument to a function has an invalid value (such as a value
outside the domain of the function, or a pointer outside the address
space of the program, or a null pointer, or a pointer to non-modifiable
storage when the corresponding parameter is not const-qualified) or a
type (after promotion) not expected by a function with variable number
of arguments, the behavior is undefined.
Thus, asctime() may exhibit undefined behavior if any of the members of
timeptr produce undefined behavior in the sample algorithm (for example,
if the timeptr->tm_wday is outside the range 0 to 6 the function may
index beyond the end of an array).
As always, the range of undefined behavior permitted includes:
Corrupting memory
Aborting the program
Range checking the argument and returning a failure indicator (e.g., a
null pointer)
Returning truncated results within the traditional 26 byte buffer.
There is no consensus to make the suggested change or any change along
this line.

-----------------------------------------------------

AS ALWAYS, THE RANGE OF UNDEFINED BEHAVIOR PERMITTED INCLUDES

CORRUPTING MEMORY.


Isn't this very clear?

Again you ruin your argument by bringing in something irrelevant. And
it seems you think "All the Worlds a 386 with a US keyboard". Which
is quite astounding from a francophone. I've used many keyboards which
had no keystroke for common C symbols. And even if not, so what?
Nobody's forcing you to use trigraphs.

Who cares about keyboards?

Why such hardware details must be in the standard?

And if the screen does not exist and the user is blind using
some Braille (blind people's alphabet)
output device? THAT would be of course more important!!!

But it is not in the standard.

Why must the standard talk about such problems?
Thats how it works on many national Standards committees (and other
bodies for that matter, never joined a trades union or debating club?)
Interested parties have to pay their subs.




What, you expected them to pay you ? ROFL. This isn't the EU you
know....

I expected that I could submit a proposition and discuss it by email.
Yes, (maybe you know this) modern communications and this "internet" fad
make sometimes traveling obsolete.
 
R

Richard Bos

jacob navia said:
AS ALWAYS, THE RANGE OF UNDEFINED BEHAVIOR PERMITTED INCLUDES

Don't shout. It makes you look like a twit, even if you have a (minor)
point in the case of asctime().
Who cares about keyboards?

Why such hardware details must be in the standard?

And if the screen does not exist and the user is blind using some Braille (blind
people's alphabet) output device? THAT would be of course more important!!!

If a blind person uses a Braille reader he might well be grateful that
trigraphs exist, since standard Braille does not include the more exotic
codes such as { and #, and although there are extended versions which
do, these are neither as wide-spread as the standard version nor
standardised across the Latin alphabet using world.

Richard
 
P

Philip Potter

How is "doesn't have to be UB" distinct from "always UB"? The
distinction in this case is outside of the
specification/programmer/language's control. But that's basically the
same situation for pretty much *ALL* UB.

The standard says that integer overflow is UB; therefore, if I add 1 to an
int containing 32767, I may have invoked UB. However, if I know that on my
system int is 17 bits or more, I can guarantee I haven't. The size of an int
is outside the programmer/language/specification's control, so according to
your argument this is still UB, and my implementation is free to reformat my
hard drive instead. I don't think many people would agree with you on this.

Similarly, if I know that on my system stdin gets a \n character at least
every 20 characters, I can use gets() and guarantee no UB.
This is a completely different situation from gets(). The ANSI C
committee has openly declared hostile intent towards the software
industry by putting their stamp of approval on this function. They
even go so far as to put deceptive language in the standard in an
attempt to demonstrate they've addressed the problem of potential bad
uses of gets().

Is this true? Please tell me more - I'd be interested to hear.

Philip
 
P

Philip Potter

One other quibble: each occurence of [0,RAND_MAX) should be [0, RAND_MAX].
RAND_MAX is the maximum possible output, not one-past; similarly, RANGE
should divide RAND_MAX+1 for uniformity.
Quoting the above page:
"Specifically the probability of choosing x in [(RAND_MAX % RANGE), RANGE)
is less than choosing x in [0, (RAND_MAX % RANGE))."

This seems to be your main problem with the solution:
int x = rand() % RANGE;
after you explicitly state that you're looking for a "good enough" RNG. For
RANGE much smaller than RAND_MAX, the difference in probability exists but
is negligible - something you completely fail to mention.

I specifically state that you require 1000 * (RAND_MAX / RANGE) samples
to be able to definitively detect the anomily in the distribution.
Obviously if RANGE is small, that number may be high enough for it not
to be a problem.

So you do. My apologies.
This was added to the FAQ after I made mention of this on my website.

How is it relevant how something ended up in the FAQ?
First of all, the FAQ used to be much worse. Second of all, its hard
to be accurate when you are incomplete.

"Two plus two is four. Two plus other numbers is outside the scope of this
sentence."
The FAQ should at least say
something like "accurate generation of finite uniform distributions is
beyond the scope of this FAQ".

You're probably right here. Question 13.15 gives references and directs
people to the sci.math.numerical-analysis list, but 13.16 doesn't.
Instead the FAQ just gives solutions
and ignores the analysis of those solutions.


The versions where I use a finite number of rand() calls to virtually
increase the range of rand() have the effect of changing RAND_MAX to
RAND_MAX**2 or RAND_MAX**3. Going back to the sample expression I
gave, we see that we are talking 300 billion and 1x10**16 number of
samples are required to detect the anomily in the most extreme case.
So these are "good enough" on practical systems.

We have differing views of practical systems. Not many people will take the
300k samples you described earlier on, and those who do shouldn't be using
rand() anyway; they should be using a RNG for which they have specific
guarantees of randomness which suit their application.
Yeah but at this point we are talking about numbers where even large
super-computer problems cannot generate enough samples in reasonable
time.

Only for now. Code lasts longer than computers.
Besides, trying to operate with accuracies of better than 1ULP
in the C language, or using your computer's floating point support is
not something easily accomplished. I am just pointing out that my
solutions are running up against what your practical hard calculation
limits are anyways.

But it isn't. You can always simulate greater accuracy.

Philip
 
R

Richard Bos

Similarly, if I know that on my system stdin gets a \n character at least
every 20 characters, I can use gets() and guarantee no UB.

Yes, but there are no realistic situations where you _can_ know that.
You can't even be certain if you enter the text yourself; typos are easy
to make.
Is this true? Please tell me more - I'd be interested to hear.

"The ANSI C committee has openly declared hostile intent towards the
software industry"? No, of course it isn't true. It's a predictable and
rather idiotic rant with as little bearing on reality as a Harry Potter
book. The Standard committee have better things to do with their time
than trying to piss off the likes of Paul Hsieh. Nevertheless, gets()
should go, and backwards compatibility be hanged for this one.

Richard
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,770
Messages
2,569,584
Members
45,077
Latest member
SangMoor21

Latest Threads

Top