gets() is dead

G

Giorgos Keramidas

Any that still use my old code, anyway. :)

Yes, this warning is my fault.

Heh! I should have probably guessed...

I did go back in CVS history -- at least as much of it as was visible in
the current FreeBSD src repository -- and the warning appears to be
already there in the first import of libc by rgrimes/at/freebsd.org.

Now we now who we have to thank for it :)
 
R

Richard Heathfield

Giorgos Keramidas said:

I was only recently tempted to use library preloading and
override the standard gets() function with a function of my own, which
can be tuned to do one of the following:

1. Unconditionally convert any call to gets() to abort().

2. Unconditionally convert any call to gets() to an error message on
stderr, and a call to _exit().

3. Unconditionally convert any call to gets() to a warning message
(like the one printed by the C runtime library on FreeBSD).

4. Do nothing, but keep the 'standard' behavior of gets(), even if
the
result is not optimal performance-wise.

Here's my ideal gets() replacement (which, I believe, is conforming):

#include <stdio.h>

char *gets(char *s)
{
return NULL;
}

The Standard says that the gets function returns NULL if there is a read
error. Using gets to read data is a read error. Therefore, gets should
always return NULL.
 
G

Giorgos Keramidas

Giorgos Keramidas said:


Here's my ideal gets() replacement (which, I believe, is conforming):

#include <stdio.h>

char *gets(char *s)
{
return NULL;
}

The Standard says that the gets function returns NULL if there is a read
error. Using gets to read data is a read error. Therefore, gets should
always return NULL.

Haha. Touche' :)

This is a standard conforming implementation as far as the return value
is concerned, but I was aiming for a more "useful" (for some value of
"useful") override of gets(), which can be tuned at runtime to warn or
even abort.

- Giorgos
 
M

Malcolm McLean

Keith Thompson said:
<python>Yes it can.</python>

Now if you'd like to add something constructive to this discussion,
feel free to do so. I believe I've already explained why I disagree
with your assertion that most programmers cannot use fgets() safely
(or whatever it was that you asserted). If you'd like me to clarify
further why I disagree with you, you're going to have to be more
specific.
Your case seems to be that fgets() removes the UB from gets(), but is still
quite a simple function.
You say
"fgets() can be unsafe if you invoke it with invalid arguments, but
it's safe if the size argument actually reflects the size of the
buffer."
You are confusing "safe" with "doesn't show UB". Let's say I have function
that returns the integer value of a character string if the string is 4
digits in length or less, and the integer value of the first 4 digits if it
is over five digits. The compiler sees this as a perfectly legitimate
function. In fact we could construct an explanation why we might need such a
function. In everyday use, however, it is safety nightmare. Your measurement
is OK until it just happens to go over 9999.

fgets() is exactly that function, unless you check for the newline, reject
the line as a partial read if it is not present, and then consume the rest
of the input in case a partially-read line is treated as a valid input by
the next call. This is a real hassle, and most programmers simply don't do
it. Some even post wrappers to fgets() which make it impossible. I don't do
it myself, in non-safety critical code.
 
H

Hallvard B Furuseth

Keith said:
The gets() function is unique in the entire standard C library in that
there is no safe way to use it (without imposing requirements that are
outside the scope of the standard).

Shrug. It is on the other hand very non-unique in the C libarary in
that naive use leads to bugs, including unsafe programs. And in that
you need to understand that if you are going to write safe C programs.
And in that it would hopefully never have been defined that way today.
And in that features outside the standard C library also can compromise
it - like threads, and fork() vs. FILE*s.

And there is one other candidate for harebrained non-safety: The
system() function, which takes a single string which the OS/shell
parses instead of letting you pass multiple arguments.

(...)
No, I don't seriously believe that.

Ah, you were just stuffing an idiot idea into your "opponent"'s mouth,
then.
I think that some people value minor convenience over safety, and I
think that's a foolish attitude.

I think making an absolute principle out of a minor detail like that is
a foolish attitude.
I object to the language standard encouraging that attitude.

Yes, it would have been nice if they _added_ an ngets() function or
something.
Using gets() in a small throwaway program is not a hanging offense.

Well, I was commenting on Dave's notes there.
In my opinion, it's a bad habit. For myself, I prefer to avoid gets()
in all circumstances, so I don't *have* to decide whether some
particular program needs to be "safe".

I always *have* to decide whether some particular program needs to be
"safe" (at the time of writing). If it has to be, I may end up spending
a *lot* more time looking for security issues and other bugs than it it
doesn't. Thus I do not spend hours on a throw-away program I would
otherwise use half an hour on.
 
R

Richard Tobin

And there is one other candidate for harebrained non-safety: The
system() function, which takes a single string which the OS/shell
parses instead of letting you pass multiple arguments.
[/QUOTE]
What is unsafe about that? It is obviously system dependent.

It's not that it can't be safely used, but that it can easily be
misused in a simple way that results in a gaping security hole. For
example, on a Unix system:

/* assume filename and command are suitably declared */
printf("enter file name to copy\n");
fgets(filename, sizeof(filename), stdin);
/* fgets error checking elided */
sprintf(command, "cp %s /tmp", filename);
system(command);

and imagine the user types "/dev/null /dev/null; rm -rf /; echo"

The Perl equivalent of this is common in CGI scripts.

-- Richard
 
R

Richard Heathfield

Malcolm McLean said:
Your case seems to be that fgets() removes the UB from gets(), but is
still quite a simple function.
You say
You are confusing "safe" with "doesn't show UB".

"Safe" is a relative term. It is much easier to test code when its
behaviour is predictable. When the behaviour of the code cannot be
predicted, or at least predicted within a given "behaviour space", how
can it be trusted? By removing the possibility of undefined behaviour,
you remove a major source of unsafety. In that sense, it is not
unreasonable to call fgets a safe function - but of course it can still
be used unsafely. So can any function.
Let's say I have
function that returns the integer value of a character string if the
string is 4 digits in length or less, and the integer value of the
first 4 digits if it is over five digits.

And let's call it stupid_function().
 
R

Richard Heathfield

Richard Tobin said:
What is unsafe about that? It is obviously system dependent.

It's not that it can't be safely used, but that it can easily be
misused in a simple way that results in a gaping security hole. For
example, on a Unix system:[/QUOTE]

....the careful programmer will take due care to ensure that the command
is safe for a Unix system. Just to give one possible example of where
care may reasonably be taken using standard C and without special
knowledge of the system...
/* assume filename and command are suitably declared */
printf("enter file name to copy\n");
fgets(filename, sizeof(filename), stdin);
/* fgets error checking elided */
sprintf(command, "cp %s /tmp", filename);
system(command);

and imagine the user types "/dev/null /dev/null; rm -rf /; echo"

....one can reasonably protect against this attack by trying to /open/
the file that one is about to copy. After all, if the file cannot be
read, it certainly cannot be copied. And the fopen function has no
licence to fire up a shell and pass arbitrary commands to it.

Nobody is claiming that C is a padded-cell language. If you want one,
there are plenty around. C is for people who run with scissors on a
daily basis without cutting themselves, because they know how to be
careful around sharp tools.
 
I

Ian Collins

What is unsafe about that? It is obviously system dependent.

It's not that it can't be safely used, but that it can easily be
misused in a simple way that results in a gaping security hole. For
example, on a Unix system:

/* assume filename and command are suitably declared */
printf("enter file name to copy\n");
fgets(filename, sizeof(filename), stdin);
/* fgets error checking elided */
sprintf(command, "cp %s /tmp", filename);
system(command);

and imagine the user types "/dev/null /dev/null; rm -rf /; echo"

The Perl equivalent of this is common in CGI scripts.
[/QUOTE]
Which is why the first, second and third rules of web development are
validate the input, validate the input and validate the input.

This is a programming issue, not a C one. If you want to be 100% safe,
don't accept any input :)
 
J

jaysome

What is unsafe about that? It is obviously system dependent.

It's not that it can't be safely used, but that it can easily be
misused in a simple way that results in a gaping security hole. For
example, on a Unix system:

/* assume filename and command are suitably declared */
printf("enter file name to copy\n");
fgets(filename, sizeof(filename), stdin);
/* fgets error checking elided */
sprintf(command, "cp %s /tmp", filename);
system(command);

and imagine the user types "/dev/null /dev/null; rm -rf /; echo"[/QUOTE]

Have you actually tried this to see what happens? I bet not.

If what you say is true, then I think we all need to execute a
paradigm shift and focus our thinking of the most dangerous standard C
functions from using the standard C function like gets() to using the
standard C function system(). Future posts may well go like this:

OP's code:
system("pause");

Standard response:
NEVER, EVER, use the system() function. It's dangerous and could
delete all your files if you're not careful. That pales in comparison
to the dangers of using gets(), which amounts to nothing more than
undefined behavior in the form of, typically, some form of access
violation, which will, typically, be reported to you in no uncertain
terms.

The fact that modern day Unix/Linux systems don't let you log in as
root (or at least make it hard to do so) and require you to run "sudo"
to run privileged commands (which requires that you enter a password)
would leave me to believe that entering such "bad" input may,
possibly, delete files owned by you at worst, and benignly bomb out at
best. You did perform that backup last night, right?

I would also be interested in finding out what happens when the
equivalent command is entered on Windows Vista or even Windows XP or
even Mac OS X (or even Windows 98 or 95). If I have some spare time
this weekend, I'll let you all know the results. If anyone else cares
to experiments, your efforts are welcome, and well justified (since
system() is a standard C function, we deserve to know all about its
ramifications, good or bad).

Best regards
 
M

Malcolm McLean

Richard Heathfield said:
Malcolm McLean said:


"Safe" is a relative term. It is much easier to test code when its
behaviour is predictable. When the behaviour of the code cannot be
predicted, or at least predicted within a given "behaviour space", how
can it be trusted? By removing the possibility of undefined behaviour,
you remove a major source of unsafety. In that sense, it is not
unreasonable to call fgets a safe function - but of course it can still
be used unsafely. So can any function.
But what if you decrease machine B and C type errors by increasing errors of
type D and E? [See little black bag post elsethread].
 
R

Richard Heathfield

Malcolm McLean said:
Richard Heathfield said:
Malcolm McLean said:


"Safe" is a relative term. It is much easier to test code when its
behaviour is predictable. When the behaviour of the code cannot be
predicted, or at least predicted within a given "behaviour space",
how can it be trusted? By removing the possibility of undefined
behaviour, you remove a major source of unsafety. In that sense, it
is not unreasonable to call fgets a safe function - but of course it
can still be used unsafely. So can any function.
But what if you decrease machine B and C type errors by increasing
errors of type D and E? [See little black bag post elsethread].

No, thanks. I'd rather scrap the whole crappy system and write something
that works. This would involve not using any of the people involved in
writing any of the above systems, since they are clearly no good at
their job.
 
M

Malcolm McLean

Richard Heathfield said:
Malcolm McLean said:
Richard Heathfield said:
Malcolm McLean said:

You are confusing "safe" with "doesn't show UB".

"Safe" is a relative term. It is much easier to test code when its
behaviour is predictable. When the behaviour of the code cannot be
predicted, or at least predicted within a given "behaviour space",
how can it be trusted? By removing the possibility of undefined
behaviour, you remove a major source of unsafety. In that sense, it
is not unreasonable to call fgets a safe function - but of course it
can still be used unsafely. So can any function.
But what if you decrease machine B and C type errors by increasing
errors of type D and E? [See little black bag post elsethread].

No, thanks. I'd rather scrap the whole crappy system and write something
that works. This would involve not using any of the people involved in
writing any of the above systems, since they are clearly no good at
their job.
The example is a little bit silly, because the consequences of following
machine E's instructions will be death, or so I think. I was going to do a
web search for "insulin, lethal dose" then realised that it might bring the
police to my door, so it is just a made up dosage that sounded reasonable,
but in fact the blood sugar is too low rather than too high, and so any
insulin at all would be dangerous. If you are diabetic or have diabetic
friends you might know this, but a lot of people wouldn't.

Such a machine will be tested extensively and it is unlikely that such an
obvious bug would survive. But what it it were calculating potato
deliveries? If the customer gets two sacks instead of one then that means
someone has got to get in a van, drive there, and pick it up. Maybe a fifty
pound loss to the company. Clearly if the consequence of a bug is only fifty
pounds, it is not economic to so more than about fifty pounds worth of
testing, and if you have one good programmer and one idiot, you put the good
guy on the insulin machine which leaves the idiot on the potato program.
 
R

Richard Tobin

jaysome said:
The fact that modern day Unix/Linux systems don't let you log in as
root (or at least make it hard to do so) and require you to run "sudo"
to run privileged commands (which requires that you enter a password)
would leave me to believe that entering such "bad" input may,
possibly, delete files owned by you at worst, and benignly bomb out at
best.

The typical bad scenario is when such a program is remotely executable.
For example in a CGI script, processing some text typed in to a box on
a web page. The files deleted are those owned by whoever owns the
CGI script, or possibly those owned by the web server user.

And of course "rm -rf /" is not the only command possible.

-- Richard
 
H

Hallvard B Furuseth

Richard said:
Richard Tobin said:

And I should have added: It's the only function in the C language which
invokes other programs, so a pure-standard-C program which wants to do
that _has_ to use it. If I got the chance to replace one and only one
function it the standard C library, it would be system() and not gets().
...the careful programmer will take due care to ensure that the command
is safe for a Unix system.

Which means that when the command (or the command arguments) come from
user input, he'll need to _parse_ that user input the same way as the
shell does, or in a more restrictive way. And he'll need to know which
OS and maybe shell is in use, to parse it correctly.

If the system() function took several arguments, that headache would be
significantly mitigated. (Though one still has to check that e.g. the
first argument doesn't start with '-' if that would become an undesired
switch.)
Just to give one possible example of where care may reasonably be
taken using standard C and without special knowledge of the system...


...one can reasonably protect against this attack by trying to /open/
the file that one is about to copy. After all, if the file cannot be
read, it certainly cannot be copied.

$ touch "foo; rm -r /home/hbf"
$ ./naive-program
enter file name to copy
foo; rm -r /home/hbf
$
Nobody is claiming that C is a padded-cell language. If you want one,
there are plenty around.

Right. That's why it's so silly to get hung up about one function in
particular.
 
R

Richard Bos

Hallvard B Furuseth said:
And I should have added: It's the only function in the C language which
invokes other programs, so a pure-standard-C program which wants to do
that _has_ to use it. If I got the chance to replace one and only one
function it the standard C library, it would be system() and not gets().

And what, then, would you replace it with?
Which means that when the command (or the command arguments) come from
user input, he'll need to _parse_ that user input the same way as the
shell does, or in a more restrictive way. And he'll need to know which
OS and maybe shell is in use, to parse it correctly.

If the system() function took several arguments, that headache would be
significantly mitigated. (Though one still has to check that e.g. the
first argument doesn't start with '-' if that would become an undesired
switch.)

Several arguments? Whatever for? If you want to construct a command
line, that's what sprintf() is for.
$ ./naive-program

Ah, and there's your problem. You assume a naive programmer. Why assume
that a programmer who will do

sprintf(command, "base_command %s", unchecked_argument);
system(command);

won't equally happily do

alternative_system("base_command", unchecked_argument);

when it's obvious that the danger is in the lack of checking, not in
system()?
Right. That's why it's so silly to get hung up about one function in
particular.

You still don't get it, do you? system() can be used unsafely, but it
can also be used safely. strcpy() (another favourite with the gets()-
overconfident) can equally be used both safely, and unsafely. In fact,
all functions in the ISO C Standard can be used safely as well as
unsafely - _except_ gets(). That's the whole point. Other functions can
be used in all security by a halfway decent programmer. gets() requires
bondage gear, heavy padlocks, and perfect typing fingers to use safely.

Richard
 
R

Richard Heathfield

Malcolm McLean said:
Richard Heathfield said:
Malcolm McLean said:
But what if you decrease machine B and C type errors by increasing
errors of type D and E? [See little black bag post elsethread].

No, thanks. I'd rather scrap the whole crappy system and write
something that works. This would involve not using any of the people
involved in writing any of the above systems, since they are clearly
no good at their job.
The example is a little bit silly [...]

Such a machine will be tested extensively and it is unlikely that such
an obvious bug would survive. But what it it were calculating potato
deliveries?

My previous answer stands.
If the customer gets two sacks instead of one then that
means someone has got to get in a van, drive there, and pick it up.
Maybe a fifty pound loss to the company. Clearly if the consequence of
a bug is only fifty pounds, it is not economic to so more than about
fifty pounds worth of testing, and if you have one good programmer and
one idiot, you put the good guy on the insulin machine which leaves
the idiot on the potato program.

The cost to the company of employing an idiot as a programmer far
exceeds fifty pounds. In fact, even if he's on UK minimum wage, the
cost to the company of employing him will exceed fifty pounds *per day*
when you take into account Employer's NI and the cost of any ancillary
benefits.

The normal method of dealing with such people, alas, is to move them
into management or marketing or, if they're astoundingly dense,
marketing management. You certainly don't let them anywhere near a
keyboard.
 
H

Hallvard B Furuseth

Richard said:
And what, then, would you replace it with?

Like I said, one which takes several arguments. Matching the arguments
to main(). Maybe preceded by one bool to allow the one-argument variant.
Several arguments? Whatever for? If you want to construct a command
line, that's what sprintf() is for.

Assuming you are quite sure how the shell will parse it.
Ah, and there's your problem. You assume a naive programmer.

If we are _not_ assuming a naive, programmer, what's the problem with
the existence of gets(), which this was meant as a counterexample to?
Nobody forces you use it just because it's there.
Why assume that a programmer who will do

sprintf(command, "base_command %s", unchecked_argument);
system(command);

won't equally happily do

alternative_system("base_command", unchecked_argument);

when it's obvious that the danger is in the lack of checking, not in
system()?

I already said that the checking gets very much simpler that way. You
won't need to duplicate the shell's parser, only a bit of the called
program's argument parser.

Nor to implement any escaping rules to defeat the shell's parser. For
example, you don't need to know if the shell accepts single or double
quotes or both, or how it treats backslash inside or outside quotes,
etc.
You still don't get it, do you? system() can be used unsafely, but it
can also be used safely. strcpy() (another favourite with the gets()-
overconfident) can equally be used both safely, and unsafely. In fact,
all functions in the ISO C Standard can be used safely as well as
unsafely - _except_ gets(). That's the whole point. Other functions can
be used in all security by a halfway decent programmer. gets() requires
bondage gear, heavy padlocks, and perfect typing fingers to use safely.

I get it. I get it. What I don't get is why it's such a big deal
whether or not it gets removed from the standard. You don't need these
heavy padlocks. All you need is to not use it if you want the program
to stay in control of itself. And most of this talk about checking to
see if gets() is used is just nonsense. Sure, if someone hands you a
program and you happen to notice it uses gets(), that's reason to get
mighty suspicious. But quite likely, reading a bit of that person's
code in the program will also give you a reason to get mighty suspicious
- whether or not it is in a program which where he happens to use
gets().

If we _are_ talking about helping naive users, then like I said what I'd
like to see is the addition of a safe and equally convenient function.
By all means remove gets() from the standard too, but that isn't going
to remove it from existing libraries anytime soon. So programs won't
suddenly turn safe (in that particular regared) because of it.
 
R

Richard Tobin

Hallvard B Furuseth said:
And I should have added: It's the only function in the C language which
invokes other programs, so a pure-standard-C program which wants to do
that _has_ to use it.

But under what circumstances would you need to use a pure-standard-C
program and yet be able to guarantee the syntax of commands interpreted
by system()?

-- Richard
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,786
Messages
2,569,626
Members
45,328
Latest member
66Teonna9

Latest Threads

Top