gets() is dead

Bart van Ingen Schenau · May 21, 2007

Tak-Shing Chan said:
[...] My point is that gets
*can* be used in insecure programs if input is bounded; [...]

gets() can also be used in 'secure programs', if the auditors and
testers are sufficiently asleep.

Tak-Shing

Bart v Ingen Schenau

Tor Rustad · May 21, 2007

Tak-Shing Chan said:
I am perfectly aware of the context. My point is that gets
*can* be used in insecure programs if input is bounded; you said
so yourself upthread.

Yes, we do *agree* on this.

user923005 · May 21, 2007

Yes, we do *agree* on this.

Is there some guarantee that the program/routines meant for a secured
environment will never be used elsewhere?

As for the argument, far upstream, about using gets() with trivial
programs and games I would like to add:

The trival program/game allows someone to add a hostile program that
formats your hard drive or steals your bank account.
How trivial is it now? Just because the program in question is a
little toss away bit of poo-poo does not mean that it can't have a
huge impact on other things that aren't.
This isn't just and academic argument. I have removed gets() calls
from at least one dozen publicly distributed chess programs. These
programs are downloaded by thousands of people and are played over the
internet (e.g. on FICS). The original authors almost universally
pushed back, but eventually they saw the light.

Jack Klein's getsafe() program is not perfect, but it is a lot better
than gets() and no harder to use.
The GNU getline() function is another alternative.
I have great respect for Tak-Shing Chan, but he's wrong about gets().
It's wrong to use it[0] and far worse to advocate it[1].

[0] If your computer is housed in a vault with no internet connection
and impossible to gain access to for anyone but yourself -- then knock
yourself out and gets() all you like. If there is any possibility of
that code ever making it outside by any stretch of the imagination
then it is offensive to use gets().
[1] Most gets() users will not be nearly so smart as Tak-Shing Chan
and will eventually let little gets() beasts loose on unsuspecting
software end-users.

Tor Rustad · May 21, 2007

Flash said:
Tor Rustad wrote, On 21/05/07 01:47:

Failing to analyze and design (in my opinion you analyze the problem
before you start the design) is also doing something you know is
insecure, so that is also covered by my statement

Well, even the detailed design specs shouldn't be language specific. The
local C coding standard, will typically ban gets(), not your design
documents.

Agreed. Although secure and safety critical are independent attributes.
For example, I have worked on safety critical SW where security was not
a requirement (it was an embedded avionics system).

This is really off-topic, but if you re-read my post, I never said safety ==
security.

However, safety, "freedom from danger of risk", is hardly an *independent*
attribute from security. I know what you mean, but some safety-critical
systems, do need to protect against information leakage, others don't.

Denial-of-Service, is that a safety or a security attribute?

Your answer is not important, since we can agree that gets() is both a
safety *and* a security "bug", right?

Yes, I fully agree.

In my case it really is builders and accountants who are generally not
that knowledgeable about computing, and that factors in to the risk
assessment. There is a lot of money involved, but the chances of anyone
having the access and knowledge to mount a sophisticated attack without
also having sufficient permissions to not need to attack are really small.

If there is a lot of money involved... even threats with little probability,
can be highly relevant. One place I worked, had an insider attack ca. 15
years ago, where they got away with *over* 100 million dollars, which today
would be approx 400-500 million dollars?

The attack was mounted at a time when the company switched over to a new
security system. The insiders was supposed to watch each other, nobody knew
they where lovers. Approx 30 billion dollars passed through that system
each day.

An outsider will typically go after a few accounts on the front-end, while
the insider might very well strike the back-end... with devastating effect.
Main worry for a bank about the front-end, is negative publicity, not really
the money lost. However, avoiding full-scale attack on the back-end, can be
a question about staying in business or not.

Removal of it means there is one less thing to learn not to do and one
less thing for teachers to get wrong.

If a company lack automated tools to detect bugs like "gets", they don't
take safety and/or security issues very seriously. splint is even free.
There are also commercial tools out there, to enforce your "safe" C subset.

If you still worry, why not comment away the banned API's from the header
files?

Heck, you can scan the binaries, for banned function calls.

Tor Rustad · May 22, 2007

CBFalconer said:
How about ggets is just as simple to call, and all you have to do
is remember to eventually free what it returns? Yet it is
perfectly safe.

Really, *perfectly* safe? Wow!

Is that code even reliable? By what magic powers are you born with, to make
such a professional judgment, without having the proper education or
training to do so?

Have you worked for NSA, and secured nuclear missiles or something? What do
you mean by *perfectly* safe? Does such a thing even exist?

Once a torpedo was designed to self-destruct if it turned 180 degrees after
launch. Would that be a safe design? Now, consider what happened when the
torpedo got stuck in the tube, and the sub turned 180 degrees!

FYI, your function is trivially likely to fail code audit for a
safety-critical system, non-deterministic memory usage is *not* what
identify such code.

Tor Rustad · May 22, 2007

user923005 said:
Is there some guarantee that the program/routines meant for a secured
environment will never be used elsewhere?

Hmm.. ITYM an in-secure program entering a secure environment?

If so, you are quite correct that many companies have poor security policies
on this. For example, a programmer may be allowed to use his notebook both
at work and at home, hence she/he might be connected to two networks with
quite different security levels.

At home, the employee surf on porn sites, or even worse, the employee
install in-secure program like a game binary. The next day, he/she enter
the trusted network, and then you suddenly risk having a trojan horse on
the inside, bypassing all the security checks...

Yes, this threat is very real, and is one of the reasons I prefer to have
physical separate networks in companies, where computers exposed to the
outside, are not allowed to enter the trusted network.

As for the argument, far upstream, about using gets() with trivial
programs and games I would like to add:

The trival program/game allows someone to add a hostile program that
formats your hard drive or steals your bank account.
How trivial is it now? Just because the program in question is a
little toss away bit of poo-poo does not mean that it can't have a
huge impact on other things that aren't.

This isn't just and academic argument. I have removed gets() calls
from at least one dozen publicly distributed chess programs. These
programs are downloaded by thousands of people and are played over the
internet (e.g. on FICS). The original authors almost universally
pushed back, but eventually they saw the light.

One solution we use, is that the employee use a different HD for private
surfing & gaming, than the one he can logon to company network with (from
the home PC).

If the employee install non-authorized program on a computer on the inside,
she/he may get sacked. Also, many companies like e.g. banks, has Intrusion
Detection Systems (IDS), which are supposed to detect attacks from the
outside, but can as well detect infected computers on the inside.

[0] If your computer is housed in a vault with no internet connection
and impossible to gain access to for anyone but yourself -- then knock
yourself out and gets() all you like. If there is any possibility of
that code ever making it outside by any stretch of the imagination
then it is offensive to use gets().

There is a great difference in risk, between installing a binary from an
untrusted source, vs writing your own unsafe program with "gets", which is
never checked in elsewhere.

The program written by others, can hide a trojan horse, which has been
targeted to attack a trusted network from the inside.

Tak-Shing Chan · May 22, 2007

But that program cannot bind that input.

Why does an *insecure* program, designated as such, need to
bind that input? When I say ``input is bounded'', I mean the
user (and/or the execution environment) is doing the bounding.

Note that I am not advocating gets(). I am simply saying
that banning gets() is a bit too extreme, because gets() has its
proper place in *insecure* programs.

gets() is a sharp tool. If you hurt yourself when using
gets(), it is your fault, not gets(). The use of a sharp tool
deserves the highest level of carefulness---if you cannot be 200%
sure about the boundedness of your inputs, then don't use gets()!

Tak-Shing

Keith Thompson · May 22, 2007

Tak-Shing Chan said:
Why does an *insecure* program, designated as such, need to
bind that input? When I say ``input is bounded'', I mean the
user (and/or the execution environment) is doing the bounding.

Note that I am not advocating gets(). I am simply saying
that banning gets() is a bit too extreme, because gets() has its
proper place in *insecure* programs.

gets() is a sharp tool. If you hurt yourself when using
gets(), it is your fault, not gets(). The use of a sharp tool
deserves the highest level of carefulness---if you cannot be 200%
sure about the boundedness of your inputs, then don't use gets()!

If gets() weren't already in the language, would you want to add it?

Programmers use gets() unsafely in real life. I strongly suspect that
it's used unsafely far more often that it's used safely (i.e., by
programmers who actually are 200% sure about the boundedness of their
inputs). Having it in the standard tends to encourage its use.

Removing it from the language won't automatically remove it from
implementations, at least not for a long while. If you really want
that functionality, you can write your own gets(); better yet, you can
write an alternative that takes an extra argument specifying the
buffer size.

user923005 · May 23, 2007

Why does an *insecure* program, designated as such, need to
bind that input? When I say ``input is bounded'', I mean the
user (and/or the execution environment) is doing the bounding.

Note that I am not advocating gets(). I am simply saying
that banning gets() is a bit too extreme, because gets() has its
proper place in *insecure* programs.

An insecure program can injure a secure program, or even a secure
network. It became insecure with the introduction of that one
insecure program.

gets() is a sharp tool. If you hurt yourself when using
gets(), it is your fault, not gets().

The gets() function is a hydrogen bomb that almost never goes off.
You can knock it over and bonk it with a hammer and the chances of
something bad happening are near zero. But someone who knows how to
set it off may be able to turn on the timer and run away, lauging
manically.

The use of a sharp tool
deserves the highest level of carefulness---if you cannot be 200%
sure about the boundedness of your inputs, then don't use gets()!

The problem is not so much hurting yourself. The problem is
inadvertently (since we are talking about retaining it in the standard
-- and therefore not maliciously) hurting other people. The thing of
it is -- the bomb cannot be defused, in the case where it is picked up
and hauled away by someone else or found sitting there by someone who
knows how to set it off. These bad people are few and far between and
there is little chance that something bad will happen to you,
individually. But your little program might escape, get copied onto
100,000 machines, and now there is a very big chance that it will harm
someone else.

Tak-Shing Chan · May 23, 2007

An insecure program can injure a secure program, or even a secure
network. It became insecure with the introduction of that one
insecure program.

The gets() function is a hydrogen bomb that almost never goes off.
You can knock it over and bonk it with a hammer and the chances of
something bad happening are near zero. But someone who knows how to
set it off may be able to turn on the timer and run away, lauging
manically.

The problem is not so much hurting yourself. The problem is
inadvertently (since we are talking about retaining it in the standard
-- and therefore not maliciously) hurting other people. The thing of
it is -- the bomb cannot be defused, in the case where it is picked up
and hauled away by someone else or found sitting there by someone who
knows how to set it off. These bad people are few and far between and
there is little chance that something bad will happen to you,
individually. But your little program might escape, get copied onto
100,000 machines, and now there is a very big chance that it will harm
someone else.

In such cases the precondition of ``200% sure'' is violated.
Therefore you get what you deserved when you use gets().

Tak-Shing

Keith Thompson · May 23, 2007

Tak-Shing Chan said:
In such cases the precondition of ``200% sure'' is violated.
Therefore you get what you deserved when you use gets().

I don't mind getting what I deserve if I use gets(). I object to
getting what others deserve when *they* use gets().

Tor Rustad · May 23, 2007

Keith said:
I don't mind getting what I deserve if I use gets(). I object to
getting what others deserve when *they* use gets().

You deserve what you get, if you install software with that low intrinsic
quality.

Keith Thompson · May 23, 2007

Tor Rustad said:
You deserve what you get, if you install software with that low intrinsic
quality.

It's just not possible for me to guarantee that. The only way I can
avoid unsafe software is to avoid software altogether, or to restrict
my use of software to such an extent that I can hardly get anything
done.

Do you audit every piece of software you use? I see you're using
KNode; how safe and/or secure is it, and how do you know?

Perhaps I deserve what I get if I fail to do what I can to encourage
high quality software and discourage low quality software. One small
way I do this is to discourage the use of the gets() function, and to
advocate that it be removed from the C standard. I'm under no
illusion that this will magically make everything better, but it's a
small step, and in my opinion it's a significant one.

user923005 · May 23, 2007

It's just not possible for me to guarantee that. The only way I can
avoid unsafe software is to avoid software altogether, or to restrict
my use of software to such an extent that I can hardly get anything
done.

Do you audit every piece of software you use? I see you're using
KNode; how safe and/or secure is it, and how do you know?

Perhaps I deserve what I get if I fail to do what I can to encourage
high quality software and discourage low quality software. One small
way I do this is to discourage the use of the gets() function, and to
advocate that it be removed from the C standard. I'm under no
illusion that this will magically make everything better, but it's a
small step, and in my opinion it's a significant one.

While we are at it, let's remove %s for scanf and sscanf only (fine
for printf, fprintf).
For scanf or sscanf, the length should be mandatory.

Richard Tobin · May 23, 2007

user923005 said:
While we are at it, let's remove %s for scanf and sscanf only (fine
for printf, fprintf).

Since you can determine the length of the input string, %s can be
perfectly safe with sscanf.

-- Richard

Keith Thompson · May 23, 2007

Since you can determine the length of the input string, %s can be
perfectly safe with sscanf.

Agreed.

For scanf(), it's not immediately obvious how to make the length
"mandatory".

For example, this:

char buf[10];
int result;
result = scanf("%s", buf);

is as unsafe as any call to gets(), since arbitrary input can easily
overflow buf. We can avoid the gets() problem by removing the gets()
function from the standard library, but we're not contemplating
removing the scanf() function altogether, so we can't prevent the
string "%s" being passed as the first argument to the scanf()
function.

Probably the best solution would be to mandate that a "%s" directive
with no length always fails; the call scanf("%s", buf) would then
return 0 and would not modify the buf array. (To be clear, this would
be incompatible with the current standard; we're talking about
proposed changes in a future version of the standard.)

In general, we want (or at least I want) to avoid cases where
arbitrary input from stdin can cause a buffer overflow. There are
other cases to consider. For example: fscanf(stdin, "%s", buf)
presents exactly the same problem as scanf("%s", buf); catching it
would require fscanf() to behave differently based on whether its
first argument is stdin.

Even more generally, I think we want to avoid cases where arbitrary
input from an interactive device (not just stdin) can cause a buffer
overflow. The standard doesn't currently require I/O operations to be
aware of whether they're dealing with an interactive device, and this
may not be possible in general. C99 7.19.3p7 provides a vague
precedent:

... the standard input and standard output streams are fully
buffered if and only if the stream can be determined not to refer
to an interactive device.

but I'm not comfortable with the idea of such a drastic change in the
behavior of fscanf(f, "%s", buf) depending on whether f *can be
determined* to refer to an interactive device.

It would be simpler to eliminate "%s" for all of scanf, sscanf, and
fscanf. sscanf and fscanf with "%s" are potentially safe because it's
possible to know the maximum length, but then it's also possible to
*know* the maximum length, and to specify it. If nothing else, the
programmer can just specify the length of the buffer, which I'd say is
good practice anyway.

Tak-Shing Chan · May 23, 2007

Since you can determine the length of the input string, %s can be
perfectly safe with sscanf.

Click to expand...

Agreed.

For scanf(), it's not immediately obvious how to make the length
"mandatory".

For example, this:

char buf[10];
int result;
result = scanf("%s", buf);

is as unsafe as any call to gets(), since arbitrary input can easily
overflow buf. We can avoid the gets() problem by removing the gets()
function from the standard library, but we're not contemplating
removing the scanf() function altogether, so we can't prevent the
string "%s" being passed as the first argument to the scanf()
function.

Probably the best solution would be to mandate that a "%s" directive
with no length always fails; the call scanf("%s", buf) would then
return 0 and would not modify the buf array. (To be clear, this would
be incompatible with the current standard; we're talking about
proposed changes in a future version of the standard.)

In general, we want (or at least I want) to avoid cases where
arbitrary input from stdin can cause a buffer overflow. There are
other cases to consider. For example: fscanf(stdin, "%s", buf)
presents exactly the same problem as scanf("%s", buf); catching it
would require fscanf() to behave differently based on whether its
first argument is stdin.

Even more generally, I think we want to avoid cases where arbitrary
input from an interactive device (not just stdin) can cause a buffer
overflow. The standard doesn't currently require I/O operations to be
aware of whether they're dealing with an interactive device, and this
may not be possible in general. C99 7.19.3p7 provides a vague
precedent:

... the standard input and standard output streams are fully
buffered if and only if the stream can be determined not to refer
to an interactive device.

but I'm not comfortable with the idea of such a drastic change in the
behavior of fscanf(f, "%s", buf) depending on whether f *can be
determined* to refer to an interactive device.

It would be simpler to eliminate "%s" for all of scanf, sscanf, and
fscanf. sscanf and fscanf with "%s" are potentially safe because it's
possible to know the maximum length, but then it's also possible to
*know* the maximum length, and to specify it. If nothing else, the
programmer can just specify the length of the buffer, which I'd say is
good practice anyway.

Six years ago, I have posted a ``solution'' to this problem
<but it
seems like the better solution is just to forget it. Those who
have safety in mind wrote their own versions of gets() anyway.
Whether a safe gets() exists or not does not bother them one bit.

Tak-Shing

Tor Rustad · May 23, 2007

user923005 said:
While we are at it, let's remove %s for scanf and sscanf only (fine
for printf, fprintf).
For scanf or sscanf, the length should be mandatory.

My old signature really says it all:

"To this day, many C programmers believe that
'strong typing' just means pounding extra hard on the keyboard". PvdL

The way I see it, the basic problem here is that C lack a highlevel type
like 'string', which know the size allocated and can enforce
runtime-checks.

string buffer1[100]; /* buffer can max store 100 characters */
string buffer2; /* buffer can grow dynamically */

Great bonus with better readability on string operations:

buffer1 = "Hello";
buffer2 = buffer1;
buffer2 += " world\n";

I did discuss this with Dan Pop a while back, but he didn't see the point..
but I can't remember he said that this would require a GC either.

There are so many safety issues with C, that when this is a prime concern,
clueless programmers should really use a language with less pitfalls.

Sadly, I so hate the syntax in Ada, the complications in C++, the memory
usage/slowness of Java.

user923005 · May 23, 2007

For scanf(), it's not immediately obvious how to make the length
"mandatory".

Click to expand...

For example, this:

Click to expand...

char buf[10];
int result;
result = scanf("%s", buf);

Click to expand...

is as unsafe as any call to gets(), since arbitrary input can easily
overflow buf. We can avoid the gets() problem by removing the gets()
function from the standard library, but we're not contemplating
removing the scanf() function altogether, so we can't prevent the
string "%s" being passed as the first argument to the scanf()
function.

Click to expand...

Probably the best solution would be to mandate that a "%s" directive
with no length always fails; the call scanf("%s", buf) would then
return 0 and would not modify the buf array. (To be clear, this would
be incompatible with the current standard; we're talking about
proposed changes in a future version of the standard.)

Click to expand...

In general, we want (or at least I want) to avoid cases where
arbitrary input from stdin can cause a buffer overflow. There are
other cases to consider. For example: fscanf(stdin, "%s", buf)
presents exactly the same problem as scanf("%s", buf); catching it
would require fscanf() to behave differently based on whether its
first argument is stdin.

Click to expand...

Even more generally, I think we want to avoid cases where arbitrary
input from an interactive device (not just stdin) can cause a buffer
overflow. The standard doesn't currently require I/O operations to be
aware of whether they're dealing with an interactive device, and this
may not be possible in general. C99 7.19.3p7 provides a vague
precedent:

Click to expand...

... the standard input and standard output streams are fully
buffered if and only if the stream can be determined not to refer
to an interactive device.

Click to expand...

but I'm not comfortable with the idea of such a drastic change in the
behavior of fscanf(f, "%s", buf) depending on whether f *can be
determined* to refer to an interactive device.

Click to expand...

It would be simpler to eliminate "%s" for all of scanf, sscanf, and
fscanf. sscanf and fscanf with "%s" are potentially safe because it's
possible to know the maximum length, but then it's also possible to
*know* the maximum length, and to specify it. If nothing else, the
programmer can just specify the length of the buffer, which I'd say is
good practice anyway.

Click to expand...

Six years ago, I have posted a ``solution'' to this problem
<but it
seems like the better solution is just to forget it. Those who
have safety in mind wrote their own versions of gets() anyway.
Whether a safe gets() exists or not does not bother them one bit.

Tak-Shing

It's not exactly a new problem.
http://www.wired.com/software/coolapps/news/2005/11/69355?currentPage=all

"1988 -- Buffer overflow in Berkeley Unix finger daemon. The first
internet worm (the so-called Morris Worm) infects between 2,000 and
6,000 computers in less than a day by taking advantage of a buffer
overflow. The specific code is a function in the standard input/output
library routine called gets() designed to get a line of text over the
network. Unfortunately, gets() has no provision to limit its input,
and an overly large input allows the worm to take over any machine to
which it can connect.

Programmers respond by attempting to stamp out the gets() function in
working code, but they refuse to remove it from the C programming
language's standard input/output library, where it remains to this
day."

The fact that it *is* ignored does not mean that it *should be*

Tor Rustad · May 23, 2007

Keith said:
It's just not possible for me to guarantee that. The only way I can
avoid unsafe software is to avoid software altogether, or to restrict
my use of software to such an extent that I can hardly get anything
done.

If it's your private computer, then it's your job to bring up the assurance
level to a sufficient level, by not installing crap. It's your choice to use
that same computer for online shopping, it is your choice to use that
computer for home banking and it's you who has selected that bank internet
solution.

Banning gets() solve little, you still need to take backups and avoid poor
quality programs & trojans.

Do you audit every piece of software you use? I see you're using
KNode; how safe and/or secure is it, and how do you know?

I have not performed code audit of KNode, and neither have I any great value
here to protect.

It's mainly out of academic interest, I do a number of things to secure my
own network & computers.. my firewall route all connections from the
outside to another machine. I dynamically trigger FireWall rule updates if
an attack is detected, the binaries get scanned for modifications/trojans
etc. Just toy research..

On the development of C	211	Mar 9, 2009
phony French doc defrauding holistic healthcare practitioners via web	0	Apr 11, 2009
word_set = set() def should_preceed_with_an(phrase): first_word =	1	Jan 26, 2013
comp.lang.c Answers to Frequently Asked Questions (FAQ List)	15	Apr 1, 2006
comp.lang.c Answers to Frequently Asked Questions (FAQ List)	1	Feb 1, 2004

gets() is dead

Bart van Ingen Schenau

Tor Rustad

user923005

Tor Rustad

Tor Rustad

Tor Rustad

Tak-Shing Chan

Keith Thompson

user923005

Tak-Shing Chan

Keith Thompson

Tor Rustad

Keith Thompson

user923005

Richard Tobin

Keith Thompson

Tak-Shing Chan

Tor Rustad

user923005

Tor Rustad

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads