gets() is dead

user923005 · May 7, 2007

...

My point is that that function is not "drop-in" compatible with gets(),
because its signature is different. That means you can't use it without
re-writing all of your code base that uses gets().

Any code using gets() is incompetently written and needs to be
rewritten.

Two notes:
1) You might argue that because it has a different name ("mygets"), it
wouldn't be drop-in compatible anyway (i.e., you'd have to edit your
code to use it). But the fact is that IRL, you'd use some trickery of
your OS/platform to allow the function name to be "over-loaded" and thus
not have to change the name. Of course, we can't talk about it (*) here.
2) The real problem with the above is in an existing code base, it may
not be obvious what value to use for "len" (without analyzing the code).
So, again, not a drop-in.

Right. Here we have a buffer of unknown size and we are reading an
arbitrary sized object into our buffer of unknown size. What two
things are wrong with this picture?

The Unix getline() function which allocates when needed is a pretty
good substitute for the basic idea of gets().

Richard Tobin · May 8, 2007

user923005 said:
The Unix getline() function which allocates when needed is a pretty
good substitute for the basic idea of gets().

Unfortunately it's not a standard unix function, but a Gnu extension.

-- Richard

Malcolm McLean · May 9, 2007

Flash Gordon said:
Malcolm McLean wrote, On 07/05/07 19:27:

Richard Tobin said:

Thne problem with fgets() is that, even when you point out the
problems,
people still don't get it. They are so fixated on avoiding the UB on
buffer
overrun that they ignore all the other ways the code can go wrong.

I don't think that's so unreasonable. Some kinds of "going wrong" are
more serious than others. If an ftp server sends you the wrong file -
or more likely, gives you an error message - because it divided up the
line incorrectly, that's less serious than if it executed some
arbitrary code because of a buffer overflow.
From elsethread
* Let's consider a program that calculates drug doses for diabetics. I
* can't be a doctor without my little black bag, so we'll have five
* machines in the bag just in case.
*
*We enter a line.
*Mr Cyril M Kornbluth, sugar level 3000
*The level is calculated by another machine that was written in Visual
*Basic, and works.
*It is in micromoles per ml.
*
*This is the output
*Machine A
*Line too long, please enter only patient intials.
*Machine B
*This machine has performed illegal operation
*Machine C
*sdfgjhkeutrpitnhfc,s[n !!!!!
*Machine D
*Inject 6000ml insulin
*Machine E
*Inject 2ml insulin

Click to expand...

Click to expand...

Or due to arbitrary code execution it injects the wrong drug in to either
that patient or the next patient.

Of course, the program that produces an output not acceptable as input to
the intended programs is faulty, despite what the above said.

Also the behaviour when defined but incorrect is more likely to cause a
test to fail than undefined behaviour, especially behaviour so obviously
wrong.

All this and more was pointed out when the above was posted before, but
you seem to have not understood it.

You think I cannot understnad such simple points?
If there is a possibility of sabotage and your system is so badly designed
that it allows input data to be executed as code, then yes, any buffer
overrun is vulnerable. Outside of sabotage there is no real danger -
executing garbage as code is so unlikely to produce meaningful results that
the possibility can be discounted.
Now an automatic tool can pick up buffer overruns. If we compile the machine
using Boundschecker or a similar tool we are forcing output B. Not ideal,
but the machine is safe. No automatic tool can pick up calculation of the
wrong value. Sometimes we do want to trucate long input lines and take only
the first characters.The error is known in aviation circles as controlled
flight into terrain.

There is an argument that defined behaviour is deterministic whilst UB is
not, although we can make UB deterministic with the right tools. That is a
genuine point. In fact the UB error output is more likely to be useful - a
crash with maybe a stack trace, the deterministic error is more likely to be
dangerous - machine D or machine E rather than machine B or machine C.

Richard Bos · May 9, 2007

Malcolm McLean said:
You think I cannot understnad such simple points?

Yes. Simply because...

No automatic tool can pick up calculation of the wrong value.

....this is wrong.

And there's an end to it. You appear uneducable. Do not expect further
attempts at it from me, at least.

Richard

Malcolm McLean · May 9, 2007

Richard Bos said:
Yes. Simply because...

...this is wrong.

And there's an end to it. You appear uneducable. Do not expect further
attempts at it from me, at least.

No automatic tool can know what the purpose of the program is. Therefore if
you tell the system to calculate a value using the wrong formula, it will
happily output the wrong value. Avation people call this the controlled
flight into terrain, and it is the leading cause of aircraft crashes.

Richard Heathfield · May 9, 2007

Malcolm McLean said:

You think I cannot understnad such simple points?

Yes.

There is an argument that defined behaviour is deterministic whilst UB
is not,
Yes.

although we can make UB deterministic with the right tools.

The right tools, in this case, being a text editor and a competent C
programmer to use it.

That is a genuine point. In fact the UB error output is more likely to
be useful

No, it's more likely to be undefined.

- a crash with maybe a stack trace,

Which bit of "undefined" didn't you understand?

the deterministic error is more likely to be dangerous

No, because we can *find* deterministic errors. There's this part of
software development that you have perhaps not come across. It's called
"testing", and one key aspect of testing is the exercise of boundary
conditions and unusual inputs (empty, too short, too long, too high,
too low, and so on).

Malcolm McLean · May 9, 2007

Richard Heathfield said:
Malcolm McLean said:

No, because we can *find* deterministic errors. There's this part of
software development that you have perhaps not come across. It's called
"testing", and one key aspect of testing is the exercise of boundary
conditions and unusual inputs (empty, too short, too long, too high,
too low, and so on).

What you mean is "sometimes find", even "usually find". A decent tool will
automatically trigger a message on a buffer overrun, so you can find those
with 100% accuracy and easily, as long as you actually trigger them. With a
deterministic error you have to have a little sheet saying "Kornberg, no
insulin". Then you look at the output of machine E. It looks reasonable. Now
is the error in the machine or in your sheet? There is no way of knowing, so
he's got to go to the doctor to double check. Eventually the doctor confirms
that no insulin is the output, and the source of the bug will be traced. But
there's a lot that can go wrong in that process. There is no point having a
perfect system if the people who use it aren't capable of fulfilling its
demands.

Unless you can show that UB errors are more common than deterministic errors
in dangerous malfunctions, I will fund it hard to believe. Controlled flight
into terrain and all that.

Richard Heathfield · May 9, 2007

Malcolm McLean said:

What you mean is "sometimes find", even "usually find".

....and we can find them at the right time - whilst writing the code, or
during testing. Whereas undefined behaviour offers us no guarantee of
being reproducible. The two worst production bugs I've ever inflicted
on my clients were both where my program's behaviour was undefined
(because of uninitialised objects whose addresses were passed to
functions that didn't fully populate those objects before their values
were handed to an OS routine). In each case, the whole thing could
easily have been avoided simply by insisting on determinate behaviour.

A decent tool
will automatically trigger a message on a buffer overrun, so you can
find those with 100% accuracy and easily, as long as you actually
trigger them. With a deterministic error you have to have a little
sheet saying "Kornberg, no insulin".

No, you can supply a lot more information than that during debugging.

Then you look at the output of machine E. It looks reasonable.

What has that to do with it? What matters is whether the output is what
the specification requires for the given input.

Now is the error in the machine or in
your sheet? There is no way of knowing, so he's got to go to the
doctor to double check.

What are you talking about? The programmer doesn't have to go to the
doctor. The programmer has to go to the specification!

Unless you can show that UB errors are more common than deterministic
errors in dangerous malfunctions, I will fund it hard to believe.

"That is why you fail." - Yoda.

Keith Thompson · May 9, 2007

Malcolm McLean said:
You think I cannot understnad such simple points?
If there is a possibility of sabotage and your system is so badly
designed that it allows input data to be executed as code, then yes,
any buffer overrun is vulnerable. Outside of sabotage there is no real
danger -
executing garbage as code is so unlikely to produce meaningful results
that the possibility can be discounted.

[...]

Isn't that how viruses work?

Chris Dollin · May 9, 2007

Keith said:
Malcolm McLean said:

You think I cannot understnad such simple points?
If there is a possibility of sabotage and your system is so badly
designed that it allows input data to be executed as code, then yes,
any buffer overrun is vulnerable. Outside of sabotage there is no real
danger -
executing garbage as code is so unlikely to produce meaningful results
that the possibility can be discounted.

Click to expand...

[...]

Isn't that how viruses work?

I think that falls under "sabotage" above.

--
"We did not have time to find out everything /A Clash of Cymbals/
we wanted to know." - James Blish

Hewlett-Packard Limited Cain Road, Bracknell, registered no:
registered office: Berks RG12 1HN 690597 England

Richard Tobin · May 9, 2007

Malcolm McLean said:
If there is a possibility of sabotage and your system is so badly designed
that it allows input data to be executed as code, then yes, any buffer
overrun is vulnerable.

Unfortunately many systems *are* that badly designed. They allow data
on the stack to be executed, and a buffer overrun can mess up
procedure return so that execution continues in the overrun data.

-- Richard

jaysome · May 9, 2007

On Wed, 9 May 2007 08:28:19 +0100, "Malcolm McLean"

[snip]

Unless you can show that UB errors are more common than deterministic errors
in dangerous malfunctions, I will fund it hard to believe. Controlled flight
into terrain and all that.

It's unnecessary to show that "UB errors are more common than
deterministic errors in dangerous malfunctions".

You should strive to eliminate UB errors completely, so that you can
focus on the task of finding deterministic errors. The former
(striving to eliminate UB errors) is one of the primary tenets the
honorable posters to this newsgroup attempt to dissuade people like
you and me from. And for good reason.

Someone may use a ring buffer in the software for a flight control
system, and they may use something like the following to increment the
current index of the buffer:

#define BUF_SIZE 256

....

i = i++ % BUF_SIZE;

Furthermore, their DO-178B Level A testing may show that this works
just fine. Nevertheless, it is a time bomb waiting to fire off,
because it is undefined behavior.

With a change to a different compiler or even a change to the next
release of the same compiler, the above may not "work" as expected.

(I can hear the echoes of management through the hall--Do we really
need to re-run *all* of the regression tests for the next N weeks,
seeing that all we did was change to the next revision of the same
compiler?).

It's better to avoid such undefined behavior in the first place, so
that you can focus on the "deterministic errors".

I think that is along the lines of what Mr. Heathfield and others are
more or less saying, and I whole heartedly agree with them.

Keith Thompson · May 9, 2007

Chris Dollin said:
Keith said:

Malcolm McLean said:

You think I cannot understnad such simple points?
If there is a possibility of sabotage and your system is so badly
designed that it allows input data to be executed as code, then yes,
any buffer overrun is vulnerable. Outside of sabotage there is no real
danger -
executing garbage as code is so unlikely to produce meaningful results
that the possibility can be discounted.

Click to expand...

[...]

Isn't that how viruses work?

Click to expand...

I think that falls under "sabotage" above.

Agreed. I was attempting to refute the "so unlikely ... that the
possibility can be discounted" part.

Flash Gordon · May 9, 2007

jaysome wrote, On 09/05/07 09:27:

It's better to avoid such undefined behavior in the first place, so
that you can focus on the "deterministic errors".

I think that is along the lines of what Mr. Heathfield and others are
more or less saying, and I whole heartedly agree with them.

You have learnt your lesson well, young Jedi.

Malcolm McLean · May 9, 2007

jaysome said:
(I can hear the echoes of management through the hall--Do we really
need to re-run *all* of the regression tests for the next N weeks,
seeing that all we did was change to the next revision of the same
compiler?).

It depends what the consequnece of an error is likely to be. If it is death
then you will surely run the regression tests. If it is two sacks of
potatoes to Blogg's minimarket when they ordered one, you've got to balance
the £50 the company will spend sending a van round to correct that error
against the cost of testing.
Sometimes you don't have the resources you would like. That is why buggy
programs are, in practise, released.

Flash Gordon · May 9, 2007

Richard Tobin wrote, On 09/05/07 09:13:

Unfortunately many systems *are* that badly designed. They allow data
on the stack to be executed, and a buffer overrun can mess up
procedure return so that execution continues in the overrun data.

It does not even require an executable stack (or even executable data
space) for it to have bad effects. Corrupting the return address so that
it returns to the wrong place but an address still within the executable
can have very nasty effects, and yes I *have* seen this occur, and it
can be a right abstrad to track down!

Richard Tobin · May 9, 2007

Unfortunately many systems *are* that badly designed. They allow data
on the stack to be executed, and a buffer overrun can mess up
procedure return so that execution continues in the overrun data.

[/QUOTE]

It does not even require an executable stack (or even executable data
space) for it to have bad effects. Corrupting the return address so that
it returns to the wrong place but an address still within the executable
can have very nasty effects, and yes I *have* seen this occur, and it
can be a right abstrad to track down!

It's very hard to use this in an attack though. If the stack is
executable, you can arrange for the overlong string itself to contain
the evil instructions, and the attack can also be robust against minor
variations in the program code and the state of the program when the
error occurs. Otherwise you have to somehow arrange to put data in
the heap somehow, and know where it is, or be lucky enough to find
that some existing code can be "repurposed".

But as you say, it can have hard-to-track-down bad effects.

-- Richard

Flash Gordon · May 9, 2007

Richard Tobin wrote, On 09/05/07 22:04:

It does not even require an executable stack (or even executable data
space) for it to have bad effects. Corrupting the return address so that
it returns to the wrong place but an address still within the executable
can have very nasty effects, and yes I *have* seen this occur, and it
can be a right abstrad to track down!

It's very hard to use this in an attack though.[/QUOTE]

Where did I say it was an attack? All I claimed was that would lead to
serious incorrect operation.

But as you say, it can have hard-to-track-down bad effects.

Hard to track down and arbitrarily bad effects.

Richard Tobin · May 10, 2007

It's very hard to use this in an attack though.

[/QUOTE]

Where did I say it was an attack?

You didn't. I just thought it was useful to point out a difference
in the kinds of problems you can get.

-- Richard

Flash Gordon · May 10, 2007

Richard Tobin wrote, On 10/05/07 00:04:

You didn't. I just thought it was useful to point out a difference
in the kinds of problems you can get.

OK, then I think we are in agreement on this

On the development of C	211	Mar 9, 2009
phony French doc defrauding holistic healthcare practitioners via web	0	Apr 12, 2009
word_set = set() def should_preceed_with_an(phrase): first_word =	1	Jan 26, 2013
comp.lang.c Answers to Frequently Asked Questions (FAQ List)	15	Apr 1, 2006
comp.lang.c Answers to Frequently Asked Questions (FAQ List)	1	Feb 1, 2004

gets() is dead

user923005

Richard Tobin

Malcolm McLean

Richard Bos

Malcolm McLean

Richard Heathfield

Malcolm McLean

Richard Heathfield

Keith Thompson

Chris Dollin

Richard Tobin

jaysome

Keith Thompson

Flash Gordon

Malcolm McLean

Flash Gordon

Richard Tobin

Flash Gordon

Richard Tobin

Flash Gordon

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads