Safe version of gets

K

Kenny McCormack

Burn them. Where did you get "all the C books" that use gets()?

He said they "talked" about gets(), not that they "used" it.

Sounds like you made a leap of logic here.

(No apologies necessary...)
 
M

Martin Ambuhl

Kenny said:
He said they "talked" about gets(), not that they "used" it.

Sounds like you made a leap of logic here.

(No apologies necessary...)

And none warranted.
 
K

Keith Thompson

Malcolm said:
Actually it's more dangerous, unless the programmer really knows what he is
doing.
Undefined behaviour is usually correct behaviour, terminating the program
with an appropriate message. Silently truncating input will usually produce
incorrect behaviour.

I'll just repeat what I wrote a couple of weeks ago in another thread
(in response to a similar statement from you).

fgets() is worse than gets()?

Nonsense.
 
M

Martin Ambuhl

Malcolm said:
Actually it's more dangerous, unless the programmer really knows what he is
doing.

Life is too short to bother with idiots that claim that fgets() is more
dangerous than gets().

*PLONK*
 
R

Randy Howard

Anton Petrusevich wrote
(in article said:
Great. What if I use your ggets() in my program and someone will run it
as ./myprogram </dev/zero?

OT: Probably depends upon the platform and resource limits (if
any). If you run it as a 32-bit app on a 64-bit platform, it
will probably bail at 2GB or slightly less, leaving the rest of
the OS with anything above that, say on a 32GB big dog Opteron
server.

Now back to your regularly schedule c.l.c. program...
Sorry I didn't look into your code, but I
suppose it will eat all available memory and then crash, right?

Or, you could simply modify it to take a size_t parameter, say
max_eat_ram, or hard code a limit that you feel is reasonable
for the specific application, set it as a tunable parameter in a
config file, etc., etc.
 
A

Anton Petrusevich

CBFalconer said:
Yes, it will eat all memory. No, it won't crash. It will return
an error indicator. If the caller then executes free(ln) the
memory will be available again.

It can crash if the system uses optimistic allocator (overcommits memory).
 
W

websnarf

Anton said:
It can crash if the system uses optimistic allocator (overcommits memory).

Well, more specifically, it can crash if you are in a multitasking
environment, and some *other* task is intolerant to memory allocation
failures because you are hogging it all with your neer ending gets
substitute. Thus, while it is possible to make such a solution sound
for the current task, its not appropriate for a system which is acting
as either a server, or just a system with lots of 3rd party tasks
running at the same time.

It also has another problem. What if you are trying to read a
password? If you use realloc() on the buffer, then its possible that
stale password prefix may be released back to heap during the process
of reading the password itself. This can be a security problem if you
have another user or process logged into the system that is able to
perform an "elevation attack" and gain access to the processes memory
outside the lifetime of the password request.

The most general answer is to read the user input into multiple
segmented buffers that are then provided back to the caller in an
incremental manner. This gives the caller the ability to performing a
length truncating style input without overallocation memory for the
most common case of small inputs. You can see an example of this here:

http://www.azillionmonkeys.com/qed/userInput.html
 
V

Villy Kruse

Actually it's more dangerous, unless the programmer really knows what he is
doing.
Undefined behaviour is usually correct behaviour, terminating the program
with an appropriate message. Silently truncating input will usually produce
incorrect behaviour.

It doesn't *silently* trunctate lines. The missing newline is an
indication that your input line were longer than the size of your buffer,
and then you can deside what the appropriate action in that case is.
You could chose to delcare this a fatal error, or you could chose that
this is no problem in which case you would read and ignore the rest of
the line, for example by using a getc loop. Or you could realloc() the
buffer and append the rest of the line.


Villy
 
M

Malcolm

Chris Torek said:
But observable, repeatable, predictable incorrect behavior.
Moreover, programs that use gets() in this same way also
silently truncate input:

gets(buf); /* XXX bad */
error = FALSE;
...

Since the "error" variable happens to alias &buf[sizeof buf], this
truncates the input, silently. In other words, the function you
claim to be "superior" has the same flaw as fgets(), plus additional
flaws.
If you're unlucky.
Undefined behaviour is always incorrect behaviour _by the program_, but
usually correct behaviour _by the computer_.
In this case, if the buffer overflow is only two or three bytes, then the
input will be silently truncated. However the chances are that sooner or
later someone will enter a longer input, and the error will be very noisily
flagged, probaly with a message like "segmentation fault". This is what you
want to happen, given that it is incorrect, it should produce no results.

Imagine that for a certain disease the clinically effective drug dose is
close to the lethal dose. We write a program to help out the doctor, taking
as input the patient's weight, severity of symptoms, and so on. The corect
does is 800mg
There are a few possible scenarios.
1) Computer prints ("Internal error");
Doctor: **** computers.
2) Computer prints "ugggle uggle uggle"
Doctor: **** program.
3) Computer prints "dose 100,000mg"
Doctor: ***** programmer. Is he trying to kill someone.
4) Computer prints "dose 1000mg"
Doctor: inject.

By replacing undefined behaviour with wrong but defined behaviour, you are
reducing the number of type 1 and type 2 errors, at the cost of more type 3
and type 4 errors. That's why Martin's advice was so badly wrong
 
K

Keith Thompson

Malcolm said:
Chris Torek said:
But observable, repeatable, predictable incorrect behavior.
Moreover, programs that use gets() in this same way also
silently truncate input:

gets(buf); /* XXX bad */
error = FALSE;
...

Since the "error" variable happens to alias &buf[sizeof buf], this
truncates the input, silently. In other words, the function you
claim to be "superior" has the same flaw as fgets(), plus additional
flaws.
If you're unlucky.
Undefined behaviour is always incorrect behaviour _by the program_, but
usually correct behaviour _by the computer_.

Depends on what you mean by "correct", I suppose.

fgets() can be used safely if you know what you're doing. With proper
use, it cannot cause a buffer overflow, and you can always tell
whether the input line was truncated by checking the last character of
the input string. It's not ideal; the non-standard but portably
implemented ggets()is arguably better, at least in some ways. If you
aren't able to use fgets() safely, you probably shouldn't be
programming in C (or in any other language).

(That last isn't directed at you personally. I presume you are
competent enough to avoid the pitfalls of fgets(). I'm atacking your
arguments, not your competence.)

getc() can be used safely if you know what you're doing. Integer
addition can be used safely if you know what you're doing. And so on,
for nearly every feature of the language.

gets() cannot be used safely. Undefined behavior is not safe. A
program that potentially trampling on other variables is not safe.
Living with possible crashes rather than learning how to use the
language properly is, frankly, not the mark of a competent programmer.
In this case, if the buffer overflow is only two or three bytes, then the
input will be silently truncated. However the chances are that sooner or
later someone will enter a longer input, and the error will be very noisily
flagged, probaly with a message like "segmentation fault". This is what you
want to happen, given that it is incorrect, it should produce no results.

Just write correct code in the first place. Code that uses gets() is
not, and cannot be, correct code. Yes, it's possible to write
incorrect code using fgets(). It's also possible (though perhaps
slightly inconvenient) to write correct code using fgets(). It's also
possible to write correct or incorrect code using ggets(), or getc(),
or nearly anything else.

[snip]
By replacing undefined behaviour with wrong but defined behaviour, you are
reducing the number of type 1 and type 2 errors, at the cost of more type 3
and type 4 errors. That's why Martin's advice was so badly wrong

Nobody has advocated replacing undefined behavior with wrong but
defined behavior. The trick is to replace undefined behavior with
correct and defined behavior. Nothing else is acceptable. Suggesting
otherwise makes it difficult to take you seriously.

Once again, yes, fgets() has problems. It's simply foolish to think
that gets() could be the solution.
 
F

Flash Gordon

Malcolm said:
Chris Torek said:
But observable, repeatable, predictable incorrect behavior.
Moreover, programs that use gets() in this same way also
silently truncate input:

gets(buf); /* XXX bad */
error = FALSE;
...

Since the "error" variable happens to alias &buf[sizeof buf], this
truncates the input, silently. In other words, the function you
claim to be "superior" has the same flaw as fgets(), plus additional
flaws.

If you're unlucky.
Undefined behaviour is always incorrect behaviour _by the program_, but
usually correct behaviour _by the computer_.

Except when it corrupts 10000 records and means the company can't get
its statutery accounts out on time.
In this case, if the buffer overflow is only two or three bytes, then the
input will be silently truncated.

Or corrupt something else causing other problems.
> However the chances are that sooner or
later someone will enter a longer input, and the error will be very noisily
flagged, probaly with a message like "segmentation fault". This is what you
want to happen, given that it is incorrect, it should produce no results.

So write your own wrapper that takes the buffer size and does an abort
if the line is too long.
Imagine that for a certain disease the clinically effective drug dose is
close to the lethal dose. We write a program to help out the doctor, taking
as input the patient's weight, severity of symptoms, and so on. The corect
does is 800mg

If I say gets being used on a safety critical project I would do my
damdest to get the developer off the project, including writing to the
QA director (if the company had one).
There are a few possible scenarios.
1) Computer prints ("Internal error");
Doctor: **** computers.
2) Computer prints "ugggle uggle uggle"
Doctor: **** program.
3) Computer prints "dose 100,000mg"
Doctor: ***** programmer. Is he trying to kill someone.
4) Computer prints "dose 1000mg"
Doctor: inject.

By replacing undefined behaviour with wrong but defined behaviour, you are
reducing the number of type 1 and type 2 errors, at the cost of more type 3
and type 4 errors. That's why Martin's advice was so badly wrong

My experience of buffer overruns is that they generally have not cause
the SW to crash until some (sometimes considerable) time later, by which
time the corrupted data can have caused all sorts of things to go wrong.
At least with defined but wrong behaviour you can then step through it
with a debugger, see what is wrong, and analyse what effects it will
have had.
 
R

Richard Heathfield

Malcolm said:
Chris Torek said:
But observable, repeatable, predictable incorrect behavior.
Moreover, programs that use gets() in this same way also
silently truncate input:

gets(buf); /* XXX bad */
error = FALSE;
...

Since the "error" variable happens to alias &buf[sizeof buf], this
truncates the input, silently. In other words, the function you
claim to be "superior" has the same flaw as fgets(), plus additional
flaws.
If you're unlucky.
Undefined behaviour is always incorrect behaviour _by the program_, but
usually correct behaviour _by the computer_.

You must live in a very different universe to me, then. I've seen computers
do some weird stuff for UB.

We had this argument a dozen times already, and you always lose it, and you
always forget you lost it.
Imagine that for a certain disease the clinically effective drug dose is
close to the lethal dose. We write a program to help out the doctor,
taking as input the patient's weight, severity of symptoms, and so on. The
corect does is 800mg
There are a few possible scenarios.
1) Computer prints ("Internal error");
Doctor: **** computers.
2) Computer prints "ugggle uggle uggle"
Doctor: **** program.
3) Computer prints "dose 100,000mg"
Doctor: ***** programmer. Is he trying to kill someone.
4) Computer prints "dose 1000mg"
Doctor: inject.

By replacing undefined behaviour with wrong but defined behaviour, you are
reducing the number of type 1 and type 2 errors, at the cost of more type
3 and type 4 errors. That's why Martin's advice was so badly wrong

Martin's advice was fine. Your example is flawed. Firstly, it is the doctor,
not the computer program, that is responsible for the patient's safety.
Secondly, NONE of the behaviours you demonstrate is preferable. The program
should accept the input completely, and process it completely, and produce
the completely correct output. This is hardly difficult.
 
M

Malcolm

Keith Thompson said:
Nobody has advocated replacing undefined behavior with wrong but
defined behavior.
You may not have done so. People frequently advocate using fgets() in an
unsafe manner. If you use the function, you must always check the newline
and consider what you need to do on a partial read (except in the unusual
case of needing to strip all newlines from input).
If you don't do this, fgets() is a highly dangerous function.
The trick is to replace undefined behavior with correct and defined
behavior. Nothing else
is acceptable. Suggesting otherwise makes it difficult to take you
seriously.
I'm not suggesting using gets(). I'm suggesting that replacing undefined
behaviour with defined but incorrect behaviour makes things worse. Thus
advice to replace gets() with fgets() must always be accompanied by a
warning to check the newline, because this is so likely to be overlooked.
That's what I am complaining of.
 
R

Richard Heathfield

Malcolm said:
I'm not suggesting using gets(). I'm suggesting that replacing undefined
behaviour with defined but incorrect behaviour makes things worse.

No, it doesn't. Reproducible bugs are always easier to find and fix than
bugs which may or may not be reproducible and which may or may not damage
the hardware.
Thus
advice to replace gets() with fgets() must always be accompanied by a
warning to check the newline, because this is so likely to be overlooked.

s/must/could/

The advice to replace gets() with fgets() is always sound. It is even sound
to advise replacing gets() with isupper()! This is because gets() cannot be
used safely, so any replacement is an improvement to the code.

Evolution of advice not to use gets():

1. Don't use gets().
2. Use fgets() instead.
3. stdin.
4. No, don't put 100 in there. Put sizeof array.
5. Only if it's an array. Duh.
6. Well, you knew how big it was when you malloc'd it. Don't forget!
7. Check the return value - it returns NULL if an error occurs or EOF is
encountered.
8. Check for a '\n' character in the string given you by fgets(). If you get
one, it means you read (the last part of) a complete line. Otherwise, it
means you read only part of a line.
9. How big should the array be? You can't know in advance. If you absolutely
must read an entire line all in one go, write a wrapper around fgets() and
do some realloc calls - or find such a wrapper.
10. Yes, some of us have already written such wrappers, and yes, they are
publicly available.

That's what I am complaining of.

Then say so, instead of wittering on about how nice it is to have a broken
program.
 
K

Keith Thompson

Malcolm said:
I'm not suggesting using gets().

Good. You've given that impression; I'm glad to see that it was
untinentional.
I'm suggesting that replacing undefined
behaviour with defined but incorrect behaviour makes things worse.

Wrong.
 
M

Malcolm

Keith Thompson said:
You assert that. But I've given a justification (see the doctor example). By
getting rid of the undefined behaviour your are suppressing most of the type
1 scenarios, the program can never produce the output "segmentation fault",
for example. But it is still incorrect, so you must be increasing the type
2, type 3 and type 4 scenarios.

Doctor example [ Malcolm, recopied ]
 
R

Richard Heathfield

Malcolm said:
You assert that. But I've given a justification (see the doctor example).

The example is broken, as has already been explained to you.
By getting rid of the undefined behaviour your are suppressing most of the
type 1 scenarios, the program can never produce the output "segmentation
fault", for example. But it is still incorrect, so you must be increasing
the type 2, type 3 and type 4 scenarios.

So is it your contention that, if one removes a source of error from a
program, other sources of error multiply to maintain the balance? Do you
have some kind of Theory of Conservation of Program Errors? Or is it more
likely that fixing a problem will /reduce/ the number of errors?
 
R

Richard Bos

Malcolm said:
You assert that. But I've given a justification (see the doctor example). By
getting rid of the undefined behaviour your are suppressing most of the type
1 scenarios, the program can never produce the output "segmentation fault",
for example. But it is still incorrect, so you must be increasing the type
2, type 3 and type 4 scenarios.

Thing is, hypothetical scenarios are easy to manufacture. I could just
as simply, and probably more plausibly, devise one in which the area for
the number can hold 4 digits (plus null); the result from fgets()
without checking for '\n' would be 1000 mg of some medicine instead of
800 mg, which is not correct but, with proper monitoring of the
patient's situation not dramatic; but using your preferred gets() would
clobber over the pointer that just happened to be situated after the
array, which points to the description of the medicine, which means that
instead of 1000 mgs of medicine X, he now gets 10000 mgs of medicine Y.
Problem is, for medicine Y, 10000 mgs is a perfectly normal amount,
unlike for X... but medicine Y will kill the patient.

Richard
 
M

Malcolm

Richard Heathfield said:
So is it your contention that, if one removes a source of error from a
program, other sources of error multiply to maintain the balance? Do you
have some kind of Theory of Conservation of Program Errors? Or is it more
likely that fixing a problem will /reduce/ the number of errors?
An awful lot of errors are introduced into programs in just that way. A bug
is discovered, and someone puts in a quick fix to suppress the symptoms,
without really getting the root of the problem.
 
F

Flash Gordon

Malcolm said:
An awful lot of errors are introduced into programs in just that way. A bug
is discovered, and someone puts in a quick fix to suppress the symptoms,
without really getting the root of the problem.

That's swapping bugs, and if the new bug is more likely to produce
reproducible symptoms, then you have at least reduced the average time
to debug the program.

However, if even *once* someone replaces a call to gets with correct
usage of fgets (incorrect usage does not increase the bug count since if
gets was used that is a bug by definition) then, on average, replacing
gets with fgets reduces the bug count.

So blanket advice to replace gets with fgets is still correct, although
it is advisable to at least point out there are pitfalls.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,019
Latest member
RoxannaSta

Latest Threads

Top