why the usage of gets() is dangerous.

J

jacob navia

Keith said:
It's true that gets() has been declared obsolescent and deprecated. This
is reflected in TC3 and in the latest standard draft, n1256.pdf. This
just happened within the last couple of months.

But please don't make the mistake of thinking that it "will disappear
shortly". It has not been removed from the C99 standard. In fact, any
conforming C99 implementation *must* provide gets(), undefined behavior
and all (though any implementation is free to warn about it).

Deprecation means that it will most likely be removed from the *next* C
standard, which is still a number of years away. Consider that the C99
standard is 8 years old, and still has not been fully implemented by the
vast majority of compilers. It will likely be decades, if ever, before
a significant number of implementations conform to a new C20YZ standard.
And even then, compilers will be free to continue to provide it in a
non-conforming mode, perhaps for backward compatibility.

I'm afraid that gets() is going to be around for a very long time. It's
still up to each of us, as programmers, to avoid using it.

jacob, if you really thing gets() will "disappear shortly", I'd be
interested in your reasoning.

Nothing, just hopes that now that is deprecated, people will
stop using it, and it will disappear in a few years.
 
M

Malcolm McLean

Paul Hsieh said:
No set of program control can prevent gets() from having undefined
behavior. In fact, basically all C compilers implement gets() to have
undefined behavior.
Undefined behaviour means "undefined by the standard". It is possible,
though rather difficult, to implement a safe gets(), that is to say one that
always terminates the program with an error message if the buffer is
exceeded.

What is not possible to is implement is a safe fgets(), that is to say, one
that can be used safely given the limitations of the average human
programmer.
 
E

Eric Sosman

Paul Hsieh wrote On 11/16/07 16:53,:
Paul Hsieh wrote On 11/16/07 14:43,:




Isn't there a buffer overrun vulnerability in the
fgetstralloc() function? Look carefully at the second
argument of the first call to getInputFrag().


Its 64. getInputFrag(*,64,*,*,*) never writes to more than 64 chars
(the extra '\0' only comes when the input is <= 64 in length; unlike
strncat, this is ok because the length read is always explicitly
returned), and the buffer passed (char blk[64]) in is 64 chars in
length. So ... what am I missing?

Nothing, I guess. I must have been confused by the
convoluted style. (Well, if it confuses me then it *must*
be convoluted, right?)

Still seems an awfully arcane way to skip and count
characters, though.
 
R

Richard Heathfield

Malcolm McLean said:

It is possible,
though rather difficult, to implement a safe gets(), that is to say one
that always terminates the program with an error message if the buffer is
exceeded.

Show me.
What is not possible to is implement is a safe fgets(), that is to say,
one that can be used safely given the limitations of the average human
programmer.

The fgets function is very easy to use safely.
 
M

Malcolm McLean

Richard Heathfield said:
We'll declare that pointer cosist of three values - the address, the start
of the object, and the end of the object.
Now in the write to array code we specify that if the address execceds the
end of the object, the program is to terminate with an error meaage.

With this device we have a perfectly safe gets() fucntion. It cannot return
an incorrect string, or corrupt another variable, or put little elves on
screen. It can only fill the buffer correctly or report that it has been
exceeded.
>

The fgets function is very easy to use safely.
Time after time it has been shown that this is not the case. Very often
people treat incomplete reads as full lines. So if the line contains a drug
dose your fgets() - enabled machine might deliver only one tenth of the
amount needed, given an off by one line length error.
 
R

Richard Heathfield

Malcolm McLean said:
We'll declare that pointer cosist of three values - the address, the
start of the object, and the end of the object.

I look forward to your reference implementation.

Time after time it has been shown that this is not the case.

You can misuse *anything* if you try hard enough. You have to try
reasonably hard to misuse fgets, whereas to misuse gets you only need call
it.
Very often
people treat incomplete reads as full lines.

Very often people drive at 40 in a 30. That does not mean it is difficult
to drive at 30.
 
M

Malcolm McLean

Harald van Dijk said:
We'll declare that pointer cosist of three values - the address, the
start of the object, and the end of the object.

So, in

struct S {
char c[10];
int i;
} s;

does a pointer to s.c store the end as &s.c[10]? Or does it store the end
as &s + 1? If the former, there are cases where it's simply not clear at
all where the buffer ends. If the latter, it doesn't prevent writing past
the end of the buffer.
A pointer to s.c would have to store the end as &s.c[10].
It is illegal to convert from a struct S * to a char *, except in the niggly
case of a char or char array being the first member, in which case it must
have the same address as the whole struct. So the compiler does in fact have
to be very clever.

however char *ptr = (char *) (void *) &s;

is I think still illegal. So you cannot defeat the system with a intricate
list of void * intermediates.
 
?

=?iso-2022-kr?q?=1B=24=29CHarald_van_D=0E=29=26=0F

Harald van Dijk said:
It is possible,
though rather difficult, to implement a safe gets(), that is to say
one that always terminates the program with an error message if the
buffer is exceeded.

Show me.

We'll declare that pointer cosist of three values - the address, the
start of the object, and the end of the object.

So, in

struct S {
char c[10];
int i;
} s;

does a pointer to s.c store the end as &s.c[10]? Or does it store the
end as &s + 1? If the former, there are cases where it's simply not
clear at all where the buffer ends. If the latter, it doesn't prevent
writing past the end of the buffer.
A pointer to s.c would have to store the end as &s.c[10].

Okay, so then you can't get back the original &s?

struct S {
char c[10];
int i;
} s[2];

char *p = &s[1].c[0];

The range for p would be &s[1].c[0] through &s[1].c[10], but I don't
believe there's anything non-standard about casting p to struct S *, and
subtracting 1. (With a stricter reading of the standard, you might need
to cast p to char(*)[10], and only then to struct S, but this doesn't
change anything important.)
It is illegal
to convert from a struct S * to a char *,

It's allowed for two reasons here. Firstly, *any* object can be addressed
as an array of char. Given int i, ((char *) &i) through (char *) &i +
sizeof i are all valid pointers. Given struct S s, (char *) &s + sizeof s
are all valid pointers. The second reason you mention below.
except in the niggly case of a
char or char array being the first member, in which case it must have
the same address as the whole struct. So the compiler does in fact have
to be very clever.

It has to be able to construct data that is no longer available.
however char *ptr = (char *) (void *) &s;

is I think still illegal. So you cannot defeat the system with a
intricate list of void * intermediates.

Well, this specific example is legal, but I think I get the point you're
making here, and I agreed already that bounded pointers, even while not
perfect, are useful.
 
C

CBFalconer

Malcolm said:
.... snip ...

Undefined behaviour means "undefined by the standard". It is
possible, though rather difficult, to implement a safe gets(),
that is to say one that always terminates the program with an
error message if the buffer is exceeded.

Consider yourself challenged to post the appropriate code, in
standard C.
 
C

CBFalconer

Malcolm said:
We'll declare that pointer cosist of three values - the address,
the start of the object, and the end of the object. Now in the
write to array code we specify that if the address execceds the
end of the object, the program is to terminate with an error
meaage.

No good. Pointers do not necessarily contain those components.
You have to make it safe within the guarantees provided by the C
standard.
 
?

=?iso-2022-kr?q?=1B=24=29CHarald_van_D=0E=29=26=0F

We'll declare that pointer cosist of three values - the address, the
start of the object, and the end of the object.

So, in

struct S {
char c[10];
int i;
} s;

does a pointer to s.c store the end as &s.c[10]? Or does it store the end
as &s + 1? If the former, there are cases where it's simply not clear at
all where the buffer ends. If the latter, it doesn't prevent writing past
the end of the buffer.

I do agree that bounded pointers would be useful, but C being what it is,
I don't believe it's possible to make it completely safe.
 
K

Keith Thompson

jacob said:
Keith Thompson wrote: [...]
I'm afraid that gets() is going to be around for a very long time.
It's still up to each of us, as programmers, to avoid using it.

jacob, if you really thing gets() will "disappear shortly", I'd be
interested in your reasoning.

Nothing, just hopes that now that is deprecated, people will
stop using it, and it will disappear in a few years.

Alas, hoping won't make it happen. As I said, any implementation that
claims to conform to any C standard, up to and including C99, *must*
provide gets().
 
B

Bill Reid

CJ said:
It's much more typing!

Leading to increased carpal tunnel syndrome!

Here's how to avoid some of the typing:

#define GET_INPUT_STRING(string_array) \
get_input_string(string_array,sizeof(string_array))

char *get_input_string(char *string_array,int array_size) {
int string_size;

GetInput :

fgets(string_array,array_size,stdin);

string_size=strlen(string_array);

if(string_array[string_size-1]!='\n') {
fflush(stdin); /* NOT PORTABLE!!! */
printf
("\nCAUTION: input too long, should be less than %d characters",
array_size);
printf("\nTry again: ");
goto GetInput;
}

else string_array[string_size-1]='\0';

return string_array;
}

Stick that in a little library that you link into all the programs that
require getting standard input (along with some automatic menu
generation routines, etc.), and if GET_INPUT_STRING is too much
typing you can just call it GIS() or something. The only problems
are: 1) you could wind up in an endless user idiocy loop, but it
beats crashing your program when the same idiot types too much
stuff into gets(), and 2) the fflush(stdin) works for my compiler but
of course is not guaranteed for yours...
 
S

santosh

jayapal said:
Can u explain the differences b/w the scanf() and gets() ..?

scanf() reads input and tries to do conversions according to any format
specifiers in it's format string argument. gets() merely reads a line
from stdin and stores it in the buffer given to it.

The construct:

scanf("%s", foo);

has the same weakness as a call to gets(). Both can potentially write
past their buffers and their is _no_ way to stop it, if it happens.

Using Standard functions the preferred way to read a line from a stream
is with fgets().

fgets(buffer, BUFFER_SIZE, stream);

Once you have got the entire line, (the presence of a '\n' before the
terminating '\0' indicates that fgets() was able to read the entire
line. Otherwise further calls to fgets() can read the remaining
portions. Obviously, 'buffer' has to grow somehow to accommodate the
full line), functions like sscanf(), strtod(), strtol(), strtoul() etc.
can be used to reliably convert the data.

Refer to your Standard C library's documentation or see:

<http://www.dinkumware.com/manuals/>
 
C

CBFalconer

jayapal said:
Can u explain the differences b/w the scanf() and gets() ..?

u hasn't posted here for some time. And black is basically the
lack of illumination, while white requires illumination. That
handles b/w. scanf() and gets() are specified in the C standard.
 
M

Malcolm McLean

Harald van Dijk said:
It's allowed for two reasons here. Firstly, *any* object can be addressed
as an array of char. Given int i, ((char *) &i) through (char *) &i +
sizeof i are all valid pointers. Given struct S s, (char *) &s + sizeof s
are all valid pointers. The second reason you mention below.
I nodded. Yes, it is illegal to convert from type x * to type y *, except

when x or y is void.
when y is unsigned char - char yes and no, you may trap when you dereference
the pointer.
when y is the first member of a struct of type x.
when y is a struct of which x is the first member.
[ problem is ]
struct S s[2];
char *ptr = &s1.firstmember;
struct S *ptr2 = (struct S *) ptr;
ptr2--;
It has to be able to construct data that is no longer available.
I think you have managed to defeat it using the last rule. We can give the
ptr2 the bounds of s[1], not easily but not with too many problems. However
it is virtually impossible to give it the bounds of s. You'd have to store a
fourth pointer with every pointer giving the "mother" object. It becomes
totally unwieldy.
 
F

Flash Gordon

Malcolm McLean wrote, On 16/11/07 23:19:
We'll declare that pointer cosist of three values - the address, the
start of the object, and the end of the object.
Now in the write to array code we specify that if the address execceds
the end of the object, the program is to terminate with an error meaage.

With this device we have a perfectly safe gets() fucntion. It cannot
return an incorrect string, or corrupt another variable, or put little
elves on screen. It can only fill the buffer correctly or report that it
has been exceeded.

However, unless you manage to get all hosted C implementations changed
to use this it is *still* not safe to call gets because your code might
be ported to an implementation that does not do this.
Time after time it has been shown that this is not the case. Very often
people treat incomplete reads as full lines. So if the line contains a
drug dose your fgets() - enabled machine might deliver only one tenth of
the amount needed, given an off by one line length error.

As pointed out the last time you raised this example:
1) It would get spotted in a properly performed code review
2) It would get caught by properly done testing

So it would not hit production like that. Input validation is one of the
basics of safety-critical programming, it is even one of the basics in
writing SW for non-critical test equipment!

In any case, that people can use a function incorrectly does not mean it
cannot be used correctly easily.

Now let us take your use of gets with an implementation guaranteed to
abort the program in to a similar situation...

The drug dispenser reads a file on a regular basis to check what it
should be dispensing. At 3AM it come across an over-length line and the
program abort. The patient then does not get the drugs keeping him/her
alive and dies.

So by using a "safe" gets you have just made it impossible to safely
handle out-of-range input whereas it is easy to do with fgets.
 


$)CHarald van D)&k

Consider yourself challenged to post the appropriate code, in standard
C.

gets is a standard library function. Standard library functions need not
be written in standard C, and may make use of highly implementation-
specific features.

If you disagree, please give even just a single example of an
implementation of fopen or longjmp written purely in standard C.
 
K

Keith Thompson

CBFalconer said:
No good. Pointers do not necessarily contain those components.
You have to make it safe within the guarantees provided by the C
standard.

No, he doesn't. You're asking for more than Malcolm claimed.

Malcolm didn't claim that it could be made safe within the gaurantees
provided by the C standard. His claim is a much more modest one,
that it's possible for a (hypothetical) C implementation to provide a
"safe" gets() function, and I believe he's correct.

His solution requires the use of "fat pointers", which are not
widely implemented but are reasonably well understood. In such an
implementation, the char* parameter to gets() provides information
about the size of the buffer to which it points. (Portable C code
cannot make use of this information, but gets() needn't be implemented
in portable C.) If the size of the input line exceeds the size of the
buffer, the behavior is undefined. This means the implementation is
free to do whatever it likes, including terminating the program with
an error message (or discarding the remainder of the line, or leaving
the remainder of the line on the input stream).

I know of no C implementations that actually use fat pointers; even if
there were, the possibility of making gets() safe in one implementation
does no good for code that is to be used with other implementations.

I believe Malcolm's claim as stated is correct. It's not particularly
useful, but he didn't claim that it was; I believe it was merely an
intellectual excercise, not a serious proposal.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,773
Messages
2,569,594
Members
45,120
Latest member
ShelaWalli
Top