Why "gets" has not been deprecated yet?

P

Pete Becker

Rolf said:
Pete Becker wrote:




So, you think the correctness of a C++ program can depend on what the user
enters at run-time?

It's what the language defintition says.
It's always potentially being ill-formed.

Whatever.



That way of thinking is the reason for quite a lot of security holes.

A quick and dirty one-off command line utility by definition isn't
secure, so security holes aren't important.
 
M

Mike Wahler

I think it's unfortunate that the phrase 'erroneous data'
is not elaborated upon. Does it mean only 'data embedded
in the program', or is it intedend to include 'external input'?

Or perhaps the concept of 'data' is defined elswhere
I'm unaware of?

-Mike
 
T

tony_in_da_uk

Mike, perhaps you're missing my point, or perhaps you're trying to
illustrate the complexities of making the determination. Anyway, your
post highlights that some issues can and should be anticipated and
usefully handled. In contrast, other things can't, or needn't be given
the robustness requirements of a system.

Some examples may help (but I'm starting to wonder). For example,
there's rarely any point worrying about whether the file will be
corrupted during intra-host comms or hard disk I/O, as if that does
happen all bets about the integrity of your process and its operational
environment are off. Obviously, inter-host comms that doesn't perform
it's own stream validation benefits from the care you prescribe. In
contrast, it's generally considered unnecessary to validate the
integrity of a TCP/IP comms stream, as the protocol detects errors and
coordinates resends as necessary, and anything reaching your app may be
deemed to be what was sent for all but the most extremely demanding of
purposes.

Consider that mistakes in array/vector indexing cause millions of bugs,
so why not have std::vector::eek:perator[] checked? It is an issue that
has been considered and debated, and most people are happy with the
prioritisation of performance over robustness for this function, are
aware that at() is available for checked access, and can wrap vector<>
redirecting operator[]() to at() if desired. People can make an
informed choice based on their needs.

Similarly, you could argue that a text viewer should be written such
that it can view files larger than the available virtual RAM, but that
doesn't mean that it's not useful and sufficient in most cases to
implement one that can't.

In summary, I'm saying that there is an argument as follows: when you
know an approach is sufficiently robust for your requirements, why
shouldn't you be allowed to use it? You can agree or disagree, but I
can assure you that there will be many people out there who believe
passionately in such a position who you'll never convince otherwise.

Tony
 
R

Rolf Magnus

Josh said:
That's like saying that every value that you set is potentially
invalid. If I have a set output from another digital source on the
machine, it COULD be ill-formed if the machine doesn't work as it's
suppose to,

There is a difference between the machine not working as it's supposed to
and the program doing assumptions that it's not supposed to do.
just as if a pointer set to a certain object could randomly
change from the OS not operating as it is suppose to and overwriting
that segment of memory.

Then that machine is not standard C++ compliant. However, if gets() attempts
to put 10 Terabytes into the buffer, that's perfectly fine with the C++
standard.
 
J

Josh Mcfarlane

Rolf said:
There is a difference between the machine not working as it's supposed to
and the program doing assumptions that it's not supposed to do.

My point is, if you have program A, that outputs to a buffer that
Program B reads, there are certain assumptions you can make about the
stream assuming Program X & Y are packaged together. A very small case,
yes, but it is still a case in which you could be sure the input data
would be valid.
Then that machine is not standard C++ compliant. However, if gets() attempts
to put 10 Terabytes into the buffer, that's perfectly fine with the C++
standard.

Exactly my point. When you're dealing with knowns from another section
of the program or a helper program, you know what you're dealing with.
 
P

Pete Becker

Rolf said:
Then that machine is not standard C++ compliant. However, if gets() attempts
to put 10 Terabytes into the buffer, that's perfectly fine with the C++
standard.

If the buffer is smaller than 10 Terabytes the behavior is undefined.
That's not "perfectly fine with the C++ standard."
 
M

Marcus

(e-mail address removed) escreveu:
Mike, perhaps you're missing my point, or perhaps you're trying to
Consider that mistakes in array/vector indexing cause millions of bugs,
so why not have std::vector::eek:perator[] checked? It is an issue that
has been considered and debated, and most people are happy with the
prioritisation of performance over robustness for this function, are
aware that at() is available for checked access, and can wrap vector<>
redirecting operator[]() to at() if desired. People can make an
informed choice based on their needs.

If you are comparing std::vector::eek:perator[] and gets() (i'm not sure
this is your point), i think the comparison is not valid. The program
creates the vector and knows its size (or uses vector::size()). On the
other hand, it's very hard to control standard input. Maybe if you are
doing interprocess communication (as it was pointed out by Josh
Mcfarlane), but i don't think it's a compelling argument to keep
gets(). If gets() is removed in C++2020 (after deprecation in C++0x),
people who miss it may reimplement it. But i expect that in 2020
everybody will have changed their gets() to getline()...
 
J

Josh Mcfarlane

Marcus said:
If you are comparing std::vector::eek:perator[] and gets() (i'm not sure
this is your point), i think the comparison is not valid. The program
creates the vector and knows its size (or uses vector::size()). On the
other hand, it's very hard to control standard input. Maybe if you are
doing interprocess communication (as it was pointed out by Josh
Mcfarlane), but i don't think it's a compelling argument to keep
gets(). If gets() is removed in C++2020 (after deprecation in C++0x),
people who miss it may reimplement it. But i expect that in 2020
everybody will have changed their gets() to getline()...

Don't get me wrong, I'm not advocating using gets(), I'm just trying to
show those people that have it in their head that gets() always invokes
undefined behavior that their notion is ill-formed.
 
M

Marcus

Josh Mcfarlane escreveu:
Don't get me wrong, I'm not advocating using gets(), I'm just trying to
show those people that have it in their head that gets() always invokes
undefined behavior that their notion is ill-formed.

I see :)

I don't know if this is really going to add to the discussion, but
let's restate the problem as:
"gets() doesn't allow any form o graceful failure"
Is this a better argument?

Or what about:
"It's embarassing to explain to newbies that they should avoid gets(),
even tho it's part of the standard library and seems very useful at
first". :p
 
K

Kaz Kylheku

Marcus said:
We all know that the "gets" function from the Standard C Library (which
is part of the Standard C++ Library) is dangerous. It provides no
bounds check, so it's easy to overwrite memory when using it, and
impossible to guarantee that it won't happen.

There is an infinite variety of things you can write in a C++ program
which will render it undefined and potentially dangerous in some
situation. Calling the gets() function is just one of these things.

Among harmful things, gets() is one of the easiest to diagnose. It's
easy for an implementation to detect that a program calls gets(), and
issue a diagnostic. Quite simply, the program's translation units
contain an unresolved reference to that function.
Therefore, i think it's surprising that this function has not been
deprecated.

Deprecated is only a status change that exists in the minds of the
members of the community. It has no practical impact on what's
happening in the actual software.

A C or C++ implementation is free to emit whatever diagnostics it
wants, and to support stricter modes of operation in which it reject
some programs which are correct according to the standard.

For instance, if you run the GNU C compiler with '-Werror', it will
reject all programs for which it emits any kind of diagnostic. Even
something harmless like the suggestion of extra parentheses, or the
definition of a variable that is not used.

I know of one environment provides a warning when a reference to gets
occurs among the translation units of a program being linked to produce
an image.

There are also environments that support bounds checking on objects,
such as compilers that use a '"fat" representation for pointers, and C
interpreters. The gets() function is harmless, to the extent that if it
overruns the array, the condition will be detected and turned into a
diagnostic. I.e. there are conceivable situations in which gets() isn't
disastrous.

So it's basically up to implementors and their community: what they
care about.
Now, the C standard committee is working on safe functions (the ones
that end with "_s") for the C Standard Library. I don't know if they
are going to deprecate the dreaded "gets".

gets is not "dreaded". Only dumb programmers are "dreaded". Dumb
programmers will foil any attempt to provide a safe environment. The
only way to make the world nearly 100% safe from dumb programmers is to
put them on an island with no Internet connection.

If you "dread" gets, you have some emotional problem. Normal people
don't think about it, let alone regard it as some Bogey Man.
getting rid of "gets" is not that hard, isn't it?

Yes, don't use it!
Programs that use it are broken anyway.

Not necessarily. Suppose I have a compiler application which is
separated into two programs, the compiler proper and an assembler.

The compiler generates assembly code, which the assembler reads from
its standard input.

My compiler never generates lines longer than 1023 characters, by
design. So the assembler can safely use gets() on a 1024 character
buffer to read the compiler's output.

The assembler is part of my compiler application; it's not meant to be
used alone. So the interface between the two is a private interface.

Years ago I did some Motorola 68000 programming using the GNU assembler
that served as the back-end for gcc. With that assembler, if I
mis-spelled the mnemonic name of an instruction opcode, there wasn't
any nice error message with a line number. Guess what, the assembler
crashed with a segfault! That wasn't a problem, because the assembler
didn't have to be designed to handle incorrect input. It was an
internal interface to be used by the compiler, which put out correct
opcodes. I got my assembly routines working anyway and life went on.

If you think that's a bad idea to have such an interface, well consider
that modules in C and C++ programs often have such "unsafe" internal
interfaces between them. It's not unusual for pointers to arrays to be
passed around without any size being mentioned anywhere, because all of
the modules just "know" the size. It is some manifest constant that
comes from a header file.

If you linked that compiler and assembler into one program, the gets()
would disappear. The compiler would just pass char * pointers directly
into the assembler, which would be understood to point to arrays of
1024 characters.

How are you going to mark /that/ type of practice as deprecated?
 
K

Kaz Kylheku

Mike said:
I think it's unfortunate that the phrase 'erroneous data'
is not elaborated upon. Does it mean only 'data embedded
in the program', or is it intedend to include 'external input'?

An external input becomes data in the program once it is read. The
erroneous data in the case of gets() is not the text that is coming
from standard input, but rather the array index, or buffer pointer,
that is driven out of bounds by that text. We don't exactly know what
that is because it's an impelmentation detail in the library.

At some point, the gets() function will internally form an lvalue that
is one element past the end of the array and assign to it. The pointer
behind that lvalue is erroneous data, with respect to that assignment,
just like zero is erroneous data with respect to its use as a
denominator in division.

If you write a program that inputs two numbers from the user and
divides them, its behavior is undefined if the user inputs a zero
denominator. At some point, the zero value exists as perfectly good
data. It is not inherently "erroneous". It's scanned, assigned to a
variable of type double or int or whatever, and sits there, being a
perfectly good zero. Then, suddenly, it looks up and sees that it's
walking under a slash! Bad luck ...
 
M

Marcus

Kaz Kylheku escreveu:
There is an infinite variety of things you can write in a C++ program
which will render it undefined and potentially dangerous in some
situation. Calling the gets() function is just one of these things.

Among harmful things, gets() is one of the easiest to diagnose. It's
easy for an implementation to detect that a program calls gets(), and
issue a diagnostic. Quite simply, the program's translation units
contain an unresolved reference to that function.

Yes. gets() is one of the easiest to diagnose. That's why i posted the
first message. Sure, complex things like rvalue references (random
example) are important, but these simple things are important too. They
change the "feel" of the language.
Deprecated is only a status change that exists in the minds of the
members of the community. It has no practical impact on what's
happening in the actual software.

This is "excuse number 3" according to a post by John Nagle on
comp.std.c++
(do links like this:
http://groups.google.com.br/group/comp.std.c++/msg/ef358f4b0fbe5842
work?)
BTW: Thanks, John!
A C or C++ implementation is free to emit whatever diagnostics it
wants, and to support stricter modes of operation in which it reject
some programs which are correct according to the standard.

And the implementation is also free to provide no diagnostic. So, i
think it would be a good thing to encourage them to provide
diagnostics...
(...)
So it's basically up to implementors and their community: what they
care about.


gets is not "dreaded". Only dumb programmers are "dreaded". Dumb
programmers will foil any attempt to provide a safe environment. The
only way to make the world nearly 100% safe from dumb programmers is to
put them on an island with no Internet connection.

"Excuse number 5"... I mean, this is irrelevant. If we look at those
extremes, we'll come to no conclusion, because "any attempt to do
anything will fail, since dumb programmers..."
If you "dread" gets, you have some emotional problem. Normal people
don't think about it, let alone regard it as some Bogey Man.

Yes, i scream and cry every time i see a call to gets() or a forum
reply that says to a newbie: "just use gets() to read input". :pPPPPPP
:-D
Yes, don't use it!

I don't. Maybe that's the reason the K&R book starts showing how to
write a getline function. Luckily, C++ improved this aspect of C.
Not necessarily. Suppose I have a compiler application which is
separated into two programs, the compiler proper and an assembler.

The compiler generates assembly code, which the assembler reads from
its standard input.

My compiler never generates lines longer than 1023 characters, by
design. So the assembler can safely use gets() on a 1024 character
buffer to read the compiler's output.

The assembler is part of my compiler application; it's not meant to be
used alone. So the interface between the two is a private interface.

Years ago I did some Motorola 68000 programming using the GNU assembler
that served as the back-end for gcc. With that assembler, if I
mis-spelled the mnemonic name of an instruction opcode, there wasn't
any nice error message with a line number. Guess what, the assembler
crashed with a segfault! That wasn't a problem, because the assembler
didn't have to be designed to handle incorrect input. It was an
internal interface to be used by the compiler, which put out correct
opcodes. I got my assembly routines working anyway and life went on.

Wow. Would you like to see your grandchildren facing the same problems?
:p :-D

Let's do things right this time. :)
If you think that's a bad idea to have such an interface, well consider
that modules in C and C++ programs often have such "unsafe" internal
interfaces between them. It's not unusual for pointers to arrays to be
passed around without any size being mentioned anywhere, because all of
the modules just "know" the size. It is some manifest constant that
comes from a header file.

I don't think about interfaces between functions the same way i think
about interfaces between programs. The security implications are
different.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top